DeepSeek has launched DeepSeek-OCR 2, a 3B parameter model that advances image, document understanding, and OCR to state-of-the-art levels. The key innovation is DeepEncoder V2, which mimics human reading by first forming a global understanding and then determining the logical reading order.

This approach improves handling of complex document layouts, tables, and structured text, outperforming Gemini 3 Pro on several benchmarks and delivering over 4% improvement compared to the previous version. DeepSeek-OCR 2 can be easily run and fine-tuned via Unsloth.

For more details, visit DeepSeek-OCR 2 Guide and model repository.

Source: DeepSeek via ai_machinelearning_big_data