Tensorlake Updates: Barcode detection, Gemini 3 OCR, and DOCX Tracked Changes

Engineers building production document workflows - we shipped some game changers.

Hello AI Engineers! 👋

We're excited to share some powerful new capabilities: Gemini 3 is now available as an OCR model for complex visual reasoning, we've solved the tracked changes problem for legal teams working with DOCX files, and barcodes are now automatically detected and decoded in your document pipelines. Plus, we've added support for new file formats and improved processing speed.

If you haven't tried Tensorlake yet, sign up for free and get in touch to discuss your document processing use case.

🎯 The Highlights

  • Gemini 3 Now Available for Document Parsing: Google's latest model is now integrated as an OCR engine in Tensorlake. Gemini 3 excels at complex visual reasoning — counting symbols on blueprints, understanding semi-wireless tables, and correlating chart legends to data. Use it when standard OCR falls short on non-trivial layouts. Tensorlake handles rate limits and chunking pages automatically. Read the blog →

  • DOCX Tracked Changes with Full Spatial Metadata: Legal tech teams, this one's for you. Parse Word documents while preserving insertions, deletions, comments, and bounding boxes in a single API call. Build contract intelligence systems that know both what changed and exactly where. No more choosing between revision history and spatial precision. Read the blog →

  • Barcode Detection & Decoding: Shipping labels, lab reports, insurance docs, barcodes are everywhere. Tensorlake now automatically detects and decodes them as part of standard parsing. Get barcode type, decoded value, and bounding boxes alongside your text and tables with no extra tooling. Read the changelog → | Try the notebook →

🔧 Product Updates

  • New File Format Support: Added parsing support for .ppt (legacy binary PowerPoint) and .rtf (Rich Text Format) files. Broader pipeline compatibility with fewer preprocessing steps.

  • Embedded Images in Output Markdown: Images embedded in your source documents are now preserved in the output markdown. Better context preservation for documents with diagrams, signatures, and visual elements.

  • Faster On-Prem Processing: Improved model03 speed for on-prem deployments. Same accuracy, faster throughput.

💡 When to Use What

Quick guide on our new OCR options:

Use Case

Recommended Model

Complex visual reasoning (blueprints, charts, symbols)

Gemini 3

Business documents, layout aware parsing with accurate bounding boxes

Model03

Tracked changes with spatial data

Standard DOCX parsing

Barcode extraction

Model03 with barcode_detection=true

Try These Updates

Thank you for being part of the Tensorlake community. We're here to help you build document pipelines that actually work in production.

Cheers,
The Tensorlake Team