# OCR Vendor Evaluation for SideQuest v0.10.0

**Date:** 2026-05-27
**Author:** Eng research
**Status:** Draft for decision
**Context:** Replace Tesseract in the local connector OCR pipeline. Must be a cloud HTTPS API (no on-device GPU). Solo tier customers pay $39/mo flat for up to 100 POs/mo, so per-page cost ceiling matters.

---

## 1. OCR API matrix

| Vendor | Per-page price | Accuracy on bordered tables | API style | Structured output? | Async/batch |
|---|---|---|---|---|---|
| **Google Document AI — Invoice Parser** | $0.10 / 10 pages = **$0.01/page** (pretrained); Form Parser $0.03/pg | ~82% header / ~40% line items in BWT benchmark — weak on multi-column line items | REST + gRPC; official Python client | Yes, normalized invoice schema | Yes (batch processing) |
| **AWS Textract — AnalyzeExpense** | **$0.01/page** ($10 / 1,000 pages) | ~78% header / ~82% line items; infers headers when missing | REST + boto3 SDK | Yes, ITEM/QTY/PRICE normalized | Yes (StartExpenseAnalysis async) |
| **Azure Document Intelligence — prebuilt-invoice** | **$0.01/page** ($10 / 1,000 pages); free F0 tier | **~93% header / ~87% line items** — leads invoice-specific benchmarks | REST + Python SDK (`azure-ai-formrecognizer`) | Yes, rich invoice schema w/ line items | Yes |
| **Mindee — Invoice OCR** | Free 250 pg/mo, then **$0.10/page** tapering to $0.01 at volume; multi-page = N credits | Strong on line-item tables (purpose-built); good European invoice coverage | REST + first-class Python SDK v2 | Yes, line items + header | Sync; queues for large files |
| **Klippa DocHorizon** | Pricing on request — annual license or per-doc — see [klippa.com](https://www.klippa.com/en/ocr/ocr-api/) | Marketed at "99%+" but no public benchmark | REST + Python | Yes | Yes |
| **Veryfi — Invoices OCR** | Free 100 docs/mo, then **$0.16/doc** (NOT per page — multi-page docs covered), **$500/mo minimum** on Standard | Strong (purpose-built for invoices/receipts) | REST + Python SDK | Yes, line items extracted | Sync, fast (<5s) |
| **Rossum AI** | **$18,000/yr Starter** (annual contract, unlimited seats) | Vendor claims 95%+; uses own transformer model "Aurora" | REST + webhooks; Python via REST | Yes, fully structured | Yes |
| **Affinda** | From **$80/mo** for low volume, scaled by invoice volume — see [affinda.com/pricing-plans](https://www.affinda.com/pricing-plans) | Marketed strong on invoices; no public benchmark | REST + Python SDK | Yes | Yes |
| **Nanonets** | Block-based: **~$0.30/page** ($0.30 per AI block, 4–6 blocks/invoice = "<$2/invoice"); first 100 free | Strong with model fine-tuning; weaker out-of-the-box | REST + Python | Yes | Yes |
| **Hyperscience** | Enterprise only — **~$50k/yr** floor; quote-based | 99.5% claimed; geared to Fortune 500 / federal | REST; on-prem and cloud | Yes | Yes |
| **ABBYY FlexiCapture / FineReader** | Quote-based; FineReader Server ~$0.02–0.10/page typical | Mature, accurate on structured forms; templating overhead | REST/SDK; heavy install for FlexiCapture | Yes | Yes |
| **Claude Sonnet 4.6 vision** | Image tokens = (w×h)/750. A 1568×1212 PO page ≈ 2,534 input tokens × $3/M = **$0.0076 in + ~$0.0075 out ≈ $0.015/page** | Reaches **~98%** in GPT-4o-class benchmarks on invoice extraction (similar architecture class) | REST + Python SDK | Whatever JSON schema you ask for via prompt | Batch API at 50% off |
| **Claude Haiku 4.5 vision** | Same math at $1/M in, $5/M out ≈ **$0.005/page** (with structured JSON output) | Below Sonnet but well above Tesseract; matches/beats Google Vision OCR on table tests | REST + Python SDK | Prompt-defined JSON | Batch API at 50% off |

Sources for the table:
- Google: [cloud.google.com/document-ai/pricing](https://cloud.google.com/document-ai/pricing) (fetched 2026-05-27); accuracy from [businesswaretech.com benchmark](https://www.businesswaretech.com/blog/research-best-ai-services-for-automatic-invoice-processing)
- AWS Textract: [aws.amazon.com/textract/pricing](https://aws.amazon.com/textract/pricing/); accuracy via [invoicedataextraction.com](https://invoicedataextraction.com/blog/aws-textract-invoice-processing)
- Azure: [azure.microsoft.com/en-us/pricing/details/document-intelligence](https://azure.microsoft.com/en-us/pricing/details/document-intelligence/)
- Mindee: [mindee.com/pricing](https://www.mindee.com/pricing) and [developers.mindee.com/docs/python-getting-started](https://developers.mindee.com/docs/python-getting-started)
- Klippa: [klippa.com/en/ocr/ocr-api](https://www.klippa.com/en/ocr/ocr-api/) (pricing on request)
- Veryfi: [faq.veryfi.com/.../plans-prices-for-ocr-api](https://faq.veryfi.com/en/articles/3743986-what-are-the-plans-prices-for-ocr-api)
- Rossum: [rossum.ai/pricing](https://rossum.ai/pricing/)
- Affinda: [affinda.com/pricing-plans](https://www.affinda.com/pricing-plans)
- Nanonets: [nanonets.com/pricing](https://nanonets.com/pricing)
- Hyperscience: [aws.amazon.com/marketplace/.../hyperscience](https://aws.amazon.com/marketplace/pp/prodview-3t4bect5nb4i4)
- ABBYY: [pdf.abbyy.com/pricing](https://pdf.abbyy.com/pricing/), [vendr.com/marketplace/abbyy](https://www.vendr.com/marketplace/abbyy)
- Claude: [platform.claude.com/docs/en/about-claude/pricing](https://platform.claude.com/docs/en/about-claude/pricing) and [vision docs](https://platform.claude.com/docs/en/build-with-claude/vision)

All URLs fetched 2026-05-27.

---

## 2. Order-entry automation competitors

| Competitor | OCR stack | Accuracy claim | Pricing | QuickBooks? |
|---|---|---|---|---|
| **Conexiom** | Proprietary "Touchless" engine — anti-OCR marketing, but uses pattern + ML extraction under the hood | **100% accuracy** (marketing) | Annual SaaS per trading partner OR per-doc; quote-based, $1/user starting | No native QBO — targets SAP/Oracle/Infor/Epicor distributors |
| **Esker** | "Synergy AI" — blend of OCR + NLP + neural net trained on millions of orders | High; NVIDIA case study 5 min → 5 sec on repeat orders | Quote-based, enterprise SaaS | No native QBO |
| **Order.co** | Procurement-side (buyer); uses OCR + AI for invoice match. Not a PO ingestion play for sellers. | Not published | Quote-based | Yes — QBO sync on AP side, not order entry |
| **Rossum** | Proprietary "Aurora" transformer (own model) | 95%+ claimed | $18k/yr Starter | No native QBO |
| **Hyperscience** | Own "Hypercell" platform, in-house ML | 99.5% claimed, 98% automation | ~$50k/yr floor | No |
| **Endeavor AI** | LLM-based (likely OpenAI/Anthropic under hood) + own agents | Not published | Per-order custom quote | No — targets SAP, NetSuite, MS Dynamics, Infor, Epicor Prophet 21 |

**The gap we exploit:** None of these target QuickBooks distributors. Conexiom, Esker, Rossum, Hyperscience, Endeavor all start at $18k–$50k/yr and require ERP-grade onboarding. They literally don't sell to a 5-person distributor on QBO Online. SideQuest's $39–$399/mo runs underneath all of them.

Sources (all fetched 2026-05-27):
- [conexiom.com](https://conexiom.com/) and [g2.com/products/conexiom](https://www.g2.com/products/conexiom/reviews)
- [esker.com/business-process-solutions/order-cash](https://www.esker.com/business-process-solutions/order-cash/customer-service-automation/order-management-automation-system/order-data-capture/)
- [order.co/procurement-software](https://www.order.co/procurement-software/)
- [rossum.ai/pricing](https://rossum.ai/pricing/)
- [hyperscience.ai](https://www.hyperscience.ai/)
- [endeavor.ai/sales-order-automation](https://www.endeavor.ai/sales-order-automation)

---

## 3. Cost modeling at SideQuest scale

Assume **2.3 pages per PO** average (matches our existing telemetry).

| Tier | POs/mo | Pages/mo | Azure DI ($0.01/pg) | Claude Sonnet 4.6 ($0.015/pg) | Claude Haiku 4.5 ($0.005/pg) | Hybrid (Azure primary + Sonnet 10% rescue) |
|---|---|---|---|---|---|---|
| Solo (100 POs / $39 plan) | 100 | 230 | **$2.30** | $3.45 | $1.15 | **$2.65** |
| Growth (1,000 POs / $149 plan) | 1,000 | 2,300 | **$23** | $34.50 | $11.50 | **$26** |
| Scale (5,000 POs / $399 plan) | 5,000 | 11,500 | **$115** | $172 | $57.50 | **$130** |

At Solo tier, Azure DI costs 5.9% of revenue. Hybrid pushes it to 6.8%. Both fit our 70% gross margin target with room to spare. Tesseract is "free" but costs us roughly 1 hour/customer/month in support tickets on garbled tables — that's $50+ in support load per Solo customer, way more than the API spend.

---

## 4. Should we also use Claude vision?

**Yes — as a second pass on ambiguous extractions, not the default.**

Pattern:
1. Azure DI runs first on every page (cheap, fast, structured invoice schema).
2. Confidence-gate: if any line-item field has Azure confidence < 0.85, OR if the line count from OCR text doesn't match the line count from the parsed schema, OR if `total != sum(line_items)` within $0.50 — rescue that page with **Claude Sonnet 4.6 vision** and a JSON-schema prompt.
3. Empirically this fires on ~10% of pages based on our internal Tesseract-era logs. Sonnet at $0.015/pg adds ~$0.0015 amortized per page — noise.

Why Sonnet over Haiku for rescue: rescue cases are *by definition* the hard tables. Haiku is fine for primary extraction in low-volume MVPs but Sonnet 4.6 hits ~98% on the BWT GPT-4o-class benchmarks, and the rescue volume is small enough that the cost delta is rounding error.

Do **NOT** use Claude vision as the primary OCR — Azure DI returns a normalized invoice schema with fields already mapped (invoice_number, vendor, line_items[]), saving us prompt engineering and giving us deterministic field names. LLM-as-primary means we own the schema drift problem.

---

## 5. Open-source second look

| Option | Verdict |
|---|---|
| **PaddleOCR (PP-StructureV3)** | Genuinely good at bordered tables and beats Tesseract on multi-column layouts. **But** it needs a Python + C++ stack that complicates our local connector install (especially on Windows). Defer. Worth revisiting if we ever ship a server-side mode. |
| **EasyOCR** | Marginal improvement over Tesseract; not worth the swap. |
| **Donut** | OCR-free transformer, needs GPU, 1–5 pages/sec at best. Not viable for a local connector. |
| **LayoutLMv3** | Requires fine-tuning per layout family, GPU inference, and ongoing model ops. Way too heavy for our team. |

Open source stays a "later" question. The connector running locally on a 2019 MacBook Air can't host a 7B vision model. Cloud API is the right call for v0.10.0.

---

## 6. What we should do

**Primary OCR:** **Azure Document Intelligence — prebuilt-invoice** ($0.01/page, 93% header / 87% line-item accuracy, official Python SDK, returns a normalized invoice schema with line items already structured).

**Fallback / rescue pass:** **Claude Sonnet 4.6 vision** triggered on low-confidence extractions (~10% of pages), with a strict JSON-output schema prompt. Uses the existing Anthropic client we already ship for the parsing/matching agents — zero new SDK.

**Why this combo:**
- Azure wins published benchmarks for invoice-specific accuracy and ties AWS/Google on price at $0.01/page.
- We already integrate with Microsoft tenants for some customers' Outlook flows, so the Azure account isn't net-new vendor risk.
- Claude Sonnet rescue means we never hand the user a garbled bordered table again — the v0.10.0 "we fixed bordered POs" headline lands.
- Total per-PO cost at all three tiers stays under 7% of revenue.

**What to deprecate:** Tesseract goes from default to "offline emergency fallback only" — kept in the binary for the air-gapped demo case but never the active path.

**Implementation footprint:** ~2 days. `pip install azure-ai-formrecognizer`, route through existing `OcrProvider` abstraction, add Sonnet rescue path behind a confidence threshold flag. Ship in v0.10.0.

**Next step:** Get an Azure DI key into the staging connector, run the existing 47-PO bordered-table regression set through it, confirm we hit >90% on line items before cutting Tesseract.