Reading handwritten POs without losing your mind
The handwritten PO is the test case nobody wants to talk about. Every distributor has three customers who fax in POs. Or print them, write quantities in pen, scan them, and email the PDF. Or photograph them with a phone, sideways, with a coffee mug in the frame.
These POs don't have machine-readable text. The "PDF" is an image embedded in a PDF wrapper. OCR on most of them returns garbage. For years the answer was "make Marcia type them." Now the answer is "make Marcia validate Claude's read of them."
SideQuest uses Claude's vision model directly on the image, with no OCR layer in between. That matters because OCR is brittle in ways that vision models aren't. OCR sees "0" and "O" as different glyphs and gets confused at low resolution. OCR breaks on tables when the cells aren't pixel-aligned. OCR can't read handwriting. The vision model handles all of these because it reads the image the way a person does: pattern-recognizing characters in context.
The flow
Here's the actual path for a scanned PO that hit our pipeline this morning:
- Email arrives in the
purchase-ordersGmail label with a PNG attachment. - Parser tries to extract text from the attachment with pdfplumber, then PyPDF, then Tika.
- All three return empty, because the attachment is a raster image.
- Parser sets
is_likely_handwritten_or_scanned: trueand passes the raw image to Claude. - Claude reads the image, returns a structured list of line items, customer name, PO number, ship date.
- Structured output goes into the standard matcher pipeline (exact SKU → cross-ref → description).
The PO from this morning was from Datamoto. Letterhead with their address. Vendor block. Two line items: DM19012 Rollerblade × 10 @ $123 and DM78123 Gas Can Sunglasses × 5 @ $90. Sub Total $1,680. Claude read every field correctly on the first pass.
Where it still trips
Two patterns we watch for.
Sideways or rotated images. A customer takes a phone photo of a PO lying flat on a desk, and the image is landscape when the PO is portrait. Claude handles this fine. But a customer who scans a stack of POs in a feeder and the feeder skews one of them: the skewed PO sometimes reads with column drift, where the qty for line 2 appears under line 1. We mitigate by always returning confidence per line and requiring rep review when confidence drops below threshold.
Mixed printed and handwritten. A printed PO with hand-corrected quantities. Qty: 50 with the 50 crossed out and 75 written above. Vision models handle this less reliably than fully-printed or fully-handwritten POs, because they have to choose which value to trust. Our heuristic: if both values appear, use the handwritten one (which is the customer's correction) and flag the line for review.
The cost question
Running every PO through a vision model isn't free. We don't do it unless the cheaper text-extraction pipeline returns empty. That's why the order matters: try the cheap path first, only escalate to vision when the cheap path fails. Across our customers, about 18% of POs hit the vision path. The other 82% are machine-readable PDFs or HTML emails that parse for fractions of a cent.
What this changes for a distributor: the "I'll just retype it" tax on scanned POs goes away. Marcia no longer has a queue of 8 scanned POs she's been avoiding because they're annoying to enter. The 8 POs get drafted, surface in her review queue with confidence scores, and she handles them in the same flow as everything else.
The bar isn't "no human ever touches a scanned PO again." The bar is "the human reviews structured output, doesn't transcribe raw images." Smaller difference. Bigger impact.