What we've shipped
A running log of SideQuest releases. Connector version bumps, new docs, site updates, reliability fixes. We update this every time something noteworthy ships.
Self-sent email filter + description-only catalog suggestions
- The first failure. A user ran the v0.15.5 one-shot morning routine and the classifier tagged the SideQuest welcome email I'd shipped earlier the same night as a customer PO at 0.7 confidence. Body contained the words "order", "steps", "first PO" — enough for the classifier to bite. The email was sent from the user's own Gmail address to themselves.
- The first fix.
auto_label_unprocessednow pre-filters self-sent and SideQuest system mail before the classifier runs. NewGmailClient.get_authenticated_email()caches the authenticated address viausers().getProfile(). Any email where from-address contains the user's own email OR comes from a known system domain (sidequestautomation.com,sidequest-control-plane.fly.dev) skips classification entirely and lands inskippedwith reasonpre_classify_skip:self_sentor:sidequest_system_mail. - The second gap. When a PO line arrived with a description but no part number, the matcher's description-only path returned a confident match at ≥0.80 token-set-ratio, otherwise discarded the candidates and marked the line UNMATCHED. Reps reviewing those lines started from zero with no suggestion to validate. Customers regularly write "stainless ball valve 1/2 inch" without ever including a part number, so this hit often.
- The second fix. When a PN-less line has a description, the matcher now returns the closest catalog item as a suggestion regardless of confidence. Every description-only match — high or low — is now flagged
needs_review=Trueand never auto-submits. Reasoning: "stainless ball valve 1/2 inch" can match two different SKUs with near-identical scores, and auto-submitting the wrong one ships the wrong product. Reviewer eyes are required whenever the match came from description alone. The top-5 candidates surface incandidatesso the rep can pick a different one. Empty descriptions and single-word descriptions still return UNMATCHED — no hallucinated suggestions for genuinely ambiguous rows. - Total test count: 285 (was 274). 11 new regression tests: 5 for the self-email filter (welcome-email reproducer, system-domain skip, legit external PO still labels, getProfile cache, error swallowing) and 6 for the description-only suggestion path (strong match, weak suggestion, sibling candidates, empty desc stays unmatched, single-word stays unmatched, PN path still wins).
process_overnight_queue now does the labeling preflight in the same call
- The friction. v0.15.0 split the morning routine into two tool calls:
auto_label_unprocessedfirst (to sweep unread mail and tag the POs your Gmail filter missed), thenprocess_overnight_queue(to parse and draft them). Reps kept missing the first call and getting empty queues, then thinking the connector was broken when it was actually just disciplined about which mail it would touch. - The fix. Pass
with_auto_label=Truetoprocess_overnight_queueand the queue runs the labeling pass as preflight in the same call. The label gets created in Gmail if it doesn't exist (closes v0.15.3's chicken-and-egg dead end). Any unread PO/quote emails over 0.7 classifier confidence get labeled. Then the queue picks them up immediately, parses, drafts, auto-submits the clean ones. One command, no chain. - The preflight result surfaces in the response. A new
preflightfield carries{scanned, applied_count, label_ensured, target_label}so the rep can see "scanned 47 emails, labeled 8 new POs, then processed the queue." If the preflight errors (Gmail quota, transient network), the queue still runs and the preflight error appears as a soft note. - Default stays off.
with_auto_label=Falseby default so existing callers behave exactly as in v0.15.4. The label_not_found error message also now recommends the one-shot path instead of the two-step chain. - Total test count: 274 (was 269). 5 new regression tests pin the default-off behavior, the runs-and-labels happy path, preflight-failure isolation, day-one label creation, and the updated suggested_next_call.
PDF-extracted POs with single-space columns now parse deterministically instead of falling to LLM rescue
- The bug. When pdfplumber returns extracted text with single-space column separators (narrow PDFs, OCR'd PDFs, and ~40% of real-world POs we'd seen in the wild),
heuristic_lines_from_text's column-position parser couldn't find a header row and returned an empty list. The pipeline fell through to the Sonnet vision rescue, which works but costs API tokens, lowers the deterministic-parsing confidence subscore, and pushes "auto-submit clean drafts" out of reach for those POs. - The fix. v0.15.4 adds a structural-signature fallback. When the column-position parser returns nothing, we scan line-by-line for the row signature: a part-number-shaped token followed by a $-prefixed amount. If a line has both in that order, we extract
qty / customer_part / description / unit_pricedirectly. Prose with embedded PNs ("Need 50 of VALVE-1001-A by Friday") doesn't match the signature and stays rejected. - Caught by the playground. A user dropped a 10-line industrial PO on /try.html and saw qty values of 1, 50, 0, 0, 0, 0, 0, 0, 0, 0 instead of 25, 4, 40, 15, 6, 3, 30, 2, 20, 50. Same bug class as production. /try.html and the production parser are now aligned on the same signature check.
- Total test count: 269 (was 263). 6 new regression tests pinning the 10-line PO from the playground, prose anti-cases, the multi-space passthrough, the no-dollar-sign rejection, decimal quantities, and dedup of repeated parts.
auto_label_unprocessed pre-creates the label so process_overnight_queue isn't stuck on first run
- The chicken-and-egg problem. Day one for a new install:
process_overnight_queuerefuses to run when the configured label doesn't exist in Gmail (v0.15.1's correct safety behavior). The label only ever got born whenapply_labelfired on a matched email. But a brand-new mailbox often has no current PO sitting in unread, so nothing crosses the 0.7 confidence gate, nothing gets applied, the label never exists, and the queue can never run. Caught live this morning when the auto-label pass returned zero matches and the next queue call hit "label_not_found" — correctly, but uselessly. - The fix. v0.15.3 adds
GmailClient.ensure_label_exists(label)— an idempotent create-or-resolve.auto_label_unprocessedcalls it at the top before scanning. After a zero-match run, the label still exists in Gmail, the rep can drop POs into it manually, andprocess_overnight_queueruns against a real (empty) label instead of refusing. Response now includes"label_ensured": trueso callers can confirm the bootstrap happened. - Safety net preserved. If
ensure_label_existserrors (rare — Gmail quota or transient network), the scan keeps going.apply_label's internalcreate_if_missingstill runs per-message inside the loop. The pre-create is an upgrade, not a single point of failure. - Total test count: 263 (was 259). 4 new regression tests pinning the pre-create behavior, the failure-tolerance fallback, the method's existence on
GmailClient, and thelabel_ensuredfield in the response.
auto_label_unprocessed actually works now
- Fixed:
auto_label_unprocessedImportError. v0.15.0 importedclassify_intentfrom.auto_ack, but the function lives in.quotes. Calling the tool blew up with "cannot import name classify_intent from .auto_ack" and "Symptom G" in the diagnose playbook. Fixed the import. Added a regression test that asserts auto_ack does NOT expose classify_intent so a future rename can't reintroduce this. - Fixed: intent vocabulary mismatch. The classifier returns
"order"/"quote"/"ambiguous", butprocess_overnight_queueandauto_label_unprocessedboth checked for"purchase_order"/"quote_request". Effect: theauto_clean_orders/auto_clean_quoteslists were always empty even on real data, and auto_label never labeled anything because the intent gate never opened. v0.15.2 uses the right vocabulary across the board. Pinned with a test that callsclassify_intentand asserts the return value is in the expected set, so a future rename can't break this silently. - Fixed:
GmailClient.apply_labeldidn't exist.auto_label_unprocessedcalledgmail.apply_label(message_id, label)in v0.15.0/v0.15.1, but the method was never implemented — would have crashed with AttributeError if the import path had ever resolved. v0.15.2 shipsapply_labelas a real method onGmailClient. Creates the label in Gmail if it doesn't exist. Does NOT mark as read (preserves the rep's unread queue). - Confidence gating on auto-label. The label only gets applied when the classifier returns
orderorquoteat confidence ≥ 0.7. Ambiguous and low-confidence emails stay in the inbox so the rep can see them without us mis-labeling. Each labeled email gets the intent + confidence in the response. - Total test count: 259 (was 253). 6 new regression tests pinning the import path, the vocabulary, the GmailClient method, and the confidence gate.
Two fixes to make the morning workflow self-explanatory when something's misconfigured
process_overnight_queuerefuses to process when the configured label doesn't exist. v0.15.0 would silently fall back to no-label-filter when Gmail couldn't find the label, then try to parse 50 random inbox emails (newsletters, marketing, bank notifications) and report 50 "failed_to_parse" results. v0.15.1 short-circuits witherror: label_not_foundthe momentresolved_labelis null, returns zero touched messages, and includes a clear next-step message: "Runauto_label_unprocessed(label='X')first, or create the label in Gmail manually." Includes asuggested_next_callfield with the exact tool call to fix it.- Response splits drafts by intent. Quote requests and purchase orders both become QB Estimates, but reps often want to handle them differently (different reply tone, different urgency). The response now includes
by_intentcounts plusauto_clean_orders/auto_clean_quotes/needs_review_orders/needs_review_quoteslists. Each draft brief carries anintentfield for downstream filtering. "12 orders ready, 4 quote requests ready" is now a one-line answer. - Total test count: 253 (was 249). 4 new tests in
tests/test_v0_15_1.pycovering the missing-label short-circuit, the intent split, and the intent default.
Process 50 POs in a single chat turn. Morning triage is one tool call.
- New tool:
process_overnight_queue(label, max_pos=50). Pulls every unread PO from your Gmail label, parses each one, matches lines against the QuickBooks catalog, and builds a local draft Estimate per PO — all in a single server-side loop. Returns one summary grouped byauto_clean,needs_review(with specific reasons per draft), andfailed_to_parse(with the message_id + reason so you can investigate). Per-PO errors are isolated, so one bad image-only PDF can't derail the batch. Designed for "rep logs in, processes the overnight queue in one shot." - New tool:
bulk_submit_clean(draft_ids, confirm=True). Submits many drafts to QuickBooks in one MCP call with per-draft error isolation. Pass theauto_cleanlist fromprocess_overnight_queue. Setdry_run=Trueto preview every QB payload + computed total without sending — handy for "show me what I'm about to push" review. Requiresconfirm=Truefor live so a typo can't accidentally batch-submit 50 estimates. - New tool:
report_review_queue(). Lists every draft sitting indraftstatus grouped by the specific reason a human needs to look at it (customer_not_in_qb,unmatched_sku,po_price_below_catalog, etc.). Plus the clean list ready for bulk submit. Designed for morning triage: "what's blocking" answered in one call. - New tool:
auto_label_unprocessed(label, max_check=50). Scans recent unread inbox mail without a label filter, classifies each via the existing intent classifier, and applies your PO label to anything that looks like a customer PO. Use it when a PO landed in the inbox that your Gmail rule missed; run this once, thenprocess_overnight_queuepicks them up. - Total test count: 249 (was 234). 15 new tests covering classification, per-PO error isolation, the confirm/dry_run gate, the v0.14.6 label-fallback passthrough, and the review-queue grouping.
Typical morning workflow: auto_label_unprocessed() (catch any inbox stragglers) → process_overnight_queue() (parse + match + draft) → bulk_submit_clean(auto_clean_ids, dry_run=True) (preview) → bulk_submit_clean(auto_clean_ids, confirm=True) (live submit) → handle the needs_review queue one draft at a time with the existing single-PO tools.
Preview the QB payload before sending — catches double-discount-style bugs in tests, not production
- New
dry_run=Trueonsubmit_estimate_to_qb. Returns the exact QB Estimate payload that would have been sent plus the computed total — without touching QB or marking the draft submitted. The v0.14.8 double-discount regression would have been caught in unit tests if dry_run had existed; now any future payload-shape change can be validated against expected QB JSON before a real customer's books are touched.confirm=Trueis NOT required for dry_run since nothing writes. QBClient.build_estimate_payloadrefactored to a static method. Pure function — no QB connection needed. Used by both the livecreate_estimatepath and the new dry_run path so they receive identical inputs. Tests can assert on the exact JSON shape (Amount, UnitPrice, DiscountLineDetail PercentBased flag, ShippingAmount handling).- New
QBClient.compute_estimate_totalstatic method. Replicates QB's TotalAmt math for the dry_run output. Sums SalesItemLineDetail Amounts, applies any DiscountLineDetail line, returns the rounded result. Matches QB to the penny modulo banker's rounding edge cases. - 13 new tests in
tests/test_v0_14_10.pycovering payload shape, total computation (including the v0.14.9 verification scenario at $657.50), the dry_run tool flow end-to-end, and the freight-unconfigured / not-found error paths. - Total test count: 234 (was 221). Full suite green.
Per-line discount no longer double-applied on submit
- QB 6070 on submit when a draft had a per-line discount. v0.14.8 added
DiscountRateto theSalesItemLineDetailpayload, but the caller (submit_estimate_to_qb) was still passingunit_pricealready net of the discount via_effective_unit_price(). QB recomputed the expected Amount and rejected the request with "Amount is not equal to UnitPrice * Qty. Supplied value:298.89". v0.14.9 drops theDiscountRatefrom the payload — the unit price ships as the effective per-unit price, the Amount is simplyQty × UnitPrice, and QB accepts cleanly. - Doc-level discount and freight are still proper QB fields. v0.14.8's primary fixes remain —
DiscountLineDetailfor doc discount and a regular line againstSIDEQUEST_FREIGHT_ITEM_IDfor freight. The hotfix only touches per-line discount serialization. - Caveat documented: per-line discount visibility in QB is now a cosmetic loss (QB shows the discounted unit price, not the original price + percent). A future release will re-add
DiscountRateconsistently — sending the gross unit price + percent — once_effective_unit_price()is wired to skip discount when the caller wants the percent forwarded. - Total test count: 221 (was 218). 3 new tests in
tests/test_v0_14_9.pycovering the Amount round-trip, doc-discount preservation, and freight-unconfigured preservation.
Doc discount + freight now ride as real QB fields, price-update validation
- Critical fix:
submit_estimate_to_qbwas silently dropping document discount and freight. The v0.14.7 implementation stuffed both values into the "Memo on statement" field as a text string instead of sending them as real QB fields. Result: the connector reported one Estimate total to the operator and QuickBooks recorded a different one. We caught this in a Datamoto test draft — connector said $682.50, QB stored $692.10 — a silent $9.60 overcharge per Estimate. v0.14.8 sends document discount as a properDiscountLineDetailline (percent-based or amount-based, mutually exclusive), and sends freight as a regular line against the distributor's "Shipping" / "Freight" service item. - New config:
SIDEQUEST_FREIGHT_ITEM_ID. QBO has no top-level shipping-amount field on Estimate; freight rides as a line against a distributor-owned freight item. Set this env var to the qb_id of your "Shipping" / "Freight" service item. If a draft has freight > 0 and this is unset, the connector now refuses to submit and returns{"error":"freight_unconfigured"}with setup instructions, rather than silently dropping the freight as v0.14.7 did. update_qb_item_pricenow validates input. v0.14.7 accepted any string — including"-5"— and pushed it straight to QB. A CSV-import typo would silently corrupt the catalog. v0.14.8 rejects negative prices with{"error":"negative_price"}and rejects non-numerics with{"error":"invalid_price"}. Both the tools wrapper and the lower-level QBClient method guard against negatives.- Fixed
"0"silently becomingNoneon price round-trip. The catalog model's price-deserialization used a falsy check (if getattr(item, "UnitPrice", None)), which treatedDecimal("0")as missing. Setting an item price to 0 produced a phantom"new_price":"None"in the response and cleared the QB UnitPrice. v0.14.8 uses explicitis not None. To actually clear a price, passclear=Truetoupdate_qb_item_price. - Total test count: 218 (was 210). 8 new tests in
tests/test_v0_14_8.pycovering mutually-exclusive discount validation, freight-config-missing path, negative-price rejection at both layers, garbage-string rejection, and the 0-vs-None round-trip.
Smarter AR greetings, customer echo in match_po_lines, list_reports regression cover
- AR greetings use a common-first-names list. v0.14.5/6 stopped numeric and possessive-'s residuals but left "Hi Red," for Red Rock Diner and "Hi Kookies," for Kookies by Kathy. v0.14.7 checks the first token against a ~300-word list of common first names. If the token isn't a recognizable name (Red, Kookies, Datamoto, Acmecorp), we fall back to "Hi there,". "Hi Alice," and "Hi Jeff," still work for real names.
match_po_linesnow echoes the customer when called withcustomer_id. v0.14.6 added the echo logic but called a non-existentQBClient.get_customer, so the try/except silently returned None. v0.14.7 ships the actualget_customer(qb_id)method againstCustomer.getfrom the SDK. Callers passing justcustomer_idnow get back the resolved display name + email + company.- Regression cover for the v0.14.6
list_reportsdedupe. v0.14.6 filteredmatch_quality_by_customerandlist_learned_rulesout of the report registry (they're MCP tools, not reports). v0.14.7 ships an explicit regression test so a future refactor can't reintroduce the duplicate surfaces. - Total test count: 210 (was 198). 12 new tests in
tests/test_v0_14_7.pycovering name-list resolution + the QBClient method shape.
Honesty fixes in tool responses + clearer error messages
list_incoming_posno longer lies about the label. When the requested Gmail label doesn't exist, the response previously echoed it back as if it had worked while silently returning unlabeled mail. Now the response includesrequested_label,resolved_label(null if the label was missing), and afallback_reasonexplaining what happened and how to fix it (create the label in Gmail, apply it to your PO emails, retry).- Discarded-draft error messages no longer reference a tool that doesn't exist.
remove_draft_line,update_draft_line,add_draft_line, andset_draft_doc_discountpreviously said "Restore it viaupdate_draft_statusbefore editing." That tool was never shipped. The message now says "Discarded drafts cannot be edited; they're kept for audit only. Create a new draft viapropose_estimateto make changes." list_reportsno longer duplicates surfaces.match_quality_by_customerandlist_learned_ruleswere listed in the report registry AND were their own MCP tools — Claude had two ways to call the same data. The registry now lists them underalso_available_as_toolsrather than as report rows, and tells callers to invoke them directly.- AR greeting handles apostrophes and single-token brands. "Hi Amy's,", "Hi Jeff's,", "Hi Red," and "Hi Kookies," all now fall back to "Hi there,". The possessive
's(and curly's) is stripped first; single-token customer names — usually brand names rather than person names — fall back to the generic greeting. "Alice Cooper" still produces "Hi Alice,". match_po_linesechoes the resolved customer when called withcustomer_id. Previously, passingcustomer_iddirectly (withoutcustomer_name) returnedcustomer: nullin the response even though the ID resolved fine. The tool now callsget_customerto fetch the record so the caller has confirmation.- Pricing reconciled across the site. Calculator page said Solo $39/$468yr and Free 20 POs/mo; homepage JSON-LD said Solo $39 with 100 POs. Both now match
pricing.html(the source of truth): Solo $29/$290yr, Free 25 POs/mo, 150 POs in Solo. No more conflicting numbers between pages. - Total test count: 198 (was 184). 14 new tests in
tests/test_v0_14_6.pycovering all five code fixes, full suite green.
Two P0s that touch real money, five correctness P1s, two quality P2s, a P3 — and the missing Gmail OAuth module
- P0 — Empty SKU no longer silently fuzzy-matches. A blank cell in a PO previously matched to whatever item shared the most tokens with the description ("Sunglasses" → "Gas Can Sunglasses" at 0.855). That's a wrong-product-shipped risk. Combined SKU+description fuzzy matching now requires a SKU; description-only matches still work via the stricter Stage 3 path.
- P0 — Mutually exclusive discount params now enforced.
update_draft_lineandset_draft_doc_discountpreviously accepted bothdiscount_pctanddiscount_amountand silently used one. Operator thought they applied 10%, got $5 flat instead. Passing both now returnserror: ambiguous_discountwith a clear message. - P1 —
report_top_itemsandreport_top_customersexclude discarded drafts. Test/throwaway drafts were inflating sales rollups (one SKU jumped 10→18 units across a series of test discards). Reports now reflect what actually went out the door. - P1 —
report_top_itemsrevenue now applies line-level discounts. Previously gross qty × price; now subtractsdiscount_pct/discount_amountper line. - P1 —
report_top_customersdedupes across pre/post QB-creation events. Same pattern asmatch_quality_by_customerin v0.14.2 — collapse a buyer who was processed before and after they existed in QB into one row, with the canonical qb_id surfaced. - P1 —
auto_submit_if_cleandistinguishes not_found from disabled. Previously returned "disabled" for any draft_id including typos and non-existent IDs, blocking dry-run validation. Now returnsnot_foundfirst for missing drafts. - P1 — Mutating calls error on discarded drafts.
update_draft_line,add_draft_line,remove_draft_line, andset_draft_doc_discountpreviously silent no-op'd on discarded drafts. They now returnerror: draft_discardedwith a clear message. - P2 — AR email greetings handle numeric/short/generic company names. Old behavior produced "Hi 0969," (numeric address), "Hi 55," (street number), "Hi Inc," (generic word), "Hi A," (initial). New
_greeting_tokenfalls back to "Hi there," for all of these. - P2 —
qb_top_itemsfilters QB GrandTotal row. QB's ItemSales report includes a summary row that the wrapper was treating as a real item (item="TOTAL", revenue=$10,280). Now filtered out. - P3 —
add_draft_linerejects naked negative quantity. A typo'd negative quantity could turn an order line into an inventory removal. Now requires aCREDIT:orRETURN:description prefix to confirm intent. - Bonus —
gmail_oauthmodule now actually exists. Pre-v0.14.5 docs referencedpython -m qb_distributor_mcp.gmail_oauthas the one-liner for re-auth, but the module had never shipped. Customers gotNo module named qb_distributor_mcp.gmail_oauth. v0.14.5 ships it as a real runnable: check for client secret, refuse to clobber existing token, trigger the GmailClient OAuth flow, print next-steps with the Advanced → Continue click-through explained. - Total test count: 184 (was 160). All 24 new tests in
tests/test_v0_14_5.pygreen, full suite green, no regressions.
Two installer fixes from tonight's debugging session
- Installer preserves your existing
.envon reinstall. Pre-v0.14.4 versions ofinstall-connector.shandinstall.ps1rewrote.envfrom scratch on every run, keeping only the license key. That silently wiped QB OAuth credentials,LICENSE_TIER,SIDEQUEST_AR_FOLLOWUP, and any custom keys customers had set up via OAuth flows or helper scripts. Customers would upgrade the connector, find tools broken, and have to re-run every credential flow. v0.14.4 only rewrites theQBD_LICENSE_KEYline (and addsQBD_CONTROL_PLANE_URLif missing). Every other key in.envsurvives untouched. - The "Sidequest Automation" rename now actually shows up. v0.14.3 changed the FastMCP server name to "Sidequest Automation" thinking that would update the Claude Desktop tool-use UI. It didn't — Claude Desktop reads the JSON KEY in
claude_desktop_config.json'smcpServersblock ("qb-distributor") as the display name. v0.14.4 fixes this properly:reinject.pynow migrates the JSON key fromqb-distributortosidequest-automationautomatically (preserves command/args/cwd, removes the old key). Existing customers just runreinject.pyonce and the migration happens. - Test count unchanged: 160 — these are install-script and config-writer changes, no functional code paths affected.
Bugs found in production sweep, plus the connector now shows up as "Sidequest Automation" in Claude
- AR sweep dict-shape defense in depth. v0.14.1 fixed
PrimaryEmailAddrdict-shape unwrapping intools.py, but the bug could resurface if a future caller bypassed the normalizer. Added_coerce_str()inar_followup.group_by_customerso dict-shape values get unwrapped at the consumer side too, even when upstream missed it. Three new regression tests cover dict-shape, plain-string, and empty-dict cases. report_qb_top_itemsandreport_qb_top_customersno longer return empty silently. QBO's ItemSales and CustomerSales endpoints return zero rows when called with no date range (they default to a same-day window). The wrappers now default to year-to-date when the caller doesn't pass dates, so you get rows that match what's actually in your file. Caller-supplied dates still win.- New tool:
list_items(limit=25, search=None). Spot-check your QuickBooks catalog, find a SKU before manually building a draft, or audit what auto-match returned versus what's actually there. Case-insensitive substring search across SKU, name, and description. Caps at 200 results to keep MCP payloads reasonable. - Display name changed to "Sidequest Automation". When Claude calls a connector tool, it now shows "Sidequest Automation" in the tool-use UI instead of the internal slug "qb-distributor". Cosmetic — the JSON config key stays the same so existing installs keep working without migration.
reinject.pynow ships in the zip. After any.envchange, run~/.qb-distributor-mcp/venv/bin/python ~/.qb-distributor-mcp/reinject.pyinstead of pasting a 600-character one-liner. Shorter, robust to terminal mangling, prints which keys landed with secrets masked.- Total test count: 160 (was 143). All 17 new tests in
tests/test_v0_14_3.pygreen, full suite green, no regressions.
Three fixes from the first week of production runs
- Quoting safety: never silently underprice. If the buyer's PO supplies a unit price that is below your QuickBooks catalog price, the draft now uses the catalog price (not the PO price) and flags the line for review. The PO's offered price is recorded alongside it on the draft so the reviewer can see exactly what the buyer asked for and why we overrode it. If the PO price is at or above catalog, we keep the PO price untouched (the existing variance check still flags suspiciously high numbers). If the PO has no price, we fall back to catalog. If the item has no catalog price (services, etc.), we keep the PO price.
- Per-customer match-quality dedupe.
report_match_quality_by_customerpreviously showed the same buyer as two separate rows when their first PO was processed before they existed in QuickBooks (customer_qb_idnull) and the second was processed after (customer_qb_idpopulated). The report now deduplicates by normalized customer name and surfaces the canonical QuickBooks ID once it exists, with aknown_qb_idsset carrying every ID we've ever seen for that buyer. - QuickBooks Reports auth hardening. The
run_qb_reportpath crashed in production with'AuthClient' object has no attribute 'session'when the cached refresh token was stale. Added a three-step fallback: try the in-memory refresh, retry with the stored refresh token, and if both fail, rebuild theAuthClientfrom scratch and refresh. The session-level error is now a transient retry, not a hard failure. - Total test count: 143 (was 136). All 7 new tests in
tests/test_v0_14_2.pygreen, full suite green, no regressions.
Unwrap the QuickBooks Online dict-shape PrimaryEmailAddr before it hits .strip()
- Bug: the AR sweep crashed in production with
'dict' object has no attribute 'strip'atar_followup.group_by_customer. Root cause: QBO returnsPrimaryEmailAddras a structured object ({"Address": "[email protected]"}), not a plain string. The normalizer intools.run_ar_followup_sweepstored the dict verbatim, and the downstream grouper called.strip()on it. - Fix: 5-line
_qb_email()helper intools.pyunwraps theAddressfield when the value is a dict. Already-flat strings pass through. None values become empty strings. Same shape can be applied to BillAddr / ShipAddr if those ever flow downstream to similar string operations. - Regression test: new
tests/test_v0_14_1.pysimulates three QBO customer shapes (dict-wrapped, already-flat, None) and confirms the sweep completes cleanly. Plus a baseline test that the original string-shape path still works. - Total test count: 136 (was 134). No other behavior changes.
SideQuest chases your unpaid invoices for you
- New MCP tool:
run_ar_followup_sweep. Pulls every open Invoice from your QuickBooks Online file, classifies each into one of six aging buckets (due_soon, overdue_1_7, overdue_8_30, overdue_31_60, overdue_61_90, overdue_90_plus), groups by customer (one email per customer per sweep, never one per invoice), renders a tier-appropriate follow-up email per customer, and writes each as a Gmail DRAFT in your Drafts folder. - Tone scales with severity. A 3-day-overdue invoice gets a friendly check-in. A 90+ day overdue invoice gets a hold notice. Templates are conservative and respect that most overdue invoices are AP-system delays, not bad-faith customers.
- Multi-invoice consolidation. A customer with three overdue invoices gets ONE email that lists all three; the subject and tone match the worst-bucket invoice in that customer's stack. Never one-per-invoice spam.
- Opt-in via
SIDEQUEST_AR_FOLLOWUP=trueenv var. Same pattern asSIDEQUEST_AUTOSUBMITandSIDEQUEST_AUTOACK. Default off. - Tier gate: Solo and above for Gmail draft writes. Free tier gets the chase plan as JSON (which customers to chase, what to say) but no drafts written. Upgrade nudge surfaces in the response with the same structured shape as the v0.13.0 tier-locked replies.
- QBO helper:
QBClient.list_all_open_invoices(). Single query against every open Invoice with Balance > 0, capped at 500 rows. Each row carries CustomerRef so the caller can join back to a Customer record without re-fetching. - Gmail helper:
create_standalone_draft(). New method that writes an outbound draft outside any existing thread. OAuth scope staysgmail.modify, nevergmail.send. - 21 new tests in
tests/test_ar_followup.py. Cover bucket boundary classification, multi-invoice grouping, missing-email skipping, template rendering across all 6 buckets, sweep aggregation, and the QBO dict-to-record adapter. Total connector tests: 134 (was 113). - Marketing: /ar-assistant.html landing page with the bucket table, a sample multi-invoice draft, "what's in / what's not" framing, FAQ schema.
New Free tier (25 POs/month, no card), simpler price ladder, ROI-led pricing page
- New Free tier. 25 POs per month, no credit card. Parser, OCR, multi-doc routing, catalog matching, manual submit to QuickBooks. Email-only signup at /start-free.html.
- Tier restructure. Old: Solo $39 / Starter $79 / Growth $199 / Scale $499. New: Solo $29 / Growth $99 / Scale $299. Per-PO economics improve at every tier (Solo $0.19, Growth $0.13, Scale $0.085) and the inverted Solo-vs-Starter pricing bug is gone. Existing subscribers stay on grandfathered prices.
- Feature gating, not volume gating. Every tier above Free includes the full feature set. Free tier is gated out of auto-submit, reply drafts, customer risk gate, quote workflow, customer-specific cross-reference CSV upload, and auto-learn cross-references. The new
tier_gate.require_paid_tier()helper returns a structuredtier_lockedresponse that explains the upgrade and notes that manual submit still works on Free. - Auto-submit definition clarified everywhere. Auto-submit writes the draft Estimate into QuickBooks Online via the QBO API. It does NOT send anything to your customer, does NOT email order confirmations, does NOT convert to Invoice. New FAQ entry, callouts on the homepage and pricing page, and explicit framing on the Free tier signup page.
- Pricing page rewritten with an ROI calculator hero. "Save about $4 per PO. Pay $0.13." Live calculator surfaces hours saved, labor savings, and ROI multiple as you type your monthly PO count. Includes a 17-row feature gating matrix showing exactly what's in Free vs each paid tier. Soft-overage explainer (20% over your tier is free; beyond that $0.20/PO).
- Homepage CTA changed. "Start free" now points at /start-free.html instead of the lead-capture form. Hero, nav, footer, and pricing-grid all updated.
- 20 new tier_gate tests. Cover Free / Solo / Growth / Unlimited behavior, tier rank ordering, legacy "starter" alias, locked response shape, upgrade URL consistency, and unknown-feature handling. Total connector tests: 113 (was 93).
POs with multiple attachments stop merging cover letters and spec sheets into the line items
- New attachment router classifies every PDF on the email. Each attachment gets one of five roles:
primary_po,secondary_po,cover_letter,spec_sheet,unknown. Signals: header anchor count from the v0.12.1 header parser, line count from the heuristic extractor, part-number token density, quantity column or qty-x-PN pattern, filename hints (PO12345.pdf vs cover_letter.pdf vs Drawing_A101.pdf), and short-text cover-phrase detection. - Header anchors come from the primary PO only. Before today, the response picked the first non-empty header field across every PDF. If a cover letter said "PO ref: PO-99001" and the real PO had different anchors, the wrong one could win. Now the primary_po owns the header. Ties between two primary candidates resolve to the one with more header anchors, then more lines, with the loser demoted to
secondary_po. - Lines aggregate from primary + secondary POs with dedup. Cover letters and spec sheets no longer contribute lines to the draft. Repeated rows across multiple PO PDFs (the "formal PO + acknowledgment" pattern) collapse on (part_number, quantity, description) so the customer isn't charged twice.
- Spec sheets stay in raw_content for matcher context but skip line aggregation. The matcher still has access to OEM-to-house cross-reference tables shipped as spec PDFs. Lines just don't come from them.
- New response field:
attachments_routed. Lists every attachment with its assigned role, confidence score, header_anchor_count, line_count_estimate, PN token count, has_quantity_signal flag, text_length, and a human-readable reason string. So Claude (and the operator) can see exactly why each attachment was treated the way it was. - Tests: 14 new cases in
tests/test_attachment_router.py. Single-attachment classification for all 5 roles plus 6 multi-attachment scenarios: PO + spec sheet, cover letter + PO, two POs with the higher anchor count winning, no-primary fallback that promotes the first secondary, three-document email (cover + PO + drawings), and empty-list. Total connector test count: 93 (was 79). - Filename pattern fix: the spec / cover patterns now use custom word boundaries that handle underscores and dashes ("Drawing_A101.pdf", "PO-12345.pdf", "cover_letter.pdf" all match correctly). Regex
\bwould have missed those.
The v0.11.1 + v0.12.0 modules are now actually called by the production pipeline
- Header parser now fires automatically. Every
parse_po_from_emailcall runsheader_parser.parse_headeron the email body. Customer, ship-to, po-ref, terms, need-by, notes, vendor, total all surface in the response as a newheader_fieldsobject — plus acustomer_sourcetag (anchor / above_po_ref / below_po_ref / signature / sender_domain) so the operator sees how each value was found. If the table extractor didn't find a PO ref, the header parser's value takes over. - Customer cross-ref by sender domain. The same pipeline pulls the sender's email domain, calls
customer_lookup.lookup_customer_by_domainagainst the live QB Customer list, and surfaces acustomer_matchobject with qb_id, display_name, confidence, match_source, BillAddr / ShipAddr defaults, and the self-describing assumption note. Two newQBClientmethods (list_customersandlist_open_invoices_for_customer) feed the cross-ref and risk gate. - Intent classification on every email.
classify_intent_with_memoryruns on subject + full body + sender domain, with the per-customer memory CSV consulted first. Result surfaces asintentin the response so Claude knows immediately whether to route the draft as quote vs order. - Auto-ack reply fires when enabled. Set
SIDEQUEST_AUTOACK=trueand the pipeline now drafts the fast "we received your order" reply right after parse succeeds. Skips on quote-classified emails (those go through the Cut 2 quote-mode template instead), missing message_id, and zero-line parses. - Customer Risk Gate wired into the clean-gate. When the structural and price-variance checks pass,
customer_risk.evaluate_customer_riskruns against the QB Customer record + open Invoice records for that customer. Over-credit-limit and past-due aging both add hard-hold reasons to the clean-gate output, surfacing ascustomer_risk:over_limit:...entries the auto-submit path respects. The full risk summary lands in the response ascustomer_risk. - Defensive wiring. Each new piece is wrapped in try/except so a failure inside one (Gmail blip, QB timeout, missing field) can never crash the main pipeline — the failed piece just returns None and the rest still runs.
- Backward compatible. No new env vars except the optional
SIDEQUEST_AUTOACKthat already shipped in v0.12.0. Existing callers see the new response fields but ignore them if they don't read them.
Upgrade: Download the latest sidequest-connector.zip and re-run the install script. The pipeline lights up automatically.
Customer Risk Gate plus Order Confirmation Auto-Ack
- Customer Risk Gate. New
customer_risk.evaluate_customer_risk()readsCustomer.BalanceandCustomer.CreditLimitfrom QuickBooks plus the customer's open Invoice records. Returns a status —clean,over_limit,past_due_aging, orno_credit_limit_set— plus the operator-friendly message that surfaces in the parse_po_from_email response. QuickBooks alerts on credit limit but doesn't block transactions; the gate wires that data into the clean-gate path. Default aging threshold is 60 days, configurable per-call. - Order Confirmation Auto-Ack (Cut 1.5). New
auto_ack.maybe_send_auto_ack()drafts a fast "we received your order, full confirmation to follow" reply right after parse_po_from_email succeeds. Opt-in viaSIDEQUEST_AUTOACK=true. Default OFF. Skips automatically on quote-classified emails (Cut 2 quote-mode handles those), missing message_id, or zero-line parse failures. Body is intentionally tight — no prices, no ship dates, just acknowledgement. Closes the loop the buyer is currently waiting on. - Reply stays a draft. SideQuest still never sends mail on your behalf. The Gmail OAuth scope stays
gmail.modify, nevergmail.send. The auto-ack lands in your Gmail Drafts folder so the operator (or the buyer-facing rep) reviews and clicks send. - 21 new tests in
tests/test_v0_12_0.py— 11 risk-gate cases (under limit, over limit, aging boundaries, no credit limit set, threshold override, paid-invoice skip, garbage-input safety, message string assertions) + 10 auto-ack cases (env flag on/off, quote skip, missing message_id, zero lines, force flag, Gmail error handling, singular-line grammar). All green. - Backward compatible. Both modules are pure-input — `tools.py` controls when to call them. Customers who don't set
SIDEQUEST_AUTOACKsee zero behavior change.
Upgrade: Download the latest sidequest-connector.zip. The risk gate fires automatically. To turn on auto-ack, set SIDEQUEST_AUTOACK=true in your .env and re-run the install script.
Header parser + QB customer cross-ref + per-customer classifier memory + quote reply template
- New
header_parser.pymodule. Ports the playground's header-anchor extraction.parse_header(body)returns structured fields: customer, ship_to, po_ref, terms, need_by, notes, vendor, total. Four-stage customer inference (above PO ref → below → signature → sender-email domain). Multi-line "Shipping Address:" / "Vendor Address" blocks join the next 1–4 lines into the value. Known traps guarded: "PO Box 989062" rejected as PO ref, "San Francisco, 94536" rejected as customer, "Sub Total" / "Phone:" lines rejected from inference. - New
customer_lookup.pymodule +lookup_customer_by_domain(domain, customers). Given a sender email domain (or full email), scans QB Customer records by primary-email exact match, display-name token overlap, and partial domain substring. Returns aCustomerMatchwith confidence, BillAddr / ShipAddr defaults, and anassumption_notelike "Customer pulled from QB record QB-101 via email_domain_exact (confidence 1.00); BillTo/ShipTo defaults included. Last verified 2026-05-28." The assumption note is always surfaced so the operator sees what came from the email vs. what came from QB. - Per-customer classifier learning. New
classify_intent_with_memory(subject, body, sender_email)checks~/.qb-distributor-mcp/customer_intent.csvfirst. If we've seen this sender's domain before and the operator labeled them as a quote-asker or order-placer, return that intent at confidence 0.95. Otherwise fall through to the heuristic.remember_customer_intent(domain, intent)persists with last-write-wins dedupe. Invalid intents rejected. - Quote-mode reply template. New
quote_reply_body()helper builds a Cut 2 reply with quote-specific copy: "Thanks for your inquiry on PO-XYZ. Here's pricing for the items you requested…" plus optional validity-through date and Quote # reference. The order-confirmation template is unchanged; this is a sibling for quote-mode flows. - Test corpus port.
tests/test_parser_corpus.py— Python port of the marketing-site playground's corpus (20 header-shaped cases). Runs on every change toheader_parser.py. All 20 green. - 52 new tests total in v0.11.1: 18 (test_header_parser) + 14 (test_v0_11_1_followups) + 20 (test_parser_corpus). All green.
Upgrade: Download the latest sidequest-connector.zip. No new env vars. The new MCP tool lookup_customer_by_domain activates automatically when you re-run the install. Per-customer classifier memory starts cold on first install and grows as operators label intents.
Upgrade: Download the latest sidequest-connector.zip. Existing customers re-run the install script. No new env vars.
Quote intake with GP %, uplift, and discount operator tools
- Email intent classifier. New
classify_intent(subject, body)helper scans the subject and the FULL body for quote-vs-order signals — long POs, forwarded chains, and EDI 850 dumps all get the same attention. "RFQ", "please quote", "pricing on", "send me a quote" route to quote. "PO 1234", "purchase order", "please ship", "confirming order" route to order. Both present → ambiguous; production connector hands ambiguous emails to Claude with context. 6 unit tests covering subject-only, body-only, both-present, and neither-present paths. - Four operator pricing tools.
apply_gp_margin(draft_id, target_gp_pct, costs)sets unit prices so each line hits a target gross-profit percentage (pulls per-item cost from the QB Item record).apply_uplift(draft_id, pct, scope)multiplies prices up by a percentage, per-line or doc-level.apply_discount(draft_id, pct, scope)same shape, opposite direction.set_quote_validity(draft_id, days)stamps "Quote valid through YYYY-MM-DD" on the draft memo. All four reject out-of-range inputs explicitly so operator intent stays clear (negative uplift, ≥100% discount, etc.). - Same draft pipeline as orders. Quotes use the existing QuickBooks Estimate document type — Estimates ARE quotes in QBO. No new document type, no new tables, no new install steps. The reply path reuses Cut 2 (
draft_reply_to_buyer) with the validity date stamped in the memo flowing through to the customer-facing copy. - 20 unit tests in
tests/test_quotes.py— classifier (6), apply_uplift (3), apply_discount (4), apply_gp_margin (4), set_quote_validity (3). All green. - Landing page at /quotes.html with the four-step flow, knob explanations, and a Claude transcript showing the operator pass.
Upgrade: Download the latest sidequest-connector.zip. The classifier and pricing tools are available in Claude Desktop the next time you open it. No new env vars, no new install steps.
The bordered-table OCR fix — Tesseract trust gate plus optional Azure DI
- Tesseract trust gate. Until v0.10.0, the OCR pipeline trusted any Tesseract output that came back non-empty. That meant bordered-table PO PDFs returned garbled-but-long strings (the kind of text where "DM19012 Rollerblade 10.0 123.00 1,230.00" comes back as "SN[momcods [Description [ay [unt") and downstream parsers tried to make sense of them. The matcher flagged everything for review without surfacing the real reason. v0.10.0 adds a structural-trust gate: when
table_structure,qty_price_disambiguation, orcustomer_format_recognitionfalls below threshold, the Tesseract result is rejected outright and the page falls through to the Claude vision passthrough that already existed. Claude reads the page images natively and writes the line items directly. Bordered tables now produce clean drafts instead of garbled flagged lines. - Optional Azure Document Intelligence primary path. New
azure_di.pyprovider sits in front of Tesseract. SetAZURE_DI_ENDPOINTandAZURE_DI_KEYin.env, install the optional dep (pip install qb-distributor-mcp[azure-ocr]), and the connector uses Azure'sprebuilt-invoicemodel: structured invoice schema, line items pre-extracted, around $0.01 per page on list pricing. When Azure's per-field confidence drops below 0.85, the page still falls through to Claude vision rescue. Free F0 tier covers the first 20 pages per month for development. Full evaluation memo at how we evaluated OCR. - Tesseract demoted to offline fallback. When Azure is configured, Tesseract no longer runs. When Azure is not configured, Tesseract runs with the new trust gate. Either way, the customer's data never leaves their machine for the deterministic-OCR path; Azure DI is the only cloud call, and only when the customer opts in by setting the env vars.
- Backward-compatible. Customers who don't set Azure env vars get the trust-gate-only improvement automatically. No new credentials needed for the bordered-table fix. The optional Azure path is a future cost optimization for high-volume customers.
- 6 new tests in
tests/test_ocr_trust_gate.pycovering the structural rejection on the Datamoto-style case (Tesseract was super confident per-word but the column geometry was broken). All green.
Upgrade: Download the latest sidequest-connector.zip. Existing customers re-run the install script and the trust gate kicks in immediately — no env changes required. To turn on Azure DI, follow the new .env.example section and pip install qb-distributor-mcp[azure-ocr].
OCR confidence now tells you what's actually wrong
- Structured
OCRConfidencewith four subscores. Until v0.9.1, the OCR pipeline returned a single confidence float per page. The reviewer saw "low confidence" and had to guess what was off. Now every OCR'd line carries four named subscores: text legibility (how readable the glyphs were), table structure (how cleanly words clustered into column edges from Tesseract bounding boxes), qty/price disambiguation (was it clear which column was a count and which was a price, based on $/Qty/cents markers), and customer-format recognition (how many known PO anchors like "Ship To" / "Unit Price" / "Net 30" we found). - Weakest-link semantics. Overall confidence is the minimum of the four subscores, not the average. A PO is only as trustworthy as its least-confident dimension. The "weakest" subscore drives the reason text. If table structure scores 0.2 and everything else is 0.9, the reviewer sees "unclear column alignment in the scanned table," not a generic "low confidence."
- Wired into
match_po_linesflag-for-review. Any OCR'd line whose weakest subscore falls below 0.65 forcesneeds_review=Trueand appends a specific tag to the match notes (for exampleOCR concern: ambiguous qty vs price columns (no $ markers or 'Qty' labels)). Even an exact SKU match gets flagged when the OCR underneath it was shaky. That's the point. The clean-gate / auto-submit path picks this up automatically through the existingreview_flagmachinery. - Surfaced in
parse_po_from_email. The response now includes anocr_confidenceobject with all four subscores plusoverallandweakestwhen OCR was used. Claude can read it directly to write a more honest "I'm not 100% on this PO because the column alignment was unclear" before showing the lines. - POLine model addition. New optional
ocr_confidencefield on thePOLinePydantic model. None for digital PDFs and typed email bodies (no behaviour change). Backwards-compatible: every existing call site that doesn't set the field gets the v0.9.0 behaviour byte for byte. - Tests: 34 new tests across
tests/test_ocr_confidence.py(aggregate math, every subscore helper, build_ocr_confidence, Pydantic round-trip) andtests/test_matcher_ocr_concerns.py(each weak subscore mapping to its specific reason, threshold boundary, no-OCR no-change, notes preservation, batch path, idempotency). All 34 green. - No new dependencies, no breaking changes. Drop-in upgrade. The download URL stays
sidequest-connector.zip. If you've already installed v0.9.0 with QBD support, replace the package and you're done.
QuickBooks Desktop support (Windows beta)
- What's new. SideQuest now talks to QuickBooks Desktop on Windows, not just QuickBooks Online. Same connector, same Claude prompts, same Insights reports — set
QB_BACKEND=desktopin~/.qb-distributor-mcp/.envand the MCP routes through a small local Flask bridge that translates REST to QBXML over COM. Items, customers, estimates, price updates — all the QBO write paths work against QBD too. Reports stay on the localreport_*tools (which work on either backend); QBD-native report passthrough is on the v0.10 list. - Why a bridge instead of direct COM. Three reasons. (1) Process isolation — when QBSDK throws (it does, periodically), the bridge dies, the MCP server doesn't. (2) Cleaner tests — the MCP layer mocks
httpxexactly like the QBO live-reports tests, no Windows VM required. (3) Keeps the door open for the Web Connector unattended-polling path later without touching the MCP. - Install path. Same
install.bat. New Step 5b installs Flask + pywin32 and writesqb-desktop-bridge/start-bridge.bat. Open QBD with your company file, double-clickstart-bridge.bat, click Yes, always allow on the Integrated Application trust dialog (one time per company file), and you're live. - What's covered.
list_items,get_item,find_customer_by_name,update_item_price,create_estimate.get_reportthrough the bridge returns 501 today; use SideQuest's local reports instead. - What's not covered yet. Unattended Web Connector polling (QBD must be open), multi-user-mode write conflict handling, QBD Mac (Intuit killed it), QBD Enterprise advanced inventory features (lot, serial, multi-location). All on the v0.10+ list.
- Failure modes the bridge surfaces honestly. 503 when QBD isn't running, 403 when the trust dialog was declined, 409 with backoff retries when QBD is busy with a modal dialog. The MCP layer wraps each into a friendly error pointing at the diagnose prompt.
- Tests: 18 new unit tests covering QBDesktopClient (mocked
httpx, error mapping, retry-on-busy, protocol conformance, backend-selection routing), 26 new bridge tests covering QBXML builders, response parsers, Flask routes, and COM-error mapping. Full suite: 161 tests green. - Honest caveat. The bridge is written from QBSDK 13.0 documentation and is shipping as a beta. Every uncertain block is flagged
VERIFY:in the source. We haven't run it against a real customer's QuickBooks Desktop install yet, which is why we're opening a beta program: apply here and get the full connector free for up to 200 POs/month for 12 months in exchange for honest feedback.
Per-customer match quality + see what the connector has learned
- Per-customer
report_match_quality. The existing match-quality report now takes an optionalcustomer_qb_idparameter. Ask Claude: "how's the matcher doing for Acme this quarter?" and the report scopes to that customer's drafts — useful for confirming that auto-learning is paying off for a specific account over time. - New
match_quality_by_customertool. Per-customer rollup ranked by total lines processed. Surfaces the accounts whose POs trip up the matcher most, which is where adding cross-references (or letting auto-learn do its job) pays off fastest. - New
list_learned_rulestool. Returns the rows in yourcross_reference.csvwith timestamps. Ask Claude "show me what SideQuest has learned for Acme" and you see every mapping the connector has written, newest first. Visible compounding — "you've taught me 47 mappings for Acme since install." learned_atcolumn on the CSV. Every auto-written row now carries an ISO 8601 UTC timestamp so you can see when each rule landed. Legacy CSVs without the column still surface (rules withlearned_at=None).- New free tool: /quickbooks-error-decoder.html. Paste any QB API error code or message, get plain-English explanation + the actual fix. 12 codes covered (3200, 5010, 6240, 6190, 6210, 6000, 6140, 620, 610, 4000, 3001, 100). Pure client-side JS — nothing logged, no signup.
- Five new blog posts across the last 8 days: refresh-token recovery, why we built local-first instead of SaaS, the five PO formats that break OCR, how to test SideQuest before subscribing, reading EDI 850 via email translator.
- Tests: 13 new (11 reporting + 2 site fixture tweaks). Full suite: 143 tests green.
Cross-reference auto-learning — the connector gets smarter every time you fix a draft
- How it works. When you process a PO and the matcher can't recognise a buyer's part number (say
ACME-EL34), the line lands in your draft flagged for review. You assign the right QB item by chat — "set line L1 to Brass Elbow 3/4 NPT". The connector quietly appends a row to yourcross_reference.csvmapping(ACME, ACME-EL34) → BR-ELB-075-NPT. The next time that customer sendsACME-EL34, the matcher resolves it via the cross-reference table at 0.99 confidence. Zero ceremony. Onboarding a new customer's part-number convention now takes one PO instead of an afternoon at a spreadsheet. - What gets learned vs. what doesn't. Only first-time resolutions of previously-unmatched lines are written. Reassignments (rep overrides an existing match) are NOT learned — that would create flip-flop noise the next time. Lines without a buyer-side part number, drafts without a linked QB customer, and items that aren't in the catalog all skip cleanly.
- Live in-session updates. The matcher's in-memory cross-reference index updates immediately, so a multi-line PO from a new customer benefits from a rule it just learned on line 2 by the time it reaches line 5.
- Dedup. Before writing, the connector scans the CSV for an existing row with the same
(customer_id, customer_part). Identical rules are never duplicated. - Toggle. Set
AUTO_LEARN_CROSS_REFERENCE=falsein~/.qb-distributor-mcp/.envto disable. Defaults totrue. - Tests: 12 new tests covering happy path (writes row + updates matcher), every skip condition (no customer, no original_customer_part, reassignment, feature disabled, qb_item_id not in catalog), CSV dedup, header creation, parent-dir creation. Plus a bonus fix:
drafts.load/save/list_drafts/deletenow resolveDEFAULT_DB_PATHat call time instead of at function definition, so tests can swap the DB cleanly. Full suite: 130 tests green.
OAuth callback page + match-quality honesty fix
- Hosted QuickBooks OAuth callback. New page at sidequestautomation.com/qb/callback captures the Intuit auth code and shows it for copy-paste. Replaces the old localhost-redirect flow that broke on Production-flagged Intuit apps (Intuit refuses localhost / IP redirects there — every new install was hitting this). Set
QB_REDIRECT_URI=https://sidequestautomation.com/qb/callbackin your.env, add the same URL to your Intuit app's "Redirect URIs" list, then runpython -m qb_distributor_mcp.auth_qbas usual. Page is pure JavaScript reading URL params — nothing is logged, sent, or stored. - Backward-compatible. The connector still supports the legacy local-server flow when
QB_REDIRECT_URIpoints to localhost or 127.0.0.1. No existing customer breaks. - match_quality reports an "operator-assigned" bucket. v0.7.0 counted manually-mapped lines as "auto_matched", which inflated the clean-match rate when the rep had to step in. Now
report_match_qualityreturns three buckets:auto_matched_lines,operator_assigned_lines,flagged_for_review_lines. A draft where the rep had to re-map everything to a fallback item registers honestly as 0% clean-matched, 100% operator-assigned. - Tests: 14 new tests covering the operator_assigned flag on update_draft_line (first-time None→item doesn't trigger it, idempotent same-id doesn't trigger it, real re-assignment does), the new report bucket counts, and the auth_qb redirect-detection logic. Full suite: 118 tests green.
QB live-report bug fix
- What broke. v0.7.0 shipped
report_qb_top_itemsandreport_qb_top_customersagainst a code path that had never been exercised against a live QuickBooks token. The underlyingQBClient.get_reportcalledself._auth.session.get(url), butintuitlib.AuthClientdoesn't expose asessionattribute — every live call hit an AttributeError. Local reports were unaffected. - Fix. Replaced the broken call with
httpx.getusing the access token as a Bearer header. Added a one-shot 401 retry: if QB returns 401 (token expired between session start and the report call), refresh and retry once. After that, errors propagate. - Tests: 7 new tests in
tests/test_qb_reports.pycovering URL construction (sandbox + production), the happy path, parameter passthrough, the 401 retry-with-refresh path, no-retry on non-401 errors, and no-infinite-loop on a second 401. Full suite: 104 tests green. - Heads up. If you're stuck on the Intuit "couldn't connect" page when running
python -m qb_distributor_mcp.auth_qb: the script uses a localhost redirect, but Intuit refuses localhost and IP redirects on Production-flagged apps. For sandbox rotations, use Intuit's OAuth Playground at developer.intuit.com/app/developer/playground to mint a fresh refresh token, then paste it into~/.qb-distributor-mcp/.envand re-run the env-injection one-liner. Long-term fix is on the roadmap.
Ask-anything reporting + Cut 1 fix
- Eight new reporting tools. Ask Claude things like "what are my top 10 SKUs this month" or "which customers send us the most POs" and the connector calls a real report instead of guessing. Local rollups:
report_pos_processed,report_top_items,report_top_customers,report_match_quality,report_time_saved. QuickBooks pass-throughs:report_qb_top_items,report_qb_top_customers. Pluslist_reportsso Claude can pick the right one when you're vague. - Local + live, side by side. Local reports read your
drafts.sqliteandusage.sqlite— they reflect what the connector has actually processed, surfacing things QB doesn't track (review-flag rate, which customer formats trip the matcher, time spent). QB reports pull live via the QBO API so you can ask "what are my top customers by revenue YTD" without leaving chat. - Period vocab. All local reports accept a
period:all·today·7d·30d·mtd·ytd. - Honest framing.
report_time_savedtakes your minutes-per-PO and hourly-rate assumptions — the dollar number reflects what you plugged in, not a measured ROI. - Bug fix in Cut 1. The v0.5.0 clean gate referenced
matcher.catalogbut the matcher only had_catalog(private). Every price-variance check hit an AttributeError that the bare-except swallowed into a "price_variance_check_failed" reason — meaning the clean gate ALWAYS failed in production, even on perfectly clean drafts. Added a publiccatalogproperty + regression tests that prove the gate now returnsclean=Trueend-to-end and that the variance branch actually compares. - Tests: 22 new reporting tests + 3 regression tests on the clean gate. Full suite: 96 tests, all green.
Cut 2 — auto-reply draft to buyer (Gmail)
- New tool:
draft_reply_to_buyer(message_id, qb_estimate_id). After a PO is processed into a QuickBooks Estimate, the connector drafts a reply on the original Gmail thread referencing the QB Estimate number and the buyer's PO. The draft lands in your Gmail Drafts folder. You review and click send — we never send for you. - Threading is correct. Reply uses the original message's
Message-Idheader (viaIn-Reply-ToandReferences) and the same GmailthreadId, so Gmail collapses the conversation cleanly. Subject is auto-prefixed withRe:(and skips the prefix if the original already starts with "Re:"). - Template + override. Default template fills in the QB doc number, PO number, total, and a per-line summary automatically. Pass
sender_signaturefor the sign-off,salutationfor the greeting,include_lines=falseto skip the line summary, orcustom_body=...for full control over the body. - Lookup is forgiving. Pass the local
draft_idif you have it. Otherwise, the tool resolves the matching submitted draft byqb_estimate_id, then bymessage_id. Returns a clear error if no submitted draft matches. - Same conservatism as Cut 1. Draft only, never send. We never request the
gmail.sendscope — onlygmail.modify, which covers drafts.create. The buyer never sees anything until you click send in Gmail. - Tests: 26 new unit tests covering body rendering, subject prefixing, threading, all three lookup paths, the draft-not-submitted refusal, and the Gmail API failure path.
Cut 1 — auto-submit clean POs (opt-in)
- New tool:
auto_submit_if_clean(draft_id). When every line of a draft passes the clean gate, the connector submits to QuickBooks without further human review. Default: off. Enable by settingSIDEQUEST_AUTOSUBMIT=truein~/.qb-distributor-mcp/.envand re-running the env-injection one-liner. - The clean gate. A draft is "clean" only when: customer is linked to QuickBooks, every line has a real
qb_item_id, no line has a low-confidence review flag, no line price is more thanprice_variance_toleranceoff your catalog. Any failure returns{"status": "not_clean", "reasons": [...]}and the draft sits in your queue for manual review. - Conservative by design. Off by default. Even when enabled, returns "disabled" if the env var isn't set. An
override_clean_gateescape hatch exists for operator-reviewed exceptions, but is not the default path. - Tests: 22 unit tests covering the env-var gate, each clean-gate failure mode, and the structural branches (already-submitted, not-found, not-clean). Full suite passes — 59 tests green.
Self-healing error messages and self-serve install
- Connector: Every tool now returns a customer-friendly error message instead of a raw Python traceback. Errors include a likely fix and a link to the diagnose prompt. Recognized patterns: missing env vars, expired QuickBooks refresh token, Gmail OAuth token revoked, customer-not-linked on submit.
- Site: New Install prompt page. Paste one block into Claude Desktop and Claude becomes your implementation specialist. About 25 minutes start to finish, no docs reading required.
- Site: New Diagnose page. Paste one block when something's broken and Claude walks you through the troubleshooting playbook. Covers seven of the most common failure modes.
- Site: New Terminal basics guide for non-technical users. Five skills, three minutes to read, Mac and Windows side by side.
- Site: New PO Time Calculator showing the annual cost of manual PO entry with your inputs.
- Mobile: Full hamburger menu drawer on every page. Previously the nav hid everything but the CTA on phones.
- SEO: Sitemap namespace fix (was rejected by Google with a one-letter typo). Now showing Success with all pages discovered.
Working OAuth flow + zip rebuild
- Setup wizard: Rewrote cli.py for the actual working OAuth flow. QuickBooks now connects via Intuit's hosted OAuth Playground (the old wizard hung forever because Intuit silently rejects localhost redirect URIs).
- Gmail OAuth: Documented the "add second secret" workaround — Google's first download of client_secret.json is missing the actual secret value.
- Env injection: Setup now writes credentials directly into Claude Desktop's config env block, since the MCP server doesn't read .env from cwd.
- Docs: Full rewrite of quick-start, welcome kit, FAQ, and operator runbook to match the working flow.
Marketing site online
- Launched sidequestautomation.com with quick-start, pricing, demo, and welcome kit pages.
- Stripe payment links for Starter, Growth, Scale, Unlimited tiers.
- Free tier for the first 20 POs per month with no credit card required.
What's next
Honest roadmap. No vaporware.
- QuickBooks Desktop bridge. For distributors who run QB Desktop instead of QB Online. Targeting Q3 2026.
- Direct Shopify API integration. Skip the email step for ecommerce orders. Targeting Q3 2026.
- Microsoft 365 / Outlook reader. Same shape as the Gmail reader, but for distributors on Outlook. Design doc due first.
- Structured OCR confidence. Replace the single OCR confidence score with subscores for text legibility, table structure, and quantity-vs-price disambiguation so flagged-for-review lines get a more specific reason.