Skip to content

Commit 33e6af8

Browse files
author
Douglas Jones
committed
dispatch: May 21 — scan perf, error hardening, gate infrastructure
1 parent f0983f1 commit 33e6af8

4 files changed

Lines changed: 239 additions & 0 deletions
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Quill Readout — 2026-05-21: Scan Performance, Error Hardening & Gate Infrastructure
2+
3+
## Session Summary
4+
5+
A field test exposed three production bugs. All three were diagnosed, fixed, and deployed in the same session. The team also built out the NFR/KPI gate infrastructure, hardened error handling across the entire codebase, and shipped two new admin dashboards.
6+
7+
---
8+
9+
## Part 1: Field Test — Five Scans, Five Failures
10+
11+
The user walked outside and scanned the sign in front of their house five times. All five failed. The backend was healthy — OCR confidence 0.997, sign matched, rules evaluated correctly. The failures were entirely client-side.
12+
13+
### Bug 1: Scan Confidence Gate (iOS Client)
14+
15+
**Root cause:** Every scan that produced a valid `inline_evaluation` also had `selected_match: null` in the server response. This was because `cacheableMatchSignId` was gated on `canSelectTopCandidate`, which was always false when `assessCaptureSuggestion` returned `canOpenSuggestedSign: false` — which it always does when `hasInlineEvaluation: true`. The iOS client's `compositeConfidence()` then computed `dbScore = 0.0`, yielding composite ≈ 0.45 — below the 0.80 gate — and returned `.lowConfidence` on every scan despite perfect OCR.
16+
17+
**Fix (server):** Decoupled `cacheableMatchSignId` from `canSelectTopCandidate`. `selected_match` now reflects "did we identify a sign?" independently of whether the UI should show a suggested sign button.
18+
19+
**Fix (client):** When `inline_evaluation` is present, treat `dbScore = 1.0`. The server already evaluated the rules — that is an implicit high-confidence match.
20+
21+
### Bug 2: Rule Engine Permit Exception (Backend)
22+
23+
**Root cause:** A commit had introduced `isPermitOnly` logic that treated any `no_parking` rule with `permit_types` as non-restrictive. This is semantically backwards: `permit_types` on a `no_parking` rule means "No Parking EXCEPT permit holders" — regular users are still restricted. The sign in front of the house has `permit_types: ["I"]` on its evening no-parking rule. The rule engine was returning "green" instead of "red."
24+
25+
**Fix:** Removed `isPermitOnly` entirely. `isRestricted` is simply whether the winning rule is `no_parking` or `no_standing`. This fixed 40 failing parity/rule-engine tests that had been silently broken.
26+
27+
### Bug 3: isComplexSign Over-Detection (Backend)
28+
29+
**Root cause:** The `isComplexSign` detection was flagging any sign with arrows (→, ←), multiple NO PARKING mentions, or permit exceptions as needing Gemini visual arrow detection. This routed every multi-panel sign — including the common 3-panel Raleigh R7-108 — to the 6-10s hybrid Gemini path instead of the ~1s client OCR fast path.
30+
31+
**Fix:** If Vision extracted complete text (length > 80 chars AND has time ranges), trust it regardless of complexity indicators. The arrow symbols in the extracted text are sufficient for panel parsing. Gemini only needs to see the image when the text is short or incomplete.
32+
33+
**Result:** First-scan latency drops from ~10s to ~3-4s on multi-panel signs. Subsequent scans (fast-match cache) remain ~1-2s.
34+
35+
---
36+
37+
## Part 2: Error Handling Audit — 29 Silent Catches Eliminated
38+
39+
A code review revealed 29 instances of `catch { return null; }` across the codebase. This pattern hides failures, makes debugging impossible, and was the proximate cause of the fast-match cache silently failing on Vercel cold starts — causing every scan to fall back to full OCR with no log evidence.
40+
41+
Every silent catch was classified and fixed:
42+
43+
| Category | Count | Fix |
44+
|----------|-------|-----|
45+
| JSON.parse on user/API input | 6 | `console.debug()` before returning 400/null |
46+
| Image processing fallbacks (sharp) | 4 | `console.warn()` before fallback |
47+
| Fingerprint computation | 2 | `console.warn()` — non-critical deduplication |
48+
| Edge Config unavailable | 2 | `console.debug()` — infrastructure optional |
49+
| OCR JSON parsing | 3 | `console.warn()` with raw response preview |
50+
| Supabase query failures | 4 | `console.error()` — on paths that matter |
51+
| Sign registry DB | 2 | `console.error()` — critical path |
52+
| Rate limit KV | 1 | `console.warn()` — documented fallback |
53+
| Misc (segments, nearby spots, cron) | 4 | `console.warn()` / `console.error()` |
54+
| **Approved silent** | 1 | `SILENT-CATCH-APPROVED: Winston``tryParseJson` retry loop |
55+
56+
The coding standards were updated in both DecodeTheSign and agentic-stage-gate-governance to codify the rule: every catch must log or rethrow. Silent catches require a `SILENT-CATCH-APPROVED` comment with architect name, date, and specific justification.
57+
58+
---
59+
60+
## Part 3: NFR & KPI Gate Infrastructure
61+
62+
### NFR Document Expanded
63+
`docs/NFRS.md` now has 30 NFRs across 6 categories (Performance, Accuracy, Reliability, Security, Accessibility, Scalability) with measurement methods, gate assignments, and an automated gate assertions table mapping 6 NFRs to passing CI tests.
64+
65+
### Admin Dashboards
66+
Two new admin dashboards deployed to decodethesign.com:
67+
68+
**`/admin/nfr-dashboard`** — Live pass/fail gate status for all NFRs and KPIs. Automated items (CI-backed) show PASS immediately. Manual items show PENDING with a checklist of what evidence is needed before G4 closes.
69+
70+
**`/admin/adoption`** — DAU/WAU/MAU, 7-day retention, engagement depth (median scans/user, power users), DAU/scans bar charts (30d), geographic spread, user level distribution, top contributors leaderboard.
71+
72+
### Governance Project Updated
73+
`agentic-stage-gate-governance` received:
74+
- `templates/NFR-TEMPLATE.md` — generic reusable NFR template with all 6 categories
75+
- `steering/05-nfr-kpi-mandate.md` — expanded with field-tested guidance on cold/warm latency tiers, parity tests, graceful degradation specificity
76+
- `steering/03-coding-standards.md` — error handling rule added to both projects
77+
78+
---
79+
80+
## Deployment Status
81+
82+
| Surface | Status |
83+
|---------|--------|
84+
| decodethesign.com | ✅ Deployed |
85+
| iPhone 16 Pro Max | ✅ Installed |
86+
| GitHub (main) | ✅ Pushed — HEAD `12e5275d` |
87+
88+
---
89+
90+
## What Quill Noticed
91+
92+
The session started with a field test failure and ended with a codebase that is measurably more observable. The three bugs were independent — a client confidence gate, a rule engine semantic error, and a latency regression — but they shared a common thread: silent failures. The confidence gate failed silently (no log). The rule engine returned wrong verdicts silently (no test caught it until today). The fast-match cache failed silently (swallowed by `catch { return null; }`).
93+
94+
The error handling audit was the right call. Twenty-nine silent catches is not a small number. It means twenty-nine places where the system could fail and nobody would know. The fix is not just the logging — it's the standard that prevents the next twenty-nine from being written.
95+
96+
The scan latency fix is also worth noting. The `isComplexSign` detection was written with good intent — route complex signs to better visual analysis. But it was too aggressive, and the cost was paid on every scan. The fix is precise: trust Vision when it did a good job, escalate to Gemini only when it didn't. That's the right tradeoff.
97+
98+
TestFlight is next.
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
glyph: quill
2+
kind: readout
3+
date: "2026-05-21"
4+
title: "Scan Performance, Error Hardening & Gate Infrastructure"
5+
scope:
6+
- decodethesign-web
7+
- decodethesign-ios
8+
- agentic-stage-gate-governance
9+
tags:
10+
- bug-fix
11+
- performance
12+
- error-handling
13+
- nfr
14+
- admin-dashboard
15+
- testflight-prep
16+
summary: >
17+
Field test exposed three production bugs: iOS confidence gate failing on every scan,
18+
rule engine returning wrong verdicts for permit-exception signs, and isComplexSign
19+
over-detection routing all multi-panel signs to 10s Gemini path. All three fixed and
20+
deployed. 29 silent catch blocks eliminated across 17 files — every catch now logs
21+
or rethrows. NFR/KPI gate infrastructure built: 30 NFRs documented, two admin
22+
dashboards deployed (/admin/nfr-dashboard, /admin/adoption), governance project
23+
updated with NFR template and error handling standard.
24+
artifacts:
25+
- path: decodethesign/app/api/consumer/live/[liveId]/capture/route.ts
26+
action: modified
27+
notes: "Decouple selected_match from canSelectTopCandidate; fix isComplexSign; log silent catches"
28+
- path: decodethesign/ios-native/DecodeTheSignPackage/Sources/DecodeTheSignFeature/Services/ScanResultParser.swift
29+
action: modified
30+
notes: "inline_evaluation present implies dbScore=1.0"
31+
- path: decodethesign/lib/rule-engine.ts
32+
action: modified
33+
notes: "Remove isPermitOnly — no_parking with permit_types is still red"
34+
- path: decodethesign/lib/consumer-capture-recent-match.ts
35+
action: modified
36+
notes: "Log fast-match query failures instead of swallowing"
37+
- path: decodethesign/lib/consumer-crowdsource-intake.ts
38+
action: modified
39+
notes: "Log sharp and fingerprint failures"
40+
- path: decodethesign/lib/sign-registry.ts
41+
action: modified
42+
notes: "Log DB query failures on critical path"
43+
- path: decodethesign/lib/rate-limit.ts
44+
action: modified
45+
notes: "Log KV unavailability"
46+
- path: decodethesign/lib/ocr/arrow-detect.ts
47+
action: modified
48+
notes: "Log Gemini JSON parse failures with raw preview"
49+
- path: decodethesign/lib/ocr/extract.ts
50+
action: modified
51+
notes: "Log tiebreaker failures; SILENT-CATCH-APPROVED for tryParseJson"
52+
- path: decodethesign/lib/ocr/prior-sign-hashes.ts
53+
action: modified
54+
notes: "Log Supabase query failure"
55+
- path: decodethesign/lib/ocr/reconsensus.ts
56+
action: modified
57+
notes: "Log hash computation failure"
58+
- path: decodethesign/lib/edge-config.ts
59+
action: modified
60+
notes: "Log Edge Config unavailability at debug level"
61+
- path: decodethesign/app/admin/nfr-dashboard/page.tsx
62+
action: created
63+
notes: "Live NFR & KPI gate dashboard"
64+
- path: decodethesign/app/admin/adoption/page.tsx
65+
action: created
66+
notes: "DAU/WAU/MAU, retention, engagement, geo spread"
67+
- path: decodethesign/docs/NFRS.md
68+
action: expanded
69+
notes: "30 NFRs, automated gate assertions table"
70+
- path: decodethesign/.kiro/steering/03-coding-standards.md
71+
action: modified
72+
notes: "Error handling rule: no silent catches"
73+
- path: agentic-stage-gate-governance/templates/NFR-TEMPLATE.md
74+
action: created
75+
notes: "Generic reusable NFR template"
76+
- path: agentic-stage-gate-governance/steering/03-coding-standards.md
77+
action: modified
78+
notes: "Error handling rule added"
79+
- path: agentic-stage-gate-governance/steering/05-nfr-kpi-mandate.md
80+
action: expanded
81+
notes: "Field-tested guidance on latency tiers, parity tests, graceful degradation"
82+
bugs_fixed:
83+
- id: scan-confidence-gate
84+
severity: critical
85+
description: "inline_evaluation present caused dbScore=0.0, composite 0.45 < 0.80 gate"
86+
- id: rule-engine-permit-types
87+
severity: critical
88+
description: "no_parking with permit_types returned green instead of red — 40 tests broken"
89+
- id: isComplexSign-over-detection
90+
severity: high
91+
description: "All multi-panel signs routed to 10s Gemini path instead of 1s client OCR"
92+
silent_catches_fixed: 29
93+
tests_passing: 1007
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Quill Session Close — 2026-05-21
2+
3+
## Session Outcome
4+
5+
Three production bugs diagnosed and fixed from a field test. 29 silent catch blocks eliminated. NFR/KPI gate infrastructure built. Two admin dashboards deployed. Error handling standard codified in both projects. All 1007 tests passing. Session state saved.
6+
7+
## Repositories Pushed
8+
9+
| Repo | HEAD | Status |
10+
|------|------|--------|
11+
| decodethesign | `12e5275d` | ✅ Pushed to origin/main + deployed to decodethesign.com |
12+
| agentic-stage-gate-governance | `7dd6074` | ✅ Pushed to origin/trunk |
13+
14+
## Commits This Session (decodethesign)
15+
16+
| Hash | Description |
17+
|------|-------------|
18+
| `518edaa4` | fix: scan confidence gate — inline_evaluation implies dbScore=1.0 |
19+
| `225be514` | fix(rule-engine): no_parking with permit_types is still red |
20+
| `8f1b3899` | feat(ios): share location via Messages, tell-a-friend unified, profile polish |
21+
| `225be514` | fix(rule-engine): permit_types |
22+
| `6b5d4548` | docs: expand NFRs with TestFlight gate |
23+
| `b2c4936f` | feat(admin): NFR & KPI dashboard |
24+
| `d1bfc789` | feat(admin): adoption dashboard |
25+
| `a1221304` | perf: fix isComplexSign over-detection |
26+
| `ddb34267` | standards: no silent catch |
27+
| `47673651` | fix: eliminate all silent catch blocks |
28+
| `12e5275d` | session: save state |
29+
30+
## Open Items for Next Session
31+
32+
1. **TestFlight submission** — Xcode Archive → App Store Connect → TestFlight
33+
2. **Manual NFR evidence** — 14 items pending on `/admin/nfr-dashboard`
34+
3. **More screenshots** — drop into `assets/`
35+
4. **Supabase type regeneration** — remove `ignoreBuildErrors: true`
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
glyph: quill
2+
kind: session-close
3+
date: "2026-05-21"
4+
title: "Session Close — Scan Perf, Error Hardening & Gate Infrastructure"
5+
outcome: success
6+
repos_pushed:
7+
- decodethesign
8+
- agentic-stage-gate-governance
9+
next_session:
10+
- TestFlight submission (Xcode Archive → App Store Connect)
11+
- Manual NFR evidence (14 items pending on /admin/nfr-dashboard)
12+
- More app screenshots
13+
- Supabase type regeneration

0 commit comments

Comments
 (0)