diff --git a/rag-agentic-dashboard/public/veridical-week6.html b/rag-agentic-dashboard/public/veridical-week6.html new file mode 100644 index 00000000..4c6baa5c --- /dev/null +++ b/rag-agentic-dashboard/public/veridical-week6.html @@ -0,0 +1,570 @@ + + +
+ + +Week 6 delivered the single largest accuracy improvement of the programme: Cohere Rerank v3 integration lifted retrieval accuracy from 88.2% to 92.5% (+4.3 pp) in production A/B testing, surpassing the 92% North Star target four weeks ahead of the Week 10 gate. The reranker was deployed via a 50/50 traffic split with automatic rollback criteria; no rollback was triggered. Budget remains well-controlled at $638K of $1.42M (44.9% consumed at 50% schedule completion). Operations department was onboarded, bringing the total pilot user base to 438 across five departments.
+ +| Track | Completion | Status | Week 6 Highlight |
|---|---|---|---|
| Infrastructure | 58% | GREEN | AKS reranker endpoint deployed; Pinecone 3.6M vectors; vector quantisation at 65% |
| Ingestion | 52% | GREEN | 1.06M docs indexed; 15,100 docs/hr (new high); Operations corpus integrated |
| Retrieval | 55% | GREEN | BREAKTHROUGH: 92.5% accuracy post-reranker; P95 1.21s |
| Governance | 42% | GREEN | ISO 42001 at 72%; reranker confidence scores in audit trail |
| Domain | Control | Treatment | Delta | Commentary |
|---|---|---|---|---|
| Legal | 85.5% | 90.8% | +5.3 pp | Highest lift — excels at multi-clause queries |
| Finance | 87.8% | 92.6% | +4.8 pp | Second-highest lift; financial reporting queries |
| Operations | 87.2% | 91.4% | +4.2 pp | New department; strong baseline from reranker |
| Compliance | 89.4% | 93.2% | +3.8 pp | Strong regulatory disambiguation |
| Engineering | 89.1% | 92.8% | +3.7 pp | Consistent improvement across tech docs |
| Category | Spent | Budget % | Commentary |
|---|---|---|---|
| Cloud Infrastructure | $212K | 48.2% | Reranker AKS endpoint +$8K/mo; quantisation offsets storage growth |
| Pinecone Vector DB | $89K | 42.1% | Storage costs stabilising via quantisation; query volume offset by efficiency |
| LLM API (OpenAI + Cohere) | $48K | 28.2% | Cohere Enterprise $1K/mo activated; GPT-4o-mini at 77% |
| Personnel | $258K | 46.0% | On track; 2 extra engineer-days for reranker sprint |
| Tooling & Licensing | $23K | 32.4% | Cohere license + monitoring upgrades for reranker observability |
| Contingency | $8K | 5.7% | Reserve healthy at $133K |
| Stage | P50 | P95 | WoW Change |
|---|---|---|---|
| Embedding | 41ms | 66ms | -1ms / -2ms |
| Vector Search | 82ms | 138ms | -3ms / -4ms (quantisation) |
| Reranker (NEW) | 38ms | 55ms | New — Cohere Rerank v3 |
| Generation | 615ms | 885ms | -5ms / -5ms |
| End-to-End | 810ms | 1.21s | +30ms / +70ms (reranker add) |
| Week | Milestone |
|---|---|
| Week 7 | Full corpus portability (3 vendors); vector quantisation 100%; Legal tuning sprint; semantic cache design |
| Week 8 | Semantic cache deployment (P95 0.85–0.95s, 62% hit rate); 1.2M corpus milestone |
| Week 9 | Legal multi-hop synthesis; domain-specific accuracy targets; provenance chain v2 |
| Week 10 | Golden Set accuracy gate (≥92% confirmed); go/no-go for production release |
| Week 12 | Full production release; SOC 2 Type II evidence package; programme retrospective |
The 92.5% accuracy achievement transforms Project Veridical from a technology implementation into a regulatory asset. As retrieval accuracy crosses the production threshold, the strategic question shifts from "Can the system answer correctly?" to "Can the system prove it answered correctly, and can we defend that proof under regulatory scrutiny?"
+This is the domain of algorithmic liability — the legal and regulatory framework governing accountability for AI-generated outputs in regulated industries.
+ +The Cohere Rerank v3 integration creates a three-layer audit trail that exceeds current regulatory requirements:
+Board implication: Embedding algorithmic liability protections now avoids $60–$100M retrofit when EU AI Act Art. 52 and SEC Rule 10b-5 enforcement begins, and positions Veridical as the de facto compliance standard — creating a regulatory moat competitors need 12–18 months to replicate.
+| Endpoint | Description |
|---|---|
| /api/veridical-week6 | Complete Week 6 report (all sections) |
| /api/veridical-week6/meta | Report metadata & classification |
| /api/veridical-week6/reasoning | Strategic reasoning (architect rationale) |
| /api/veridical-week6/health | Programme health & executive summary |
| /api/veridical-week6/metrics | Key metrics, cost breakdown, benchmarks |
| /api/veridical-week6/risks | Risk landscape (REI 0.09) |
| /api/veridical-week6/next-steps | Week 7 objectives, decisions, look-ahead |
| /api/veridical-week6/ab-test | A/B test results (Cohere v3 production) |
| /api/veridical-week6/visionary | Algorithmic Liability visionary theme |