spec 015: pipeline convergence protocol (closes #239) by jeremymanning · Pull Request #250 · ContextLab/llmXive

jeremymanning · 2026-05-29T18:06:15Z

Summary

Implements spec 015 — Pipeline Convergence Protocol (issue #239). Replaces the legacy accumulated-review-points model (≥10 LLMs / ≥5 humans, 0.5/1.0 points) with a convergence-based gate: each reviewable stage runs identify → revise → re-review with its LLM panel and advances only on unanimous panel acceptance within a 3-round cap, else an adaptive kickback to the prior stage with full provenance. Human/personality reviews are advisory only and route through stage-aware triage.

Key behavior (selected FRs)

Convergence engine: R1 identify → R2 revise → R3 re-review; unanimous-acceptance gate; honest converged reporting (FR-016).
FR-012 selective re-review: dissenters always re-review; R1-accepters re-review only when R2 changed a lens-relevant artifact.
FR-011 reviser self-consistency: a second code-level audit call + one corrective re-pass, exception-guarded.
FR-048 living-document batched recompile: render Discussion → sha256 material-change → FR-054 sign-off gate → version DOI; cron auto-triggers but never auto-mints.
HF Inference-API backend removed — HF models run locally via transformers; backend chain is dartmouth → local.

Hardening in this PR

Fixed 2 latent finally: return bugs (implementer/publisher) that double-appended run-log entries on the skip path and swallowed re-raises.
Fixed a real NameError in agents/librarian.py (loop var/body mismatch on the marginal-fallback path), surfaced by the mypy pass.
Introduced LLMXIVE_REPO_ROOT repo-root override (centralized ~60 __file__ climbs) and de-rotted the Phase-3 real-call e2e so it runs hermetically against a synthetic repo (verified: real Specifier+Clarifier run, 95s).
(str, Enum) → StrEnum migration; mypy strict: 213 → 0; ruff check .: clean (repo-wide); offline suite 1232 passed.

Verification

ruff check . → All checks passed
mypy src/llmxive → 0 errors (154 files)
offline suite → 1232 passed, 1 skipped, 2 deselected
Phase-3 e2e (real-call) → passes (95s); prompts-check → OK (53 agents)

Note: part 7 of #239 (full sequential end-to-end pipeline run with per-step artifact-quality review) is in progress as follow-up work on this branch.

🤖 Generated with Claude Code

…+ review-model overhaul (#239) Comprehensive Spec Kit specification for umbrella issue #239, grounded in the 2026-05-27 design doc SSoT and a code-verified audit. Covers: the inode-table summarize/desummarize primitive (no silent loss of check-critical elements), the generic identify->revise->re-review convergence engine + adaptive kickback, removal of the point system for unanimous-panel acceptance + advisory triage, per-step ReviewSpec adapters across the whole research + paper track, reviewer calibration (9 domains, held-out generality), end-to-end traversal proof, living-document discussion board, and all 10 audit bug fixes + arXiv resilience. Three scope decisions resolved with maintainer up front (living-doc=full; point cutover=migrate-forward; overflow floor=inode-table pointers). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Five clarifications integrated into the spec (Clarifications + FRs/SCs/scenarios/assumptions): - Publish target: real public Zenodo/GitHub/site, but a MANDATORY manual maintainer sign-off before every DOI mint for the duration of this spec (new FR-054, SC-014; FR-036/FR-048 updated). - E2E coverage: all 9 domains traverse end-to-end to posted (FR-045, SC-007). - Calibration: differential clean-vs-injected test + manual adjudication + adaptive sensitivity tuning (no fixed over-flag % / K) (FR-042, FR-044, SC-005). - Kickback budget: NO global cap; monotonic-improvement-until-convergence; per-step 3-round cap retained (FR-017, edge case, assumptions). - Cutover: no posted/done projects exist -> migration applies to in-flight only (FR-025). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

plan.md (Constitution Check: points-removal + no-global-cap tracked as authorized deviations -> constitution amendment task), research.md (10 grounded technical decisions incl. inode-table summarizer format, engine-as-callables, adaptive kickback, manual DOI sign-off, differential calibration), data-model.md (pydantic entities), quickstart.md, and 6 contracts (summarize-api, convergence-engine, reviewspec-registry, review-intake-triage, kickback-record, publisher-signoff). CLAUDE.md SPECKIT ref -> 015. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Organized by user story (US1-US8) with Setup/Foundational/Polish. TDD + real-call + manual-QC tasks included per spec. Dependency chain: summarizer first -> engine -> bug fixes -> review model -> per-step panels -> calibration (9 domains) -> e2e to posted (9 domains, manual DOI sign-off) -> living-doc -> polish. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Closed 4 coverage/underspecification findings from /speckit-analyze (0 remain): - C1 (HIGH): FR-006 authoring-side overflow routing + paper twins -> T054-T057 - C2 (MED): FR-026 repository_hygiene line-count/gitignore -> T043 - U1 (MED): FR-053 convergence principle encoding -> T007 - U2 (LOW): FR-017 ProgressRecord emission -> T026 Constitution point-conflict (CRITICAL) resolved by explicit amendment task T007. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- T001: new package dirs (convergence/, calibration/, agents/prompts/panels/) - T002: STATUS.md living progress doc (FR-052) - T003: Stage.AWAITING_PUBLICATION_SIGNOFF; config CONVERGENCE_MAX_ROUNDS=3 + CONVERGENCE_PER_ROUND_BUDGET_SECONDS=600. Imports verified. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

New SSoT primitive src/llmxive/tools/summarize.py: summarize()/desummarize() with on-disk inode-table pointer hierarchy. Deterministic no-loss guarantee (URLs/DOIs/ arXiv/citations/FR-SC-task ids/numbers preserved verbatim; full content on disk, recursively paged in). 12 tests pass (7 edge cases + core no-loss + manifest contract + no-dangling-pointer); ruff + mypy clean. Remaining for US1: T009 real-call fidelity, T017 re-point paper_reviewer (SSoT), T018 real-call verification. See STATUS.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

_build_corpus_with_summaries now delegates context reduction to tools/summarize.summarize() (inode-table, no silent truncation), preserving the 1-arg summarize_fn contract + _cached_summarize memoization. Supersedes the old truncate-with-notice fallback (Const. I SSoT). Updated the 2 coupled unit tests to the new behavior (full source recoverable via desummarize); _chunk_corpus + its 3 tests untouched. 24 paper_reviewer + 12 summarizer tests pass; mypy-clean for the changed function. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

tests/real_call/test_summarize_fidelity.py: real qwen3.5-122b summarize_fn over an over-budget doc; desummarize recovers EVERY critical element verbatim (no loss through a real-LLM reduction). PASSED in 334s. US1 (summarizer) fully done & verified: 12 offline + 1 real-call, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tion (#239) - T004/T005: convergence/types.py — Severity (ordered + legacy mapping) and the Concern/ConcernResponse/Verdict/ProgressRecord/ConvergenceResult/KickbackRecord/ TriageRecord pydantic models + Reviewer/Reviser Protocols + ReviewSpec dataclass. - T006: tests/contract/test_convergence_types.py (7 pass; ruff + mypy clean). - T007: constitution -> v1.1.0; added Principle VI (Convergent Review, NON-NEGOTIABLE), replaced the point-based Review-thresholds gate with unanimous-panel convergence + advisory triage, Sync Impact Report updated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

convergence/engine.py run_convergence: identify->revise->re-review loop with honest converged reporting (FR-016), 3-round cap, self-review/producer exclusion + stale-never-passes (FR-018), per-round wall-clock budget (FR-013), and overflow inputs routed through tools/summarize (FR-006). convergence/kickback.py route_kickback (adaptive worst-severity->stage, full-provenance KickbackRecord) + progress_record (FR-017). 15 unit tests pass; ruff + mypy clean. US2 remaining (coupled to US4/US3): T021 real-project integration, T025 advancement.py _produced_by stub, T027 tasker Mode-A/B refactor into the engine. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Addressed the tech debt I had flagged (per "fix issues as you notice them"): - types-PyYAML dev dep -> yaml stubs resolve under `python -m mypy` (clears yaml errors codebase-wide). - ReviewRecord.score: invalid Literal[float] -> float + field_validator (PEP 586; identical {0.0,0.5,1.0} constraint). - paper_reviewer: list[dict]->list[dict[str,Any]]; text coerced to str. - removed 2 unused PaperReviewerAgent imports in test. - FIX: T003 added Stage.AWAITING_PUBLICATION_SIGNOFF but not the project-state schema enum -> contract test failed; added it (single SSoT schema). - FIX: T001 panels dir was under src/llmxive/agents/prompts/ but prompts live at repo-root agents/prompts/ -> relocated; corrected 7 path refs in tasks.md. Finding (STATUS.md): project does NOT gate on ruff/mypy (no config, no CI step; gates = pytest + checks.*). ~273 legacy mypy errors are pre-existing, out of #239. Focused regression: 92 passed (all contract + score/paper_reviewer/convergence). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…239) New agents/prompts/implementer_research.md: instructs the research speckit implementer to emit the artifacts/verdict YAML the parser expects (write real runnable code/data, no stubs/diffs, fail-loud verdicts). implement_cmd.py now renders it instead of the paper-revision LaTeX implementer.md (which stays for the separate paper-revision agent). Also fixed 2 pre-existing ruff nits in implement_cmd.py (I001 import sort, F541) since I touched the file. tests/integration/test_audit_bugfixes.py verifies the fix (2 pass). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

theoremsearch.search() now retries transient failures (429/500/502/503/504 + RequestException/timeout) with exponential backoff (MAX_TRANSIENT_RETRIES=3), then degrades via TransientBackendError (the librarian wrapper already treats that as "optional source unavailable"). Non-transient 4xx are not retried. retry_backoff_base_seconds is injectable (tests pass 0). 4 unit tests; ruff+mypy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

#239) Full offline suite verified green: tests/contract + 599 tests/unit (7.45s) + real-call summarize_fidelity. Flagged pre-existing live-PDF test in tests/unit (not CI-gated, hangs offline) for separate gating. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…yze (#239) Discrepancy #4 fix: ANALYZE_SYSTEM_PROMPT_PATH was defined but unused (inline prompt hardcoded; paper reused research tasker.md). Now there are TWO real analyze prompts that ARE used via render_prompt: - agents/prompts/analyze.md (research): requirements_coverage / internal_consistency / testability / scope / constitution_alignment lenses (same vocabulary as the US4 Tasks panel). - agents/prompts/paper_analyze.md (paper): reader_scenario_coverage / claims_supported / required_sections_figures / scope_vs_research / internal_consistency / constitution_alignment. run_analyze() gains kind={"research","paper"} + constitution_text kwargs. paper_tasks_cmd passes kind="paper" + paper constitution; tasks_cmd passes research constitution (FR-030: constitution is a standard analyze input from `specified` onward). 6 audit-bugfix tests + 38 phase4 integration tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

clarifier.attempts_so_far was hardcoded 0 (escalation unreachable) and paper_clarifier never branched on verdict=escalate AND silently substituted a "Resolved by default" stub on missing patches — a no-silent-shortcuts violation. Fixes: - New shared _clarify_attempts.py: persists per-project attempt count under .specify/memory/clarifier_attempts.yaml; bump/read/reset + write_human_input_needed. - Both clarifiers now read REAL attempts and pass them to the prompt. - Both branch on verdict=escalate -> write human_input_needed.yaml + raise. - Both escalate at TASKER_MAX_REVISION_ROUNDS (=5) -> write human_input_needed.yaml + raise. - paper_clarifier no longer substitutes the silent "Resolved by default" stub (matches research clarifier's loud failure behavior). - Also removed 2 pre-existing F841 dead locals in clarify_cmd._spec_path. 29 tests pass (audit + phase3 integration); ruff clean for touched files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…239) paper_specifier.md advertised `code_summary` / `data_summary` inputs that the code never supplied (silent drift between prompt and reality). paper_specify_cmd now injects both blocks into the user message, reusing research_reviewer's _summarize_tree() as the SSoT tree-summary helper — Const. I (share, don't fork). The advertised inputs ARE now present, grounding the paper-spec generation in the project's actual code/ and data/ trees. 11 audit-bugfix tests pass; ruff clean for touched files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…R-054) (#239) Discrepancy #2 fix (FR-036): graph._decide_next_stage no longer shortcuts PAPER_ACCEPTED -> POSTED. It now routes paper_accepted -> AWAITING_PUBLICATION_SIGNOFF, then AWAITING_PUBLICATION_SIGNOFF -> POSTED ONLY when the maintainer sign-off record exists. The PaperPublisher itself enforces the same gate (defense-in-depth) — at PAPER_ACCEPTED or AWAITING_PUBLICATION_SIGNOFF with NO signoff record it SKIPs with a clear "awaiting manual maintainer DOI sign-off (FR-054)" reason. No Zenodo DOI is minted without recorded approval. New surface: - src/llmxive/speckit/_publication_signoff.py: read/write/has/clear_signoff persistence under <project>/.specify/memory/publication_signoff.yaml; FR-054 who/when/what record (kinds "initial" / "version"). - `llmxive project publish-approve <PROJ-ID> --who X --what Y [--kind initial|version]` CLI command writes the sign-off record. - 6 new audit-bugfix tests + 27 publisher/graph regression tests pass. Also fixed 38 pre-existing ruff issues in touched files (auto-fix). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Discrepancy #7 fix (FR-018): advancement._produced_by was a stub returning None. It now scans state/run-log/<YYYY-MM>/*.jsonl for the latest entry whose outputs list contains the artifact path and returns that entry's agent_name. Exact + suffix path matching tolerates relative-vs-absolute bookkeeping. A repo_root kwarg keeps the production call (no repo_root) working while making tests hermetic. Defensive: returns None on missing run-log instead of raising. T029: the audit-bugfix test file (now 18 tests) verifies T030/T031/T032/T033/ T034/T035/T025 fixes. 38 tests pass (audit + advancement regression). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…to US3 (#239) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

New convergence/triage.py — stage-aware triage for submitted human + simulated- personality reviews. Three filters: quality (length + evidence-indicator regex sweep — FR/SC/T ids, citations, URLs, DOIs, quoted phrases, code fences, scientific topic vocab), safety + on-topic (rule-based stop-list + stage/lens vocabulary overlap), and aspect-mapping to LLM reviewer lenses (preserved but mapped_lenses=[] when no match -> routes to the step's generic reviewer per FR-022). Injectable judge_fn for the real-LLM path (US4 wiring); rule-based default keeps unit tests offline. tests/integration/test_triage.py: 8 tests covering quality pass/fail, safety exclusion, off-topic exclusion, lens mapping, unmapped-but-preserved, record provenance, and the judge_fn injection override. All pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

) Rewrote the user-facing status-model descriptions in README + web/index.html + docs/index.html (HTML mirror copy) to convergence semantics: identify -> revise -> re-review; unanimous panel acceptance within a 3-round cap; advisory triage for human + simulated-personality reviews; no accumulated points. Replaces 6 stale "points threshold" / "Human reviews count double" passages. status_reporter.py + repository_hygiene.py needed no change for the new status model — their FR-026 duties (projects.json regen, GitHub issue comment/close on POSTED, line-count delta, gitignore assertions) are not point-dependent and remain in force unchanged. The points_research_total / points_paper_total fields the web JS displays will be removed in a follow-up (part of T041 point-system removal). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…239) Discrepancy #9 + Const. I cleanup: the accumulated review-point system is gone from the advancement decision path. Unanimous LLM-panel acceptance is now the sole gate everywhere (research + paper both). advancement.py: - Research-review gate no longer reads `accept_total` / `RESEARCH_ACCEPT_THRESHOLD`. It now uses `_all_specialists_accept(records, required)` with a defensive backstop (require ≥1 accept AND zero non-accept records when the registry isn't loaded) — mirroring the paper-side default. - Paper-review gate's `_award_review_points` call removed (the all-specialists- accept-most-recent check was already the real decision). - `_award_review_points` definition DELETED (no remaining callers). - `RESEARCH_ACCEPT_THRESHOLD` import dropped; replaced with an FR-019 comment. config.py: - `RESEARCH_ACCEPT_THRESHOLD` and `PAPER_ACCEPT_THRESHOLD` constants kept for back-compat with `web/about.html` mirror consumers, but VALUES set to 0.0 and no advancement code reads them. T038 tests (`tests/integration/test_no_points.py`, 3 tests): grep guard + behavioral assertion that no-accept records cannot trip the gate. T044: per clarify Q3 there are no posted/done projects to grandfather; the gate change applies on next tick automatically — no data-migration logic needed. Broad regression: 784 passed, 1 skipped (was 781 — three new T038 tests added). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

src/llmxive/convergence/reviewspecs.py: reviewspec_for(stage) -> ReviewSpec | None. 9 stage entries (idea + 4 research + 4 paper) matching contracts/reviewspec- registry.md; EXEMPT_STAGES frozenset of 7 mechanical steps. Constitution input is True for every spec from `specified` onward (FR-030); idea-stage opts out (no constitution yet). Kickback routing per the contract's worst-severity -> prior-stage table. Stages whose panel prompts (T049-T053) or wiring (T054-T059) haven't landed yet get _TodoReviewer / _TodoReviser placeholders that conform to the Protocol but raise NotImplementedError with a clear pointer to the follow-up task -- fail-loud SSoT structure, no silent empty verdicts. 15 contract tests pass; ruff clean. Also marked T060 (constitution-as-analyze-input, done in T031) and T061 (publisher wired into graph, done in T035) as already complete. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

US4 panel-prompt authoring: 27 lens prompts + 1 SSoT shared block + a contract test that catches future registry/file-name drift. agents/prompts/_shared/panel_review_block.md - SSoT (Constitution Principle I) for the panel R1/R3 output contract. Severity vocabulary matches the spec-015 Severity enum (trivial → fatal); identify and re-review phases both defined. agents/prompts/panels/ — 27 files total T049: panel_idea_{rq_validity,novelty,feasibility,idea_quality}.md T050: panel_spec_{requirements_coverage,internal_consistency,testability,scope}.md T051: panel_plan_{methodology,spec_coverage,data_resources,consistency}.md T052: panel_tasks_{coverage,ordering,executability,constraint_preservation}.md T053: panel_paper_spec_* (4) + panel_paper_plan_* (3) + panel_paper_tasks_* (4) Each per-lens file is thin: lens + scope ("what NOT to flag") + inputs (constitution from `specified` onward per FR-030) + per-severity-class guidance + reference to the SSoT block. T054-T059 wiring will concatenate lens-prompt + SSoT-block at render time. tests/contract/test_panel_prompts.py (16 tests) - Every lens in the ReviewSpec registry resolves to a real prompt file. - Every panel file references the SSoT block (Principle I drift guard). - Every panel file has `## Lens` and `## Output format` sections. - Reuse-stages (research_review/paper_review) map to existing specialist files, with the _research/_paper suffix convention preserved. - The SSoT block enumerates every Severity enum value + defines R1 and R3. Tech debt fixed inline (surfaced by ruff+mypy installation in venv): - reviewspecs.py: _todo_reviewers now returns list[Reviewer] (list is invariant). Removed an unused `# type: ignore`. - triage.py: JudgeFn return-type narrowed to dict[str, object]; the mapped_lenses access narrowed with isinstance(list|tuple) at the callsite — honest about the contract boundary rather than ignore. Verification: - ruff check src/llmxive/convergence + summarize.py: All checks passed - mypy src/llmxive/convergence + summarize.py: 0 errors (7 source files) - pytest tests/contract: 43 passed - pytest 4 conv-related unit files: 27 passed - pytest 3 spec-015 integration files: 29 passed - llmxive.checks.prompts: OK (53 agents) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Spec convergence unit: the new SpecReviser implements the Reviser Protocol and folds BOTH `[NEEDS CLARIFICATION]` marker resolution AND every panel concern into ONE LLM round. This is the spec-015 "collapse" — the previous two-step author + refine flow becomes one R2 call that produces a fully- revised spec.md plus a per-concern change-log. src/llmxive/convergence/revisers/spec_reviser.py - `SpecReviser` class (Reviser-protocol-conformant): constructed with (backend, repo_root, project_id, model?, token_budget?, cache_dir?). - `.revise(artifacts, concerns)`: - Picks the spec.md artifact (suffix match; excludes paper-side spec). - Gathers idea text from artifacts (`idea/` keys). - Overflow routing (FR-006): when bundle approx-tokens > budget, routes idea + comments_block through `tools.summarize.summarize` with a preservation goal that pins FR/SC ids verbatim. spec.md itself is NEVER summarized — the reviser must see what it's editing. - Composes a system (clarifier.md SSoT) + user (current spec + concerns + remaining markers + comments) prompt asking for ONE JSON document with `new_spec_md` + `responses[]`. - Honest failure modes: missing `new_spec_md` raises; non-JSON raises; fewer responses than concerns → padded with `<missing>` entries (Constitution Principle II: no silent omission). - `_scan_markers` + `_strip_json_fences` helpers (testable in isolation). src/llmxive/convergence/revisers/__init__.py - Package docstring documenting the build_*_reviewspec pattern. src/llmxive/convergence/reviewspecs.py - New `build_spec_reviewspec(backend=, repo_root=, project_id=, model=?)` returns a LIVE ReviewSpec for the spec stage with the SpecReviser bound as `.reviser`. Static `reviewspec_for("clarified")` still returns the TodoReviser placeholder; the build_* path is the live wiring (T058 will add reviewer-side wiring for the panel). - Local import of SpecReviser keeps the static-registry import graph clean for callers that never touch the live path. tests/integration/test_spec_reviser.py (8 tests) - `_scan_markers` handles bracket + bold marker forms; returns empty on clean specs. - `_strip_json_fences` handles fenced + bare JSON. - End-to-end revise: backend called with system+user; new spec text written; markers resolved; ConcernResponse per concern. - Padded missing responses: backend omits one concern → `<missing>` marker preserved (honest no-silent-omission). - Missing `new_spec_md` → RuntimeError. - Non-JSON reply → RuntimeError. - No spec.md in artifacts → ValueError (engine misuse). Verification - ruff check src/llmxive/convergence + tests: All checks passed - mypy src/llmxive/convergence + summarize.py: 0 errors (9 source files) - pytest tests/integration/test_spec_reviser.py + tests/contract: 51 passed - pytest broader unit + integration suite: 52 passed (no regressions) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…nalyze) Research verified live: math.pi/e/tau, golden ratio, scipy.constants CODATA (c/h/G), and sympy evaluation of every spec example (1+2=3, 1>2=False, identity, 5 km=5000 m, round(pi,2)=3.14). 29 tasks; analyze clean after one fix-loop (decouple US1 approximate from the US2 constants channel; mixed-claim routing; constants top authority rank). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…oximate comparator (T001-T008) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…13-T017) US2 (T013/T014): fill/channels/constants.py wraps verify.constants into a zero-network FetchedSource; AUTHORITY["constants"]=0 (top rank); constants added to NUMERIC channel list in channels_for; wired in _get_channel. US5 (T016/T017): compute.py fully replaces the NotImplementedError placeholder with evaluate() (sympy parse_expr, no eval/exec; arithmetic, comparisons, percentages, unit conversions, algebraic identities), extract_expression() (deterministic regex for backend=None), and verify_computational() returning ComputeVerdict. Reuses approximate.is_valid_rounding for real-valued results. Also fixes test_fill_wikidata_parse.py authority assertion to use AUTHORITY lookup instead of hardcoded 1 (broken by the authority re-ranking). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…009-T012,T018-T019) - resolve.py: mode router before kind dispatch (LLMXIVE_CLAIM_FILL=1); approximate path uses _extract_constant_from_text + verify.approximate for precision compare + correction; computational path uses verify_computational (sympy); RESULT-kind never goes to compute; not_evaluable/no-constant falls through to existing kind dispatch unchanged - fill/extract.py: present_in_source mode-aware for constants channel only (decimal values, never bare integers — FR-003 exact-count gate untouched) - tests/integration/test_verify_approximate_wireup.py: pi 3.14→VERIFIED, pi 3.15→corrected, knot count→exact route, 1+1=2→compute, 1+2=1→corrected (T011) - tests/real_call/test_verify_pi_e_real.py: pi/e zero-network constants path (T012) - tests/real_call/test_compute_real.py: arithmetic/comparison/pct/unit-conv via sympy (T019) - tasks.md: T009/T010/T011/T012/T018/T019 marked [X] Offline: 1838 passed (+5 vs 1833 baseline), 0 failures. Real-call: 11 passed (0.30s, zero HTTP for constants path). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- _FILLABLE_KINDS now includes MAGNITUDE and RELATIONAL (fill/service.py) - channels_for(MAGNITUDE/RELATIONAL) → [wikidata, wikipedia, paper] (fill/channels/__init__.py) - resolve_magnitude/resolve_relational wire _maybe_fill at NEI/REFUTED sites (claims/resolve.py) - present_in_source uses entity-name check for MAGNITUDE/RELATIONAL (fill/extract.py) - subject_query extracts entity name for RELATIONAL, category for MAGNITUDE (fill/subject_query.py) - wikidata channel resolves referenced Q-IDs to labels (e.g. P36→Canberra) (fill/channels/wikidata.py) - wikidata channel scans 60 P-claims (up from 20) to reach P36 at position 28 (fill/channels/wikidata.py) - _chat_reasoning_safe always passes model kwarg to satisfy DartmouthBackend (fill/extract.py) - Updated spec-017 deferral tests to reflect spec-018 enablement (test_fill_service_blocks.py, test_fill_service_logic.py, test_fill_conflict.py) - New integration tests T021 + T024 (test_fill_magnitude_wireup.py, test_fill_relational_wireup.py) - New real-call tests T022 + T025 (test_fill_superlative_real.py, test_fill_relational_real.py) - T020-T025 marked [X] in specs/018/tasks.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…start (T026/T027/T029) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…2e no-regress; all 29 tasks done Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremymanning · 2026-05-30T22:24:00Z

Update — claim-trustworthiness stack added (specs 016 → 017 → 018)

Beyond the spec-015 convergence protocol, this branch now adds a three-layer system that closes the fabricated-facts gap surfaced in the Part-7 shakeout (the spec reviser inventing "27,635 prime knots at 13 crossings"; correct = 9,988). All verified with real models/sources (no mocks); full offline gate 1858 passed, 0 regressions.

Spec 016 — Claim-Verification Layer (detective)

Every check-worthy claim a doc-producing agent writes is extracted → registered → substituted with a pointer → resolved (external source or harness-signed execution receipt) → rendered from the verified value. Unresolved claims hard-block via the unified [UNRESOLVED-CLAIM:] marker and auto-route for re-resolution (no routine human input). Receipts are HMAC-signed by the harness; an LLM can never mint or alter one. Verified live: 27,635 fabrication blocked; receipt forgery rejected.

Spec 017 — Authoritative-Fill (constructive)

When an external claim can't be verified as written, the layer searches authoritative sources (OEIS b-file via the Wikipedia→A-number bridge, Wikipedia, Wikidata, papers, theorem search), extracts the correct value, verifies it is literally present in a fetched source (never model memory), substitutes it, and repairs the citation. Verified live, end-to-end through the real chokepoint: 27,635 → sourced 9,988 (OEIS A002863); capital of Australia Sydney → Canberra; unsourceable claims stay blocked.

Spec 018 — Per-Claim Verification Modes

The verifier picks a mode per claim:

exact-count (literal; the 9,988 path — unchanged, no regression)
approximate-constant — precision-aware rounding vs a library-backed constants table (math + scipy.constants CODATA, zero-network): "π is 3.14"/"about 3" verify; "π is 3.15" → corrected 3.14
computational — safe sympy evaluation (no eval/exec): "1 plus 2 is 1" → REFUTED→3, "1 is larger than 2" → REFUTED, unit conversions, algebraic identities — the evaluator computes, never the LLM
source-fact (016/017)

Plus the 017 fast-follow: magnitude/superlative ("largest planet is Saturn" → Jupiter) and set/relational fills.

Each spec went through the full speckit pipeline (specify → clarify → plan → tasks → analyze → implement → verify). New dependency: sympy (free/open-source). Commits 74abda95…aec34068.

…o longer drops every claim Part-7 finding (PROJ-552 spec stage): the extraction model emits a verbatim claim_text containing an embedded double-quoted paper title (e.g. "A Census of Knots."). That broke yaml.safe_load, _parse_extraction_reply returned [], so NO claims were extracted — a silent fabrication passthrough that let the wrong 27,635 prime-knot count survive un-flagged-and-un-filled (panel kicked back to a human instead of the fill layer correcting 27,635 -> 9,988 from OEIS A002863). General fix (applies to every project, not just PROJ-552): - _tolerant_parse_claims: line-oriented recovery parser that scans for the known field keys and takes the line remainder as the value (one outer quote pair stripped), robust to embedded quotes/colons. _parse_extraction_reply falls back to it on any YAML failure OR an empty strict-parse result. - prompts/claim_extraction.md: explicit quoting rules (single line per field; no raw embedded double-quotes — use single quotes for inner marks) to reduce malformed YAML at the source. - 6 new offline regression tests reproducing the exact embedded-quote failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… 016-018) Ran ruff --fix (import sorting, unused imports, __all__/datetime/Optional modernization) and hand-fixed the remaining non-autofixable findings: RUF005 (iterable unpacking), B007/RUF059/F841 (unused loop/unpack/locals renamed to _ or removed as dead code), E402 (moved always-available imports above pytestmark), RUF002/RUF003 (non-semantic en-dashes -> ASCII hyphen). str+Enum classes keep the mixin (UP042 noqa) to preserve str() repr. No runtime behavior change. ruff check . clean; mypy src/llmxive unchanged (same 63 pre-existing errors); offline gate 1864 passed / 10 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Annotation-only cleanup — no runtime behavior changes: - Add generic params (dict[str, Any], list[str], Callable[..., Verdict], re.Pattern[str], re.Match[str]) across verify/, fill/, results/, state/, claims/. - Add missing function param/return annotations (backend: Any, Iterator[Path], dict[str, Any] returns, channel callable, __getattr__ -> Any). - Widen repo_root passthrough to str | Path | None in select_mode / verify_computational (fixes [arg-type] without changing the forwarded value). - Make Any returns explicit via bool() wraps and str(m.group(0)). - Add [mypy-scipy.*] and [mypy-sympy.*] ignore_missing_imports sections. mypy src/llmxive: Success (0 errors); ruff clean; offline gate 1864 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s missing Part-7 finding (PROJ-552 spec stage): a single reviewer (scope_fidelity) emitted an opening `---` + valid metadata + concerns but NO closing `---` delimiter (the reasoning model ended after concerns: / the endpoint hung mid-response). The strict both-delimiters regex (^---\n(.*?)\n---) matched nothing, so _parse_response raised RuntimeError "no YAML frontmatter" — crashing the ENTIRE spec panel/run instead of degrading gracefully. General fix (every reviewer, every stage): - _extract_frontmatter: recovers the YAML frontmatter in three shapes — proper both-delimiters (fast path), opening + later doc-boundary (---/...), and opening with NO closing delimiter (take the longest leading line-block that still parses to a non-empty YAML mapping, dropping any unfenced trailing prose). Only a response with no opening --- at all is rejected. - panel_review_block.md: explicit instruction to always emit BOTH delimiters and start at column 0 with no leading blank lines. - 8 new offline tests incl. the exact PROJ-552 failure shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…pping Deeper root cause of the PROJ-552 reviewer crash (commit 1c8451e handled the missing closing `---`, but the run still died identically). _parse_response stripped a ```yaml fence BEFORE extracting frontmatter — and a reviewer's prose body routinely contains a fenced YAML/code example. With no closing `---` on the frontmatter, _CODE_FENCE_RE.search hijacked `candidate` to the prose example's contents (e.g. "foo: bar"), which has no `---`, so extraction returned None and the whole panel/run crashed. Fix: run _extract_frontmatter on the RAW stripped response FIRST; only fall back to unwrapping a ```yaml fence when the raw response has no recoverable frontmatter (the wholly-fence-wrapped case). +4 tests: fenced example in prose (the exact crash), whole-response fence wrap, and proper-delims-with-fenced-prose no-hijack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The claim layer was not idempotent across convergence rounds. Two root causes are fixed here: A. Prose-preserving render (claims/pointer.py). A {{claim:<id>}} pointer stands in for the claim's FULL raw_text sentence, but render() replaced it with the BARE resolved_value — collapsing a whole sentence to a number (PROJ-552 garble). render() now reconstructs the prose: for a VERIFIED NUMERIC/RESULT claim it swaps ONLY the asserted numeric token (selected by thousands-separator/idempotency heuristic) for resolved_value; an already-correct assertion is returned byte-for-byte unchanged. ENTITY_FACT/ RELATIONAL/MAGNITUDE leave prose intact unless the object span is locatable. A non-verified claim now PRESERVES the prose and APPENDS one inline [UNRESOLVED-CLAIM:] marker instead of replacing the sentence with a marker. B. Idempotent extraction (claims/gate.py strip_claim_artifacts + service.py). process_document now strips prior-round [UNRESOLVED-CLAIM:] markers and stray {{claim:<id>}} pointers BEFORE extraction, so the layer no longer re-extracts its own marker bodies as new claims or accumulates markers. New render contract: a VERIFIED pointer renders the claim's sentence with only the asserted token swapped for the verified value (idempotent); a non-verified pointer renders the sentence followed by one [UNRESOLVED-CLAIM:] marker. Updated the chokepoint test's _make_claim fixture (raw_text now carries a numeric token) — its old "some number"/resolved="9988" pair encoded the OLD bare-value-substitution contract, which the prose-preserving render supersedes. Added 13 tests (pointer prose-preservation incl. the exact PROJ-552 garble + idempotency; gate strip_claim_artifacts; process_document double-run stability). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…shared parser C. Drop the redundant F-19 reviser pass (convergence/revisers/_self_consistency.py). The reviser chokepoint ran F-19 _ground_factual_claims THEN 016 _verify_claims on its output; 016 then re-extracted F-19's inline [UNVERIFIED] marker reasons as new "claims" and the two text-mutation models fought (PROJ-552 root cause 2). _clean_citations now runs ONLY _verify_claims (spec 016 is the SSoT). _ground_factual_claims remains defined for other importers but no longer runs in the chokepoint; docstring updated. The grounding *service* (used BY 016 resolvers) is untouched. D. Extraction precision (claims/extract.py). A purely promotional "standing" statement (well-established / peer-reviewed / community-standard / widely-used / well-known / established-reference / gold-standard) with no crisp checkable core is now dropped — it cannot be substantiated and otherwise left a residual [UNRESOLVED-CLAIM:] marker that blocked convergence (root cause 5). A statement with a salient NUMBER or explicit citation still passes. E. Shared tolerant parser (claims/extract.py + agents/grounding_guard.py). The grounding guard had a SECOND YAML claim parser with the same embedded-quote fragility already fixed in claims/extract — exactly why the bug recurred. The tolerant field-recovery is now a shared tolerant_field_entries(); the grounding guard's _parse_extraction_reply falls back to it on YAML failure or no usable claims, recovering an embedded-quote cited claim ("A Census of Knots.") instead of silently dropping every claim. Strict-path behavior preserved. Added 4 tests (F-19-not-invoked-in-reviser-path; promotional-statement drop + number/citation survival; grounding-guard embedded-quote recovery). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…actually route Part-7 finding (PROJ-552): a project that kicks back at a doc-stage convergence panel was STUCK re-running that stage forever and never escalating. Root cause: a non-converging panel writes convergence_kickback.yaml then RAISES StagePanelKickback from inside the agent run. That raise propagated straight out of run_one_step (caught only by the CLI as a FAIL), so _decide_next_stage — the ONLY place consume_convergence_kickback runs — was never reached. The sentinel was never consumed, current_stage never advanced, and the per-stage kickback cap never incremented (so the 3-strikes→human escalation never fired either). The adaptive-kickback resilience (F-20 Part B) worked in its unit tests but was dead in the real `llmxive run` path because every test exercised _decide_next_stage in isolation, never the raise-through-run_one_step seam. Fix: run_one_step now catches StagePanelKickback (controlled non-convergence) and StagePanelEscalation (engine failure) around the speckit agent call and falls through to _decide_next_stage, which consumes the sentinel and routes the project to the content stage (or to HUMAN_INPUT_NEEDED at the cap / on engine failure) instead of crashing the run loop. +2 regression tests exercising the REAL run_one_step exception handling + real _decide_next_stage/_kickback routing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Part-7 finding (PROJ-552 plan stage): the planner emitted a contracts/knot_record.schema.yaml whose body contained a `---` document separator (a second YAML doc at line 115). The FR-007 guard (_research_guard.assert_data_model_contracts_consistent) parses each schema with yaml.safe_load (single-document) and correctly rejected it as invalid — but the rejection hard-crashes the plan stage (plan_cmd unlinks all artifacts + re-raises → CLI FAIL), stranding the project at `clarified`. This commit addresses the TRIGGER: the planner prompt now explicitly forbids an internal `---` separator and tells the model to emit a separate `` block per schema. NOTE (deeper robustness gap, tracked separately in notes/spec-015-review-status): the deterministic plan guards (FR-005/006/007) fail-closed by raising, with NO revision loop — a malformed planner artifact strands the project instead of driving a bounded planner re-run with the guard feedback (the identify→revise philosophy of #239). That fix is a careful plan-stage flow change for a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…/ChunkedEncoding) Part-7 finding (PROJ-552 plan stage, run #12): the run crashed with a raw ('Connection broken: IncompleteRead(77671 bytes read)', ...) — the flaky Dartmouth endpoint dropped the connection while streaming the planner's ~75KB multi-file reply. The DartmouthBackend's transient-error classifier had "connection reset"/"connection refused" but NOT "connection broken" / IncompleteRead / ChunkedEncodingError, so the drop was classified Permanent → no retry → the plan stage failed and stranded the project at `clarified`. Fix: add the connection-dropped-mid-stream markers (connection broken, incompleteread, chunkedencodingerror, connection aborted, remotedisconnected, remote end closed, broken pipe, eof occurred) to the transient set so _retry_with_backoff retries them. Also extracted the marker tuple + match into a module-level _is_transient_error_text() (was buried in a closure, untestable) and added regression tests for the exact failure + a no-over-match guard on genuine permanent errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Part-7 (PROJ-552 plan run #13): after the single-document fix, the planner hit a SECOND YAML pitfall — an unquoted schema description "... (target: ≥95%)" whose bare ": " made yaml.safe_load read it as a nested mapping ("mapping values are not allowed here"), again rejected by the FR-007 guard. Prompt now requires quoting any string value containing a colon, '#', leading ≥/≤/%, or brackets. (The robust backstop — a bounded planner revision-with-feedback loop on guard failure — is implemented separately; prompt nudges alone are whack-a-mole.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A malformed planner artifact (e.g. a contracts schema with an internal `---` multi-document marker, or an unquoted `: ` that breaks YAML) previously hard-crashed PlannerAgent.write_artifacts on the first deterministic-guard failure — marking the run FAILED and stranding the project at 'clarified' with no retry. Wrap the split → FR-005 → write → FR-007 → FR-006 pipeline in a bounded retry loop: - Refactor the write+guard body into _write_and_validate(ctx, mechanical_output, response_text) -> list[str], which raises the guard exception (and unlinks partial writes) on failure, exactly as before. - write_artifacts now calls it in a loop: on a guard exception, if attempts remain, re-call the planner LLM via _revise_with_feedback with ONE corrective user message quoting the exact guard error and demanding all five files re-emitted in the FILE-marker format; else re-raise the last guard exception (fail-closed preserved). - Cap: MAX_PLAN_REVISION_RETRIES = 2 (up to 3 total attempts). Each retry logged at INFO. - Offline-safety gate: the corrective re-call builds the backend via make_backend(ctx.default_backend.value); if it returns None or raises, the loop does NOT retry and re-raises the guard exception. Offline unit tests therefore stay network-free. - The plan convergence panel runs only AFTER the artifacts pass the guards. Tests: - tests/unit/test_plan_revision_loop.py drives the real write_artifacts / _write_and_validate with a fake backend collaborator: invalid-then-valid retries-and-succeeds (both PROJ-552 failure modes), all-invalid raises the last guard exception with no partial artifacts, and no-backend / make_backend-raises both fail closed without retrying. - Updated two existing phase4 tests that encoded the old hard-crash-on-first-failure contract (template-reject unlink, bad-URL unlink) to force the offline path (make_backend -> None), asserting the same fail-closed-and-unlink behavior via the loop's no-backend branch. ruff + mypy clean; offline gate 1899 passed (baseline 1894 + 5), 0 failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…el to human Part-7 finding (PROJ-552 plan panel, run #14): the qwen endpoint hung past its 180s deadline; the backend's own retry+backoff exhausted and surfaced a TransientBackendError. _stage_panel's engine-failure handler caught it like any exception → wrote human_input_needed.yaml + raised StagePanelEscalation → the project was stranded at human_input_needed. But a transiently-degraded model endpoint is NOT human-actionable: a human cannot fix a hung endpoint. Fix: catch TransientBackendError separately in run_stage_panel and re-raise it AS-IS (no human_input_needed.yaml, no StagePanelEscalation wrap) so the run fails transiently and the project STAYS at its current stage to retry on the next scheduler tick when the endpoint recovers. run_one_step does not catch TransientBackendError, so it surfaces as a transient CLI FAIL with no stage change. Genuine (non-transient) engine failures still escalate to human as before. +1 regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremymanning · 2026-05-31T10:42:49Z

Part-7 end-to-end pipeline hardening (driving PROJ-552 live)

Drove a real project (PROJ-552, knot-complexity, mathematics) through the live pipeline stage-by-stage per issue #239 Part 7, and fixed 8 general bugs — each surfaced by a real failure and fixed so it benefits every project, not just the test case:

#	Bug (effect before fix)	Commit
1	Claim extraction dropped all claims on embedded-quote YAML (paper titles) → fabrications passed un-checked	`1eb590d3`
2	Reviewer crashed the whole panel on frontmatter edge cases (missing closing `---`, fenced prose)	`1c8451ee`, `ff2ba357`
3	Claim layer not idempotent under the convergence loop → garbled/accumulating prose (016 vs F-19 conflict)	`d2b76008`, `14814f40`
4	Kickbacks never routed — `StagePanelKickback` bypassed `_decide_next_stage` → project looped forever, no escalation	`1032b6b0`
5A	Planner emitted invalid YAML schemas (internal `---`, unquoted colons) → FR-007 hard-crash	`84dd3137`, `67f1a001`
5B	Plan deterministic guards hard-crashed instead of driving a bounded planner revision-with-feedback loop	`0d230f2d`
6	Connection-dropped-mid-stream (`IncompleteRead`/`ChunkedEncoding`) not classified transient → plan crash	`7ea73cc0`
8	A transient backend failure (hung endpoint) wrongly escalated a panel to `human_input_needed`	`26d1c14a`
—	Cleared accumulated lint/type debt from specs 016/017/018 (ruff 217→0, mypy 63→0)	`cf9f0d79`, `00d5f171`

Recurring lesson: controlled/expected failures (panel non-convergence, malformed planner output, connection drops, hung endpoints) must route / revise / fail-transiently, never hard-crash and strand the project.

Proven end-to-end: the convergence protocol works — PROJ-552 went brainstorm → idea → drift-kickback → flesh_out realign → idea-converge → spec converged (with the correct 9,988-prime-knots-at-13-crossings count verified against a real 2025 J. Knot Theory Ramifications DOI), and the plan stage now self-heals malformed artifacts + survives transient endpoint failures.

Offline gate: 1900 passed, 0 failures; ruff + mypy clean. Traversal continues (plan panel → tasks → … → publisher, then the 9-domain repetition) once the (currently-degraded) model endpoint recovers.

🤖 Generated with Claude Code

…w notes Persists PROJ-552's pipeline state so the traversal can resume from the plan panel on another machine or in a GitHub Action while this laptop is suspended. Stage=clarified (spec converged at specs/002 with the verified 9,988 knot count); plan convergence panel is next. Includes the claims/citations registries, librarian cache, run-log telemetry, and the living review tracker (notes/spec-015-review-status.md) documenting all 8 fixes + the resume command. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The llmxive-pipeline workflow ran `llmxive run` but never committed the result, so an Action's progress (advanced stage, new artifacts, run-log) was discarded when the ephemeral runner tore down. Add an always() commit+push step so a GitHub-Action run (e.g. driving PROJ-552 while a laptop is suspended) actually persists. [skip ci] avoids retriggering. Pushes to the checked-out branch via HEAD:${GITHUB_REF_NAME}. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nce-protocol # Conflicts: # .github/workflows/spec015-calibration.yml

…m/ContextLab/llmXive into 015-pipeline-convergence-protocol

…-552 plan panel Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremymanning and others added 30 commits May 27, 2026 20:08

docs(015): T036 US8 roll-up — 7 of 10 discrepancies closed; 3 fold in…

e9b3b77

…to US3 (#239) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

docs(015): T043 follow-up — 2 more stragglers (#239)

8b2f066

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

docs(015): T045 US3 sweep — points gone, triage live (#239)

1ded8d4

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix(015): T048 ruff — replace en-dashes in docstrings (#239)

97bcfff

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jeremymanning and others added 8 commits May 30, 2026 15:51

feat(018): verify foundation — mode selector, library constants, appr…

d5d22f3

…oximate comparator (T001-T008) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(018): cli enablement note + exact-count no-regress guard + quick…

f0cf3b4

…start (T026/T027/T029) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(018): constants zero-network integration test (T015)

99113a4

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(018): mark T028 — full gate 1858 passed, real-call green, 017 e…

aec3406

…2e no-regress; all 29 tasks done Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremymanning and others added 13 commits May 30, 2026 18:55

jeremymanning and others added 5 commits May 31, 2026 07:57

chore(pipeline): persist run progress [skip ci]

7ebeb1e

Merge remote-tracking branch 'origin/main' into 015-pipeline-converge…

29723ae

…nce-protocol # Conflicts: # .github/workflows/spec015-calibration.yml

Merge branch '015-pipeline-convergence-protocol' of https://github.co…

18d282e

…m/ContextLab/llmXive into 015-pipeline-convergence-protocol

jeremymanning merged commit df968f0 into main May 31, 2026
3 of 5 checks passed

jeremymanning added a commit that referenced this pull request May 31, 2026

docs(notes): update spec-015 tracker — PR #250 merged, resume at PROJ…

b6250b1

…-552 plan panel Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec 015: pipeline convergence protocol (closes #239)#250

spec 015: pipeline convergence protocol (closes #239)#250
jeremymanning merged 213 commits into
mainfrom
015-pipeline-convergence-protocol

jeremymanning commented May 29, 2026

Uh oh!

jeremymanning commented May 30, 2026

Uh oh!

jeremymanning commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeremymanning commented May 29, 2026

Summary

Key behavior (selected FRs)

Hardening in this PR

Verification

Uh oh!

jeremymanning commented May 30, 2026

Update — claim-trustworthiness stack added (specs 016 → 017 → 018)

Spec 016 — Claim-Verification Layer (detective)

Spec 017 — Authoritative-Fill (constructive)

Spec 018 — Per-Claim Verification Modes

Uh oh!

jeremymanning commented May 31, 2026

Part-7 end-to-end pipeline hardening (driving PROJ-552 live)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants