From dddbe577d1855bd07d0cb1665d25d3d886799b6b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexander=20K=C3=B6lnberger?= <159939812+ProfRandom92@users.noreply.github.com> Date: Tue, 19 May 2026 13:48:58 -0700 Subject: [PATCH 1/3] Rewrite README as reviewer-oriented deterministic validation front page --- README.md | 509 +++++++++++++++++++++++++----------------------------- 1 file changed, 236 insertions(+), 273 deletions(-) diff --git a/README.md b/README.md index fb36ab3..3ca6699 100644 --- a/README.md +++ b/README.md @@ -5,13 +5,7 @@
- Deterministic operational replay validation for long-horizon AI agents. -
- -
- Comptextv7 is a deterministic operational replay-validation and state-survivability prototype: it tests whether compact, replay-safe operational state preserves fixture-defined evidence, constraints, blockers, dependencies, recovery paths, and tool-order signals across compression, reconstruction, iterative replay degradation, and CI-audited summaries — without LLM judges, embeddings, vector databases, graph stores, or external APIs.
-
- See docs/research_positioning.md for conservative project positioning and scope boundaries.
+ Deterministic replay-survivability validation for compressed operational state in long-horizon AI agents.
@@ -23,33 +17,63 @@
- External Monaco showcase repo - · Benchmark explanation - · Iterative replay degradation - · Replay report -
+CompTextv7 does not ask whether a compressed summary sounds good. +It asks whether the compressed state can still replay the operational facts required to continue the work. --- +## In 30 seconds + +Long-horizon agents repeatedly compress prior work into smaller summaries. Those summaries can lose blockers, constraints, evidence, dependency order, recovery paths, and tool order. CompTextv7 validates whether those operational facts survive replay using deterministic contracts, exact scoring, and committed artifacts. -## Thesis +## What CompTextv7 is -Comptextv7 measures whether compressed agent/workflow state can still be replayed as usable operational state — and shows exactly what breaks when compression becomes too aggressive. It extracts fixture-defined evidence, constraints, blockers, dependencies, recovery paths, and tool-order signals; compacts them into replay-safe state; reconstructs them; and validates deterministic survival with committed artifacts rather than LLM judges, embeddings, vector databases, graph stores, or external APIs. +- A deterministic replay-validation research prototype. +- Fixture-bound and contract-linked. +- Artifact-backed with reproducible JSON/SVG outputs. +- CI-reproducible via repository checks and GitHub Actions. +- Focused on operational admissibility, not prose quality. -## What Comptextv7 does +## What CompTextv7 is not + +- Autonomous agent framework. +- Workflow orchestrator. +- Learned compressor. +- Vector memory system. +- RAG replacement. +- KV-cache optimizer. +- Production telemetry platform. +- Clinical-grade system. +- Universal AI-memory solution. +- LLM judge. + +## Replay validation model + +```mermaid +flowchart LR + A["Checked-in fixture"] --> B["Original operational state"] + B --> C["Compressed + reconstructed replay state"] + C --> D["Contract validator"] + D --> E["Admissibility scorer"] + E --> F["Failure labels"] + E --> G["Committed artifacts"] + G --> H["CI gates"] + F --> H +``` -- Extracts operational state from checked-in paper and agent/workflow fixtures. -- Compacts that state into deterministic replay payloads. -- Replays the compacted state into reconstructed operational state. -- Validates deterministic survival of required evidence, constraints, blockers, dependencies, recovery paths, and tool-order signals. -- Labels replay failure modes when operational state is lost or detached. +## Current fixture-bound signal -## Why this matters +- Three manifest-registered operational fixture families. +- Standard levels: `baseline`, `mild`, `moderate`, `severe`. +- Deterministic evaluation mode. +- No LLM judges. +- No external APIs. +- Exact rational scoring. +- Reproducible artifacts. -Summaries can sound fluent while losing blockers, constraints, evidence, dependencies, or recovery paths. Comptextv7 treats that as a measurable replay problem: if compressed state cannot be replayed into usable operational state, the validator records what failed instead of relying on subjective prose quality. +These are internal fixture-bound results, not external benchmark claims, production-readiness claims, or solved-memory claims. -## Current signal +### Drift-locked benchmark metrics | Signal | Current fixture-bound result | | --- | ---: | @@ -58,291 +82,230 @@ Summaries can sound fluent while losing blockers, constraints, evidence, depende | `CONSERVATIVE` replay consistency | `0.895833` | | `BALANCED` replay consistency | `0.250000` | | `AGGRESSIVE` replay consistency | `0.125000` | - -`BALANCED` currently emits these replay failure labels in the comparative degradation fixture summary: `EVIDENCE_LOSS` and `CONSTRAINT_DRIFT`. - -Interpretation: the profile comparison shows monotonic degradation under increasing compression pressure. That is useful because the benchmark responds to operational-state loss; these results are internal, fixture-bound observations, not external benchmark, production-readiness, or solved-memory claims. - -## Positioning boundaries - -Comptextv7 is a deterministic operational replay-validation and state-survivability prototype. It is complementary to learned context-compression research, RAG evaluation, vector-memory systems, serving-layer cache optimization, and durable workflow infrastructure, but it is not a workflow orchestrator, learned compressor, vector memory system, RAG replacement, KV-cache compressor, autonomous agent framework, production telemetry system, clinical-grade system, or universal AI-memory solution. - -For the concise research positioning brief, scope boundaries, and benchmark interpretation, see [Research Positioning](docs/research_positioning.md), [Iterative Replay Degradation](docs/iterative_replay_degradation.md), the [Benchmark Explanation](docs/BENCHMARK_EXPLANATION.md), the committed [iterative replay degradation summary](artifacts/iterative_replay_degradation_results.summary.md), and [`scripts/validate_replay_artifact_drift.py`](scripts/validate_replay_artifact_drift.py). The visual Monaco walkthrough lives in the external [`ProfRandom92/comptext-v7-monaco-showcase`](https://github.com/ProfRandom92/comptext-v7-monaco-showcase) repository. - -## Proof at a glance - -| Evidence | Current result | -|---|---:| -| Paper replay fixtures | 3 dense technical papers | -| Agent trace fixtures | 3 multi-step workflows | | Paper avg compression | `1.347063` | | Agent avg compression | `1.773954` | -| Paper replay consistency | `0.791667` | | Agent replay consistency | `1.000000` | | Agent operational drift | `0.000000` | -| Evaluation mode | deterministic, no LLM judging | -| Artifact format | committed JSON + CI upload | - -Sources: [`artifacts/paper_replay_results.json`](artifacts/paper_replay_results.json) and [`artifacts/agent_trace_replay_results.json`](artifacts/agent_trace_replay_results.json). - -## How to read these values -- **Paper replay is lossy under dense technical prose.** The current paper fixtures include entities, limitations, sections, and metrics that are harder to preserve after compaction. -- **Agent trace replay is currently near-lossless because traces are structured.** The checked-in traces expose explicit tasks, blockers, dependencies, tool order, and recovery actions. -- **`1.000000` replay consistency does not mean solved memory.** It means exact preservation under the current structured trace fixtures and current deterministic validator. -- **Operational drift is field loss, not subjective quality.** A non-zero drift rate would mean replay lost required operational fields. -- **Iterative replay degradation is a bounded prototype.** Repeated compact/replay cycles emit deterministic JSON and Markdown artifacts for reviewing drift curves, collapse points, and failure labels. A small fixture-bound comparison mode contrasts `CONSERVATIVE`, `BALANCED`, and `AGGRESSIVE` compression profiles with deterministic per-profile aggregates, and an additive sensitivity-analysis surface varies bounded replay/compression parameters without external services. -## What makes this different - -- Not chat-history storage. -- Not vector memory. -- Not model-judged summarization. -- Not autonomous agent orchestration. -- Deterministic operational-state replay validation. - -## Architecture +## Artifact evidence pipeline ```mermaid flowchart LR - A[Raw Context / Agent Trace] - --> B[Operational State Extraction] - B --> C[Compact Replay State] - C --> D[Replay Reconstruction] - D --> E[Deterministic Validation] - E --> F[CI Artifact] + A["fixtures/manifest.json"] --> B["Fixture families"] + B --> C["DegradationCurveGenerator"] + B --> D["AdmissibilityScorer"] + C --> E["artifacts/multi_family_admissibility_curves.svg"] + D --> F["artifacts/layered_admissibility_results.json"] + D --> G["artifacts/multi_family_admissibility_results.json"] + F --> H["Reproducibility tests"] + G --> H + E --> I["Progression tests"] + H --> J["GitHub Actions"] + I --> J ``` -Comptextv7 turns noisy context into compact operational state, then validates whether replay reconstructs the fields needed to continue work. - -## Benchmark family - -### Paper Replay Benchmark - -- **Validates:** whether dense technical paper summaries preserve entities, metrics, limitations, and section structure after deterministic replay compression. -- **Artifact:** [`artifacts/paper_replay_results.json`](artifacts/paper_replay_results.json). -- **Method:** [`docs/benchmarks/paper_replay.md`](docs/benchmarks/paper_replay.md). -- **Current avg compression:** `1.347063`. -- **Current replay consistency:** `0.791667`. - -### Agent Trace Replay Benchmark - -- **Validates:** whether multi-step agent workflows preserve active tasks, constraints, dependencies, tool sequences, unresolved blockers, deployment requirements, and recovery actions. -- **Artifact:** [`artifacts/agent_trace_replay_results.json`](artifacts/agent_trace_replay_results.json). -- **Method:** [`docs/benchmarks/agent_trace_replay.md`](docs/benchmarks/agent_trace_replay.md). -- **Current avg compression:** `1.773954`. -- **Current replay consistency:** `1.000000`. -- **Operational drift:** `0.000000`. -- **Interpretation:** current setup is near-lossless because the fixtures are structured; this is a useful baseline, not a universal memory claim. - -### Multi-Family Operational Admissibility Benchmark - -- **Validates:** Deterministic multi-family operational admissibility benchmark with manifest-driven fixture selection, exact scoring, reproducible JSON artifacts, and progression-regression checks. -- **Method:** [`docs/benchmarks/multi_family_admissibility_benchmark.md`](docs/benchmarks/multi_family_admissibility_benchmark.md). - -### Iterative Replay Degradation Prototype - -- **Validates:** how checked-in paper and agent-trace fixtures degrade across bounded repeated compact/replay cycles. -- **Method:** [`docs/iterative_replay_degradation.md`](docs/iterative_replay_degradation.md). -- **Profile comparison:** additive prototype mode compares `CONSERVATIVE`, `BALANCED`, and `AGGRESSIVE` compression profiles using fixture-bound aggregates only: collapse rate, replay consistency, operational drift, evidence survival, and deterministic failure labels. -- **Sensitivity analysis:** additive JSON/Markdown surface varies bounded `max_context_units`, `max_families`, `max_bursts`, `replay_window_seconds`, `replay_cycles`, and `compression_budget_scale` values for fixture-bound replay degradation review. -- **Current internal baseline:** see the fixture-bound [comparative replay degradation results](docs/iterative_replay_degradation.md#comparative-replay-degradation-results). -- **Interpretation:** profile comparison rows are deterministic replay-validation observations for the current fixtures, not general memory, production, or clinical-grade claims. - -## Complementary adversarial replay stress suite - -This suite is a separate long-horizon stress surface under `reports/replay_continuity/`. -It remains useful context, but the focused README narrative is the deterministic operational replay benchmark family above. - -| System | Iteration 25 | Iteration 50 | Iteration 100 | Iteration 250 | -| --- | ---: | ---: | ---: | ---: | -| Naive | 0.039 | 0.039 | 0.043 | 0.039 | -| Baseline | 0.294 | 0.294 | 0.294 | 0.294 | -| Adaptive | 0.679 | 0.476 | 0.302 | 0.302 | -| Comptextv7 | 1.000 | 0.995 | 0.824 | 0.572 | - -The committed 250-iteration report records Comptextv7 mean final continuity at `0.571783`, rounded to `0.572` here. -Detail fidelity still degrades: hidden truth survival is `0.570173`, and evaluator agreement divergence is `0.421743`. - -| System | Approx collapse point | -| --- | ---: | -| Naive | ~1 iteration | -| Baseline | ~10 iterations | -| Adaptive | ~45 iterations | -| Comptextv7 | censored at ~250 iterations in this suite | - -## Visual artifacts - -- [`replay_degradation_curves.svg`](reports/replay_continuity/replay_degradation_curves.svg) -- [`continuity_half_life_chart.svg`](reports/replay_continuity/continuity_half_life_chart.svg) -- [`semantic_drift_graph.svg`](reports/replay_continuity/semantic_drift_graph.svg) -- [`replay_collapse_curves.svg`](reports/replay_continuity/replay_collapse_curves.svg) -- [`evaluator_agreement_divergence.svg`](reports/replay_continuity/evaluator_agreement_divergence.svg) -- [`hidden_constraint_survival_curves.svg`](reports/replay_continuity/hidden_constraint_survival_curves.svg) - -## Integrity model - -- **no LLM judging**; -- **no embeddings**; -- **no vector DBs**; -- **no external APIs**; -- **artifact-backed JSON + CI checks**; -- **deterministic hashing foundation** ([`docs/deterministic_hashing.md`](docs/deterministic_hashing.md)); -- **audit-friendly and CI reproducible**. - -### Foundational Components - -The system relies on the following deterministic foundations: -- **`ReferenceIndex`** and **`EventLogArtifactAdapter`**: track context references and deterministically fingerprint event payloads ([`docs/reference_index_event_fingerprints.md`](docs/reference_index_event_fingerprints.md)). -- **`ReplayArtifactWriter v1-alpha.1`**: generates deterministic, standalone JSON artifacts for verifiable snapshots ([`docs/replay_artifact_writer.md`](docs/replay_artifact_writer.md)). - -## Limitations - -- Metrics mentioned in benchmarks are **fixture-bound baselines** and do not reflect real-world universal correctness. -- Fixtures are curated and checked in. -- Structured agent traces currently replay near-losslessly. -- This is not solved AI memory. -- This is not production telemetry. -- This is not an autonomous agent framework. -- Evaluator divergence remains material in the long-horizon stress suite. -- Iterative degradation remains a bounded fixture prototype; its artifact and summary are review aids, not universal memory claims. - -## Next technical milestone - -> Next: continue tightening deterministic replay review surfaces. -> Keep repeated compact/replay artifacts cheap, deterministic, additive-compatible, and easy to inspect in CI and pull requests. - -## Validated deterministic replay review flow - -Use this short flow when reviewing replay-system changes: - -1. Regenerate or inspect deterministic replay artifacts only from checked-in fixtures. -2. Compare stable metric fields (`replay_consistency`, evidence survival rates, `operational_drift_rate`) and taxonomy fields (`failure_labels`, `failure_mode_counts`) rather than prose interpretations. -3. For iterative degradation and sensitivity review, run `python scripts/generate_iterative_replay_degradation_artifacts.py` and inspect both the JSON artifact and Markdown summary. -4. Treat additive artifact fields as forward-compatible when existing deterministic fields remain stable. -5. Keep claims fixture-bound: no LLM judging, embeddings, external APIs, production-readiness claims, or solved-memory claims. - -## Review surfaces - -The main Comptextv7 repository is the source of truth for deterministic replay-validation evidence: artifacts, benchmarks, failure labels, degradation summaries, and conservative research positioning. The visual Monaco walkthrough now lives separately in the external showcase repository. +## Minimal deterministic example + +```json +{ + "original_operational_state": { + "policy_steps": ["identify_owner", "collect_evidence", "execute_recovery"], + "causal_dependencies": [["alert", "triage"], ["triage", "recovery"]], + "recovery_paths": ["pagerduty_ack -> mitigation_runbook"] + }, + "reconstructed_state": { + "policy_steps": ["collect_evidence", "identify_owner", "execute_recovery"], + "causal_dependencies": [["alert", "recovery"]], + "recovery_paths": [] + }, + "deterministic_validation_result": { + "admissible": false, + "failure_labels": [ + "POLICY_ORDER_BROKEN", + "CAUSAL_DEPENDENCY_LOSS", + "RECOVERY_PATH_INVALID", + "INVARIANT_VIOLATION" + ] + } +} +``` -### Main repo technical evidence +## Proof artifacts -| Surface | Link | -| --- | --- | -| CI Artifact Narrative | [`docs/ci_artifact_narrative.md`](docs/ci_artifact_narrative.md) | -| Benchmark explanation | [`docs/BENCHMARK_EXPLANATION.md`](docs/BENCHMARK_EXPLANATION.md) | -| Replay failure taxonomy | [`docs/operational_replay_failure_taxonomy.md`](docs/operational_replay_failure_taxonomy.md) | -| Iterative replay degradation artifact and CI summary | [`docs/iterative_replay_degradation.md`](docs/iterative_replay_degradation.md) | -| Comparative replay degradation artifact and CI summary | [`docs/iterative_replay_degradation.md#comparative-replay-degradation-results`](docs/iterative_replay_degradation.md#comparative-replay-degradation-results) | -| Replay sensitivity-analysis artifact and CI summary | [`docs/iterative_replay_degradation.md#replay-sensitivity-analysis-surface`](docs/iterative_replay_degradation.md#replay-sensitivity-analysis-surface) | -| Replay report | [`reports/replay_continuity/validation_report.md`](reports/replay_continuity/validation_report.md) | -| API surface | [`docs/API_SURFACE.md`](docs/API_SURFACE.md) | - -### External Monaco showcase UI - -| Surface | Link | +| Artifact | Purpose | | --- | --- | -| Monaco showcase repository | [`ProfRandom92/comptext-v7-monaco-showcase`](https://github.com/ProfRandom92/comptext-v7-monaco-showcase) | -| Legacy demo walkthrough note | [`docs/DEMO_WALKTHROUGH.md`](docs/DEMO_WALKTHROUGH.md) | -| Legacy showcase readiness note | [`docs/SHOWCASE_READINESS.md`](docs/SHOWCASE_READINESS.md) | +| `artifacts/layered_admissibility_results.json` | Layered admissibility outputs by severity/profile. | +| `artifacts/multi_family_admissibility_results.json` | Multi-family deterministic admissibility aggregates. | +| `artifacts/multi_family_admissibility_curves.svg` | Deterministic multi-family degradation curve rendering. | +| `docs/benchmarks/multi_family_admissibility_benchmark.md` | Benchmark method and interpretation constraints. | +| `docs/failure_taxonomy.md` | Failure label documentation aligned to deterministic contracts. | -## Repository map +## Verify locally -```text -Comptextv7/ -├── artifacts/ # committed deterministic replay benchmark JSON -├── benchmarks/ # deterministic compression, replay, and audit runners -├── contracts/ # machine-readable validation and handoff contracts -├── dashboard/ # backend plus React operations console -├── docs/ # benchmark, artifact, research, and legacy showcase notes -├── reports/replay_continuity/ # adversarial continuity metrics and SVG charts -├── scripts/ # validation, reporting, and artifact tooling -├── showcase/app/ # legacy in-repo Vite app; Monaco UI lives in external repo -├── src/ # KVTC engine, audit, and semantic validation modules -├── tests/ # Python regression and replay validation tests -└── README.md +```bash +npm install --no-save --no-package-lock +npm run check +pytest tests/test_failure_taxonomy.py -q +pytest tests/test_multi_family_admissibility_artifact.py -q +pytest tests/test_multi_family_svg_renderer.py -q ``` -## Safety boundaries -Do not commit: -- proprietary customer data; -- secrets, API keys, tokens, cookies, or credentials; -- raw production logs; -- unsanitized replay fixtures; -- private deployment credentials or environment dumps. - -Comptextv7 is a deterministic, synthetic-only research prototype for operational replay persistence and reviewable diagnostic infrastructure. - -## Cloud-first validation -Comptextv7 is biased toward artifact-backed review rather than local machine trust. +## Benchmark families -| Workflow | Role | -|---|---| -| [`ci.yml`](.github/workflows/ci.yml) | Runs deterministic replay, tests, telemetry, and validation gates. | -| [`agent-checks.yml`](.github/workflows/agent-checks.yml) | Runs repository/report/contract checks plus dashboard validation. | -| [`validation_runner.yml`](.github/workflows/validation_runner.yml) | Publishes compact cloud validation result artifacts. | +- `coding_workflow_pr_review` +- `incident_response_page_triage` +- `cross_domain_operational_dependency_workflow` -## Reproducibility -Install the Python test dependency set: +## Fixture family coverage -```bash -python -m pip install -e '.[test]' +```mermaid +flowchart LR + A["coding_workflow_pr_review"] --> L1["baseline"] + A --> L2["mild"] + A --> L3["moderate"] + A --> L4["severe"] + + B["incident_response_page_triage"] --> L1 + B --> L2 + B --> L3 + B --> L4 + + C["cross_domain_operational_dependency_workflow"] --> L1 + C --> L2 + C --> L3 + C --> L4 + + L1 --> M["manifest registration"] + L2 --> M + L3 --> M + L4 --> M + M --> N["multi-family artifact"] + N --> O["deterministic SVG"] ``` -Regenerate deterministic replay artifacts: +## Failure labels + +Primary registered labels used across deterministic admissibility validation: + +- `POLICY_ORDER_BROKEN`: required policy order failed deterministic contract. +- `TOOL_ORDER_VIOLATION`: replayed tool sequence diverged from required execution order. +- `CAUSAL_DEPENDENCY_LOSS`: required causal edges were not preserved. +- `DEPENDENCY_CHAIN_BREAK`: required dependency chains broke in reconstruction. +- `RECOVERY_PATH_INVALID`: required recovery reachability contract failed. +- `RECOVERY_PATH_LOSS`: required recovery route(s) not preserved. +- `INVARIANT_VIOLATION`: declared invariant contract failed. +- `ORPHAN_DEPENDENCY`: dependency nodes became orphaned. +- `DETACHED_DEPENDENCY`: required dependency edges detached. +- `CYCLE_INTRODUCED`: reconstructed dependency graph introduced a cycle. +- `GRAPH_FRAGMENTATION`: replay fragmented required dependency connectivity. +- `TEMPORAL_ORDER_VIOLATION`: required relative topological order was violated. +- `EVIDENCE_LOSS`: evidence preservation dropped below required full survival. +- `EVIDENCE_SURVIVAL_LOSS`: expected evidence units were not fully preserved. +- `HIGH_CRITICAL_EVIDENCE_LOSS`: high-critical evidence survival fell below full preservation. +- `CONSTRAINT_DRIFT`: constraint preservation drifted below full survival. +- `BLOCKER_DETACHMENT`: blocker survival dropped below required full attachment. +- `GOVERNANCE_DRIFT`: governance constraint preservation drifted below full survival. +- `ARTIFACT_INTEGRITY_VIOLATION`: artifact fields drifted against deterministic contract bundle. +- `REPLAY_NON_REPRODUCIBLE`: fixed-input replay output failed reproducibility checks. + +## Failure taxonomy map -```bash -python tests/utils/paper_replay_runner.py -python tests/utils/agent_trace_replay_runner.py -python benchmarks/run_replay_continuity.py --iterations 250 --output-dir reports/replay_continuity -python scripts/generate_iterative_replay_degradation_artifacts.py +```mermaid +flowchart LR + O1["POLICY_ORDER_BROKEN"] --> C1["ordering"] + O2["TOOL_ORDER_VIOLATION"] --> C1 + O3["TEMPORAL_ORDER_VIOLATION"] --> C1 + + D1["CAUSAL_DEPENDENCY_LOSS"] --> C2["causality/dependency"] + D2["DEPENDENCY_CHAIN_BREAK"] --> C2 + D3["DETACHED_DEPENDENCY"] --> C2 + D4["ORPHAN_DEPENDENCY"] --> C2 + D5["GRAPH_FRAGMENTATION"] --> C2 + D6["CYCLE_INTRODUCED"] --> C2 + + R1["RECOVERY_PATH_INVALID"] --> C3["recovery/reachability"] + R2["RECOVERY_PATH_LOSS"] --> C3 + + I1["INVARIANT_VIOLATION"] --> C4["invariant/no-orphan"] + + E1["EVIDENCE_LOSS"] --> C5["evidence/criticality"] + E2["EVIDENCE_SURVIVAL_LOSS"] --> C5 + E3["HIGH_CRITICAL_EVIDENCE_LOSS"] --> C5 + E4["BLOCKER_DETACHMENT"] --> C5 + E5["CONSTRAINT_DRIFT"] --> C5 + E6["GOVERNANCE_DRIFT"] --> C5 + + A1["ARTIFACT_INTEGRITY_VIOLATION"] --> C6["artifact integrity/reproducibility"] + A2["REPLAY_NON_REPRODUCIBLE"] --> C6 ``` -Use the validation commands in [`docs/validation.md`](docs/validation.md). The root `package.json` is a wrapper for reviewer convenience. App dependencies remain in `dashboard/app` and the legacy in-repo `showcase/app`; the current Monaco showcase UI is maintained in [`ProfRandom92/comptext-v7-monaco-showcase`](https://github.com/ProfRandom92/comptext-v7-monaco-showcase). +## How this differs from adjacent systems -Root wrapper checks: +| System type | Stores state | Compresses context | Orchestrates agents | Deterministically validates replay loss | +| --- | --- | --- | --- | --- | +| Workflow runtimes | Sometimes | No | Yes | No | +| Agent frameworks | Sometimes | Sometimes | Yes | Usually no | +| Vector memory / RAG | Yes | Retrieval-centric | No | No | +| Learned prompt compressors | Sometimes | Yes (learned) | No | Usually no | +| LLM-as-judge evaluators | Sometimes | N/A | No | No (model-judged) | +| CompTextv7 | Yes (fixture operational state) | Yes (deterministic replay state) | No | Yes | -```bash -npm run layout -npm run typecheck -npm run validate -npm run build -npm test -npm run check +## CI and merge gate + +```mermaid +flowchart LR + A["PR head SHA"] --> B["GitHub Actions"] + B --> C["Agent Workflow Checks"] + B --> D["hash-companion-validation"] + B --> E["CompText V7 Industrial Validation"] + C --> F["all success"] + D --> F + E --> F + F --> G["squash merge"] ``` -Dashboard app checks: +Vercel/Netlify/deployment previews are not merge gates unless explicitly scoped. -```bash -cd dashboard/app -npm run typecheck -npm run build +## Repository map + +```text +Comptextv7/ +├── artifacts/ +├── docs/ +├── fixtures/ +├── reports/ +├── scripts/ +├── tests/ +└── src/validation/ ``` -Showcase app checks: +## Roadmap -```bash -cd showcase/app -npm run typecheck -npm run validate -npm run build -``` +- Forensic audit reports with deterministic exports. +- Artifact schema stabilization. +- Cross-family degradation comparison. +- Minimal artifact integrity gates. +- Golden corpus foundation. +- Offline import/export schemas only. -Python checks from the repository root: +## Roadmap flow -```bash -pytest -q -pytest tests/test_core_foundation_ts.py -q -pytest tests/test_paper_replay_bench.py tests/test_agent_trace_replay.py tests/test_replay_continuity.py -q +```mermaid +flowchart LR + A["failure taxonomy"] --> B["cross-domain fixture families"] + B --> C["forensic reports"] + C --> D["schema stabilization"] + D --> E["cross-family comparison"] + E --> F["integrity gates"] + F --> G["golden corpus"] + G --> H["offline import/export"] ``` -Additional repository validation helpers remain available when their surfaces are touched: +## Limitations -```bash -python scripts/validate.py replay -python scripts/validate.py token -python scripts/validate.py forensic -python scripts/validate_contracts.py -python scripts/validate_api_exports.py -``` +- Metrics are fixture-bound and internal to checked-in datasets. +- Fixtures are curated and checked in, not live production traces. +- This is a deterministic prototype, not a production-readiness claim. +- This is not a universal AI-memory claim. +- This does not claim runtime integration or orchestration coverage. From 4a8cd2fb00848af97517009522437570d883cc48 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexander=20K=C3=B6lnberger?= <159939812+ProfRandom92@users.noreply.github.com> Date: Tue, 19 May 2026 15:09:03 -0700 Subject: [PATCH 2/3] Update README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3ca6699..067d0d2 100644 --- a/README.md +++ b/README.md @@ -277,7 +277,9 @@ Comptextv7/ ├── reports/ ├── scripts/ ├── tests/ -└── src/validation/ +└── src/ + ├── core/ + └── validation/ ``` ## Roadmap From edf9b3c3aeb8b3eb74d9fd3a583ef930259e6e56 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexander=20K=C3=B6lnberger?= <159939812+ProfRandom92@users.noreply.github.com> Date: Wed, 20 May 2026 01:14:45 -0700 Subject: [PATCH 3/3] Align README roadmap scope with style guide --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 067d0d2..0142ab7 100644 --- a/README.md +++ b/README.md @@ -282,7 +282,9 @@ Comptextv7/ └── validation/ ``` -## Roadmap +## Replay-Validation Roadmap + +This roadmap describes the deterministic replay-validation layer. Broader project milestones such as intake agents, SAE/NLA components, and MCP server work remain tracked separately in the repository style guide. - Forensic audit reports with deterministic exports. - Artifact schema stabilization.