From 0b46ec3e6064ffb6d53bcb1b0adb599810e6422d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexander=20K=C3=B6lnberger?= <159939812+ProfRandom92@users.noreply.github.com> Date: Wed, 20 May 2026 01:52:23 -0700 Subject: [PATCH 1/3] Resolve README conflict against current main --- README.md | 530 +++++++++++++++++++++++------------------------------- 1 file changed, 221 insertions(+), 309 deletions(-) diff --git a/README.md b/README.md index 707085d..b27dc4a 100644 --- a/README.md +++ b/README.md @@ -5,345 +5,273 @@

Comptextv7

- Deterministic Operational Replay Evaluation Infrastructure -

- -

- A replay-native evaluation layer for long-horizon AI agents. + Deterministic replay-survivability validation for compressed operational state in long-horizon AI agents.

No embeddings • No vector DB • No semantic scoring • No LLM judges

-

- Comptextv7 evaluates whether compressed agent state
- remains operationally admissible
- under deterministic replay constraints. -

- -

- Not semantically.
- Operationally. -

-

CI Python Deterministic Replay Replay Native - No LLM Judging Replay Artifacts

- Monaco Showcase → - · Research Positioning + Research Positioning · Benchmark Details - · Replay Degradation - · Validation Report + · Multi-Family Benchmark + · Failure Taxonomy

---- - -## Why this exists - -Most AI memory systems optimize for: - -- semantic similarity -- retrieval quality -- conversational continuity -- long-context recall - -Comptextv7 evaluates something different: - -> Can compressed operational state still remain replayable and admissible after deterministic reconstruction? - -This includes: - -- evidence preservation -- constraint survival -- blocker continuity -- dependency integrity -- tool-order preservation -- recovery-path continuity -- relational admissibility - -All deterministically validated. +CompTextv7 does not ask whether a compressed summary sounds good. It asks whether the compressed state can still replay the operational facts required to continue the work. --- -## Core thesis - -Comptextv7 measures what operationally survives compression. - -Not whether outputs sound similar. - -But whether replayed state still preserves: - -- operational invariants -- admissible execution structure -- dependency continuity -- reconstructable agent behavior - ---- - -## Invariant-first evaluation - -Comptextv7 does not ask: +## In 30 seconds -> “Does the replay semantically resemble the original?” +Long-horizon agents compress prior work into smaller summaries. Those summaries can silently lose blockers, constraints, evidence, dependency order, recovery paths, and tool order. -It asks: - -> “Which operational invariants survived replay?” - -Examples: - -- evidence integrity -- dependency reachability -- blocker attachment -- causal continuity -- tool sequencing -- operational admissibility -- constraint preservation - -This makes the system auditable, reproducible, CI-compatible, and deterministic. +CompTextv7 treats that as a deterministic replay-validation problem. It checks whether compressed operational state remains admissible after reconstruction using fixture-defined contracts, exact scoring, failure labels, committed artifacts, and CI gates. --- -## Architecture - -```mermaid -flowchart TD - A[Raw Agent Trace
or Research Paper] - --> B[Operational State Extraction] - - B --> C[Operational State
Evidence
Constraints
Dependencies
Tool Order
Recovery Paths] +## What CompTextv7 is - C --> D[Compression Profiles] - D --> E[CONSERVATIVE
BALANCED
AGGRESSIVE] +- Deterministic replay-validation infrastructure for operational state. +- Fixture-bound and contract-linked. +- Artifact-backed with reproducible JSON/SVG outputs. +- CI-reproducible through repository checks. +- Focused on operational admissibility, not prose quality. - E --> F[Compact Replay State] - F --> G[Deterministic Replay Reconstruction] +## What CompTextv7 is not - G --> H[Invariant Validation Engine
+ Failure Taxonomy] - - H --> I[Structural Drift
Relational Drift
Operational Drift] - - I --> J[Committed JSON Artifacts
+ Deterministic CI Validation] - - style F fill:#172554,stroke:#60a5fa,stroke-width:2px,color:#ffffff - style H fill:#0f172a,stroke:#38bdf8,stroke-width:2px,color:#ffffff - style J fill:#064e3b,stroke:#34d399,stroke-width:2px,color:#ffffff -``` +- Agent framework. +- Workflow orchestrator. +- Learned compressor. +- Vector memory system. +- RAG replacement. +- KV-cache optimizer. +- Production telemetry platform. +- Clinical-grade system. +- Universal AI-memory solution. +- LLM judge. --- -## Deterministic replay degradation +## Replay validation model ```mermaid -flowchart TD - A[Aggressive Compression] - --> B[Structural Drift] - B --> C[Relational Drift] - C --> D[Operational Drift] - D --> E[Admissibility Collapse] - - style A fill:#1e3a8a,stroke:#60a5fa,color:#ffffff - style E fill:#7f1d1d,stroke:#f87171,color:#ffffff +flowchart LR + A["Checked-in fixture"] --> B["Original operational state"] + B --> C["Reconstructed replay state"] + C --> D["Contract validator"] + D --> E["Admissibility scorer"] + E --> F["Failure labels"] + E --> G["Committed artifacts"] + G --> H["CI gates"] + F --> H ``` --- -## Current deterministic results - -| Profile | Replay Consistency | Evidence Survival | Operational Drift | Failure Labels | -|---|---:|---:|---:|---| -| `CONSERVATIVE` | `0.895833` | `0.916667` | `0.104167` | `EVIDENCE_LOSS` | -| `BALANCED` | `0.250000` | `0.166667` | `0.750000` | `EVIDENCE_LOSS`, `CONSTRAINT_DRIFT` | -| `AGGRESSIVE` | `0.125000` | `0.083333` | `0.875000` | `EVIDENCE_LOSS`, `CONSTRAINT_DRIFT`, `BLOCKER_DETACHMENT` | +## Current fixture-bound signal -Values are fixture-bound and CI-validated against committed replay artifacts. +- Three manifest-registered operational fixture families. +- Standard levels: `baseline`, `mild`, `moderate`, `severe`. +- Deterministic evaluation mode. +- Exact rational scoring. +- Reproducible artifacts. +- No LLM judges or external APIs. -### Additional replay baselines +These are internal fixture-bound results, not external benchmark claims, production-readiness claims, or solved-memory claims. | Signal | Current fixture-bound result | -|---|---:| +| --- | ---: | | Agent trace replay consistency | `1.000000` | -| Agent avg compression | `1.773954x` | -| Agent operational drift | `0.000000` | | Paper replay consistency | `0.791667` | -| Paper avg compression | `1.347063x` | - -Interpretation: the profile comparison shows monotonic degradation under increasing compression pressure. These results are internal fixture-bound observations, not external benchmark, production-readiness, or solved-memory claims. - ---- - -## Failure taxonomy - -Comptextv7 classifies replay degradation into deterministic failure classes. - -| Failure Type | Meaning | -|---|---| -| `EVIDENCE_LOSS` | Critical evidence disappeared | -| `HIGH_CRITICAL_EVIDENCE_LOSS` | High-priority evidence degraded | -| `CONSTRAINT_DRIFT` | Operational constraints degraded | -| `BLOCKER_DETACHMENT` | Blocking dependencies became orphaned | -| `RELATIONAL_DRIFT` | Dependency graph fragmentation | -| `TEMPORAL_DRIFT` | Replay ordering degraded | - -No probabilistic scoring. - -No hidden heuristics. - -Every failure is reproducible. +| `CONSERVATIVE` replay consistency | `0.895833` | +| `BALANCED` replay consistency | `0.250000` | +| `AGGRESSIVE` replay consistency | `0.125000` | +| Paper avg compression | `1.347063` | +| Agent avg compression | `1.773954` | +| Agent operational drift | `0.000000` | --- -## What makes Comptextv7 different +## Artifact evidence pipeline -| Traditional systems | Comptextv7 | -|---|---| -| Semantic similarity | Operational admissibility | -| LLM judges | Deterministic validation | -| Embedding recall | Invariant preservation | -| Memory retrieval | Replay degradation analysis | -| Conversational continuity | Operational continuity | -| Black-box scoring | Auditable metrics | +```mermaid +flowchart LR + A["fixtures/manifest.json"] --> B["Fixture families"] + B --> C["DegradationCurveGenerator"] + B --> D["AdmissibilityScorer"] + C --> E["multi_family_admissibility_curves.svg"] + D --> F["layered_admissibility_results.json"] + D --> G["multi_family_admissibility_results.json"] + F --> H["Reproducibility tests"] + G --> H + E --> I["Progression tests"] + H --> J["GitHub Actions"] + I --> J +``` --- -## What Comptextv7 is not - -Comptextv7 is not: - -- an agent runtime -- a vector memory system -- a RAG framework -- a semantic evaluator -- an orchestration engine -- an LLM-judge benchmark -- a universal memory layer - -It is deterministic replay evaluation infrastructure for operational state degradation analysis. +## Minimal deterministic example + +```json +{ + "original_operational_state": { + "policy_steps": ["identify_owner", "collect_evidence", "execute_recovery"], + "causal_dependencies": [["alert", "triage"], ["triage", "recovery"]], + "recovery_paths": ["ack -> mitigation_runbook"] + }, + "reconstructed_state": { + "policy_steps": ["collect_evidence", "identify_owner", "execute_recovery"], + "causal_dependencies": [["alert", "recovery"]], + "recovery_paths": [] + }, + "deterministic_validation_result": { + "admissible": false, + "failure_labels": [ + "POLICY_ORDER_BROKEN", + "CAUSAL_DEPENDENCY_LOSS", + "RECOVERY_PATH_INVALID", + "INVARIANT_VIOLATION" + ] + } +} +``` --- -## Research positioning - -Current long-context benchmarks mainly focus on: - -- conversational memory -- semantic retrieval -- QA recall -- context-window scaling - -Comptextv7 focuses on: - -- operational admissibility -- invariant preservation -- replay integrity -- deterministic degradation analysis -- relational continuity -- execution-faithful reconstruction +## Proof artifacts -This positions Comptextv7 closer to replayable execution systems, event-sourced orchestration, execution lineage validation, operational semantics, and deterministic governance infrastructure. - -For conservative scope boundaries and benchmark interpretation, see [Research Positioning](docs/research_positioning.md), [Iterative Replay Degradation](docs/iterative_replay_degradation.md), the [Benchmark Explanation](docs/BENCHMARK_EXPLANATION.md), and the committed [iterative replay degradation summary](artifacts/iterative_replay_degradation_results.summary.md). +| Artifact | Purpose | +| --- | --- | +| `artifacts/layered_admissibility_results.json` | Layered admissibility outputs. | +| `artifacts/multi_family_admissibility_results.json` | Multi-family deterministic aggregates. | +| `artifacts/multi_family_admissibility_curves.svg` | Deterministic degradation curve rendering. | +| `docs/benchmarks/multi_family_admissibility_benchmark.md` | Benchmark method and interpretation boundaries. | +| `docs/failure_taxonomy.md` | Failure label documentation. | --- -## Benchmark family - -### Paper Replay Benchmark - -- Validates whether dense technical paper summaries preserve entities, metrics, limitations, and section structure after deterministic replay compression. -- Artifact: [`artifacts/paper_replay_results.json`](artifacts/paper_replay_results.json). -- Method: [`docs/benchmarks/paper_replay.md`](docs/benchmarks/paper_replay.md). -- Current avg compression: `1.347063x`. -- Current replay consistency: `0.791667`. - -### Agent Trace Replay Benchmark - -- Validates whether multi-step agent workflows preserve active tasks, constraints, dependencies, tool sequences, unresolved blockers, deployment requirements, and recovery actions. -- Artifact: [`artifacts/agent_trace_replay_results.json`](artifacts/agent_trace_replay_results.json). -- Method: [`docs/benchmarks/agent_trace_replay.md`](docs/benchmarks/agent_trace_replay.md). -- Current avg compression: `1.773954x`. -- Current replay consistency: `1.000000`. -- Operational drift: `0.000000`. - -### Multi-Family Operational Admissibility Benchmark +## Verify locally -- Validates deterministic multi-family operational admissibility with manifest-driven fixture selection, exact scoring, reproducible JSON artifacts, and progression-regression checks. -- Method: [`docs/benchmarks/multi_family_admissibility_benchmark.md`](docs/benchmarks/multi_family_admissibility_benchmark.md). - -### Iterative Replay Degradation Prototype - -- Validates how checked-in paper and agent-trace fixtures degrade across bounded repeated compact/replay cycles. -- Method: [`docs/iterative_replay_degradation.md`](docs/iterative_replay_degradation.md). -- Profile comparison: fixture-bound aggregates for collapse rate, replay consistency, operational drift, evidence survival, and deterministic failure labels. -- Sensitivity analysis: bounded variations of `max_context_units`, `max_families`, `max_bursts`, `replay_window_seconds`, `replay_cycles`, and `compression_budget_scale`. +```bash +npm install --no-save --no-package-lock +npm run check +pytest tests/test_failure_taxonomy.py -q +pytest tests/test_multi_family_admissibility_artifact.py -q +pytest tests/test_multi_family_svg_renderer.py -q +``` --- -## Integrity model +## Benchmark families -- no LLM judging -- no embeddings -- no vector databases -- no external APIs -- artifact-backed JSON + CI checks -- deterministic hashing foundation: [`docs/deterministic_hashing.md`](docs/deterministic_hashing.md) -- audit-friendly and CI reproducible +- `coding_workflow_pr_review` +- `incident_response_page_triage` +- `cross_domain_operational_dependency_workflow` -Foundational deterministic components: - -- `ReferenceIndex` and `EventLogArtifactAdapter`: track context references and deterministically fingerprint event payloads. -- `ReplayArtifactWriter v1-alpha.1`: generates deterministic standalone JSON artifacts for verifiable snapshots. +```mermaid +flowchart LR + A["coding_workflow_pr_review"] --> L1["baseline"] + A --> L2["mild"] + A --> L3["moderate"] + A --> L4["severe"] + B["incident_response_page_triage"] --> L1 + B --> L2 + B --> L3 + B --> L4 + C["cross_domain_operational_dependency_workflow"] --> L1 + C --> L2 + C --> L3 + C --> L4 + L1 --> M["manifest registration"] + L2 --> M + L3 --> M + L4 --> M + M --> N["multi-family artifact"] + N --> O["deterministic SVG"] +``` --- -## Quick start +## Failure labels + +Primary registered labels used across deterministic admissibility validation: + +- `POLICY_ORDER_BROKEN`: required policy order failed. +- `TOOL_ORDER_VIOLATION`: replayed tool sequence violated required order. +- `CAUSAL_DEPENDENCY_LOSS`: required causal edges were not preserved. +- `DEPENDENCY_CHAIN_BREAK`: required dependency chain broke. +- `RECOVERY_PATH_INVALID`: recovery reachability contract failed. +- `RECOVERY_PATH_LOSS`: required recovery route was not preserved. +- `INVARIANT_VIOLATION`: declared invariant failed. +- `EVIDENCE_LOSS`: required evidence did not survive replay. +- `EVIDENCE_SURVIVAL_LOSS`: expected evidence units were not preserved. +- `HIGH_CRITICAL_EVIDENCE_LOSS`: high-critical evidence was lost. +- `CONSTRAINT_DRIFT`: constraint preservation drifted. +- `BLOCKER_DETACHMENT`: blocker attachment was lost. +- `GOVERNANCE_DRIFT`: governance constraint drifted. +- `ARTIFACT_INTEGRITY_VIOLATION`: artifact integrity drifted. +- `REPLAY_NON_REPRODUCIBLE`: deterministic replay was not reproducible. -Install the Python test dependency set: - -```bash -python -m pip install -e '.[test]' +```mermaid +flowchart LR + O1["POLICY_ORDER_BROKEN"] --> C1["ordering"] + O2["TOOL_ORDER_VIOLATION"] --> C1 + D1["CAUSAL_DEPENDENCY_LOSS"] --> C2["causality/dependency"] + D2["DEPENDENCY_CHAIN_BREAK"] --> C2 + R1["RECOVERY_PATH_INVALID"] --> C3["recovery/reachability"] + R2["RECOVERY_PATH_LOSS"] --> C3 + I1["INVARIANT_VIOLATION"] --> C4["invariant/no-orphan"] + E1["EVIDENCE_LOSS"] --> C5["evidence/criticality"] + E2["EVIDENCE_SURVIVAL_LOSS"] --> C5 + E3["HIGH_CRITICAL_EVIDENCE_LOSS"] --> C5 + E4["CONSTRAINT_DRIFT"] --> C5 + E5["BLOCKER_DETACHMENT"] --> C5 + E6["GOVERNANCE_DRIFT"] --> C5 + A1["ARTIFACT_INTEGRITY_VIOLATION"] --> C6["artifact/reproducibility"] + A2["REPLAY_NON_REPRODUCIBLE"] --> C6 ``` -Run the full reviewer check: +--- -```bash -npm run check -``` +## How this differs from adjacent systems -Run core replay tests: +| System type | Stores state | Compresses context | Orchestrates agents | Deterministically validates replay loss | +| --- | --- | --- | --- | --- | +| Workflow runtimes | Sometimes | No | Yes | No | +| Agent frameworks | Sometimes | Sometimes | Yes | Usually no | +| Vector memory / RAG | Yes | Retrieval-centric | No | No | +| Learned prompt compressors | Sometimes | Yes | No | Usually no | +| LLM-as-judge evaluators | Sometimes | N/A | No | No | +| CompTextv7 | Yes | Yes | No | Yes | -```bash -pytest tests/test_multi_family_admissibility_artifact.py -q -pytest tests/test_failure_taxonomy.py -q -pytest tests/test_evidence_metrics_adaptive_policy.py -q -pytest tests/test_paper_replay_bench.py tests/test_agent_trace_replay.py -q -``` +--- -Regenerate deterministic replay artifacts: +## CI and merge gate -```bash -python tests/utils/paper_replay_runner.py -python tests/utils/agent_trace_replay_runner.py -python scripts/generate_iterative_replay_degradation_artifacts.py +```mermaid +flowchart LR + A["PR head SHA"] --> B["GitHub Actions"] + B --> C["Agent Workflow Checks"] + B --> D["hash-companion-validation"] + B --> E["CompText V7 Industrial Validation"] + C --> F["all success"] + D --> F + E --> F + F --> G["squash merge"] ``` -Additional validation helpers: - -```bash -python scripts/validate.py replay -python scripts/validate.py token -python scripts/validate.py forensic -python scripts/validate_contracts.py -python scripts/validate_api_exports.py -``` +Vercel/Netlify/deployment previews are not merge gates unless explicitly scoped. --- @@ -351,61 +279,45 @@ python scripts/validate_api_exports.py ```text Comptextv7/ -├── artifacts/ # committed deterministic replay benchmark JSON -├── benchmarks/ # deterministic compression, replay, and audit runners -├── contracts/ # machine-readable validation and handoff contracts -├── dashboard/ # backend plus React operations console -├── docs/ # benchmark, artifact, research, and legacy showcase notes -├── reports/replay_continuity/ # adversarial continuity metrics and SVG charts -├── scripts/ # validation, reporting, and artifact tooling -├── showcase/app/ # legacy in-repo Vite app; Monaco UI lives in external repo -├── src/ # compression, audit, and validation modules -├── tests/ # Python regression and replay validation tests -└── README.md +├── artifacts/ +├── docs/ +├── fixtures/ +├── reports/ +├── scripts/ +├── tests/ +└── src/ + ├── core/ + └── validation/ ``` --- -## Cloud-first validation +## Replay-validation roadmap -Comptextv7 is biased toward artifact-backed review rather than local machine trust. +```mermaid +flowchart LR + A["failure taxonomy"] --> B["cross-domain fixture families"] + B --> C["forensic reports"] + C --> D["schema stabilization"] + D --> E["cross-family comparison"] + E --> F["integrity gates"] + F --> G["golden corpus"] + G --> H["offline import/export"] +``` -| Workflow | Role | -|---|---| -| [`ci.yml`](.github/workflows/ci.yml) | Runs deterministic replay, tests, telemetry, and validation gates. | -| [`agent-checks.yml`](.github/workflows/agent-checks.yml) | Runs repository, report, contract, and dashboard validation. | -| [`validation_runner.yml`](.github/workflows/validation_runner.yml) | Publishes compact cloud validation result artifacts. | +- Forensic audit reports with deterministic exports. +- Artifact schema stabilization. +- Cross-family degradation comparison. +- Minimal artifact integrity gates. +- Golden corpus foundation. +- Offline import/export schemas only. --- ## Limitations -- Metrics are fixture-bound baselines and do not reflect universal real-world correctness. -- Fixtures are curated and checked in. -- Structured agent traces currently replay near-losslessly. -- This is not solved AI memory. -- This is not production telemetry. -- This is not an autonomous agent framework. -- Iterative degradation remains a bounded fixture prototype. - ---- - -## Safety boundaries - -Do not commit: - -- proprietary customer data -- secrets, API keys, tokens, cookies, or credentials -- raw production logs -- unsanitized replay fixtures -- private deployment credentials or environment dumps - -Comptextv7 is a deterministic, synthetic-only research prototype for operational replay persistence and reviewable diagnostic infrastructure. - ---- - -## Final principle - -> Comptextv7 does not evaluate whether an agent remembers. -> -> It evaluates whether compressed operational state remains admissible under deterministic replay. +- Metrics are fixture-bound and internal to checked-in datasets. +- Fixtures are curated and checked in, not live production traces. +- This is a deterministic prototype, not a production-readiness claim. +- This is not a universal AI-memory claim. +- This does not claim runtime integration or orchestration coverage. From 75fccb41d095416bc79bba1feecbb362a118f2d5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexander=20K=C3=B6lnberger?= <159939812+ProfRandom92@users.noreply.github.com> Date: Wed, 20 May 2026 01:58:47 -0700 Subject: [PATCH 2/3] Add drift-locked agent replay consistency metric --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b27dc4a..e4722c8 100644 --- a/README.md +++ b/README.md @@ -98,6 +98,7 @@ These are internal fixture-bound results, not external benchmark claims, product | `AGGRESSIVE` replay consistency | `0.125000` | | Paper avg compression | `1.347063` | | Agent avg compression | `1.773954` | +| Agent replay consistency | `1.000000` | | Agent operational drift | `0.000000` | --- From c1101fceb02dbfdceabf399b0b21cc947f0cea9a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexander=20K=C3=B6lnberger?= <159939812+ProfRandom92@users.noreply.github.com> Date: Wed, 20 May 2026 02:01:51 -0700 Subject: [PATCH 3/3] Address README review feedback --- README.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index e4722c8..0c5095d 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Comptextv7 logo

-

Comptextv7

+

CompText V7

Deterministic replay-survivability validation for compressed operational state in long-horizon AI agents. @@ -27,7 +27,7 @@ · Failure Taxonomy

-CompTextv7 does not ask whether a compressed summary sounds good. It asks whether the compressed state can still replay the operational facts required to continue the work. +CompText V7 does not ask whether a compressed summary sounds good. It asks whether the compressed state can still replay the operational facts required to continue the work. --- @@ -35,11 +35,11 @@ CompTextv7 does not ask whether a compressed summary sounds good. It asks whethe Long-horizon agents compress prior work into smaller summaries. Those summaries can silently lose blockers, constraints, evidence, dependency order, recovery paths, and tool order. -CompTextv7 treats that as a deterministic replay-validation problem. It checks whether compressed operational state remains admissible after reconstruction using fixture-defined contracts, exact scoring, failure labels, committed artifacts, and CI gates. +CompText V7 treats that as a deterministic replay-validation problem. It checks whether compressed operational state remains admissible after reconstruction using fixture-defined contracts, exact scoring, failure labels, committed artifacts, and CI gates. --- -## What CompTextv7 is +## What CompText V7 is - Deterministic replay-validation infrastructure for operational state. - Fixture-bound and contract-linked. @@ -47,7 +47,7 @@ CompTextv7 treats that as a deterministic replay-validation problem. It checks w - CI-reproducible through repository checks. - Focused on operational admissibility, not prose quality. -## What CompTextv7 is not +## What CompText V7 is not - Agent framework. - Workflow orchestrator. @@ -165,11 +165,13 @@ flowchart LR ## Verify locally ```bash +python -m pip install -e '.[test]' npm install --no-save --no-package-lock npm run check pytest tests/test_failure_taxonomy.py -q pytest tests/test_multi_family_admissibility_artifact.py -q pytest tests/test_multi_family_svg_renderer.py -q +pytest tests/test_paper_replay_bench.py tests/test_agent_trace_replay.py -q ``` --- @@ -254,7 +256,7 @@ flowchart LR | Vector memory / RAG | Yes | Retrieval-centric | No | No | | Learned prompt compressors | Sometimes | Yes | No | Usually no | | LLM-as-judge evaluators | Sometimes | N/A | No | No | -| CompTextv7 | Yes | Yes | No | Yes | +| CompText V7 | Yes | Yes | No | Yes | ---