Skip to content

Latest commit

 

History

History
314 lines (235 loc) · 19.6 KB

File metadata and controls

314 lines (235 loc) · 19.6 KB

Context-Pipe Scenarios: The Mental Supply Chain Lab

This document consolidates the details, objectives, and findings for all 27 test scenarios executed in the std-context-lab.

The Verifiable Proof Standard: Every scenario directory contains a raw terminal .log file and a structured EVIDENCE.md file, providing immutable proof of technical claims.

Current Baseline: context-pipe v0.5.4 | semantic-sift v0.3.5 | Last update: 2026-05-30

Channel Key: ✅ Verified · ⚠️ Partial (infra drift) · ❌ Hard fail · ⏳ Pending

Note on Shell vs pi.dev: Every bash command in this session IS a pi.dev execution, so the two columns are identical. Gemini CLI refers to a separate environment run on a different machine.


Phase 1: Feature Validation

Scenario 01: Protocol Basics

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Prove the fundamental stdin/stdout contract and language agnosticism of CPP.
  • Setup: basics-pipe — Node.js transformer.jssemantic-sift-cli.
  • Status: ✅ Verified. Multi-language orchestration confirmed (Node.js → Rust via stdio).
  • Note: Must run from scenario directory — transformer.js is referenced by relative path.
  • Proof: EVIDENCE.md

Scenario 02: Shadow Discovery

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Prove "Zero Tool Bloat" — MCP nodes execute without registering in the global IDE.
  • Setup: mcp-pipe list against local pipes.json with shadow server configured.
  • Status: ✅ Verified. mcp-pipe list correctly surfaces configured pipes + PATH tools.
  • Proof: EVIDENCE.md

Scenario 03: Research Synthesizer

  • Last Verified In: v0.5.2 Shell (regression) · v0.4.3 Gemini CLI
  • Channels: Shell ❌ · Gemini CLI ✅ · pi.dev ❌
  • Objective: Prove the "Mental Supply Chain" by chaining fetch → markitdown → sift.
  • Setup: 3-node pipe — mcp-server-fetchmarkitdownsemantic-sift.
  • Status: ❌ Blocked by REPORT_041 (_run_mcp_node hang — all MCP node pipes broken). ✅ REPORT_041 closed in v0.5.5 — needs re-verification.
  • Proof: EVIDENCE.md

Scenario 04: Core Pre-Filters

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Prove massive context reduction using deterministic OS-native binaries.
  • Setup: log-optimizerrg/findstr pre-filter → semantic-sift-cli.
  • Status: ✅ Verified.
  • Proof: EVIDENCE.md

Scenario 05: Pipe-Tee Inspection

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Prove non-breaking stream auditing using the T-Pipe stream splitter.
  • Setup: tee-pipefindstr ERRORsemantic-sift-cli (with tee to .tee/ folder).
  • Status: ✅ Verified. Snapshots written to .tee/ without interrupting stdout.
  • Proof: EVIDENCE.md

Scenario 06: Agent-to-Agent (A2A) Handoff

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Prove "Refined Handoff ROI" during multi-agent workflows.
  • Setup: mcp-pipe handoff --from-agent AgentA --to-agent AgentB.
  • Status: ✅ Verified. Handoff distillation + ROI telemetry working.
  • Proof: EVIDENCE.md

Scenario 07: The Mental Supply Chain (E2E Flagship)

  • Last Verified In: v0.5.2 Shell (regression) · v0.4.3 Gemini CLI
  • Channels: Shell ❌ · Gemini CLI ✅ · pi.dev ❌
  • Objective: Full E2E orchestration across MCP servers, Node.js scripts, Python CLIs, and Rust engines.
  • Setup: e2e-supply-chain — 5-node pipe including an MCP node.
  • Status: ❌ Blocked by REPORT_041 (_run_mcp_node hang — identical to S03). ✅ REPORT_041 closed in v0.5.5 — needs re-verification.
  • Proof: EVIDENCE.md

Scenario 08: Multi-Modal Distillation

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Prove format agnosticism (HTML, PDF, DOCX) using markitdown as pre-refinery node.
  • Setup: multi-modal-pipemarkitdownsemantic-sift.
  • Status: ✅ Verified. HTML document distilled to Markdown.
  • Proof: EVIDENCE.md

Scenario 09: Adaptive Pressure Simulation

  • Last Verified In: v0.4.3 (original) · v0.5.2 Shell ⚠️
  • Channels: Shell ⚠️ · Gemini CLI ✅ · pi.dev ⚠️
  • Objective: Prove "Adaptive Signaling" — dynamic ${VAR} injection into pipeline arguments.
  • Setup: adaptive-sift pipe with SIFT_RATE env var injection.
  • Status: ⚠️ Infrastructure drift. adaptive-sift pipe removed from referenced pipes.json. Core ${VAR} substitution confirmed working via Scenario 25.
  • Proof: EVIDENCE.md

Scenario 10: The Structured Data Auditor

  • Last Verified In: v0.4.3 (original) · v0.5.2 Shell ⚠️
  • Channels: Shell ⚠️ · Gemini CLI ✅ · pi.dev ⚠️
  • Objective: Prove the "Structured Data Exemption" — automatic bypass of valid JSON payloads.
  • Setup: json-auditor pipe against mock SQLite DB output (1,000 telemetry rows).
  • Status: ⚠️ Infrastructure drift. json-auditor pipe removed from referenced pipes.json.
  • Proof: EVIDENCE.md

Scenario 11: Supply Chain Visualization

  • Last Verified In: v0.4.3 (original) · v0.5.2 Shell ⚠️
  • Channels: Shell ⚠️ · Gemini CLI ✅ · pi.dev ⚠️
  • Objective: Prove the "System over Patch" (Observability) claim via Mermaid diagram generation.
  • Setup: pipes_to_mermaid.py + viz-pipe meta-pipeline.
  • Status: ⚠️ Infrastructure drift. Script-based test; pipe engine unchanged.
  • Proof: EVIDENCE.md

Phase 2: Operational Hardening

Scenario 12: The "Giant File" Heart-Attack

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Stress-test stream stability and Rust-sidecar memory management.
  • Setup: heart-attack-pipe — 50.6 MB raw log via --input-file.
  • Status: ✅ Verified. 50MB processed without memory exhaustion.
  • Proof: EVIDENCE.md

Scenario 13: The Resiliency Gauntlet

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Test error handling and failure bypass under cascading node failures.
  • Setup: gauntlet-pipe — 5-node pipe with intentional failures and optional: true schema.
  • Status: ✅ Verified. Orchestrator bypassed failures and completed the pipeline.

Gap tests 2026-05-30:

  • required-timeout-pipe: Timeout works via env var (REPORT_039: node-level timeout field now fixed in v0.5.3)
  • optional-condition-pipe: optional: true + condition interaction confirmed on both paths
  • False pass corrected: forever_sleep.py reads 1 char and exits — never triggered a timeout in original test

Scenario 14: The Security "Black Hole" (PII)

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Prove "Zero-Trust Context" via PII scrubbing before sifting.
  • Setup: security-gatewaypii_scrubber.pysemantic-sift-cli against 1,500 fake secrets.
  • Status: ✅ Verified. Must run from scenario directory (pii_scrubber.py relative path).
  • Proof: EVIDENCE.md

Scenario 15: Recursive Supply Chains

  • Last Verified In: v0.4.3 (original) · v0.5.2 Shell ⚠️
  • Channels: Shell ⚠️ · Gemini CLI ✅ · pi.dev ⚠️
  • Objective: Prove "Pipeline Encapsulation" — calling a pipe inside another pipe.
  • Setup: recursive-pipe invoking mcp-pipe run inner-distiller.
  • Status: ⚠️ Infrastructure drift. Referenced pipes removed from pipes.json. Echo Guard logic unchanged.
  • Proof: EVIDENCE.md

Scenario 16: Protocol Violation Stress

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Stress-test against malicious or corrupt binary data streams.
  • Setup: bad_actor.py emitting invalid UTF-8 bytes piped through basics-pipe.
  • Status: ✅ Verified. Orchestrator sanitized non-UTF8 bytes without crashing.
  • Proof: EVIDENCE.md

Scenario 17: Version Awareness & Self-Update

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Verify mcp-pipe verify reports correct component versions and paths.
  • Setup: mcp-pipe verify — reports context-pipe, pipes.json, semantic-sift-cli, node resolution.
  • Status: ✅ Verified. All components correctly identified.
  • Proof: EVIDENCE.md

Scenario 18: Autonomous Dynamic Sifting

  • Last Verified In: v0.4.3 (original) · v0.5.2 Shell ⚠️
  • Channels: Shell ⚠️ · Gemini CLI ✅ · pi.dev ⚠️
  • Objective: Prove "Dynamic Sifting" — agent-assembled JIT processing graphs.
  • Setup: run-dynamic with JSON node array, tested via mcp-pipe run-dynamic.
  • Status: ⚠️ Partial. Hardcoded grep not on Windows PATH. Re-run with rg passes (10 MB haystack in 2.9s).

Gap tests 2026-05-30: Phase 11 features (type:"validator", condition, id+next) all work via run-dynamic — node schemas pass through unmodified to run_pipe.


Phase 3: Battle Testing & Rust Core

Scenario 19: Proactive Gating (BeforeTool)

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Test BeforeTool hook boundary conditions against massive files.
  • Setup: wrap_payload() (formerly wrap()) called with 50 MB file path — expects {"decision":"deny"}.
  • Status: ✅ Verified. Denied 50MB read, allowed small config, fail-safe on unknown tools.
  • Proof: EVIDENCE.md

Scenario 20: Line Range Precision

  • Last Verified In: v0.5.2 (regression 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Stress-test --start-line / --end-line slicing logic.
  • Setup: standard-distill with --start-line 10 --end-line 20 against numbered lines file.
  • Status: ✅ Verified. Bit-perfect extraction for valid ranges, graceful fallback for OOB/Inverted.
  • Proof: EVIDENCE.md

Scenario 21: Rust Core Stress, Concurrency & Parity

  • Last Verified In: v0.4.5 (original) · v0.5.2 (regression)
  • Channels: Shell ✅ · Gemini CLI ✅ · pi.dev ✅
  • Objective: Verify cpipe (Rust) functional parity with Python core.
  • Setup: stress-test pipe run via both mcp-pipe (Python) and cpipe (Rust).
  • Status: ✅ Verified. 21.4x speedup, 100% functional parity across all 21 scenarios.
  • Proof: EVIDENCE.md

Phase 4: v0.5.0 Feature Validation (New)

Scenario 22: Pipe Transparency Layer (Phase 9)

  • Last Verified In: v0.5.2 (first run 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ⏳ · pi.dev ✅
  • Objective: Verify real-time [PIPE] log lines emitted to stderr via logging block in pipes.json.
  • Setup: transparent-compact, transparent-verbose, custom-prefix-pipe, no-logging-pipe pipes.
  • Status: ✅ Verified. All 7 tests pass: compact level (one [PIPE] ✓ exit line per node), verbose level (entry + exit lines), custom prefix [XPIPE] override, env var fallback (PIPE_LOG_LEVEL), per-pipe logging block overrides env var, no logging → silent stderr, Rust cpipe parity.
  • Proof: EVIDENCE.md

Scenario 23: Conditional Branching — condition Predicates (Phase 11A)

  • Last Verified In: v0.5.2 (first run 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ⏳ · pi.dev ✅
  • Objective: Verify condition key on nodes skips or executes based on 5 predicates.
  • Setup: condition-size-gate, condition-artifact-pipe, condition-contains-error-pipe, condition-fail-open-pipe.
  • Status: ✅ Verified. All predicates confirmed: size:>N (skip/execute), size:<N (inverse), artifact:missing (skip when exists, execute when absent), artifact:exists (inverse), contains:<string> (skip when absent, execute when present), unknown predicate fails-open (warns + runs). Rust cpipe parity.
  • Proof: EVIDENCE.md

Scenario 24: DAG Validator Nodes + Loop Guard (Phase 11B/C)

  • Last Verified In: v0.5.2 (first run 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ⏳ · pi.dev ✅
  • Objective: Verify type: "validator" nodes route by exit code; id+next jump; 100-step loop guard.
  • Setup: validator-exit-router, validator-exit-1-router, explicit-jump-pipe, loop-guard-pipe.
  • Status: ✅ Verified: Exit 0 → pass-sift branch, Exit 1 → fail-passthrough branch, id+next explicit jump (node B skipped — [C][A]input), 100-step loop guard. Gap tests added: artifact-fork-pipe (two-route fork via validator), validator-loop-pipe (validator-based cycle → loop guard at 100 steps), nested-validator-pipe (validator inside branch_sequences — two levels of DAG routing).
  • Proof: EVIDENCE.md

Scenario 25: Runtime Variable Injection (Phase 12A)

  • Last Verified In: v0.5.2 (first run 2026-05-30)
  • Channels: Shell ⚠️ · Gemini CLI ⏳ · pi.dev ⚠️
  • Objective: Verify --var KEY=VALUE substitution, vars defaults, env fallback, and missing-var fail-fast.
  • Setup: var-rate-pipe, var-missing-pipe, var-multi-pipe, var-env-fallback-pipe.
  • Status: ⚠️ Partial. --var injection ✅ · pipe vars defaults ✅ · caller overrides default ✅ · multiple --var flags ✅ · env var fallback ✅ · empty-default fail-fast (var-empty-default-fail-pipe) ✅ — error Missing pipe variable: TOKEN before spawn. --manifest + --var combined ✅. REPORT_038 closed in v0.5.3: missing var now fail-fast with clear error before subprocess spawn. Full test suite passes.
  • Proof: EVIDENCE.md

Scenario 26: Run Manifests (Phase 12B)

  • Last Verified In: v0.5.2 (first run 2026-05-30)
  • Channels: Shell ✅ · Gemini CLI ⏳ · pi.dev ✅
  • Objective: Verify --manifest <path> and "manifest": "auto" write structured JSON execution traces.
  • Setup: standard-distill with --manifest, auto-manifest-pipe, manifest-fail-pipe.
  • Status: ✅ Verified: Explicit path creates manifest with pipe, startedAt, completedAt, status, steps ✅. Fail pipe records status:"fail" and ok:false ✅. "manifest":"auto" writes to project root .pipe_cache/<name>-<iso>.json ✅. No manifest created by default ✅.
  • Proof: EVIDENCE.md

Scenario 27: MCP Banner Tolerance (Phase 13)

  • Last Verified In: v0.5.4 (2026-05-30)
  • Channels: Shell ❌ · Gemini CLI ⏳ · pi.dev ❌
  • Objective: Verify graceful skip of non-JSON stdout lines from noisy MCP servers.
  • Setup: mock_noisy_server.py (configurable banner count) + banner-pipe, banner-verbose-pipe, banner-overflow-pipe.
  • Status: ✅ Fixed in v0.5.5. REPORT_041 resolved — MCP node no longer hangs. Pipe completes with correct [ECHO] output. Note: MCP SDK internal reader logs Failed to parse JSONRPC warnings for banner lines on stderr (cosmetic — pipe succeeds, banner tolerance works).
  • Proof: EVIDENCE.md

Active Blockers

Bug Scenarios Affected Description Status
REPORT_041 03, 07, 27 _run_mcp_node hangs when called from module context — fixed in v0.5.5 (shlex posix=False + server_args). ✅ Closed in v0.5.5
REPORT_037 03, 07, 27 _StdoutToleranceWrapper missing async context manager protocol — all MCP node pipes broken since v0.5.0 ✅ Closed in v0.5.3
REPORT_038 25 Missing ${VAR} not caught before node spawn — literal string passed to subprocess ✅ Closed in v0.5.3
REPORT_039 13 node.get("timeout") ignored by orchestrator — per-node "timeout" in pipes.json silently ignored ✅ Closed in v0.5.3
REPORT_040 27 StdioServerParameters missing encoding/encoding_error_handler on Windows — UnicodeDecodeError on non-UTF8 banner lines ✅ Closed in v0.5.4

Infrastructure Drift (Not Engine Regressions)

Scenario Drift Fix Required
01, 14 Relative script paths require cd to scenario dir Optionally update pipes.json to use absolute paths
09, 10, 15 Referenced pipes removed from cross-borrowed pipes.json Add standalone pipes.json to each scenario
18 Hardcoded grep not on Windows PATH Update scenario to use rg on Windows

Summary

Phase Scenarios Shell ✅ Shell ⚠️ Shell ❌ pi.dev ✅ pi.dev ⚠️ pi.dev ❌
Phase 1 (Feature Validation) 01–11 5 4 2 5 4 2
Phase 2 (Operational Hardening) 12–18 5 2 0 5 2 0
Phase 3 (Battle Testing & Rust) 19–21 3 0 0 3 0 0
Phase 4 (v0.5.0 New Features) 22–27 5 1 0 5 1 0
Total 27 18 7 2 18 7 2

The core sifting engine, DAG orchestration, resiliency, line ranges, A2A handoff, tee-pipes, multi-modal, dynamic pipes, Phase 9 transparency, Phase 11 branching, Phase 12 manifests, Phase 12 variables, and Phase 13 (MCP banner tolerance) are all verified working on v0.5.5.

Scenario 27 is now ✅ fixed in v0.5.5. Scenarios 03 and 07 were also blocked by REPORT_041 — closed in v0.5.5, pending re-verification. Scenarios 09, 10, 15, 18, and 25 have infrastructure drift (missing pipes or Windows PATH issues) that don't affect the engine. Scenario 25 was partially failing due to REPORT_038, now closed in v0.5.3.

The Gemini CLI column remains pending for scenarios 22–27.