Problem
A campaign can pick parameter values (in particular, KV cache capacity K) that demonstrate the desired effect without corresponding to any realistic (model, GPU, workload) tuple. The math works, the apparatus passes, the result is real for the synthetic regime — but the result doesn't say anything about real-hardware behavior, and a reviewer can ask "you constructed your own contention".
In paper-memorytime-mirage, the campaign locked workload parameters (concurrency=32, P_A=1024, P_B mixture, D=1) and chose K (~500–1000 blocks = 8–16K tokens) to make the mirage manifest. The realistic K for llama-3.1-8b on H100 is ~24,576 blocks ≈ 393K tokens (derived from 48 GiB available for KV at gpu_memory_utilization=0.9, divided by 128 KiB/token). At realistic K, the chosen workload uses ~0.6% of cache — fully un-contended — and the mirage cannot manifest at all.
The agent picked K based on bucket-engagement math (K such that ω·K is below typical occupancy), which is mathematically correct for showing the mechanism, but it answers a different question from "does this matter on real hardware?".
Desired behavior
A new optional schema block on bundle.yaml:
physical_realism_check:
model: meta-llama/llama-3.1-8b-instruct
gpu: H100-80GB
gpu_memory_utilization: 0.9
derived_k_realistic: 24576
k_used_in_experiment: 1000
k_realism_ratio: 0.041 # k_used / k_realistic
justification: |
K is set 24x smaller than physical to demonstrate mechanism isolation
in the contested-cache regime. Production scenarios where K contention
actually happens: long-context RAG, smaller GPUs, larger models, multi-LoRA.
A separate campaign at realistic K with a contention-inducing workload
(RAG-scale P̄=8K, concurrency=64) is needed to test under production conditions.
The DESIGN agent must populate this block when its verified_parameters includes any K-class quantity (KV blocks, max_model_len, batched-tokens budget). If k_realism_ratio < 0.5 (configurable threshold), nous emits a soft warning: "your K is 24× smaller than physical realism. State your reasoning in justification or raise K."
Suggested implementation sketch
- Add
physical_realism_check to bundle.yaml schema, optional but expected when K-class parameters are set.
- Add a methodology-prompt section for DESIGN: "When you set K-class parameters, populate
physical_realism_check with derivation and justification."
- The validator emits a soft warning if
k_realism_ratio < threshold and justification is empty/perfunctory.
- Document in
nous schema bundle rendering.
Acceptance criteria
Severity
HIGH (paper-defensibility) — vulnerable to a reviewer "you constructed your own contention" criticism if the realism check isn't surfaced.
Source
friction-report.md F15, paper-memorytime-mirage campaign (2026-05).
Part of friction-report tracking issue #245.
Problem
A campaign can pick parameter values (in particular, KV cache capacity K) that demonstrate the desired effect without corresponding to any realistic (model, GPU, workload) tuple. The math works, the apparatus passes, the result is real for the synthetic regime — but the result doesn't say anything about real-hardware behavior, and a reviewer can ask "you constructed your own contention".
In paper-memorytime-mirage, the campaign locked workload parameters (concurrency=32, P_A=1024, P_B mixture, D=1) and chose K (~500–1000 blocks = 8–16K tokens) to make the mirage manifest. The realistic K for llama-3.1-8b on H100 is ~24,576 blocks ≈ 393K tokens (derived from 48 GiB available for KV at gpu_memory_utilization=0.9, divided by 128 KiB/token). At realistic K, the chosen workload uses ~0.6% of cache — fully un-contended — and the mirage cannot manifest at all.
The agent picked K based on bucket-engagement math (K such that ω·K is below typical occupancy), which is mathematically correct for showing the mechanism, but it answers a different question from "does this matter on real hardware?".
Desired behavior
A new optional schema block on bundle.yaml:
The DESIGN agent must populate this block when its
verified_parametersincludes any K-class quantity (KV blocks, max_model_len, batched-tokens budget). Ifk_realism_ratio < 0.5(configurable threshold), nous emits a soft warning: "your K is 24× smaller than physical realism. State your reasoning injustificationor raise K."Suggested implementation sketch
physical_realism_checkto bundle.yaml schema, optional but expected when K-class parameters are set.physical_realism_checkwith derivation and justification."k_realism_ratio < thresholdandjustificationis empty/perfunctory.nous schema bundlerendering.Acceptance criteria
physical_realism_check.Severity
HIGH (paper-defensibility) — vulnerable to a reviewer "you constructed your own contention" criticism if the realism check isn't surfaced.
Source
friction-report.mdF15, paper-memorytime-mirage campaign (2026-05).Part of friction-report tracking issue #245.