Skip to content

[F15] Bundle schema: physical_realism_check block — DESIGN must justify K vs (model, GPU) #260

@sriumcp

Description

@sriumcp

Problem

A campaign can pick parameter values (in particular, KV cache capacity K) that demonstrate the desired effect without corresponding to any realistic (model, GPU, workload) tuple. The math works, the apparatus passes, the result is real for the synthetic regime — but the result doesn't say anything about real-hardware behavior, and a reviewer can ask "you constructed your own contention".

In paper-memorytime-mirage, the campaign locked workload parameters (concurrency=32, P_A=1024, P_B mixture, D=1) and chose K (~500–1000 blocks = 8–16K tokens) to make the mirage manifest. The realistic K for llama-3.1-8b on H100 is ~24,576 blocks ≈ 393K tokens (derived from 48 GiB available for KV at gpu_memory_utilization=0.9, divided by 128 KiB/token). At realistic K, the chosen workload uses ~0.6% of cache — fully un-contended — and the mirage cannot manifest at all.

The agent picked K based on bucket-engagement math (K such that ω·K is below typical occupancy), which is mathematically correct for showing the mechanism, but it answers a different question from "does this matter on real hardware?".

Desired behavior

A new optional schema block on bundle.yaml:

physical_realism_check:
  model: meta-llama/llama-3.1-8b-instruct
  gpu: H100-80GB
  gpu_memory_utilization: 0.9
  derived_k_realistic: 24576
  k_used_in_experiment: 1000
  k_realism_ratio: 0.041   # k_used / k_realistic
  justification: |
    K is set 24x smaller than physical to demonstrate mechanism isolation
    in the contested-cache regime. Production scenarios where K contention
    actually happens: long-context RAG, smaller GPUs, larger models, multi-LoRA.
    A separate campaign at realistic K with a contention-inducing workload
    (RAG-scale P̄=8K, concurrency=64) is needed to test under production conditions.

The DESIGN agent must populate this block when its verified_parameters includes any K-class quantity (KV blocks, max_model_len, batched-tokens budget). If k_realism_ratio < 0.5 (configurable threshold), nous emits a soft warning: "your K is 24× smaller than physical realism. State your reasoning in justification or raise K."

Suggested implementation sketch

  1. Add physical_realism_check to bundle.yaml schema, optional but expected when K-class parameters are set.
  2. Add a methodology-prompt section for DESIGN: "When you set K-class parameters, populate physical_realism_check with derivation and justification."
  3. The validator emits a soft warning if k_realism_ratio < threshold and justification is empty/perfunctory.
  4. Document in nous schema bundle rendering.

Acceptance criteria

  • Bundle schema documents physical_realism_check.
  • DESIGN methodology prompt instructs the agent to populate it for K-class parameter choices.
  • Validator emits a soft warning when ratio is far from 1 and justification is missing.
  • Friction report F15 row in the tracking issue checks off.

Severity

HIGH (paper-defensibility) — vulnerable to a reviewer "you constructed your own contention" criticism if the realism check isn't surfaced.

Source

friction-report.md F15, paper-memorytime-mirage campaign (2026-05).


Part of friction-report tracking issue #245.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfriction-reportFrom external campaign-author friction reports

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions