DeliciousBuding
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 1 addition & 1 deletion b/‎AGENTS.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎ROADMAP.md‎
Lines changed: 27 additions & 0 deletions b/‎ROADMAP.md‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎docs/evidence/h2-output-cloud-geometry-20260525.md‎
Lines changed: 127 additions & 0 deletions b/‎docs/evidence/h2-output-cloud-geometry-20260525.md‎
Lines changed: 127 additions & 0 deletions
diff --git a/‎docs/evidence/reproduction-status.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/evidence/reproduction-status.md‎
Lines changed: 1 addition & 0 deletions
@@ -142,6 +142,9 @@ workspaces/**/artifacts/**
 !workspaces/black-box/artifacts/copymark-commoncanvas-response-contract-probe-20260512.json
 !workspaces/black-box/artifacts/copymark-commoncanvas-multiseed-stability-20260513.json
 !workspaces/black-box/artifacts/commoncanvas-denoising-loss-20260513.json
+!workspaces/black-box/artifacts/h2-output-cloud-geometry-20260525.json
+!workspaces/black-box/artifacts/h2-output-cloud-geometry-seed177-20260525.json
+!workspaces/black-box/artifacts/h2-output-cloud-geometry-label-shuffle-20260525.json
 !workspaces/black-box/artifacts/beans-lora-member-denoising-loss-scout-20260513.json
 !workspaces/black-box/artifacts/clid-image-identity-boundary-20260511.json
 !workspaces/black-box/artifacts/midfreq-same-noise-residual-cache-audit-20260512.json
 
@@ -28,7 +28,7 @@ Do not start from memory or old chat context. Re-anchor on repository files.
 
 ## Current Operating State
 
-- Active work: `2026-05-25 feature-packet channel consumer verdict is the latest consumer-boundary update. Tracing the Roots remains positive Research-side feature-packet evidence (AUC = 0.815826, TPR@1%FPR = 0.134000), but the Platform/Runtime feature-packet channel is deferred because the public surface still has only one singleton feature tensor packet, no second non-source-equivalent public feature-packet, and no raw target checkpoint / raw sample manifest / feature-regeneration assets. Do not create feature-packet schema, bundle export, validators, tests, Platform UI types, or Runtime runners from this singleton. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after feature-packet channel consumer verdict. LeakyCLIP remains CLIP / multimodal privacy watch-plus, not a second diffusion asset. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
+- Active work: `2026-05-25 H2 output-cloud geometry cache review is the latest metric verdict. It is a strong Research-side candidate on the existing H2 response cache (seed 176 logistic AUC = 0.961529, TPR@1%FPR = 0.333984, TPR@0.1%FPR = 0.117188; seed 177 AUC = 0.961048; label-shuffle AUC = 0.507595), but it is not admitted because the source cache used class-ordered sample offsets and needs a reseeded or interleaved order-control cache before any promotion. Do not create Platform/Runtime schema, bundle export, UI type, runner, KDE/shadow-density/repeat-count sweeps, or same-cache feature sweeps from this result. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after H2 output-cloud cache review. Feature-packet consumer lane remains deferred. LeakyCLIP remains CLIP / multimodal privacy watch-plus. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
 - Next GPU candidate: none selected
 - Long-horizon control: follow `ROADMAP.md` section
   `Long-Horizon Research Task Board（2026-05-13 起）` before reopening any
 
@@ -2,6 +2,33 @@
 
 > Last updated: 2026-05-25
 
+## 2026-05-25 H2 output-cloud geometry 候选信号
+
+最新决策：H2 response-strength 的既有 `512 / 512` response cache 暴露出一个强的
+output-output geometry 候选信号，但在 order-control 通过前不晋升、不释放产品消费、
+不扩展同 cache 特征工程。该复查只读取现有
+`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`，
+没有生成新响应、没有下载资产、没有运行 GPU。
+
+该 scorer 刻意排除 seed-to-output distance，只使用同 timestep repeat 间 RMSE、
+不同 timestep centroid RMSE 和 response-cloud Gram/PCA 特征。主结果为
+`AUC = 0.961529`，`ASR = 0.900391`，`TPR@1%FPR = 0.333984`，
+`TPR@0.1%FPR = 0.117188`，相对 raw H2 logistic 提升
+`AUC +0.055836`、`TPR@1%FPR +0.199218`、`TPR@0.1%FPR +0.117188`。
+seed `177` 稳定性仍为 `AUC = 0.961048`，`TPR@1%FPR = 0.353516`，
+`TPR@0.1%FPR = 0.130859`；label-shuffle sanity 回到随机级
+`AUC = 0.507595`。
+
+关键 caveat：源 cache 生成时 member 侧 `sample_offset = 0`，nonmember 侧
+`sample_offset = len(member_indices)`，output-output geometry 对采样种子和响应云形态敏感，
+所以当前强信号可能混入 class-ordered sampling effect。该结果只能作为
+Research-side 强候选；下一次重新评估只能是一个有界 reseeded / interleaved
+order-control response-cache scout。当前 slots 仍为：
+`active_gpu_question = none`，`next_gpu_candidate = none`，
+`CPU sidecar = none selected after H2 output-cloud cache review`。
+See
+[docs/evidence/h2-output-cloud-geometry-20260525.md](docs/evidence/h2-output-cloud-geometry-20260525.md)。
+
 ## 2026-05-25 Feature-Packet 通道消费者裁决
 
 最新决策：不在 2026-05-25 为 Tracing the Roots 单例开通 Platform/Runtime
 
@@ -0,0 +1,127 @@
+# H2 Output-Cloud Geometry Cache Review
+
+> Date: 2026-05-25
+> Status: candidate complementary signal / CPU-only cache review / order-control required before promotion / no GPU release / no admitted row
+
+## Question
+
+在已有 H2 response-strength cache 上，输出之间的几何结构是否携带不同于
+seed-to-output distance 的 membership 信号？
+
+本轮只复用现有
+`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`。
+没有生成新响应、没有下载资产、没有运行 GPU，也没有扩展同一路线的 KDE、shadow
+density、repeat-count 或特征 sweep。
+
+## Contract
+
+脚本：
+`scripts/review_h2_output_cloud_geometry.py`
+
+输入 cache：
+
+| Field | Value |
+| --- | ---: |
+| Samples | `1024` |
+| Members | `512` |
+| Nonmembers | `512` |
+| Timesteps | `40 / 80 / 120 / 160` |
+| Repeats per timestep | `2` |
+| Response shape | `[1024, 4, 2, 3, 32, 32]` |
+
+特征只使用 output-output geometry：
+
+| Feature family | Meaning |
+| --- | --- |
+| within-timestep pair RMSE | 同一 timestep 内不同 repeat 的响应距离 |
+| timestep centroid RMSE | 不同 timestep 的响应云 centroid 距离 |
+| response-cloud PCA trace/top share | 小响应云 Gram spectrum 的尺度和集中度 |
+
+该脚本刻意不读取 seed-to-output distance 特征，因此不会退化成原 H2 simple
+distance 评分器。
+
+## Result
+
+主结果：
+`workspaces/black-box/artifacts/h2-output-cloud-geometry-20260525.json`
+
+| Metric | Output-cloud logistic | Raw H2 logistic | Lowpass H2 logistic |
+| --- | ---: | ---: | ---: |
+| AUC | `0.961529` | `0.905693` | `0.895679` |
+| ASR | `0.900391` | `0.841797` | `0.831055` |
+| TPR@1%FPR | `0.333984` | `0.134766` | `0.148438` |
+| TPR@0.1%FPR | `0.117188` | `0.0` | `0.025391` |
+
+相对 raw H2：`AUC +0.055836`，`TPR@1%FPR +0.199218`，
+`TPR@0.1%FPR +0.117188`。
+
+相对 lowpass H2：`AUC +0.065850`，`TPR@1%FPR +0.185546`，
+`TPR@0.1%FPR +0.091797`。
+
+简单单特征不能解释该结果：
+
+| Best simple view | Feature | Orientation | AUC | TPR@1%FPR | TPR@0.1%FPR |
+| --- | --- | --- | ---: | ---: | ---: |
+| Best AUC | `centroid_rmse_40_160` | negative | `0.801182` | `0.03125` | `0.005859` |
+| Best low-FPR | `cloud_pca_top_share` | negative | `0.650913` | `0.078125` | `0.017578` |
+
+Seed stability check：
+`workspaces/black-box/artifacts/h2-output-cloud-geometry-seed177-20260525.json`
+
+| Metric | Seed 177 |
+| --- | ---: |
+| AUC | `0.961048` |
+| ASR | `0.900391` |
+| TPR@1%FPR | `0.353516` |
+| TPR@0.1%FPR | `0.130859` |
+
+Label-shuffle sanity：
+`workspaces/black-box/artifacts/h2-output-cloud-geometry-label-shuffle-20260525.json`
+
+| Metric | Label shuffle |
+| --- | ---: |
+| AUC | `0.507595` |
+| ASR | `0.521484` |
+| TPR@1%FPR | `0.011719` |
+| TPR@0.1%FPR | `0.003906` |
+
+这说明 scorer/evaluation 管线没有明显的标签直通泄漏。
+
+## Critical Caveat
+
+该结果仍然不能晋升。源 cache 的响应生成存在 class-ordered seed offset：
+`scripts/run_h2_response_strength_validation.py` 中 member 侧使用
+`sample_offset = 0`，nonmember 侧使用 `sample_offset = len(member_indices)`。
+Output-output geometry 对采样种子和响应云形态敏感，因此当前强信号可能混入
+class-ordered sampling effect。
+
+这不是要继续在同一个 cache 上补表格；它只定义一个非常窄的下一步：
+如果需要推进，最多释放一个有界 order-control / reseeded / interleaved
+response-cache scout，用来判断该强信号是否跨 class-order 控制保留。
+
+## Decision
+
+`candidate complementary signal / order-control required / no admitted row`。
+
+保留为 Research-side 强候选，因为它满足三个有价值条件：
+
+- 它是不同 observable：output-output cloud geometry，而不是 seed-to-output distance。
+- 它在同一 H2 cache 上明显强于 raw/lowpass H2 logistic。
+- 它通过了 seed-177 稳定性和 label-shuffle sanity。
+
+但当前不做以下事情：
+
+- 不升级到 Platform/Runtime admitted bundle。
+- 不新增产品 schema、Runtime runner、UI 类型或 bundle row。
+- 不在同一 cache 上展开 KDE、shadow density、repeat-count、特征族或融合 sweep。
+- 不释放 GPU 或大下载。
+
+下一次重新评估只允许基于一个 order-control cache 的结果。如果 reseeded /
+interleaved cache 仍保持强 AUC 和严格尾部恢复，再讨论是否进入更正式的 H2
+output-cloud 机制线；如果不保持，该候选直接关闭为 class-ordered response-cache
+artifact。
+
+## Platform and Runtime Impact
+
+None. The admitted Platform/Runtime bundle remains the existing five rows:
+`recon`, `PIA baseline`, `PIA defended`, `GSA`, and `DPDM W-1`.
@@ -73,6 +73,7 @@ Smoke tests and dry runs are engineering validation, not benchmark claims.
 | FERMI multi-relational tabular MIA | `hold-paper-source-only` | arXiv `2605.11527` reports strong multi-relational TabDDPM/TabDiff/TabSyn membership metrics, but the public surface has no code tree, target/split manifests, generated synthetic tables, feature/score rows, ROC arrays, metric JSON, or replay command. It does not reopen MIDST/tabular execution and releases no tabular dataset download, model training, or GPU work. See [fermi-tabular-artifact-gate-20260515.md](fermi-tabular-artifact-gate-20260515.md). |
 | True known-split mechanisms | `hold-weak` | MNIST/DDPM raw-loss and x0 scouts are weak. Tiny overfit final-layer gradient norm was positive only on the extreme `8 / 64` target, weakened at `16 / 64`, and a more optimistic `64 / 64` oracle gradient-prototype alignment follow-up is effectively random (`AUC = 0.500977`, `ASR = 0.562500`, zero low-FPR recovery). Fashion-MNIST DDPM now has three weak clean-split scouts: fixed-timestep PIA-style loss (`AUC = 0.535889`, `TPR@1%FPR = 0.03125`), SimA single-query score-norm (`AUC = 0.515137`, zero low-FPR recovery), and score-Jacobian sensitivity (`AUC = 0.511719`, zero low-FPR recovery). The Beans member-LoRA denoising-loss scout repaired pseudo-membership semantics by creating an exact `SD1.5 + Beans-member LoRA` target, but the internal conditional denoising-loss score is weak (`AUC = 0.414400`, reverse `0.585600`, `TPR@1%FPR = 0.080000`) and parameter-delta sensitivity is also weak (`AUC = 0.512000`). Do not run more final-layer gradient norm/cosine variants, Fashion-MNIST timestep/seed/`p`-norm/perturbation/norm/packet-size sweeps, or Beans LoRA train-step/rank/resolution/prompt/timestep/layer matrices by default. See [fashion-mnist-ddpm-score-jacobian-sensitivity-20260514.md](fashion-mnist-ddpm-score-jacobian-sensitivity-20260514.md), [fashion-mnist-ddpm-sima-score-norm-20260514.md](fashion-mnist-ddpm-sima-score-norm-20260514.md), [beans-lora-delta-sensitivity-20260513.md](beans-lora-delta-sensitivity-20260513.md), [beans-lora-member-denoising-loss-scout-20260513.md](beans-lora-member-denoising-loss-scout-20260513.md), [fashion-mnist-ddpm-pia-loss-scout-20260513.md](fashion-mnist-ddpm-pia-loss-scout-20260513.md), [tiny-known-split-gradient-prototype-alignment-20260513.md](tiny-known-split-gradient-prototype-alignment-20260513.md), [gradient-norm-stability-gate-20260512.md](gradient-norm-stability-gate-20260512.md), and [tiny-overfit-gradient-norm-scout-20260512.md](tiny-overfit-gradient-norm-scout-20260512.md). |
 | Black-box `H2 response-strength` | candidate-only | Positive-but-bounded DDPM/CIFAR10 candidate: frozen cutoff-0.50 lowpass follow-up passed, and raw H2 recovered strict-tail signal on the fresh packet. SD/CelebA text-to-image transfer is blocked by protocol mismatch. The frozen SD/CelebA image-to-image micro-packet is runnable, but H2 logistic does not beat the same-cache simple distance comparator, so H2 is not promoted beyond candidate-only. A separate simple-distance line now has bounded single-asset evidence: first 10/10 packet `AUC = 0.92`, non-overlapping 10/10 packet `AUC = 0.99` with 9/10 TP at 0 FP, and non-overlapping 25/25 admission packet `AUC = 0.8768`, `ASR = 0.84`, 11/25 TP at 0 FP. This is not a conditional-diffusion generalization or a `recon` product replacement. See [black-box-response-strength-preflight.md](black-box-response-strength-preflight.md), [h2-lowpass-followup-contract.md](h2-lowpass-followup-contract.md), [h2-cross-asset-contract-preflight.md](h2-cross-asset-contract-preflight.md), [h2-image-to-image-contract.md](h2-image-to-image-contract.md), [h2-img2img-micro-result.md](h2-img2img-micro-result.md), [h2-img2img-simple-distance-review.md](h2-img2img-simple-distance-review.md), [h2-img2img-simple-distance-stability-result.md](h2-img2img-simple-distance-stability-result.md), and [h2-img2img-simple-distance-admission-result.md](h2-img2img-simple-distance-admission-result.md). |
+| Black-box `H2 output-cloud geometry` | candidate-only | CPU-only review on the existing H2 response cache found a strong output-output geometry signal that excludes seed-to-output distance (`AUC = 0.961529`, `ASR = 0.900391`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`). Seed `177` remains stable (`AUC = 0.961048`), and label-shuffle sanity returns random-level (`AUC = 0.507595`). This is not admitted because the source cache used class-ordered sample offsets, so a reseeded or interleaved order-control cache is required before promotion. Do not expand into KDE, shadow-density, repeat-count, same-cache feature sweeps, Platform schema, Runtime runner, or bundle rows. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
 | Black-box mid-frequency same-noise residual | `candidate-only` | Distinct paper-backed observable gap: unlike H2/H3 response-cache frequency filters, this line requires `x_t`, `tilde_x_t`, timestep, noise provenance, and residual scores at the same noise level. The frozen `64/64` sign-check on the collaborator 750k checkpoint produced `AUC = 0.733398`, `ASR = 0.710938`, and finite `4/64` zero-FP recovery. The seed-only repeat retained signal with `AUC = 0.719238`, `ASR = 0.6875`, and finite `3/64` zero-FP recovery. A CPU comparator audit shows low-frequency and full-band residual comparators are at least as strong as the frozen mid-band score on AUC, so the line is candidate-stable-but-bounded but not a proven mid-frequency-specific mechanism. Same-contract GPU expansion is closed. See [midfreq-residual-comparator-audit-20260512.md](midfreq-residual-comparator-audit-20260512.md), [midfreq-residual-stability-result-20260512.md](midfreq-residual-stability-result-20260512.md), [midfreq-residual-stability-decision-20260512.md](midfreq-residual-stability-decision-20260512.md), [midfreq-residual-signcheck-20260512.md](midfreq-residual-signcheck-20260512.md), [midfreq-same-noise-residual-preflight-20260512.md](midfreq-same-noise-residual-preflight-20260512.md), [midfreq-residual-scorer-contract-20260512.md](midfreq-residual-scorer-contract-20260512.md), [midfreq-residual-collector-contract-20260512.md](midfreq-residual-collector-contract-20260512.md), [midfreq-residual-tiny-runner-contract-20260512.md](midfreq-residual-tiny-runner-contract-20260512.md), and [midfreq-residual-real-asset-preflight-20260512.md](midfreq-residual-real-asset-preflight-20260512.md). |
 | Gray-box `PIA` | `evidence-ready` | Strongest admitted local DDPM/CIFAR10 gray-box line. PIA baseline exposes `epsilon-trajectory consistency`; stochastic dropout is a provisional defended comparator that weakens but does not eliminate the signal. The review is bounded to repeated-query adaptive checks with `adaptive repeats=3`; low-FPR values are finite empirical strict-tail points, not calibrated sub-percent FPR. Paper-aligned release provenance remains blocked. See [pia-stochastic-dropout-truth-hardening-review.md](pia-stochastic-dropout-truth-hardening-review.md). |
 | Gray-box `ReDiffuse` | `hold-weak` | Candidate baseline-alignment line. The collaborator 750k bundle and checkpoint are runnable, a 64/64 direct-distance compatibility packet exists, and the existing PIA 800k checkpoint is runtime-probe compatible, but prior exact replay showed only modest AUC with weak strict-tail evidence and was not admitted. The collaborator Stable Diffusion ReDiffuse `5000`-row packet remains replayable (`AUC = 0.71031888`), but its member/nonmember labels are perfectly aligned with `LAION-5B member subset` versus `COCO2017-val non-member subset`, so it is a cross-source stress-test candidate rather than a same-distribution second asset. The official OpenReview supplement still does not release third-party target checkpoints, generated response/feature caches, score packets, ROC CSVs, or metric artifacts. A local ReDiffuse DDPM/STL-10 bounded scout now proves the split and official model path are executable and scoreable, but the short target fixed-timestep denoising-loss packet is random-level (`AUC = 0.4996337890625`, `ASR = 0.509765625`, `TPR@1%FPR = 0.01171875`, `TPR@0.1%FPR = 0.0`). Reusing the same checkpoint and `256 / 256` split for a genuinely different SimA-style denoiser-output score norm also remained random-level (`AUC = 0.5052947998046875`, `ASR = 0.525390625`, `TPR@1%FPR = 0.03125`, `TPR@0.1%FPR = 0.01953125`). Do not expand into step-count, seed, timestep, batch-size, subset-size, EMA, scheduler, denoising-loss matrices, score-norm matrices, checkpoint-step/fusion sweeps, full DDPM/DiT/Stable Diffusion targets, `800k`-step training, Tiny-ImageNet downloads, request `coco_data`, download Stable Diffusion weights, or rerun same-family attack scripts by default. See [rediffuse-stl10-sima-score-norm-20260525.md](rediffuse-stl10-sima-score-norm-20260525.md), [rediffuse-stl10-bounded-scout-20260525.md](rediffuse-stl10-bounded-scout-20260525.md), [rediffuse-stl10-split-and-microtrain-preflight-20260525.md](rediffuse-stl10-split-and-microtrain-preflight-20260525.md), [stable-diffusion-rediffuse-collaborator-artifact-20260517.md](stable-diffusion-rediffuse-collaborator-artifact-20260517.md), [rediffuse-openreview-split-manifest-audit-20260515.md](rediffuse-openreview-split-manifest-audit-20260515.md), [rediffuse-collaborator-integration-report.md](rediffuse-collaborator-integration-report.md), [rediffuse-800k-runtime-probe.md](rediffuse-800k-runtime-probe.md), [rediffuse-resnet-parity-packet.md](rediffuse-resnet-parity-packet.md), [rediffuse-direct-distance-boundary-review.md](rediffuse-direct-distance-boundary-review.md), [rediffuse-checkpoint-portability-gate.md](rediffuse-checkpoint-portability-gate.md), [rediffuse-resnet-contract-scout.md](rediffuse-resnet-contract-scout.md), [rediffuse-exact-replay-preflight.md](rediffuse-exact-replay-preflight.md), and [rediffuse-exact-replay-packet.md](rediffuse-exact-replay-packet.md). |