diff --git a/.gitignore b/.gitignore
index e43efbb7..9215e577 100644
--- a/.gitignore
+++ b/.gitignore
@@ -142,6 +142,9 @@ workspaces/**/artifacts/**
 !workspaces/black-box/artifacts/copymark-commoncanvas-response-contract-probe-20260512.json
 !workspaces/black-box/artifacts/copymark-commoncanvas-multiseed-stability-20260513.json
 !workspaces/black-box/artifacts/commoncanvas-denoising-loss-20260513.json
+!workspaces/black-box/artifacts/h2-output-cloud-geometry-20260525.json
+!workspaces/black-box/artifacts/h2-output-cloud-geometry-seed177-20260525.json
+!workspaces/black-box/artifacts/h2-output-cloud-geometry-label-shuffle-20260525.json
 !workspaces/black-box/artifacts/beans-lora-member-denoising-loss-scout-20260513.json
 !workspaces/black-box/artifacts/clid-image-identity-boundary-20260511.json
 !workspaces/black-box/artifacts/midfreq-same-noise-residual-cache-audit-20260512.json
diff --git a/AGENTS.md b/AGENTS.md
index f2cbab0c..76ddda65 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -28,7 +28,7 @@ Do not start from memory or old chat context. Re-anchor on repository files.
 
 ## Current Operating State
 
-- Active work: `2026-05-25 feature-packet channel consumer verdict is the latest consumer-boundary update. Tracing the Roots remains positive Research-side feature-packet evidence (AUC = 0.815826, TPR@1%FPR = 0.134000), but the Platform/Runtime feature-packet channel is deferred because the public surface still has only one singleton feature tensor packet, no second non-source-equivalent public feature-packet, and no raw target checkpoint / raw sample manifest / feature-regeneration assets. Do not create feature-packet schema, bundle export, validators, tests, Platform UI types, or Runtime runners from this singleton. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after feature-packet channel consumer verdict. LeakyCLIP remains CLIP / multimodal privacy watch-plus, not a second diffusion asset. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
+- Active work: `2026-05-25 H2 output-cloud geometry cache review is the latest metric verdict. It is a strong Research-side candidate on the existing H2 response cache (seed 176 logistic AUC = 0.961529, TPR@1%FPR = 0.333984, TPR@0.1%FPR = 0.117188; seed 177 AUC = 0.961048; label-shuffle AUC = 0.507595), but it is not admitted because the source cache used class-ordered sample offsets and needs a reseeded or interleaved order-control cache before any promotion. Do not create Platform/Runtime schema, bundle export, UI type, runner, KDE/shadow-density/repeat-count sweeps, or same-cache feature sweeps from this result. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after H2 output-cloud cache review. Feature-packet consumer lane remains deferred. LeakyCLIP remains CLIP / multimodal privacy watch-plus. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
 - Next GPU candidate: none selected
 - Long-horizon control: follow `ROADMAP.md` section
   `Long-Horizon Research Task Board（2026-05-13 起）` before reopening any
diff --git a/ROADMAP.md b/ROADMAP.md
index 4fd7b8fa..23b6b9ef 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -2,6 +2,33 @@
 
 > Last updated: 2026-05-25
 
+## 2026-05-25 H2 output-cloud geometry 候选信号
+
+最新决策：H2 response-strength 的既有 `512 / 512` response cache 暴露出一个强的
+output-output geometry 候选信号，但在 order-control 通过前不晋升、不释放产品消费、
+不扩展同 cache 特征工程。该复查只读取现有
+`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`，
+没有生成新响应、没有下载资产、没有运行 GPU。
+
+该 scorer 刻意排除 seed-to-output distance，只使用同 timestep repeat 间 RMSE、
+不同 timestep centroid RMSE 和 response-cloud Gram/PCA 特征。主结果为
+`AUC = 0.961529`，`ASR = 0.900391`，`TPR@1%FPR = 0.333984`，
+`TPR@0.1%FPR = 0.117188`，相对 raw H2 logistic 提升
+`AUC +0.055836`、`TPR@1%FPR +0.199218`、`TPR@0.1%FPR +0.117188`。
+seed `177` 稳定性仍为 `AUC = 0.961048`，`TPR@1%FPR = 0.353516`，
+`TPR@0.1%FPR = 0.130859`；label-shuffle sanity 回到随机级
+`AUC = 0.507595`。
+
+关键 caveat：源 cache 生成时 member 侧 `sample_offset = 0`，nonmember 侧
+`sample_offset = len(member_indices)`，output-output geometry 对采样种子和响应云形态敏感，
+所以当前强信号可能混入 class-ordered sampling effect。该结果只能作为
+Research-side 强候选；下一次重新评估只能是一个有界 reseeded / interleaved
+order-control response-cache scout。当前 slots 仍为：
+`active_gpu_question = none`，`next_gpu_candidate = none`，
+`CPU sidecar = none selected after H2 output-cloud cache review`。
+See
+[docs/evidence/h2-output-cloud-geometry-20260525.md](docs/evidence/h2-output-cloud-geometry-20260525.md)。
+
 ## 2026-05-25 Feature-Packet 通道消费者裁决
 
 最新决策：不在 2026-05-25 为 Tracing the Roots 单例开通 Platform/Runtime
diff --git a/docs/evidence/h2-output-cloud-geometry-20260525.md b/docs/evidence/h2-output-cloud-geometry-20260525.md
new file mode 100644
index 00000000..3a2b7746
--- /dev/null
+++ b/docs/evidence/h2-output-cloud-geometry-20260525.md
@@ -0,0 +1,127 @@
+# H2 Output-Cloud Geometry Cache Review
+
+> Date: 2026-05-25
+> Status: candidate complementary signal / CPU-only cache review / order-control required before promotion / no GPU release / no admitted row
+
+## Question
+
+在已有 H2 response-strength cache 上，输出之间的几何结构是否携带不同于
+seed-to-output distance 的 membership 信号？
+
+本轮只复用现有
+`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`。
+没有生成新响应、没有下载资产、没有运行 GPU，也没有扩展同一路线的 KDE、shadow
+density、repeat-count 或特征 sweep。
+
+## Contract
+
+脚本：
+`scripts/review_h2_output_cloud_geometry.py`
+
+输入 cache：
+
+| Field | Value |
+| --- | ---: |
+| Samples | `1024` |
+| Members | `512` |
+| Nonmembers | `512` |
+| Timesteps | `40 / 80 / 120 / 160` |
+| Repeats per timestep | `2` |
+| Response shape | `[1024, 4, 2, 3, 32, 32]` |
+
+特征只使用 output-output geometry：
+
+| Feature family | Meaning |
+| --- | --- |
+| within-timestep pair RMSE | 同一 timestep 内不同 repeat 的响应距离 |
+| timestep centroid RMSE | 不同 timestep 的响应云 centroid 距离 |
+| response-cloud PCA trace/top share | 小响应云 Gram spectrum 的尺度和集中度 |
+
+该脚本刻意不读取 seed-to-output distance 特征，因此不会退化成原 H2 simple
+distance 评分器。
+
+## Result
+
+主结果：
+`workspaces/black-box/artifacts/h2-output-cloud-geometry-20260525.json`
+
+| Metric | Output-cloud logistic | Raw H2 logistic | Lowpass H2 logistic |
+| --- | ---: | ---: | ---: |
+| AUC | `0.961529` | `0.905693` | `0.895679` |
+| ASR | `0.900391` | `0.841797` | `0.831055` |
+| TPR@1%FPR | `0.333984` | `0.134766` | `0.148438` |
+| TPR@0.1%FPR | `0.117188` | `0.0` | `0.025391` |
+
+相对 raw H2：`AUC +0.055836`，`TPR@1%FPR +0.199218`，
+`TPR@0.1%FPR +0.117188`。
+
+相对 lowpass H2：`AUC +0.065850`，`TPR@1%FPR +0.185546`，
+`TPR@0.1%FPR +0.091797`。
+
+简单单特征不能解释该结果：
+
+| Best simple view | Feature | Orientation | AUC | TPR@1%FPR | TPR@0.1%FPR |
+| --- | --- | --- | ---: | ---: | ---: |
+| Best AUC | `centroid_rmse_40_160` | negative | `0.801182` | `0.03125` | `0.005859` |
+| Best low-FPR | `cloud_pca_top_share` | negative | `0.650913` | `0.078125` | `0.017578` |
+
+Seed stability check：
+`workspaces/black-box/artifacts/h2-output-cloud-geometry-seed177-20260525.json`
+
+| Metric | Seed 177 |
+| --- | ---: |
+| AUC | `0.961048` |
+| ASR | `0.900391` |
+| TPR@1%FPR | `0.353516` |
+| TPR@0.1%FPR | `0.130859` |
+
+Label-shuffle sanity：
+`workspaces/black-box/artifacts/h2-output-cloud-geometry-label-shuffle-20260525.json`
+
+| Metric | Label shuffle |
+| --- | ---: |
+| AUC | `0.507595` |
+| ASR | `0.521484` |
+| TPR@1%FPR | `0.011719` |
+| TPR@0.1%FPR | `0.003906` |
+
+这说明 scorer/evaluation 管线没有明显的标签直通泄漏。
+
+## Critical Caveat
+
+该结果仍然不能晋升。源 cache 的响应生成存在 class-ordered seed offset：
+`scripts/run_h2_response_strength_validation.py` 中 member 侧使用
+`sample_offset = 0`，nonmember 侧使用 `sample_offset = len(member_indices)`。
+Output-output geometry 对采样种子和响应云形态敏感，因此当前强信号可能混入
+class-ordered sampling effect。
+
+这不是要继续在同一个 cache 上补表格；它只定义一个非常窄的下一步：
+如果需要推进，最多释放一个有界 order-control / reseeded / interleaved
+response-cache scout，用来判断该强信号是否跨 class-order 控制保留。
+
+## Decision
+
+`candidate complementary signal / order-control required / no admitted row`。
+
+保留为 Research-side 强候选，因为它满足三个有价值条件：
+
+- 它是不同 observable：output-output cloud geometry，而不是 seed-to-output distance。
+- 它在同一 H2 cache 上明显强于 raw/lowpass H2 logistic。
+- 它通过了 seed-177 稳定性和 label-shuffle sanity。
+
+但当前不做以下事情：
+
+- 不升级到 Platform/Runtime admitted bundle。
+- 不新增产品 schema、Runtime runner、UI 类型或 bundle row。
+- 不在同一 cache 上展开 KDE、shadow density、repeat-count、特征族或融合 sweep。
+- 不释放 GPU 或大下载。
+
+下一次重新评估只允许基于一个 order-control cache 的结果。如果 reseeded /
+interleaved cache 仍保持强 AUC 和严格尾部恢复，再讨论是否进入更正式的 H2
+output-cloud 机制线；如果不保持，该候选直接关闭为 class-ordered response-cache
+artifact。
+
+## Platform and Runtime Impact
+
+None. The admitted Platform/Runtime bundle remains the existing five rows:
+`recon`, `PIA baseline`, `PIA defended`, `GSA`, and `DPDM W-1`.
diff --git a/docs/evidence/reproduction-status.md b/docs/evidence/reproduction-status.md
index 7c00b1d7..1a447b74 100644
--- a/docs/evidence/reproduction-status.md
+++ b/docs/evidence/reproduction-status.md
@@ -73,6 +73,7 @@ Smoke tests and dry runs are engineering validation, not benchmark claims.
 | FERMI multi-relational tabular MIA | `hold-paper-source-only` | arXiv `2605.11527` reports strong multi-relational TabDDPM/TabDiff/TabSyn membership metrics, but the public surface has no code tree, target/split manifests, generated synthetic tables, feature/score rows, ROC arrays, metric JSON, or replay command. It does not reopen MIDST/tabular execution and releases no tabular dataset download, model training, or GPU work. See [fermi-tabular-artifact-gate-20260515.md](fermi-tabular-artifact-gate-20260515.md). |
 | True known-split mechanisms | `hold-weak` | MNIST/DDPM raw-loss and x0 scouts are weak. Tiny overfit final-layer gradient norm was positive only on the extreme `8 / 64` target, weakened at `16 / 64`, and a more optimistic `64 / 64` oracle gradient-prototype alignment follow-up is effectively random (`AUC = 0.500977`, `ASR = 0.562500`, zero low-FPR recovery). Fashion-MNIST DDPM now has three weak clean-split scouts: fixed-timestep PIA-style loss (`AUC = 0.535889`, `TPR@1%FPR = 0.03125`), SimA single-query score-norm (`AUC = 0.515137`, zero low-FPR recovery), and score-Jacobian sensitivity (`AUC = 0.511719`, zero low-FPR recovery). The Beans member-LoRA denoising-loss scout repaired pseudo-membership semantics by creating an exact `SD1.5 + Beans-member LoRA` target, but the internal conditional denoising-loss score is weak (`AUC = 0.414400`, reverse `0.585600`, `TPR@1%FPR = 0.080000`) and parameter-delta sensitivity is also weak (`AUC = 0.512000`). Do not run more final-layer gradient norm/cosine variants, Fashion-MNIST timestep/seed/`p`-norm/perturbation/norm/packet-size sweeps, or Beans LoRA train-step/rank/resolution/prompt/timestep/layer matrices by default. See [fashion-mnist-ddpm-score-jacobian-sensitivity-20260514.md](fashion-mnist-ddpm-score-jacobian-sensitivity-20260514.md), [fashion-mnist-ddpm-sima-score-norm-20260514.md](fashion-mnist-ddpm-sima-score-norm-20260514.md), [beans-lora-delta-sensitivity-20260513.md](beans-lora-delta-sensitivity-20260513.md), [beans-lora-member-denoising-loss-scout-20260513.md](beans-lora-member-denoising-loss-scout-20260513.md), [fashion-mnist-ddpm-pia-loss-scout-20260513.md](fashion-mnist-ddpm-pia-loss-scout-20260513.md), [tiny-known-split-gradient-prototype-alignment-20260513.md](tiny-known-split-gradient-prototype-alignment-20260513.md), [gradient-norm-stability-gate-20260512.md](gradient-norm-stability-gate-20260512.md), and [tiny-overfit-gradient-norm-scout-20260512.md](tiny-overfit-gradient-norm-scout-20260512.md). |
 | Black-box `H2 response-strength` | candidate-only | Positive-but-bounded DDPM/CIFAR10 candidate: frozen cutoff-0.50 lowpass follow-up passed, and raw H2 recovered strict-tail signal on the fresh packet. SD/CelebA text-to-image transfer is blocked by protocol mismatch. The frozen SD/CelebA image-to-image micro-packet is runnable, but H2 logistic does not beat the same-cache simple distance comparator, so H2 is not promoted beyond candidate-only. A separate simple-distance line now has bounded single-asset evidence: first 10/10 packet `AUC = 0.92`, non-overlapping 10/10 packet `AUC = 0.99` with 9/10 TP at 0 FP, and non-overlapping 25/25 admission packet `AUC = 0.8768`, `ASR = 0.84`, 11/25 TP at 0 FP. This is not a conditional-diffusion generalization or a `recon` product replacement. See [black-box-response-strength-preflight.md](black-box-response-strength-preflight.md), [h2-lowpass-followup-contract.md](h2-lowpass-followup-contract.md), [h2-cross-asset-contract-preflight.md](h2-cross-asset-contract-preflight.md), [h2-image-to-image-contract.md](h2-image-to-image-contract.md), [h2-img2img-micro-result.md](h2-img2img-micro-result.md), [h2-img2img-simple-distance-review.md](h2-img2img-simple-distance-review.md), [h2-img2img-simple-distance-stability-result.md](h2-img2img-simple-distance-stability-result.md), and [h2-img2img-simple-distance-admission-result.md](h2-img2img-simple-distance-admission-result.md). |
+| Black-box `H2 output-cloud geometry` | candidate-only | CPU-only review on the existing H2 response cache found a strong output-output geometry signal that excludes seed-to-output distance (`AUC = 0.961529`, `ASR = 0.900391`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`). Seed `177` remains stable (`AUC = 0.961048`), and label-shuffle sanity returns random-level (`AUC = 0.507595`). This is not admitted because the source cache used class-ordered sample offsets, so a reseeded or interleaved order-control cache is required before promotion. Do not expand into KDE, shadow-density, repeat-count, same-cache feature sweeps, Platform schema, Runtime runner, or bundle rows. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
 | Black-box mid-frequency same-noise residual | `candidate-only` | Distinct paper-backed observable gap: unlike H2/H3 response-cache frequency filters, this line requires `x_t`, `tilde_x_t`, timestep, noise provenance, and residual scores at the same noise level. The frozen `64/64` sign-check on the collaborator 750k checkpoint produced `AUC = 0.733398`, `ASR = 0.710938`, and finite `4/64` zero-FP recovery. The seed-only repeat retained signal with `AUC = 0.719238`, `ASR = 0.6875`, and finite `3/64` zero-FP recovery. A CPU comparator audit shows low-frequency and full-band residual comparators are at least as strong as the frozen mid-band score on AUC, so the line is candidate-stable-but-bounded but not a proven mid-frequency-specific mechanism. Same-contract GPU expansion is closed. See [midfreq-residual-comparator-audit-20260512.md](midfreq-residual-comparator-audit-20260512.md), [midfreq-residual-stability-result-20260512.md](midfreq-residual-stability-result-20260512.md), [midfreq-residual-stability-decision-20260512.md](midfreq-residual-stability-decision-20260512.md), [midfreq-residual-signcheck-20260512.md](midfreq-residual-signcheck-20260512.md), [midfreq-same-noise-residual-preflight-20260512.md](midfreq-same-noise-residual-preflight-20260512.md), [midfreq-residual-scorer-contract-20260512.md](midfreq-residual-scorer-contract-20260512.md), [midfreq-residual-collector-contract-20260512.md](midfreq-residual-collector-contract-20260512.md), [midfreq-residual-tiny-runner-contract-20260512.md](midfreq-residual-tiny-runner-contract-20260512.md), and [midfreq-residual-real-asset-preflight-20260512.md](midfreq-residual-real-asset-preflight-20260512.md). |
 | Gray-box `PIA` | `evidence-ready` | Strongest admitted local DDPM/CIFAR10 gray-box line. PIA baseline exposes `epsilon-trajectory consistency`; stochastic dropout is a provisional defended comparator that weakens but does not eliminate the signal. The review is bounded to repeated-query adaptive checks with `adaptive repeats=3`; low-FPR values are finite empirical strict-tail points, not calibrated sub-percent FPR. Paper-aligned release provenance remains blocked. See [pia-stochastic-dropout-truth-hardening-review.md](pia-stochastic-dropout-truth-hardening-review.md). |
 | Gray-box `ReDiffuse` | `hold-weak` | Candidate baseline-alignment line. The collaborator 750k bundle and checkpoint are runnable, a 64/64 direct-distance compatibility packet exists, and the existing PIA 800k checkpoint is runtime-probe compatible, but prior exact replay showed only modest AUC with weak strict-tail evidence and was not admitted. The collaborator Stable Diffusion ReDiffuse `5000`-row packet remains replayable (`AUC = 0.71031888`), but its member/nonmember labels are perfectly aligned with `LAION-5B member subset` versus `COCO2017-val non-member subset`, so it is a cross-source stress-test candidate rather than a same-distribution second asset. The official OpenReview supplement still does not release third-party target checkpoints, generated response/feature caches, score packets, ROC CSVs, or metric artifacts. A local ReDiffuse DDPM/STL-10 bounded scout now proves the split and official model path are executable and scoreable, but the short target fixed-timestep denoising-loss packet is random-level (`AUC = 0.4996337890625`, `ASR = 0.509765625`, `TPR@1%FPR = 0.01171875`, `TPR@0.1%FPR = 0.0`). Reusing the same checkpoint and `256 / 256` split for a genuinely different SimA-style denoiser-output score norm also remained random-level (`AUC = 0.5052947998046875`, `ASR = 0.525390625`, `TPR@1%FPR = 0.03125`, `TPR@0.1%FPR = 0.01953125`). Do not expand into step-count, seed, timestep, batch-size, subset-size, EMA, scheduler, denoising-loss matrices, score-norm matrices, checkpoint-step/fusion sweeps, full DDPM/DiT/Stable Diffusion targets, `800k`-step training, Tiny-ImageNet downloads, request `coco_data`, download Stable Diffusion weights, or rerun same-family attack scripts by default. See [rediffuse-stl10-sima-score-norm-20260525.md](rediffuse-stl10-sima-score-norm-20260525.md), [rediffuse-stl10-bounded-scout-20260525.md](rediffuse-stl10-bounded-scout-20260525.md), [rediffuse-stl10-split-and-microtrain-preflight-20260525.md](rediffuse-stl10-split-and-microtrain-preflight-20260525.md), [stable-diffusion-rediffuse-collaborator-artifact-20260517.md](stable-diffusion-rediffuse-collaborator-artifact-20260517.md), [rediffuse-openreview-split-manifest-audit-20260515.md](rediffuse-openreview-split-manifest-audit-20260515.md), [rediffuse-collaborator-integration-report.md](rediffuse-collaborator-integration-report.md), [rediffuse-800k-runtime-probe.md](rediffuse-800k-runtime-probe.md), [rediffuse-resnet-parity-packet.md](rediffuse-resnet-parity-packet.md), [rediffuse-direct-distance-boundary-review.md](rediffuse-direct-distance-boundary-review.md), [rediffuse-checkpoint-portability-gate.md](rediffuse-checkpoint-portability-gate.md), [rediffuse-resnet-contract-scout.md](rediffuse-resnet-contract-scout.md), [rediffuse-exact-replay-preflight.md](rediffuse-exact-replay-preflight.md), and [rediffuse-exact-replay-packet.md](rediffuse-exact-replay-packet.md). |
diff --git a/docs/evidence/workspace-evidence-index.md b/docs/evidence/workspace-evidence-index.md
index 04399a80..a2368f56 100644
--- a/docs/evidence/workspace-evidence-index.md
+++ b/docs/evidence/workspace-evidence-index.md
@@ -5,6 +5,18 @@ This index separates current track state from archived research history.
 ## Current Track State
 
 Latest Research update:
+[h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md)
+records a CPU-only metric verdict on the existing H2 response-strength cache.
+The output-output geometry scorer is a strong Research-side candidate
+(`AUC = 0.961529`, `TPR@1%FPR = 0.333984`,
+`TPR@0.1%FPR = 0.117188`) and is stable under seed `177`
+(`AUC = 0.961048`), while label-shuffle sanity returns random-level
+(`AUC = 0.507595`). It is not admitted because the source cache used
+class-ordered sample offsets and needs a reseeded or interleaved order-control
+cache before promotion. Decision: `candidate complementary signal /
+order-control required / no admitted row / no download / no GPU release`.
+
+Previous Research update:
 [feature-packet-channel-consumer-verdict-20260525.md](feature-packet-channel-consumer-verdict-20260525.md)
 records a consumer-boundary verdict for the gray-box feature-packet lane.
 Tracing the Roots remains positive Research-side feature-packet evidence
diff --git a/scripts/review_h2_output_cloud_geometry.py b/scripts/review_h2_output_cloud_geometry.py
new file mode 100644
index 00000000..608f04df
--- /dev/null
+++ b/scripts/review_h2_output_cloud_geometry.py
@@ -0,0 +1,272 @@
+from __future__ import annotations
+
+import argparse
+import json
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+
+from diffaudit.attacks.h2_response_strength import evaluate_logistic_holdout, metric_delta, score_metrics
+
+
+def _sanitize(value: Any) -> Any:
+    if isinstance(value, dict):
+        return {str(key): _sanitize(item) for key, item in value.items()}
+    if isinstance(value, list):
+        return [_sanitize(item) for item in value]
+    if isinstance(value, tuple):
+        return [_sanitize(item) for item in value]
+    if isinstance(value, np.ndarray):
+        return _sanitize(value.tolist())
+    if isinstance(value, np.generic):
+        return _sanitize(value.item())
+    if isinstance(value, Path):
+        return str(value)
+    return value
+
+
+def _slope(values: np.ndarray, axis_values: np.ndarray) -> np.ndarray:
+    if values.shape[1] <= 1:
+        return np.zeros(values.shape[0], dtype=np.float32)
+    return np.polyfit(axis_values.astype(np.float64), values.T.astype(np.float64), deg=1)[0].astype(np.float32)
+
+
+def _cloud_eigen_features(flat_responses: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
+    centered = flat_responses - flat_responses.mean(axis=1, keepdims=True)
+    # Eigenvalues of the response Gram matrix give the non-zero PCA spectrum of
+    # each small response cloud without building a huge pixel covariance matrix.
+    gram = np.einsum("nrd,nsd->nrs", centered, centered, optimize=True)
+    gram /= max(flat_responses.shape[1] - 1, 1)
+    eigvals = np.linalg.eigvalsh(gram).astype(np.float64)
+    eigvals = np.clip(eigvals, 0.0, None)
+    trace = eigvals.sum(axis=1)
+    top_share = np.divide(
+        eigvals[:, -1],
+        trace,
+        out=np.zeros_like(trace),
+        where=trace > 0,
+    )
+    return trace.astype(np.float32), top_share.astype(np.float32)
+
+
+def compute_output_cloud_features(responses: np.ndarray, axis_values: np.ndarray) -> tuple[np.ndarray, list[str]]:
+    responses_f32 = np.asarray(responses, dtype=np.float32)
+    if responses_f32.ndim != 6:
+        raise ValueError("responses must have shape [sample, timestep, repeat, channel, height, width]")
+    if responses_f32.shape[2] < 2:
+        raise ValueError("output-cloud geometry needs at least two repeats per timestep")
+
+    sample_count, timestep_count, repeat_count = responses_f32.shape[:3]
+    flat = responses_f32.reshape(sample_count, timestep_count, repeat_count, -1)
+    names: list[str] = []
+    columns: list[np.ndarray] = []
+
+    pair_distances = []
+    for left in range(repeat_count):
+        for right in range(left + 1, repeat_count):
+            pair_distances.append(np.sqrt(np.mean((flat[:, :, left] - flat[:, :, right]) ** 2, axis=2)))
+    pair_rmse = np.stack(pair_distances, axis=2).mean(axis=2).astype(np.float32)
+    for idx, axis_value in enumerate(axis_values.tolist()):
+        names.append(f"within_timestep_pair_rmse_{int(axis_value)}")
+        columns.append(pair_rmse[:, idx])
+    names.extend(
+        [
+            "within_timestep_pair_rmse_mean",
+            "within_timestep_pair_rmse_std",
+            "within_timestep_pair_rmse_slope",
+        ]
+    )
+    columns.extend([pair_rmse.mean(axis=1), pair_rmse.std(axis=1), _slope(pair_rmse, axis_values)])
+
+    centroids = flat.mean(axis=2)
+    centroid_pairs: list[np.ndarray] = []
+    centroid_pair_names: list[str] = []
+    for left in range(timestep_count):
+        for right in range(left + 1, timestep_count):
+            distance = np.sqrt(np.mean((centroids[:, left] - centroids[:, right]) ** 2, axis=1)).astype(np.float32)
+            centroid_pairs.append(distance)
+            centroid_pair_names.append(
+                f"centroid_rmse_{int(axis_values[left])}_{int(axis_values[right])}"
+            )
+    centroid_pair_matrix = np.stack(centroid_pairs, axis=1)
+    names.extend(centroid_pair_names)
+    columns.extend(centroid_pairs)
+    names.extend(["centroid_rmse_mean", "centroid_rmse_std"])
+    columns.extend([centroid_pair_matrix.mean(axis=1), centroid_pair_matrix.std(axis=1)])
+
+    cloud_flat = flat.reshape(sample_count, timestep_count * repeat_count, -1)
+    cloud_trace, cloud_top_share = _cloud_eigen_features(cloud_flat)
+    names.extend(["cloud_pca_trace", "cloud_pca_top_share"])
+    columns.extend([cloud_trace, cloud_top_share])
+
+    features = np.stack(columns, axis=1).astype(np.float32)
+    return features, names
+
+
+def _simple_candidates(labels: np.ndarray, features: np.ndarray, names: list[str], *, seed: int) -> list[dict[str, Any]]:
+    candidates: list[dict[str, Any]] = []
+    for idx, name in enumerate(names):
+        raw_scores = features[:, idx]
+        for orientation, scores in (
+            ("negative_higher_is_member", -raw_scores),
+            ("positive_higher_is_member", raw_scores),
+        ):
+            candidates.append(
+                {
+                    "name": name,
+                    "orientation": orientation,
+                    "metrics": score_metrics(labels, scores),
+                }
+            )
+    return candidates
+
+
+def _best_by_low_fpr(candidates: list[dict[str, Any]]) -> dict[str, Any]:
+    return max(
+        candidates,
+        key=lambda item: (
+            float(item["metrics"]["tpr_at_1pct_fpr"]),
+            float(item["metrics"]["tpr_at_0_1pct_fpr"]),
+            float(item["metrics"]["auc"]),
+        ),
+    )
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Review output-output cloud geometry on an existing H2 response-cache.npz."
+    )
+    parser.add_argument("--response-cache", type=Path, required=True)
+    parser.add_argument("--output", type=Path, required=True)
+    parser.add_argument("--seed", type=int, default=176)
+    parser.add_argument("--holdout-repeats", type=int, default=7)
+    parser.add_argument("--bootstrap-iters", type=int, default=200)
+    parser.add_argument(
+        "--shuffle-labels",
+        action="store_true",
+        help="Run the scorer after a seeded label permutation as a leakage sanity check.",
+    )
+    return parser.parse_args()
+
+
+def main() -> int:
+    args = parse_args()
+    cache = np.load(args.response_cache)
+    labels = cache["labels"].astype(np.int64)
+    label_mode = "original"
+    if args.shuffle_labels:
+        labels = np.random.default_rng(args.seed).permutation(labels)
+        label_mode = f"shuffled_seed_{args.seed}"
+    if "timesteps" not in cache.files:
+        raise KeyError("output-cloud geometry review expects a timestep-based H2 cache")
+    timesteps = cache["timesteps"].astype(np.int64)
+    responses = cache["responses"].astype(np.float32)
+    features, feature_names = compute_output_cloud_features(responses, timesteps)
+
+    simple = _simple_candidates(labels, features, feature_names, seed=args.seed)
+    best_auc = max(simple, key=lambda item: float(item["metrics"]["auc"]))
+    best_low = _best_by_low_fpr(simple)
+    logistic = evaluate_logistic_holdout(
+        labels,
+        features,
+        seed=args.seed,
+        repeats=args.holdout_repeats,
+        bootstrap_iters=args.bootstrap_iters,
+    )
+
+    raw_h2_metrics = None
+    lowpass_h2_metrics = None
+    summary_path = args.response_cache.with_name("summary.json")
+    if summary_path.exists():
+        summary = json.loads(summary_path.read_text(encoding="utf-8"))
+        raw_h2_metrics = summary.get("raw_h2", {}).get("logistic", {}).get("aggregate_metrics")
+        lowpass_h2_metrics = summary.get("lowpass_h2", {}).get("logistic", {}).get("aggregate_metrics")
+
+    logistic_metrics = logistic["aggregate_metrics"]
+    verdict = "weak_non_complementary_output_cloud_geometry"
+    if args.shuffle_labels:
+        verdict = "label_shuffle_sanity_random_level"
+    elif (
+        float(logistic_metrics["tpr_at_0_1pct_fpr"]) > 0
+        and raw_h2_metrics is not None
+        and float(logistic_metrics["auc"]) >= float(raw_h2_metrics["auc"]) - 0.03
+    ):
+        verdict = "candidate_complementary_output_cloud_geometry"
+
+    result: dict[str, Any] = {
+        "status": "ready",
+        "track": "black-box",
+        "method": "H2 output-cloud geometry scorer",
+        "mode": "cpu-cache-review",
+        "response_cache": str(args.response_cache),
+        "inputs": {
+            "sample_count": int(labels.shape[0]),
+            "member_count": int((labels == 1).sum()),
+            "nonmember_count": int((labels == 0).sum()),
+            "timesteps": [int(value) for value in timesteps.tolist()],
+            "repeat_count": int(responses.shape[2]),
+            "feature_count": int(features.shape[1]),
+            "feature_names": feature_names,
+            "seed": int(args.seed),
+            "label_mode": label_mode,
+            "holdout_repeats": int(args.holdout_repeats),
+            "bootstrap_iters": int(args.bootstrap_iters),
+        },
+        "simple": {
+            "best_by_auc": {
+                "name": best_auc["name"],
+                "orientation": best_auc["orientation"],
+                "metrics": best_auc["metrics"],
+            },
+            "best_by_low_fpr": {
+                "name": best_low["name"],
+                "orientation": best_low["orientation"],
+                "metrics": best_low["metrics"],
+            },
+        },
+        "logistic": {
+            "aggregate_metrics": logistic_metrics,
+            "aggregate_ci95": logistic["aggregate_ci95"],
+            "mean_coefficients": logistic["mean_coefficients"],
+            "prediction_count": logistic["prediction_count"],
+        },
+        "comparison": {
+            "raw_h2_logistic": raw_h2_metrics if not args.shuffle_labels else None,
+            "lowpass_h2_logistic": lowpass_h2_metrics if not args.shuffle_labels else None,
+            "output_cloud_minus_raw_h2": metric_delta(logistic_metrics, raw_h2_metrics)
+            if raw_h2_metrics is not None and not args.shuffle_labels
+            else None,
+            "output_cloud_minus_lowpass_h2": metric_delta(logistic_metrics, lowpass_h2_metrics)
+            if lowpass_h2_metrics is not None and not args.shuffle_labels
+            else None,
+        },
+        "decision_gate": {
+            "uses_only_output_output_geometry": True,
+            "does_not_generate_new_responses": True,
+            "nonzero_strict_tail": bool(float(logistic_metrics["tpr_at_0_1pct_fpr"]) > 0),
+            "beats_best_simple_low_fpr": bool(
+                float(logistic_metrics["tpr_at_1pct_fpr"])
+                > float(best_low["metrics"]["tpr_at_1pct_fpr"])
+                and float(logistic_metrics["tpr_at_0_1pct_fpr"])
+                >= float(best_low["metrics"]["tpr_at_0_1pct_fpr"])
+            ),
+            "reopen_allowed": False,
+            "requires_reseeded_or_interleaved_cache_before_promotion": True,
+        },
+        "verdict": verdict,
+        "notes": [
+            "This is a CPU-only scorer review on an existing H2 response cache.",
+            "It intentionally excludes seed-to-output distance features so it cannot collapse back into H2 simple distance.",
+            "A positive result is candidate-only until reseeded or interleaved response-cache controls rule out class-ordered sampling effects.",
+            "Do not expand this cache into KDE, shadow density, repeat-count, or same-cache feature sweeps.",
+        ],
+    }
+    args.output.parent.mkdir(parents=True, exist_ok=True)
+    args.output.write_text(json.dumps(_sanitize(result), indent=2, ensure_ascii=True), encoding="utf-8")
+    print(json.dumps(_sanitize(result), indent=2, ensure_ascii=False))
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/test_review_h2_output_cloud_geometry_script.py b/tests/test_review_h2_output_cloud_geometry_script.py
new file mode 100644
index 00000000..e5c387f1
--- /dev/null
+++ b/tests/test_review_h2_output_cloud_geometry_script.py
@@ -0,0 +1,111 @@
+from __future__ import annotations
+
+import importlib.util
+import json
+import subprocess
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import numpy as np
+
+
+def _load_script_module():
+    repo_root = Path(__file__).resolve().parents[1]
+    script_path = repo_root / "scripts" / "review_h2_output_cloud_geometry.py"
+    spec = importlib.util.spec_from_file_location("review_h2_output_cloud_geometry", script_path)
+    if spec is None or spec.loader is None:
+        raise RuntimeError("Could not load review_h2_output_cloud_geometry.py")
+    module = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(module)
+    return module
+
+
+class ReviewH2OutputCloudGeometryScriptTests(unittest.TestCase):
+    def test_compute_features_uses_all_repeat_pairs(self) -> None:
+        module = _load_script_module()
+        responses = np.asarray(
+            [
+                [
+                    [[[0.0]], [[2.0]], [[4.0]]],
+                    [[[1.0]], [[1.0]], [[1.0]]],
+                ],
+                [
+                    [[[1.0]], [[3.0]], [[9.0]]],
+                    [[[4.0]], [[8.0]], [[12.0]]],
+                ],
+            ],
+            dtype=np.float32,
+        )
+        responses = responses.reshape(2, 2, 3, 1, 1, 1)
+
+        features, names = module.compute_output_cloud_features(responses, np.asarray([10, 20]))
+
+        self.assertEqual(features.shape, (2, 10))
+        self.assertEqual(names[0], "within_timestep_pair_rmse_10")
+        self.assertEqual(names[1], "within_timestep_pair_rmse_20")
+        self.assertEqual(names[-2:], ["cloud_pca_trace", "cloud_pca_top_share"])
+        np.testing.assert_allclose(features[:, 0], np.asarray([8.0 / 3.0, 16.0 / 3.0]), rtol=1e-6)
+        np.testing.assert_allclose(features[:, 1], np.asarray([0.0, 16.0 / 3.0]), rtol=1e-6)
+
+    def test_compute_features_rejects_single_repeat_cache(self) -> None:
+        module = _load_script_module()
+        responses = np.zeros((2, 2, 1, 1, 1, 1), dtype=np.float32)
+
+        with self.assertRaisesRegex(ValueError, "at least two repeats"):
+            module.compute_output_cloud_features(responses, np.asarray([10, 20]))
+
+    def test_shuffle_label_mode_writes_sanity_payload(self) -> None:
+        repo_root = Path(__file__).resolve().parents[1]
+        with tempfile.TemporaryDirectory() as tmpdir:
+            tmp = Path(tmpdir)
+            cache = tmp / "response-cache.npz"
+            summary = tmp / "summary.json"
+            output = tmp / "shuffle-review.json"
+            labels = np.asarray([1] * 8 + [0] * 8, dtype=np.int64)
+            responses = np.zeros((16, 2, 2, 1, 1, 1), dtype=np.float32)
+            responses[:, 0, 0, 0, 0, 0] = np.linspace(0.0, 1.5, 16)
+            responses[:, 0, 1, 0, 0, 0] = np.linspace(0.2, 1.7, 16)
+            responses[:, 1, 0, 0, 0, 0] = np.linspace(0.4, 1.9, 16)
+            responses[:, 1, 1, 0, 0, 0] = np.linspace(0.7, 2.2, 16)
+            np.savez_compressed(cache, labels=labels, timesteps=np.asarray([40, 80]), responses=responses)
+            summary.write_text(
+                json.dumps({"raw_h2": {"logistic": {"aggregate_metrics": {"auc": 0.9}}}}),
+                encoding="utf-8",
+            )
+
+            completed = subprocess.run(
+                [
+                    sys.executable,
+                    "-X",
+                    "utf8",
+                    "scripts/review_h2_output_cloud_geometry.py",
+                    "--response-cache",
+                    str(cache),
+                    "--output",
+                    str(output),
+                    "--seed",
+                    "176",
+                    "--holdout-repeats",
+                    "2",
+                    "--bootstrap-iters",
+                    "0",
+                    "--shuffle-labels",
+                ],
+                check=False,
+                capture_output=True,
+                text=True,
+                cwd=repo_root,
+            )
+
+            self.assertEqual(completed.returncode, 0, completed.stderr)
+            payload = json.loads(output.read_text(encoding="utf-8"))
+            self.assertEqual(payload["inputs"]["label_mode"], "shuffled_seed_176")
+            self.assertEqual(payload["verdict"], "label_shuffle_sanity_random_level")
+            self.assertIsNone(payload["comparison"]["raw_h2_logistic"])
+            self.assertIsNone(payload["comparison"]["output_cloud_minus_raw_h2"])
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/workspaces/black-box/README.md b/workspaces/black-box/README.md
index 6297a817..aafee7dd 100644
--- a/workspaces/black-box/README.md
+++ b/workspaces/black-box/README.md
@@ -5,6 +5,13 @@
 - 方向：黑盒成员推断攻击。
 - 主要方法：`recon` 是已准入的黑盒产品行，也是选定用于有限尾部置信度加固的审计线路。
 - 支撑方法：`CLiD`、`variation`、`H2 response-strength` 以及语义辅助分类器。
+- H2 output-cloud geometry 状态：复用既有 H2 `512 / 512` response cache 的
+  CPU-only review 发现强候选信号，logistic `AUC = 0.961529`、
+  `TPR@1%FPR = 0.333984`、`TPR@0.1%FPR = 0.117188`；seed `177` 仍稳定，
+  label-shuffle sanity 回到随机级。但源 cache 使用 class-ordered sample offset，
+  所以该结果必须先过 reseeded / interleaved order-control cache，才能讨论晋升。
+  不要把它扩成 KDE、shadow density、repeat-count 或同 cache feature sweep；
+  不要新增 Platform/Runtime schema、runner 或 admitted bundle row。
 - 已导入候选工件：协作者移交的 Stable Diffusion ReDiffuse 结果包现通过
   `diffaudit probe-rediffuse-sd-artifacts` 进行审计。导入的 `5000` 行
   `2500 / 2500` 包重放结果为 `AUC = 0.710319` 和 `ASR = 0.6846`，因此值得保留作为候选证据。同一导入子集现在也支持
@@ -41,6 +48,9 @@
 当前 H2 候选边界：
 [../../docs/evidence/black-box-response-strength-preflight.md](../../docs/evidence/black-box-response-strength-preflight.md)。
 
+当前 H2 output-cloud geometry 候选：
+[../../docs/evidence/h2-output-cloud-geometry-20260525.md](../../docs/evidence/h2-output-cloud-geometry-20260525.md)。
+
 当前中频同噪声残差预检：
 [../../docs/evidence/midfreq-same-noise-residual-preflight-20260512.md](../../docs/evidence/midfreq-same-noise-residual-preflight-20260512.md)。
 
diff --git a/workspaces/black-box/artifacts/h2-output-cloud-geometry-20260525.json b/workspaces/black-box/artifacts/h2-output-cloud-geometry-20260525.json
new file mode 100644
index 00000000..30087312
--- /dev/null
+++ b/workspaces/black-box/artifacts/h2-output-cloud-geometry-20260525.json
@@ -0,0 +1,166 @@
+{
+  "status": "ready",
+  "track": "black-box",
+  "method": "H2 output-cloud geometry scorer",
+  "mode": "cpu-cache-review",
+  "response_cache": "workspaces\\black-box\\runs\\h2-response-strength-512-20260501-r1\\response-cache.npz",
+  "inputs": {
+    "sample_count": 1024,
+    "member_count": 512,
+    "nonmember_count": 512,
+    "timesteps": [
+      40,
+      80,
+      120,
+      160
+    ],
+    "repeat_count": 2,
+    "feature_count": 17,
+    "feature_names": [
+      "within_timestep_pair_rmse_40",
+      "within_timestep_pair_rmse_80",
+      "within_timestep_pair_rmse_120",
+      "within_timestep_pair_rmse_160",
+      "within_timestep_pair_rmse_mean",
+      "within_timestep_pair_rmse_std",
+      "within_timestep_pair_rmse_slope",
+      "centroid_rmse_40_80",
+      "centroid_rmse_40_120",
+      "centroid_rmse_40_160",
+      "centroid_rmse_80_120",
+      "centroid_rmse_80_160",
+      "centroid_rmse_120_160",
+      "centroid_rmse_mean",
+      "centroid_rmse_std",
+      "cloud_pca_trace",
+      "cloud_pca_top_share"
+    ],
+    "seed": 176,
+    "label_mode": "original",
+    "holdout_repeats": 7,
+    "bootstrap_iters": 200
+  },
+  "simple": {
+    "best_by_auc": {
+      "name": "centroid_rmse_40_160",
+      "orientation": "negative_higher_is_member",
+      "metrics": {
+        "auc": 0.801182,
+        "asr": 0.739258,
+        "tpr_at_1pct_fpr": 0.03125,
+        "tpr_at_0_1pct_fpr": 0.005859,
+        "member_score_mean": -0.035516,
+        "nonmember_score_mean": -0.0456
+      }
+    },
+    "best_by_low_fpr": {
+      "name": "cloud_pca_top_share",
+      "orientation": "negative_higher_is_member",
+      "metrics": {
+        "auc": 0.650913,
+        "asr": 0.618164,
+        "tpr_at_1pct_fpr": 0.078125,
+        "tpr_at_0_1pct_fpr": 0.017578,
+        "member_score_mean": -0.26242,
+        "nonmember_score_mean": -0.275406
+      }
+    }
+  },
+  "logistic": {
+    "aggregate_metrics": {
+      "auc": 0.961529,
+      "asr": 0.900391,
+      "tpr_at_1pct_fpr": 0.333984,
+      "tpr_at_0_1pct_fpr": 0.117188,
+      "member_score_mean": 0.829101,
+      "nonmember_score_mean": 0.171006
+    },
+    "aggregate_ci95": {
+      "auc": {
+        "p025": 0.950939,
+        "p975": 0.972625
+      },
+      "asr": {
+        "p025": 0.887671,
+        "p975": 0.920922
+      },
+      "tpr_at_1pct_fpr": {
+        "p025": 0.202197,
+        "p975": 0.617334
+      },
+      "tpr_at_0_1pct_fpr": {
+        "p025": 0.093701,
+        "p975": 0.294971
+      }
+    },
+    "mean_coefficients": [
+      2.430795,
+      0.320931,
+      0.262193,
+      0.973249,
+      0.860887,
+      -0.599108,
+      -0.058332,
+      0.20929,
+      -3.265317,
+      -4.968652,
+      0.839256,
+      0.448905,
+      2.486766,
+      -0.861475,
+      -0.441132,
+      -0.35279,
+      -0.293748
+    ],
+    "prediction_count": {
+      "min": 7,
+      "max": 7,
+      "mean": 7.0
+    }
+  },
+  "comparison": {
+    "raw_h2_logistic": {
+      "auc": 0.905693,
+      "asr": 0.841797,
+      "tpr_at_1pct_fpr": 0.134766,
+      "tpr_at_0_1pct_fpr": 0.0,
+      "member_score_mean": 0.743293,
+      "nonmember_score_mean": 0.256827
+    },
+    "lowpass_h2_logistic": {
+      "auc": 0.895679,
+      "asr": 0.831055,
+      "tpr_at_1pct_fpr": 0.148438,
+      "tpr_at_0_1pct_fpr": 0.025391,
+      "member_score_mean": 0.735716,
+      "nonmember_score_mean": 0.264013
+    },
+    "output_cloud_minus_raw_h2": {
+      "auc": 0.055836,
+      "asr": 0.058594,
+      "tpr_at_1pct_fpr": 0.199218,
+      "tpr_at_0_1pct_fpr": 0.117188
+    },
+    "output_cloud_minus_lowpass_h2": {
+      "auc": 0.06585,
+      "asr": 0.069336,
+      "tpr_at_1pct_fpr": 0.185546,
+      "tpr_at_0_1pct_fpr": 0.091797
+    }
+  },
+  "decision_gate": {
+    "uses_only_output_output_geometry": true,
+    "does_not_generate_new_responses": true,
+    "nonzero_strict_tail": true,
+    "beats_best_simple_low_fpr": true,
+    "reopen_allowed": false,
+    "requires_reseeded_or_interleaved_cache_before_promotion": true
+  },
+  "verdict": "candidate_complementary_output_cloud_geometry",
+  "notes": [
+    "This is a CPU-only scorer review on an existing H2 response cache.",
+    "It intentionally excludes seed-to-output distance features so it cannot collapse back into H2 simple distance.",
+    "A positive result is candidate-only until reseeded or interleaved response-cache controls rule out class-ordered sampling effects.",
+    "Do not expand this cache into KDE, shadow density, repeat-count, or same-cache feature sweeps."
+  ]
+}
\ No newline at end of file
diff --git a/workspaces/black-box/artifacts/h2-output-cloud-geometry-label-shuffle-20260525.json b/workspaces/black-box/artifacts/h2-output-cloud-geometry-label-shuffle-20260525.json
new file mode 100644
index 00000000..d1bfc4b4
--- /dev/null
+++ b/workspaces/black-box/artifacts/h2-output-cloud-geometry-label-shuffle-20260525.json
@@ -0,0 +1,142 @@
+{
+  "status": "ready",
+  "track": "black-box",
+  "method": "H2 output-cloud geometry scorer",
+  "mode": "cpu-cache-review",
+  "response_cache": "workspaces\\black-box\\runs\\h2-response-strength-512-20260501-r1\\response-cache.npz",
+  "inputs": {
+    "sample_count": 1024,
+    "member_count": 512,
+    "nonmember_count": 512,
+    "timesteps": [
+      40,
+      80,
+      120,
+      160
+    ],
+    "repeat_count": 2,
+    "feature_count": 17,
+    "feature_names": [
+      "within_timestep_pair_rmse_40",
+      "within_timestep_pair_rmse_80",
+      "within_timestep_pair_rmse_120",
+      "within_timestep_pair_rmse_160",
+      "within_timestep_pair_rmse_mean",
+      "within_timestep_pair_rmse_std",
+      "within_timestep_pair_rmse_slope",
+      "centroid_rmse_40_80",
+      "centroid_rmse_40_120",
+      "centroid_rmse_40_160",
+      "centroid_rmse_80_120",
+      "centroid_rmse_80_160",
+      "centroid_rmse_120_160",
+      "centroid_rmse_mean",
+      "centroid_rmse_std",
+      "cloud_pca_trace",
+      "cloud_pca_top_share"
+    ],
+    "seed": 176,
+    "label_mode": "shuffled_seed_176",
+    "holdout_repeats": 7,
+    "bootstrap_iters": 100
+  },
+  "simple": {
+    "best_by_auc": {
+      "name": "within_timestep_pair_rmse_std",
+      "orientation": "negative_higher_is_member",
+      "metrics": {
+        "auc": 0.522099,
+        "asr": 0.53418,
+        "tpr_at_1pct_fpr": 0.015625,
+        "tpr_at_0_1pct_fpr": 0.001953,
+        "member_score_mean": -0.01289,
+        "nonmember_score_mean": -0.013093
+      }
+    },
+    "best_by_low_fpr": {
+      "name": "within_timestep_pair_rmse_40",
+      "orientation": "positive_higher_is_member",
+      "metrics": {
+        "auc": 0.483044,
+        "asr": 0.518555,
+        "tpr_at_1pct_fpr": 0.033203,
+        "tpr_at_0_1pct_fpr": 0.021484,
+        "member_score_mean": 0.027483,
+        "nonmember_score_mean": 0.027669
+      }
+    }
+  },
+  "logistic": {
+    "aggregate_metrics": {
+      "auc": 0.507595,
+      "asr": 0.521484,
+      "tpr_at_1pct_fpr": 0.011719,
+      "tpr_at_0_1pct_fpr": 0.003906,
+      "member_score_mean": 0.500857,
+      "nonmember_score_mean": 0.49935
+    },
+    "aggregate_ci95": {
+      "auc": {
+        "p025": 0.479549,
+        "p975": 0.540294
+      },
+      "asr": {
+        "p025": 0.514136,
+        "p975": 0.551343
+      },
+      "tpr_at_1pct_fpr": {
+        "p025": 0.004834,
+        "p975": 0.041406
+      },
+      "tpr_at_0_1pct_fpr": {
+        "p025": 0.0,
+        "p975": 0.011719
+      }
+    },
+    "mean_coefficients": [
+      -0.014965,
+      0.099394,
+      0.015372,
+      0.004088,
+      0.026973,
+      -0.264673,
+      -0.009396,
+      -0.70348,
+      -0.126196,
+      0.492217,
+      0.100179,
+      -0.049061,
+      -0.155517,
+      -0.024371,
+      -0.286773,
+      0.775695,
+      0.077593
+    ],
+    "prediction_count": {
+      "min": 7,
+      "max": 7,
+      "mean": 7.0
+    }
+  },
+  "comparison": {
+    "raw_h2_logistic": null,
+    "lowpass_h2_logistic": null,
+    "output_cloud_minus_raw_h2": null,
+    "output_cloud_minus_lowpass_h2": null
+  },
+  "decision_gate": {
+    "uses_only_output_output_geometry": true,
+    "does_not_generate_new_responses": true,
+    "nonzero_strict_tail": true,
+    "beats_best_simple_low_fpr": false,
+    "reopen_allowed": false,
+    "requires_reseeded_or_interleaved_cache_before_promotion": true
+  },
+  "verdict": "label_shuffle_sanity_random_level",
+  "notes": [
+    "This is a CPU-only scorer review on an existing H2 response cache.",
+    "It intentionally excludes seed-to-output distance features so it cannot collapse back into H2 simple distance.",
+    "A positive result is candidate-only until reseeded or interleaved response-cache controls rule out class-ordered sampling effects.",
+    "Do not expand this cache into KDE, shadow density, repeat-count, or same-cache feature sweeps."
+  ]
+}
\ No newline at end of file
diff --git a/workspaces/black-box/artifacts/h2-output-cloud-geometry-seed177-20260525.json b/workspaces/black-box/artifacts/h2-output-cloud-geometry-seed177-20260525.json
new file mode 100644
index 00000000..d46ffff1
--- /dev/null
+++ b/workspaces/black-box/artifacts/h2-output-cloud-geometry-seed177-20260525.json
@@ -0,0 +1,166 @@
+{
+  "status": "ready",
+  "track": "black-box",
+  "method": "H2 output-cloud geometry scorer",
+  "mode": "cpu-cache-review",
+  "response_cache": "workspaces\\black-box\\runs\\h2-response-strength-512-20260501-r1\\response-cache.npz",
+  "inputs": {
+    "sample_count": 1024,
+    "member_count": 512,
+    "nonmember_count": 512,
+    "timesteps": [
+      40,
+      80,
+      120,
+      160
+    ],
+    "repeat_count": 2,
+    "feature_count": 17,
+    "feature_names": [
+      "within_timestep_pair_rmse_40",
+      "within_timestep_pair_rmse_80",
+      "within_timestep_pair_rmse_120",
+      "within_timestep_pair_rmse_160",
+      "within_timestep_pair_rmse_mean",
+      "within_timestep_pair_rmse_std",
+      "within_timestep_pair_rmse_slope",
+      "centroid_rmse_40_80",
+      "centroid_rmse_40_120",
+      "centroid_rmse_40_160",
+      "centroid_rmse_80_120",
+      "centroid_rmse_80_160",
+      "centroid_rmse_120_160",
+      "centroid_rmse_mean",
+      "centroid_rmse_std",
+      "cloud_pca_trace",
+      "cloud_pca_top_share"
+    ],
+    "seed": 177,
+    "label_mode": "original",
+    "holdout_repeats": 7,
+    "bootstrap_iters": 100
+  },
+  "simple": {
+    "best_by_auc": {
+      "name": "centroid_rmse_40_160",
+      "orientation": "negative_higher_is_member",
+      "metrics": {
+        "auc": 0.801182,
+        "asr": 0.739258,
+        "tpr_at_1pct_fpr": 0.03125,
+        "tpr_at_0_1pct_fpr": 0.005859,
+        "member_score_mean": -0.035516,
+        "nonmember_score_mean": -0.0456
+      }
+    },
+    "best_by_low_fpr": {
+      "name": "cloud_pca_top_share",
+      "orientation": "negative_higher_is_member",
+      "metrics": {
+        "auc": 0.650913,
+        "asr": 0.618164,
+        "tpr_at_1pct_fpr": 0.078125,
+        "tpr_at_0_1pct_fpr": 0.017578,
+        "member_score_mean": -0.26242,
+        "nonmember_score_mean": -0.275406
+      }
+    }
+  },
+  "logistic": {
+    "aggregate_metrics": {
+      "auc": 0.961048,
+      "asr": 0.900391,
+      "tpr_at_1pct_fpr": 0.353516,
+      "tpr_at_0_1pct_fpr": 0.130859,
+      "member_score_mean": 0.829446,
+      "nonmember_score_mean": 0.170744
+    },
+    "aggregate_ci95": {
+      "auc": {
+        "p025": 0.948315,
+        "p975": 0.969186
+      },
+      "asr": {
+        "p025": 0.887183,
+        "p975": 0.919458
+      },
+      "tpr_at_1pct_fpr": {
+        "p025": 0.208887,
+        "p975": 0.542041
+      },
+      "tpr_at_0_1pct_fpr": {
+        "p025": 0.106397,
+        "p975": 0.296338
+      }
+    },
+    "mean_coefficients": [
+      2.431525,
+      0.319316,
+      0.265638,
+      0.973178,
+      0.861645,
+      -0.602296,
+      -0.057459,
+      0.220761,
+      -3.269253,
+      -4.971886,
+      0.839791,
+      0.442868,
+      2.490737,
+      -0.861653,
+      -0.443531,
+      -0.354449,
+      -0.290938
+    ],
+    "prediction_count": {
+      "min": 7,
+      "max": 7,
+      "mean": 7.0
+    }
+  },
+  "comparison": {
+    "raw_h2_logistic": {
+      "auc": 0.905693,
+      "asr": 0.841797,
+      "tpr_at_1pct_fpr": 0.134766,
+      "tpr_at_0_1pct_fpr": 0.0,
+      "member_score_mean": 0.743293,
+      "nonmember_score_mean": 0.256827
+    },
+    "lowpass_h2_logistic": {
+      "auc": 0.895679,
+      "asr": 0.831055,
+      "tpr_at_1pct_fpr": 0.148438,
+      "tpr_at_0_1pct_fpr": 0.025391,
+      "member_score_mean": 0.735716,
+      "nonmember_score_mean": 0.264013
+    },
+    "output_cloud_minus_raw_h2": {
+      "auc": 0.055355,
+      "asr": 0.058594,
+      "tpr_at_1pct_fpr": 0.21875,
+      "tpr_at_0_1pct_fpr": 0.130859
+    },
+    "output_cloud_minus_lowpass_h2": {
+      "auc": 0.065369,
+      "asr": 0.069336,
+      "tpr_at_1pct_fpr": 0.205078,
+      "tpr_at_0_1pct_fpr": 0.105468
+    }
+  },
+  "decision_gate": {
+    "uses_only_output_output_geometry": true,
+    "does_not_generate_new_responses": true,
+    "nonzero_strict_tail": true,
+    "beats_best_simple_low_fpr": true,
+    "reopen_allowed": false,
+    "requires_reseeded_or_interleaved_cache_before_promotion": true
+  },
+  "verdict": "candidate_complementary_output_cloud_geometry",
+  "notes": [
+    "This is a CPU-only scorer review on an existing H2 response cache.",
+    "It intentionally excludes seed-to-output distance features so it cannot collapse back into H2 simple distance.",
+    "A positive result is candidate-only until reseeded or interleaved response-cache controls rule out class-ordered sampling effects.",
+    "Do not expand this cache into KDE, shadow density, repeat-count, or same-cache feature sweeps."
+  ]
+}
\ No newline at end of file
diff --git a/workspaces/black-box/plan.md b/workspaces/black-box/plan.md
index 687225ad..4322b370 100644
--- a/workspaces/black-box/plan.md
+++ b/workspaces/black-box/plan.md
@@ -36,7 +36,12 @@
   not selected for GPU.
 - `H2 response-strength`: candidate-only with positive non-overlap signal;
   frozen lowpass follow-up is positive-but-bounded on `DDPM/CIFAR10`; SD/CelebA
-  text-to-image transfer is protocol-blocked.
+  text-to-image transfer is protocol-blocked. The 2026-05-25 output-cloud
+  geometry cache review found a stronger output-output candidate signal
+  (`AUC = 0.961529`, `TPR@0.1%FPR = 0.117188`) and a random-level label-shuffle
+  sanity check, but it remains candidate-only until a reseeded or interleaved
+  order-control cache preserves the signal. Do not promote it into Platform or
+  Runtime runners from the existing cache.
 - `simple image-to-image distance`: bounded single-asset evidence on
   SD1.5/CelebA; not a product row and not portability evidence.
 - `mid-frequency same-noise residual`: distinct paper-backed observable gap;
@@ -69,15 +74,18 @@
 
 ## Next Action
 
-No black-box GPU or CPU sidecar is selected. The next action belongs to the
-root long-horizon queue: continue Lane A only with a non-duplicate asset that
-has exact target identity, member/nonmember split artifacts, and response or
-score coverage. The imported Stable Diffusion ReDiffuse collaborator artifact,
-CLiD gated ZIP, CopyMark `laion_mi`, and CopyMark `laion_ridar` do not satisfy
-that gate by themselves, so preserve them as support/candidate evidence instead
-of turning them into rerun tasks. Do not reopen CommonCanvas, Beans,
-Fashion-MNIST, MIDST, or same-contract mid-frequency residual variants unless
-a genuinely new artifact or observable changes the decision gate.
+No black-box GPU or CPU sidecar is selected. The only H2 output-cloud reopen
+path is one bounded reseeded or interleaved order-control response-cache scout;
+until that exists, preserve the signal as candidate evidence instead of
+turning it into same-cache feature work. The broader root long-horizon queue
+still continues Lane A only with a non-duplicate asset that has exact target
+identity, member/nonmember split artifacts, and response or score coverage.
+The imported Stable Diffusion ReDiffuse collaborator artifact, CLiD gated ZIP,
+CopyMark `laion_mi`, and CopyMark `laion_ridar` do not satisfy that gate by
+themselves, so preserve them as support/candidate evidence instead of turning
+them into rerun tasks. Do not reopen CommonCanvas, Beans, Fashion-MNIST,
+MIDST, or same-contract mid-frequency residual variants unless a genuinely new
+artifact or observable changes the decision gate.
 
 ## Current Status