Skip to content

Commit 67e6ab6

Browse files
docs: record h2 order-control scout results
Record H2 output-cloud shared-position order-control scout results and keep the candidate out of admitted Platform/Runtime consumption.
1 parent 6912fbc commit 67e6ab6

12 files changed

Lines changed: 719 additions & 50 deletions

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,10 @@ workspaces/**/artifacts/**
145145
!workspaces/black-box/artifacts/h2-output-cloud-geometry-20260525.json
146146
!workspaces/black-box/artifacts/h2-output-cloud-geometry-seed177-20260525.json
147147
!workspaces/black-box/artifacts/h2-output-cloud-geometry-label-shuffle-20260525.json
148+
!workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-256-20260525.json
149+
!workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-256-label-shuffle-20260525.json
150+
!workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-20260525.json
151+
!workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-label-shuffle-20260525.json
148152
!workspaces/black-box/artifacts/beans-lora-member-denoising-loss-scout-20260513.json
149153
!workspaces/black-box/artifacts/clid-image-identity-boundary-20260511.json
150154
!workspaces/black-box/artifacts/midfreq-same-noise-residual-cache-audit-20260512.json

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Do not start from memory or old chat context. Re-anchor on repository files.
2828

2929
## Current Operating State
3030

31-
- Active work: `2026-05-25 H2 output-cloud geometry cache review is the latest metric verdict. It is a strong Research-side candidate on the existing H2 response cache (seed 176 logistic AUC = 0.961529, TPR@1%FPR = 0.333984, TPR@0.1%FPR = 0.117188; seed 177 AUC = 0.961048; label-shuffle AUC = 0.507595), but it is not admitted because the source cache used class-ordered sample offsets and needs a reseeded or interleaved order-control cache before any promotion. Do not create Platform/Runtime schema, bundle export, UI type, runner, KDE/shadow-density/repeat-count sweeps, or same-cache feature sweeps from this result. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after H2 output-cloud cache review. Feature-packet consumer lane remains deferred. LeakyCLIP remains CLIP / multimodal privacy watch-plus. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
31+
- Active work: `2026-05-25 H2 output-cloud geometry is the latest metric verdict. It is a strong Research-side candidate on the existing H2 response cache (seed 176 logistic AUC = 0.961529, TPR@1%FPR = 0.333984, TPR@0.1%FPR = 0.117188; seed 177 AUC = 0.961048; label-shuffle AUC = 0.507595). The bounded 256/256 shared-position order-control scout preserved the signal (AUC = 0.967819, TPR@1%FPR = 0.410156, TPR@0.1%FPR = 0.132812; label-shuffle AUC = 0.464066), so class-ordered seed offset is not a sufficient explanation. It is still not admitted because this remains Research-side H2 response-cache geometry, not a second public asset or Platform/Runtime contract. Do not create Platform/Runtime schema, bundle export, UI type, runner, KDE/shadow-density/repeat-count sweeps, same-cache feature sweeps, or a full 512/512 rerun just to complete a table. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after H2 output-cloud order-control scout. Feature-packet consumer lane remains deferred. LeakyCLIP remains CLIP / multimodal privacy watch-plus. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
3232
- Next GPU candidate: none selected
3333
- Long-horizon control: follow `ROADMAP.md` section
3434
`Long-Horizon Research Task Board(2026-05-13 起)` before reopening any

ROADMAP.md

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,12 @@
44
55
## 2026-05-25 H2 output-cloud geometry 候选信号
66

7-
最新决策:H2 response-strength 的既有 `512 / 512` response cache 暴露出一个强的
8-
output-output geometry 候选信号,但在 order-control 通过前不晋升、不释放产品消费、
9-
不扩展同 cache 特征工程。该复查只读取现有
10-
`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`
11-
没有生成新响应、没有下载资产、没有运行 GPU。
7+
最新决策:H2 response-strength 的 output-cloud geometry 是 Research-side 强候选,
8+
并且已通过一个有界 `256 / 256` shared-position order-control scout;但它仍不晋升、
9+
不释放产品消费、不扩展同 cache 特征工程,也不默认补跑完整 `512 / 512`
10+
shared-position。第一轮复查读取既有
11+
`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`
12+
控制轮生成了本地 `256 / 256` shared-position cache,没有下载资产。
1213

1314
该 scorer 刻意排除 seed-to-output distance,只使用同 timestep repeat 间 RMSE、
1415
不同 timestep centroid RMSE 和 response-cloud Gram/PCA 特征。主结果为
@@ -19,13 +20,19 @@ seed `177` 稳定性仍为 `AUC = 0.961048`,`TPR@1%FPR = 0.353516`,
1920
`TPR@0.1%FPR = 0.130859`;label-shuffle sanity 回到随机级
2021
`AUC = 0.507595`
2122

22-
关键 caveat:源 cache 生成时 member 侧 `sample_offset = 0`,nonmember 侧
23-
`sample_offset = len(member_indices)`,output-output geometry 对采样种子和响应云形态敏感,
24-
所以当前强信号可能混入 class-ordered sampling effect。该结果只能作为
25-
Research-side 强候选;下一次重新评估只能是一个有界 reseeded / interleaved
26-
order-control response-cache scout。当前 slots 仍为:
23+
已完成的 order-control:`--seed-offset-policy shared-position` 让 member / nonmember
24+
使用相同 per-position seed offset。该 `256 / 256` 控制 cache 上 output-cloud
25+
logistic 仍为 `AUC = 0.967819``ASR = 0.923828`
26+
`TPR@1%FPR = 0.410156``TPR@0.1%FPR = 0.132812`;label-shuffle
27+
回到随机级 `AUC = 0.464066`。同尺寸旧 class-ordered subset 为
28+
`AUC = 0.967438``TPR@1%FPR = 0.179688`
29+
`TPR@0.1%FPR = 0.105469`。因此 class-ordered seed offset 不再是该强信号的充分解释。
30+
31+
该结果只能作为 Research-side 强候选;下一步不是同 cache sweep,也不是为了补表格跑
32+
完整 `512 / 512` shared-position。重新打开只应基于正式机制晋升、第二公开资产或独立消费合约。
33+
当前 slots 仍为:
2734
`active_gpu_question = none``next_gpu_candidate = none`
28-
`CPU sidecar = none selected after H2 output-cloud cache review`
35+
`CPU sidecar = none selected after H2 output-cloud order-control scout`
2936
See
3037
[docs/evidence/h2-output-cloud-geometry-20260525.md](docs/evidence/h2-output-cloud-geometry-20260525.md)
3138

docs/evidence/h2-output-cloud-geometry-20260525.md

Lines changed: 77 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
11
# H2 Output-Cloud Geometry Cache Review
22

33
> Date: 2026-05-25
4-
> Status: candidate complementary signal / CPU-only cache review / order-control required before promotion / no GPU release / no admitted row
4+
> Status: candidate complementary signal / order-control scout passed / no admitted row / no 512/512 rerun selected
55
66
## Question
77

88
在已有 H2 response-strength cache 上,输出之间的几何结构是否携带不同于
99
seed-to-output distance 的 membership 信号?
1010

11-
本轮只复用现有
11+
第一轮只复用现有
1212
`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`
13-
没有生成新响应、没有下载资产、没有运行 GPU,也没有扩展同一路线的 KDE、shadow
13+
随后只释放一个有界 `256 / 256` shared-position order-control scout,用来回答
14+
class-ordered seed-offset caveat。没有下载资产,也没有扩展同一路线的 KDE、shadow
1415
density、repeat-count 或特征 sweep。
1516

1617
## Contract
@@ -87,46 +88,99 @@ Label-shuffle sanity:
8788

8889
这说明 scorer/evaluation 管线没有明显的标签直通泄漏。
8990

90-
## Critical Caveat
91+
## Shared-Position Order-Control Scout
9192

92-
该结果仍然不能晋升。源 cache 的响应生成存在 class-ordered seed offset:
93-
`scripts/run_h2_response_strength_validation.py` member 侧使用
94-
`sample_offset = 0`,nonmember 侧使用 `sample_offset = len(member_indices)`
95-
Output-output geometry 对采样种子和响应云形态敏感,因此当前强信号可能混入
93+
`512 / 512` cache 的响应生成存在 class-ordered seed offset:
94+
`scripts/run_h2_response_strength_validation.py` 的历史默认行为是 member
95+
`sample_offset = 0`,nonmember `sample_offset = len(member_indices)`
96+
Output-output geometry 对采样种子和响应云形态敏感,因此必须检查强信号是否只是
9697
class-ordered sampling effect。
9798

98-
这不是要继续在同一个 cache 上补表格;它只定义一个非常窄的下一步:
99-
如果需要推进,最多释放一个有界 order-control / reseeded / interleaved
100-
response-cache scout,用来判断该强信号是否跨 class-order 控制保留。
101-
102-
当前允许的最小脚本改动仅限于生成这个控制 cache:
99+
本轮只加入一个窄的 seed policy 控制:
103100
`scripts/run_h2_response_strength_validation.py --seed-offset-policy shared-position`
104-
该模式会让 member / nonmember 使用相同 per-position seed offset,并在
105-
`summary.json` 中标记 `order_control_scout = true`。它只用于重新评估
106-
class-ordered sampling effect,不代表 admission,也不得直接生成 Platform /
107-
Runtime row。
101+
该模式让 member / nonmember 使用相同 per-position seed offset,并在
102+
`summary.json` 中标记 `order_control_scout = true`。运行边界为
103+
`256 / 256`,timesteps `40 / 80 / 120 / 160`,repeats `2`,seed `176`
104+
holdout repeats `7`,bootstrap iters `100`。GPU scout 用时 `208.866516s`
105+
106+
Runner summary 的 H2 distance scorer 在 shared-position 下仍为正但尾部弱:
107+
108+
| Metric | Raw H2 logistic | Lowpass H2 logistic |
109+
| --- | ---: | ---: |
110+
| AUC | `0.906967` | `0.898102` |
111+
| ASR | `0.837891` | `0.828125` |
112+
| TPR@1%FPR | `0.058594` | `0.066406` |
113+
| TPR@0.1%FPR | `0.003906` | `0.003906` |
114+
115+
Output-cloud geometry review on the same shared-position cache:
116+
`workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-256-20260525.json`
117+
118+
| Metric | Shared-position `256 / 256` |
119+
| --- | ---: |
120+
| AUC | `0.967819` |
121+
| ASR | `0.923828` |
122+
| TPR@1%FPR | `0.410156` |
123+
| TPR@0.1%FPR | `0.132812` |
124+
125+
Label-shuffle sanity for the shared-position cache:
126+
`workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-256-label-shuffle-20260525.json`
127+
128+
| Metric | Shared-position label shuffle |
129+
| --- | ---: |
130+
| AUC | `0.464066` |
131+
| ASR | `0.505859` |
132+
| TPR@1%FPR | `0.003906` |
133+
| TPR@0.1%FPR | `0.0` |
134+
135+
Same-size historical class-ordered subset from the old cache:
136+
`workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-20260525.json`
137+
138+
| Metric | Class-ordered subset `256 / 256` |
139+
| --- | ---: |
140+
| AUC | `0.967438` |
141+
| ASR | `0.916016` |
142+
| TPR@1%FPR | `0.179688` |
143+
| TPR@0.1%FPR | `0.105469` |
144+
145+
Class-ordered subset label shuffle:
146+
`workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-label-shuffle-20260525.json`
147+
148+
| Metric | Class-ordered subset label shuffle |
149+
| --- | ---: |
150+
| AUC | `0.427902` |
151+
| ASR | `0.5` |
152+
| TPR@1%FPR | `0.0` |
153+
| TPR@0.1%FPR | `0.0` |
154+
155+
Interpretation: shared-position order-control did not collapse the output-cloud
156+
geometry signal, and its label-shuffle check returns random-level. This removes
157+
the previous class-ordered seed-offset caveat as a sufficient explanation for
158+
the signal. The result still does not imply product admission: it is one
159+
controlled scout on H2 DDPM/CIFAR10 response-cache geometry, not a second
160+
public asset or Platform/Runtime contract.
108161

109162
## Decision
110163

111-
`candidate complementary signal / order-control required / no admitted row`
164+
`candidate complementary signal / order-control scout passed / no admitted row`
112165

113166
保留为 Research-side 强候选,因为它满足三个有价值条件:
114167

115168
- 它是不同 observable:output-output cloud geometry,而不是 seed-to-output distance。
116169
- 它在同一 H2 cache 上明显强于 raw/lowpass H2 logistic。
117170
- 它通过了 seed-177 稳定性和 label-shuffle sanity。
171+
- 它在 `256 / 256` shared-position order-control scout 中没有因 seed-offset 控制而坍塌。
118172

119173
但当前不做以下事情:
120174

121175
- 不升级到 Platform/Runtime admitted bundle。
122176
- 不新增产品 schema、Runtime runner、UI 类型或 bundle row。
123177
- 不在同一 cache 上展开 KDE、shadow density、repeat-count、特征族或融合 sweep。
124-
- 不释放 GPU 或大下载。
178+
- 不释放完整 `512 / 512` shared-position GPU rerun 或大下载;当前 `256 / 256`
179+
order-control 已经回答了会改变路线的 caveat。
125180

126-
下一次重新评估只允许基于一个 order-control cache 的结果。如果 reseeded /
127-
interleaved cache 仍保持强 AUC 和严格尾部恢复,再讨论是否进入更正式的 H2
128-
output-cloud 机制线;如果不保持,该候选直接关闭为 class-ordered response-cache
129-
artifact。
181+
下一次重新评估不应是同 cache feature sweep 或为了表格好看的 `512 / 512` 补跑。
182+
只有在需要正式晋升机制线、发现第二公开资产、或要建立独立消费合约时,才重新定义
183+
更高成本的验证任务。
130184

131185
## Platform and Runtime Impact
132186

docs/evidence/reproduction-status.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Smoke tests and dry runs are engineering validation, not benchmark claims.
3232
| Track | Status | Notes |
3333
| --- | --- | --- |
3434
| Black-box `recon` | `evidence-ready` | Strongest black-box method and admitted non-CLiD product row. Public data limits strict paper-aligned claims. The bounded public-100 step30 rerun plus unified artifact summary yields the promoted coherent packet: `AUC = 0.837`, `ASR = 0.74`, `TPR@1%FPR = 0.22`, `TPR@0.1%FPR = 0.11`. See [non-clid-black-box-reselection.md](non-clid-black-box-reselection.md), [recon-product-validation-contract.md](recon-product-validation-contract.md), [recon-product-validation-result.md](recon-product-validation-result.md), and [../product-bridge/recon-product-validation-handoff.md](../product-bridge/recon-product-validation-handoff.md). |
35+
| Black-box `H2 output-cloud geometry` | `hold-candidate` | Strong Research-side output-output geometry signal on H2 response caches, but not an admitted Platform/Runtime row. Existing `512 / 512` cache review gives `AUC = 0.961529`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`; seed `177` is stable and label shuffle is random-level. The `256 / 256` shared-position order-control scout preserves the signal (`AUC = 0.967819`, `TPR@1%FPR = 0.410156`, `TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`), so class-ordered seed offset is not a sufficient explanation. Do not promote, add schema/runner/UI/bundle rows, run same-cache feature sweeps, or schedule a full `512 / 512` rerun by default. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
3536
| Black-box `CLiD` | `hold-candidate` | Selected as a bounded black-box lane after H2 SD/CelebA text-to-image transfer was protocol-blocked. The official CPU `inter_output/*` replay is strong (`AUC = 0.961277`, `TPR@1%FPR = 0.675470`, `ASR = 0.891957`) and now has a machine-readable candidate-only card, but row identity remains blocked because the public score rows are numeric-only and the 2026-05-15 authenticated HF `mia_COCO.zip` `HEAD`/`Range` recheck still returned `403`. Earlier local prompt-conditioned packets were strong and repeat-stable, but prompt-neutral perturbation collapses the signal, swapped-prompt control is degraded, within-split prompt shuffle is weak and seed-sensitive, prompt-text-only review is moderate AUC but weak strict-tail, and control attribution shows auxiliary-feature instability under prompt controls. Current evidence supports a prompt-conditioned diagnostic claim only, not admitted general black-box evidence. No next CLiD GPU task is selected. See [../product-bridge/clid-candidate-evidence-card.md](../product-bridge/clid-candidate-evidence-card.md), [clid-official-inter-output-replay-20260515.md](clid-official-inter-output-replay-20260515.md), [clid-identity-manifest-gate-20260515.md](clid-identity-manifest-gate-20260515.md), [black-box-next-lane-selection.md](black-box-next-lane-selection.md), [clid-bridge-contract.md](clid-bridge-contract.md), [clid-score-schema-gate.md](clid-score-schema-gate.md), [clid-tiny-score-bridge.md](clid-tiny-score-bridge.md), [clid-100-score-packet.md](clid-100-score-packet.md), [clid-candidate-integrity-review.md](clid-candidate-integrity-review.md), [clid-repeat-stability.md](clid-repeat-stability.md), [clid-prompt-perturbation.md](clid-prompt-perturbation.md), [clid-prompt-conditioning-boundary.md](clid-prompt-conditioning-boundary.md), [clid-swapped-prompt-control.md](clid-swapped-prompt-control.md), [clid-within-split-shuffle-control.md](clid-within-split-shuffle-control.md), [clid-prompt-text-only-review.md](clid-prompt-text-only-review.md), and [clid-control-attribution.md](clid-control-attribution.md). |
3637
| Black-box `variation` | `code-ready` | API-only support method; needs real query data for stronger claims. |
3738
| Feature-packet consumer lane | `deferred-candidate` | 2026-05-25 consumer verdict keeps the gray-box feature-packet lane out of Platform/Runtime. Tracing the Roots remains positive Research evidence (`AUC = 0.815826`, `TPR@1%FPR = 0.134000`), but live narrow public-surface recheck found no second non-source-equivalent public feature-packet and no raw checkpoint/sample/regeneration assets. Do not add feature-packet schema, bundle export, validators, tests, Platform UI type, Runtime runner, GPU task, or download from this singleton. See [feature-packet-channel-consumer-verdict-20260525.md](feature-packet-channel-consumer-verdict-20260525.md) and [../product-bridge/feature-packet-lane.md](../product-bridge/feature-packet-lane.md). |

docs/evidence/workspace-evidence-index.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,19 @@ This index separates current track state from archived research history.
66

77
Latest Research update:
88
[h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md)
9-
records a CPU-only metric verdict on the existing H2 response-strength cache.
9+
records a metric verdict on the H2 response-strength cache plus a bounded
10+
`256 / 256` shared-position order-control scout.
1011
The output-output geometry scorer is a strong Research-side candidate
1112
(`AUC = 0.961529`, `TPR@1%FPR = 0.333984`,
1213
`TPR@0.1%FPR = 0.117188`) and is stable under seed `177`
1314
(`AUC = 0.961048`), while label-shuffle sanity returns random-level
14-
(`AUC = 0.507595`). It is not admitted because the source cache used
15-
class-ordered sample offsets and needs a reseeded or interleaved order-control
16-
cache before promotion. Decision: `candidate complementary signal /
17-
order-control required / no admitted row / no download / no GPU release`.
15+
(`AUC = 0.507595`). The shared-position order-control scout also stays strong
16+
(`AUC = 0.967819`, `TPR@1%FPR = 0.410156`,
17+
`TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`).
18+
It is not admitted because this remains a Research-side H2 response-cache
19+
geometry candidate, not a second public asset or Platform/Runtime contract.
20+
Decision: `candidate complementary signal / order-control scout passed /
21+
no admitted row / no download / no 512/512 rerun selected`.
1822

1923
Previous Research update:
2024
[feature-packet-channel-consumer-verdict-20260525.md](feature-packet-channel-consumer-verdict-20260525.md)

0 commit comments

Comments
 (0)