Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@ workspaces/**/artifacts/**
!workspaces/black-box/artifacts/h2-output-cloud-geometry-label-shuffle-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-256-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-256-label-shuffle-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-seed177-256-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-seed177-256-label-shuffle-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-label-shuffle-20260525.json
!workspaces/black-box/artifacts/beans-lora-member-denoising-loss-scout-20260513.json
Expand Down
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Do not start from memory or old chat context. Re-anchor on repository files.

## Current Operating State

- Active work: `2026-05-25 H2 output-cloud geometry is the latest metric verdict. It is a strong Research-side candidate on the existing H2 response cache (seed 176 logistic AUC = 0.961529, TPR@1%FPR = 0.333984, TPR@0.1%FPR = 0.117188; seed 177 AUC = 0.961048; label-shuffle AUC = 0.507595). The bounded 256/256 shared-position order-control scout preserved the signal (AUC = 0.967819, TPR@1%FPR = 0.410156, TPR@0.1%FPR = 0.132812; label-shuffle AUC = 0.464066), so class-ordered seed offset is not a sufficient explanation. It is still not admitted because this remains Research-side H2 response-cache geometry, not a second public asset or Platform/Runtime contract. Do not create Platform/Runtime schema, bundle export, UI type, runner, KDE/shadow-density/repeat-count sweeps, same-cache feature sweeps, or a full 512/512 rerun just to complete a table. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after H2 output-cloud order-control scout. Feature-packet consumer lane remains deferred. LeakyCLIP remains CLIP / multimodal privacy watch-plus. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
- Active work: `2026-05-25 H2 output-cloud geometry is the latest metric verdict. It is a strong Research-side candidate on the existing H2 response cache (seed 176 logistic AUC = 0.961529, TPR@1%FPR = 0.333984, TPR@0.1%FPR = 0.117188; seed 177 AUC = 0.961048; label-shuffle AUC = 0.507595). The bounded 256/256 shared-position order-control scout preserved the signal (AUC = 0.967819, TPR@1%FPR = 0.410156, TPR@0.1%FPR = 0.132812; label-shuffle AUC = 0.464066), so class-ordered seed offset is not a sufficient explanation. The same controlled boundary at seed 177 remains strong (AUC = 0.956192, TPR@1%FPR = 0.285156, TPR@0.1%FPR = 0.109375; label-shuffle AUC = 0.484070), so the controlled signal is not single-seed. It is still not admitted because this remains Research-side H2 response-cache geometry, not a second public asset or Platform/Runtime contract. Do not create Platform/Runtime schema, bundle export, UI type, runner, KDE/shadow-density/repeat-count sweeps, same-cache feature sweeps, or a full 512/512 rerun just to complete a table. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after H2 output-cloud order-control seed-stability scout. Feature-packet consumer lane remains deferred. LeakyCLIP remains CLIP / multimodal privacy watch-plus. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
- Next GPU candidate: none selected
- Long-horizon control: follow `ROADMAP.md` section
`Long-Horizon Research Task Board(2026-05-13 起)` before reopening any
Expand Down
11 changes: 8 additions & 3 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@
## 2026-05-25 H2 output-cloud geometry 候选信号

最新决策:H2 response-strength 的 output-cloud geometry 是 Research-side 强候选,
并且已通过一个有界 `256 / 256` shared-position order-control scout;但它仍不晋升、
并且已通过有界 `256 / 256` shared-position order-control 和 seed-stability scout;
但它仍不晋升、
不释放产品消费、不扩展同 cache 特征工程,也不默认补跑完整 `512 / 512`
shared-position。第一轮复查读取既有
`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`;
控制轮生成了本地 `256 / 256` shared-position cache,没有下载资产。
控制轮生成了本地 `256 / 256` shared-position cache,稳定性轮只把 seed 从 `176`
改成 `177`,没有下载资产。

该 scorer 刻意排除 seed-to-output distance,只使用同 timestep repeat 间 RMSE、
不同 timestep centroid RMSE 和 response-cloud Gram/PCA 特征。主结果为
Expand All @@ -27,12 +29,15 @@ logistic 仍为 `AUC = 0.967819`,`ASR = 0.923828`,
回到随机级 `AUC = 0.464066`。同尺寸旧 class-ordered subset 为
`AUC = 0.967438`,`TPR@1%FPR = 0.179688`,
`TPR@0.1%FPR = 0.105469`。因此 class-ordered seed offset 不再是该强信号的充分解释。
同边界 seed `177` shared-position scout 继续保持强信号:output-cloud logistic
`AUC = 0.956192`,`ASR = 0.896484`,`TPR@1%FPR = 0.285156`,
`TPR@0.1%FPR = 0.109375`;label-shuffle 回到随机级 `AUC = 0.484070`。

该结果只能作为 Research-side 强候选;下一步不是同 cache sweep,也不是为了补表格跑
完整 `512 / 512` shared-position。重新打开只应基于正式机制晋升、第二公开资产或独立消费合约。
当前 slots 仍为:
`active_gpu_question = none`,`next_gpu_candidate = none`,
`CPU sidecar = none selected after H2 output-cloud order-control scout`。
`CPU sidecar = none selected after H2 output-cloud order-control seed-stability scout`。
See
[docs/evidence/h2-output-cloud-geometry-20260525.md](docs/evidence/h2-output-cloud-geometry-20260525.md)。

Expand Down
52 changes: 48 additions & 4 deletions docs/evidence/h2-output-cloud-geometry-20260525.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# H2 Output-Cloud Geometry Cache Review

> Date: 2026-05-25
> Status: candidate complementary signal / order-control scout passed / no admitted row / no 512/512 rerun selected
> Status: candidate complementary signal / order-control scout passed / shared-position seed-stable / no admitted row / no 512/512 rerun selected

## Question

Expand All @@ -11,8 +11,9 @@ seed-to-output distance 的 membership 信号?
第一轮只复用现有
`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`。
随后只释放一个有界 `256 / 256` shared-position order-control scout,用来回答
class-ordered seed-offset caveat。没有下载资产,也没有扩展同一路线的 KDE、shadow
density、repeat-count 或特征 sweep。
class-ordered seed-offset caveat;再释放一个同边界的 seed `177` 稳定性 scout,
用来判断 order-control 后的强信号是否只是单 seed 现象。没有下载资产,也没有扩展
同一路线的 KDE、shadow density、repeat-count 或特征 sweep。

## Contract

Expand Down Expand Up @@ -159,16 +160,59 @@ the signal. The result still does not imply product admission: it is one
controlled scout on H2 DDPM/CIFAR10 response-cache geometry, not a second
public asset or Platform/Runtime contract.

## Shared-Position Seed-177 Stability Scout

为避免把 order-control 结论建立在单个 random seed 上,下一步只跑同边界
`256 / 256` shared-position seed `177`。运行边界不扩大:timesteps
`40 / 80 / 120 / 160`,repeats `2`,holdout repeats `7`,bootstrap iters
`100`。GPU scout 用时 `185.470864s`。

Runner summary 的 H2 distance scorer:

| Metric | Raw H2 logistic | Lowpass H2 logistic |
| --- | ---: | ---: |
| AUC | `0.911255` | `0.896698` |
| ASR | `0.851562` | `0.828125` |
| TPR@1%FPR | `0.113281` | `0.093750` |
| TPR@0.1%FPR | `0.0` | `0.062500` |

Output-cloud geometry review:
`workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-seed177-256-20260525.json`

| Metric | Shared-position seed `177` |
| --- | ---: |
| AUC | `0.956192` |
| ASR | `0.896484` |
| TPR@1%FPR | `0.285156` |
| TPR@0.1%FPR | `0.109375` |

Label-shuffle sanity:
`workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-seed177-256-label-shuffle-20260525.json`

| Metric | Seed `177` label shuffle |
| --- | ---: |
| AUC | `0.484070` |
| ASR | `0.513672` |
| TPR@1%FPR | `0.023438` |
| TPR@0.1%FPR | `0.011719` |

Interpretation: the shared-position output-cloud signal remains strong under
seed `177`, and label shuffle stays random-level. Together with seed `176`, this
supports the narrower conclusion that output-cloud geometry is a stable H2
mechanism candidate after seed-offset control. It still does not create a
second public asset, a product contract, or an admitted row.

## Decision

`candidate complementary signal / order-control scout passed / no admitted row`。
`candidate complementary signal / order-control scout passed / seed-stable / no admitted row`。

保留为 Research-side 强候选,因为它满足三个有价值条件:

- 它是不同 observable:output-output cloud geometry,而不是 seed-to-output distance。
- 它在同一 H2 cache 上明显强于 raw/lowpass H2 logistic。
- 它通过了 seed-177 稳定性和 label-shuffle sanity。
- 它在 `256 / 256` shared-position order-control scout 中没有因 seed-offset 控制而坍塌。
- 它在 shared-position seed `177` scout 中仍保持强 AUC 和非零严格尾部恢复。

但当前不做以下事情:

Expand Down
2 changes: 1 addition & 1 deletion docs/evidence/reproduction-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Smoke tests and dry runs are engineering validation, not benchmark claims.
| Track | Status | Notes |
| --- | --- | --- |
| Black-box `recon` | `evidence-ready` | Strongest black-box method and admitted non-CLiD product row. Public data limits strict paper-aligned claims. The bounded public-100 step30 rerun plus unified artifact summary yields the promoted coherent packet: `AUC = 0.837`, `ASR = 0.74`, `TPR@1%FPR = 0.22`, `TPR@0.1%FPR = 0.11`. See [non-clid-black-box-reselection.md](non-clid-black-box-reselection.md), [recon-product-validation-contract.md](recon-product-validation-contract.md), [recon-product-validation-result.md](recon-product-validation-result.md), and [../product-bridge/recon-product-validation-handoff.md](../product-bridge/recon-product-validation-handoff.md). |
| Black-box `H2 output-cloud geometry` | `hold-candidate` | Strong Research-side output-output geometry signal on H2 response caches, but not an admitted Platform/Runtime row. Existing `512 / 512` cache review gives `AUC = 0.961529`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`; seed `177` is stable and label shuffle is random-level. The `256 / 256` shared-position order-control scout preserves the signal (`AUC = 0.967819`, `TPR@1%FPR = 0.410156`, `TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`), so class-ordered seed offset is not a sufficient explanation. Do not promote, add schema/runner/UI/bundle rows, run same-cache feature sweeps, or schedule a full `512 / 512` rerun by default. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
| Black-box `H2 output-cloud geometry` | `hold-candidate` | Strong Research-side output-output geometry signal on H2 response caches, but not an admitted Platform/Runtime row. Existing `512 / 512` cache review gives `AUC = 0.961529`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`; seed `177` is stable and label shuffle is random-level. The `256 / 256` shared-position order-control scout preserves the signal (`AUC = 0.967819`, `TPR@1%FPR = 0.410156`, `TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`), so class-ordered seed offset is not a sufficient explanation. The same controlled boundary at seed `177` remains strong (`AUC = 0.956192`, `TPR@1%FPR = 0.285156`, `TPR@0.1%FPR = 0.109375`) with random-level label shuffle (`AUC = 0.484070`), so the controlled signal is not single-seed. Do not promote, add schema/runner/UI/bundle rows, run same-cache feature sweeps, or schedule a full `512 / 512` rerun by default. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
| Black-box `CLiD` | `hold-candidate` | Selected as a bounded black-box lane after H2 SD/CelebA text-to-image transfer was protocol-blocked. The official CPU `inter_output/*` replay is strong (`AUC = 0.961277`, `TPR@1%FPR = 0.675470`, `ASR = 0.891957`) and now has a machine-readable candidate-only card, but row identity remains blocked because the public score rows are numeric-only and the 2026-05-15 authenticated HF `mia_COCO.zip` `HEAD`/`Range` recheck still returned `403`. Earlier local prompt-conditioned packets were strong and repeat-stable, but prompt-neutral perturbation collapses the signal, swapped-prompt control is degraded, within-split prompt shuffle is weak and seed-sensitive, prompt-text-only review is moderate AUC but weak strict-tail, and control attribution shows auxiliary-feature instability under prompt controls. Current evidence supports a prompt-conditioned diagnostic claim only, not admitted general black-box evidence. No next CLiD GPU task is selected. See [../product-bridge/clid-candidate-evidence-card.md](../product-bridge/clid-candidate-evidence-card.md), [clid-official-inter-output-replay-20260515.md](clid-official-inter-output-replay-20260515.md), [clid-identity-manifest-gate-20260515.md](clid-identity-manifest-gate-20260515.md), [black-box-next-lane-selection.md](black-box-next-lane-selection.md), [clid-bridge-contract.md](clid-bridge-contract.md), [clid-score-schema-gate.md](clid-score-schema-gate.md), [clid-tiny-score-bridge.md](clid-tiny-score-bridge.md), [clid-100-score-packet.md](clid-100-score-packet.md), [clid-candidate-integrity-review.md](clid-candidate-integrity-review.md), [clid-repeat-stability.md](clid-repeat-stability.md), [clid-prompt-perturbation.md](clid-prompt-perturbation.md), [clid-prompt-conditioning-boundary.md](clid-prompt-conditioning-boundary.md), [clid-swapped-prompt-control.md](clid-swapped-prompt-control.md), [clid-within-split-shuffle-control.md](clid-within-split-shuffle-control.md), [clid-prompt-text-only-review.md](clid-prompt-text-only-review.md), and [clid-control-attribution.md](clid-control-attribution.md). |
| Black-box `variation` | `code-ready` | API-only support method; needs real query data for stronger claims. |
| Feature-packet consumer lane | `deferred-candidate` | 2026-05-25 consumer verdict keeps the gray-box feature-packet lane out of Platform/Runtime. Tracing the Roots remains positive Research evidence (`AUC = 0.815826`, `TPR@1%FPR = 0.134000`), but live narrow public-surface recheck found no second non-source-equivalent public feature-packet and no raw checkpoint/sample/regeneration assets. Do not add feature-packet schema, bundle export, validators, tests, Platform UI type, Runtime runner, GPU task, or download from this singleton. See [feature-packet-channel-consumer-verdict-20260525.md](feature-packet-channel-consumer-verdict-20260525.md) and [../product-bridge/feature-packet-lane.md](../product-bridge/feature-packet-lane.md). |
Expand Down
9 changes: 6 additions & 3 deletions docs/evidence/workspace-evidence-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,21 @@ This index separates current track state from archived research history.

Latest Research update:
[h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md)
records a metric verdict on the H2 response-strength cache plus a bounded
`256 / 256` shared-position order-control scout.
records a metric verdict on the H2 response-strength cache plus bounded
`256 / 256` shared-position order-control and seed-stability scouts.
The output-output geometry scorer is a strong Research-side candidate
(`AUC = 0.961529`, `TPR@1%FPR = 0.333984`,
`TPR@0.1%FPR = 0.117188`) and is stable under seed `177`
(`AUC = 0.961048`), while label-shuffle sanity returns random-level
(`AUC = 0.507595`). The shared-position order-control scout also stays strong
(`AUC = 0.967819`, `TPR@1%FPR = 0.410156`,
`TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`).
The same controlled boundary at seed `177` remains strong (`AUC = 0.956192`,
`TPR@1%FPR = 0.285156`, `TPR@0.1%FPR = 0.109375`) with random-level label
shuffle (`AUC = 0.484070`).
It is not admitted because this remains a Research-side H2 response-cache
geometry candidate, not a second public asset or Platform/Runtime contract.
Decision: `candidate complementary signal / order-control scout passed /
Decision: `candidate complementary signal / order-control scout passed / seed-stable /
no admitted row / no download / no 512/512 rerun selected`.

Previous Research update:
Expand Down
7 changes: 5 additions & 2 deletions workspaces/black-box/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@
label-shuffle sanity 回到随机级。后续 `256 / 256` shared-position
order-control scout 仍为 `AUC = 0.967819`、`TPR@1%FPR = 0.410156`、
`TPR@0.1%FPR = 0.132812`,label-shuffle `AUC = 0.464066`,因此
class-ordered seed offset 不是充分解释。但它仍只是 Research-side H2
response-cache geometry 候选,不是第二公开资产或产品合约。
class-ordered seed offset 不是充分解释。同边界 seed `177` shared-position
scout 仍为 `AUC = 0.956192`、`TPR@1%FPR = 0.285156`、
`TPR@0.1%FPR = 0.109375`,label-shuffle `AUC = 0.484070`,说明该候选
在 order-control 后不是单 seed 现象。但它仍只是 Research-side H2 response-cache
geometry 候选,不是第二公开资产或产品合约。
不要把它扩成 KDE、shadow density、repeat-count 或同 cache feature sweep;
不要补跑完整 `512 / 512` 只为表格好看;不要新增 Platform/Runtime schema、
runner 或 admitted bundle row。
Expand Down
Loading
Loading