Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ workspaces/**/artifacts/**
!workspaces/black-box/artifacts/h2-output-cloud-geometry-shared-position-seed177-256-label-shuffle-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-label-shuffle-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-transfer-shared-position-256-20260525.json
!workspaces/black-box/artifacts/beans-lora-member-denoising-loss-scout-20260513.json
!workspaces/black-box/artifacts/clid-image-identity-boundary-20260511.json
!workspaces/black-box/artifacts/midfreq-same-noise-residual-cache-audit-20260512.json
Expand Down
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Do not start from memory or old chat context. Re-anchor on repository files.

## Current Operating State

- Active work: `2026-05-25 H2 output-cloud geometry is the latest metric verdict. It is a strong Research-side candidate on the existing H2 response cache (seed 176 logistic AUC = 0.961529, TPR@1%FPR = 0.333984, TPR@0.1%FPR = 0.117188; seed 177 AUC = 0.961048; label-shuffle AUC = 0.507595). The bounded 256/256 shared-position order-control scout preserved the signal (AUC = 0.967819, TPR@1%FPR = 0.410156, TPR@0.1%FPR = 0.132812; label-shuffle AUC = 0.464066), so class-ordered seed offset is not a sufficient explanation. The same controlled boundary at seed 177 remains strong (AUC = 0.956192, TPR@1%FPR = 0.285156, TPR@0.1%FPR = 0.109375; label-shuffle AUC = 0.484070), so the controlled signal is not single-seed. It is still not admitted because this remains Research-side H2 response-cache geometry, not a second public asset or Platform/Runtime contract. Do not create Platform/Runtime schema, bundle export, UI type, runner, KDE/shadow-density/repeat-count sweeps, same-cache feature sweeps, or a full 512/512 rerun just to complete a table. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after H2 output-cloud order-control seed-stability scout. Feature-packet consumer lane remains deferred. LeakyCLIP remains CLIP / multimodal privacy watch-plus. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
- Active work: `2026-05-25 H2 output-cloud geometry is the latest metric verdict. It is a strong Research-side candidate on the existing H2 response cache (seed 176 logistic AUC = 0.961529, TPR@1%FPR = 0.333984, TPR@0.1%FPR = 0.117188; seed 177 AUC = 0.961048; label-shuffle AUC = 0.507595). The bounded 256/256 shared-position order-control scout preserved the signal (AUC = 0.967819, TPR@1%FPR = 0.410156, TPR@0.1%FPR = 0.132812; label-shuffle AUC = 0.464066), so class-ordered seed offset is not a sufficient explanation. The same controlled boundary at seed 177 remains strong (AUC = 0.956192, TPR@1%FPR = 0.285156, TPR@0.1%FPR = 0.109375; label-shuffle AUC = 0.484070), so the controlled signal is not single-seed. A CPU-only fold-disjoint transfer review across the two shared-position caches is also strong (seed 176 -> 177 AUC = 0.948990, TPR@1%FPR = 0.375000, TPR@0.1%FPR = 0.058594; seed 177 -> 176 AUC = 0.970520, TPR@1%FPR = 0.390625, TPR@0.1%FPR = 0.074219; mean AUC = 0.959755). It is still not admitted because this remains Research-side H2 response-cache geometry, not a second public asset or Platform/Runtime contract. Do not create Platform/Runtime schema, bundle export, UI type, runner, KDE/shadow-density/repeat-count sweeps, same-cache feature sweeps, or a full 512/512 rerun just to complete a table. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after H2 output-cloud cross-cache transfer review. Feature-packet consumer lane remains deferred. LeakyCLIP remains CLIP / multimodal privacy watch-plus. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
- Next GPU candidate: none selected
- Long-horizon control: follow `ROADMAP.md` section
`Long-Horizon Research Task Board(2026-05-13 起)` before reopening any
Expand Down
14 changes: 12 additions & 2 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
## 2026-05-25 H2 output-cloud geometry 候选信号

最新决策:H2 response-strength 的 output-cloud geometry 是 Research-side 强候选,
并且已通过有界 `256 / 256` shared-position order-control 和 seed-stability scout;
并且已通过有界 `256 / 256` shared-position order-control、seed-stability scout
和 cross-cache transfer review;
但它仍不晋升、
不释放产品消费、不扩展同 cache 特征工程,也不默认补跑完整 `512 / 512`
shared-position。第一轮复查读取既有
Expand Down Expand Up @@ -33,11 +34,20 @@ logistic 仍为 `AUC = 0.967819`,`ASR = 0.923828`,
`AUC = 0.956192`,`ASR = 0.896484`,`TPR@1%FPR = 0.285156`,
`TPR@0.1%FPR = 0.109375`;label-shuffle 回到随机级 `AUC = 0.484070`。

进一步的 CPU-only existing-cache transfer review 在两份 shared-position cache
之间做 fold-disjoint 迁移:seed `176` -> seed `177` 为 `AUC = 0.948990`,
`ASR = 0.884766`,`TPR@1%FPR = 0.375000`,`TPR@0.1%FPR = 0.058594`;
seed `177` -> seed `176` 为 `AUC = 0.970520`,`ASR = 0.935547`,
`TPR@1%FPR = 0.390625`,`TPR@0.1%FPR = 0.074219`。主结果不用
same-sample all-train/all-test diagnostic;decision gate 为
`mean_auc = 0.959755`,`min_tpr_at_1pct_fpr = 0.375000`,
`min_tpr_at_0_1pct_fpr = 0.058594`。

该结果只能作为 Research-side 强候选;下一步不是同 cache sweep,也不是为了补表格跑
完整 `512 / 512` shared-position。重新打开只应基于正式机制晋升、第二公开资产或独立消费合约。
当前 slots 仍为:
`active_gpu_question = none`,`next_gpu_candidate = none`,
`CPU sidecar = none selected after H2 output-cloud order-control seed-stability scout`。
`CPU sidecar = none selected after H2 output-cloud cross-cache transfer review`。
See
[docs/evidence/h2-output-cloud-geometry-20260525.md](docs/evidence/h2-output-cloud-geometry-20260525.md)。

Expand Down
50 changes: 45 additions & 5 deletions docs/evidence/h2-output-cloud-geometry-20260525.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# H2 Output-Cloud Geometry Cache Review

> Date: 2026-05-25
> Status: candidate complementary signal / order-control scout passed / shared-position seed-stable / no admitted row / no 512/512 rerun selected
> Status: candidate complementary signal / order-control scout passed / shared-position seed-stable / cross-cache transfer strong / no admitted row / no 512/512 rerun selected

## Question

Expand All @@ -12,8 +12,11 @@ seed-to-output distance 的 membership 信号?
`workspaces/black-box/runs/h2-response-strength-512-20260501-r1/response-cache.npz`。
随后只释放一个有界 `256 / 256` shared-position order-control scout,用来回答
class-ordered seed-offset caveat;再释放一个同边界的 seed `177` 稳定性 scout,
用来判断 order-control 后的强信号是否只是单 seed 现象。没有下载资产,也没有扩展
同一路线的 KDE、shadow density、repeat-count 或特征 sweep。
用来判断 order-control 后的强信号是否只是单 seed 现象。最后只做一次
CPU-only existing-cache transfer review,用来检查 seed `176` 和 seed `177`
两份 shared-position response cache 之间的 output-cloud logistic 是否能迁移。
没有下载资产,也没有扩展同一路线的 KDE、shadow density、repeat-count 或特征
sweep。

## Contract

Expand Down Expand Up @@ -202,9 +205,43 @@ supports the narrower conclusion that output-cloud geometry is a stable H2
mechanism candidate after seed-offset control. It still does not create a
second public asset, a product contract, or an admitted row.

## Cross-Cache Transfer Review

脚本:
`scripts/review_h2_output_cloud_transfer.py`

主结果:
`workspaces/black-box/artifacts/h2-output-cloud-transfer-shared-position-256-20260525.json`

该 review 只读取现有两份 `256 / 256` shared-position response cache:

| Cache | Path |
| --- | --- |
| Seed `176` | `workspaces/black-box/runs/h2-response-strength-256-shared-position-20260525-r1/response-cache.npz` |
| Seed `177` | `workspaces/black-box/runs/h2-response-strength-256-shared-position-seed177-20260525-r1/response-cache.npz` |

Primary transfer 使用 repeated stratified folds:在 source cache 的 train fold
训练 logistic scorer,然后只在 target cache 中对应 held-out sample identities 上
打分。这样避免把两份 cache 中相同 sample identities 的 all-train/all-test
diagnostic 误当成主结论。same-sample 全量迁移只写入 JSON 的 diagnostic 区块。

| Direction | AUC | ASR | TPR@1%FPR | TPR@0.1%FPR |
| --- | ---: | ---: | ---: | ---: |
| seed `176` -> seed `177` | `0.948990` | `0.884766` | `0.375000` | `0.058594` |
| seed `177` -> seed `176` | `0.970520` | `0.935547` | `0.390625` | `0.074219` |

Decision gate:`mean_auc = 0.959755`,`min_tpr_at_1pct_fpr = 0.375000`,
`min_tpr_at_0_1pct_fpr = 0.058594`,verdict =
`cross_cache_transfer_strong_candidate`。

Interpretation: 这不是新资产,也不是产品消费合约,但它比单 cache review 更强:
output-cloud geometry 在两个独立生成的 shared-position response cache 之间保持可迁移,
并且主结果不依赖 same-sample 全量训练/测试诊断。该结论只支持 Research-side
机制候选更稳,不释放新的 GPU、大下载、Platform/Runtime schema 或 bundle row。

## Decision

`candidate complementary signal / order-control scout passed / seed-stable / no admitted row`。
`candidate complementary signal / order-control scout passed / seed-stable / cross-cache transfer strong / no admitted row`。

保留为 Research-side 强候选,因为它满足三个有价值条件:

Expand All @@ -213,14 +250,17 @@ second public asset, a product contract, or an admitted row.
- 它通过了 seed-177 稳定性和 label-shuffle sanity。
- 它在 `256 / 256` shared-position order-control scout 中没有因 seed-offset 控制而坍塌。
- 它在 shared-position seed `177` scout 中仍保持强 AUC 和非零严格尾部恢复。
- 它在 seed `176` / seed `177` 两份 shared-position response cache 之间通过了
fold-disjoint transfer review,主结果 `mean_auc = 0.959755`,严格尾部非零。

但当前不做以下事情:

- 不升级到 Platform/Runtime admitted bundle。
- 不新增产品 schema、Runtime runner、UI 类型或 bundle row。
- 不在同一 cache 上展开 KDE、shadow density、repeat-count、特征族或融合 sweep。
- 不把 same-sample all-train/all-test transfer diagnostic 当成 headline 结果。
- 不释放完整 `512 / 512` shared-position GPU rerun 或大下载;当前 `256 / 256`
order-control 已经回答了会改变路线的 caveat。
order-control 与 cross-cache transfer 已经回答了会改变路线的 caveat。

下一次重新评估不应是同 cache feature sweep 或为了表格好看的 `512 / 512` 补跑。
只有在需要正式晋升机制线、发现第二公开资产、或要建立独立消费合约时,才重新定义
Expand Down
2 changes: 1 addition & 1 deletion docs/evidence/reproduction-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Smoke tests and dry runs are engineering validation, not benchmark claims.
| Track | Status | Notes |
| --- | --- | --- |
| Black-box `recon` | `evidence-ready` | Strongest black-box method and admitted non-CLiD product row. Public data limits strict paper-aligned claims. The bounded public-100 step30 rerun plus unified artifact summary yields the promoted coherent packet: `AUC = 0.837`, `ASR = 0.74`, `TPR@1%FPR = 0.22`, `TPR@0.1%FPR = 0.11`. See [non-clid-black-box-reselection.md](non-clid-black-box-reselection.md), [recon-product-validation-contract.md](recon-product-validation-contract.md), [recon-product-validation-result.md](recon-product-validation-result.md), and [../product-bridge/recon-product-validation-handoff.md](../product-bridge/recon-product-validation-handoff.md). |
| Black-box `H2 output-cloud geometry` | `hold-candidate` | Strong Research-side output-output geometry signal on H2 response caches, but not an admitted Platform/Runtime row. Existing `512 / 512` cache review gives `AUC = 0.961529`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`; seed `177` is stable and label shuffle is random-level. The `256 / 256` shared-position order-control scout preserves the signal (`AUC = 0.967819`, `TPR@1%FPR = 0.410156`, `TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`), so class-ordered seed offset is not a sufficient explanation. The same controlled boundary at seed `177` remains strong (`AUC = 0.956192`, `TPR@1%FPR = 0.285156`, `TPR@0.1%FPR = 0.109375`) with random-level label shuffle (`AUC = 0.484070`), so the controlled signal is not single-seed. Do not promote, add schema/runner/UI/bundle rows, run same-cache feature sweeps, or schedule a full `512 / 512` rerun by default. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
| Black-box `H2 output-cloud geometry` | `hold-candidate` | Strong Research-side output-output geometry signal on H2 response caches, but not an admitted Platform/Runtime row. Existing `512 / 512` cache review gives `AUC = 0.961529`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`; seed `177` is stable and label shuffle is random-level. The `256 / 256` shared-position order-control scout preserves the signal (`AUC = 0.967819`, `TPR@1%FPR = 0.410156`, `TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`), so class-ordered seed offset is not a sufficient explanation. The same controlled boundary at seed `177` remains strong (`AUC = 0.956192`, `TPR@1%FPR = 0.285156`, `TPR@0.1%FPR = 0.109375`) with random-level label shuffle (`AUC = 0.484070`), so the controlled signal is not single-seed. A CPU-only fold-disjoint transfer review across the two shared-position caches is also strong: seed `176` -> `177` gives `AUC = 0.948990`, `TPR@1%FPR = 0.375000`, `TPR@0.1%FPR = 0.058594`, and seed `177` -> `176` gives `AUC = 0.970520`, `TPR@1%FPR = 0.390625`, `TPR@0.1%FPR = 0.074219`; same-sample all-train/all-test transfer is diagnostic only. Do not promote, add schema/runner/UI/bundle rows, run same-cache feature sweeps, or schedule a full `512 / 512` rerun by default. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
| Black-box `CLiD` | `hold-candidate` | Selected as a bounded black-box lane after H2 SD/CelebA text-to-image transfer was protocol-blocked. The official CPU `inter_output/*` replay is strong (`AUC = 0.961277`, `TPR@1%FPR = 0.675470`, `ASR = 0.891957`) and now has a machine-readable candidate-only card, but row identity remains blocked because the public score rows are numeric-only and the 2026-05-15 authenticated HF `mia_COCO.zip` `HEAD`/`Range` recheck still returned `403`. Earlier local prompt-conditioned packets were strong and repeat-stable, but prompt-neutral perturbation collapses the signal, swapped-prompt control is degraded, within-split prompt shuffle is weak and seed-sensitive, prompt-text-only review is moderate AUC but weak strict-tail, and control attribution shows auxiliary-feature instability under prompt controls. Current evidence supports a prompt-conditioned diagnostic claim only, not admitted general black-box evidence. No next CLiD GPU task is selected. See [../product-bridge/clid-candidate-evidence-card.md](../product-bridge/clid-candidate-evidence-card.md), [clid-official-inter-output-replay-20260515.md](clid-official-inter-output-replay-20260515.md), [clid-identity-manifest-gate-20260515.md](clid-identity-manifest-gate-20260515.md), [black-box-next-lane-selection.md](black-box-next-lane-selection.md), [clid-bridge-contract.md](clid-bridge-contract.md), [clid-score-schema-gate.md](clid-score-schema-gate.md), [clid-tiny-score-bridge.md](clid-tiny-score-bridge.md), [clid-100-score-packet.md](clid-100-score-packet.md), [clid-candidate-integrity-review.md](clid-candidate-integrity-review.md), [clid-repeat-stability.md](clid-repeat-stability.md), [clid-prompt-perturbation.md](clid-prompt-perturbation.md), [clid-prompt-conditioning-boundary.md](clid-prompt-conditioning-boundary.md), [clid-swapped-prompt-control.md](clid-swapped-prompt-control.md), [clid-within-split-shuffle-control.md](clid-within-split-shuffle-control.md), [clid-prompt-text-only-review.md](clid-prompt-text-only-review.md), and [clid-control-attribution.md](clid-control-attribution.md). |
| Black-box `variation` | `code-ready` | API-only support method; needs real query data for stronger claims. |
| Feature-packet consumer lane | `deferred-candidate` | 2026-05-25 consumer verdict keeps the gray-box feature-packet lane out of Platform/Runtime. Tracing the Roots remains positive Research evidence (`AUC = 0.815826`, `TPR@1%FPR = 0.134000`), but live narrow public-surface recheck found no second non-source-equivalent public feature-packet and no raw checkpoint/sample/regeneration assets. Do not add feature-packet schema, bundle export, validators, tests, Platform UI type, Runtime runner, GPU task, or download from this singleton. See [feature-packet-channel-consumer-verdict-20260525.md](feature-packet-channel-consumer-verdict-20260525.md) and [../product-bridge/feature-packet-lane.md](../product-bridge/feature-packet-lane.md). |
Expand Down
10 changes: 8 additions & 2 deletions docs/evidence/workspace-evidence-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ This index separates current track state from archived research history.
Latest Research update:
[h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md)
records a metric verdict on the H2 response-strength cache plus bounded
`256 / 256` shared-position order-control and seed-stability scouts.
`256 / 256` shared-position order-control, seed-stability, and cross-cache
transfer scouts.
The output-output geometry scorer is a strong Research-side candidate
(`AUC = 0.961529`, `TPR@1%FPR = 0.333984`,
`TPR@0.1%FPR = 0.117188`) and is stable under seed `177`
Expand All @@ -18,10 +19,15 @@ The output-output geometry scorer is a strong Research-side candidate
The same controlled boundary at seed `177` remains strong (`AUC = 0.956192`,
`TPR@1%FPR = 0.285156`, `TPR@0.1%FPR = 0.109375`) with random-level label
shuffle (`AUC = 0.484070`).
The CPU-only fold-disjoint transfer review across the two shared-position
caches is also strong: seed `176` -> seed `177` gives `AUC = 0.948990`,
`TPR@1%FPR = 0.375000`, `TPR@0.1%FPR = 0.058594`; seed `177` -> seed `176`
gives `AUC = 0.970520`, `TPR@1%FPR = 0.390625`,
`TPR@0.1%FPR = 0.074219`; `mean_auc = 0.959755`.
It is not admitted because this remains a Research-side H2 response-cache
geometry candidate, not a second public asset or Platform/Runtime contract.
Decision: `candidate complementary signal / order-control scout passed / seed-stable /
no admitted row / no download / no 512/512 rerun selected`.
cross-cache transfer strong / no admitted row / no download / no 512/512 rerun selected`.

Previous Research update:
[feature-packet-channel-consumer-verdict-20260525.md](feature-packet-channel-consumer-verdict-20260525.md)
Expand Down
Loading
Loading