Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ workspaces/**/artifacts/**
!workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-geometry-class-ordered-subset-256-label-shuffle-20260525.json
!workspaces/black-box/artifacts/h2-output-cloud-transfer-shared-position-256-20260525.json
!workspaces/black-box/artifacts/h2-img2img-output-cloud-portability-20260525.json
!workspaces/black-box/artifacts/beans-lora-member-denoising-loss-scout-20260513.json
!workspaces/black-box/artifacts/clid-image-identity-boundary-20260511.json
!workspaces/black-box/artifacts/midfreq-same-noise-residual-cache-audit-20260512.json
Expand Down
17 changes: 15 additions & 2 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,26 @@ same-sample all-train/all-test diagnostic;decision gate 为
`mean_auc = 0.959755`,`min_tpr_at_1pct_fpr = 0.375000`,
`min_tpr_at_0_1pct_fpr = 0.058594`。

同日完成的 img2img portability review 只读取现有 SD/CelebA img2img response
caches,不生成新响应、不下载模型、不释放 GPU。结果没有扩展 H2:admission
`25 / 25` cache 上 output-cloud logistic 只有 `AUC = 0.7888`,
`TPR@1%FPR = 0.0`,`TPR@0.1%FPR = 0.0`,且比 best simple-distance
低 `AUC -0.0880`;stability `10 / 10` cache 虽为 `AUC = 0.9600`,
但仍低于 simple-distance `AUC = 0.9900`。decision gate 为
`img2img_output_cloud_weak_or_unstable`,所以该 review 只把 output-cloud
geometry 限定为 H2 response-strength Research-side diagnostic,不打开
img2img Runtime runner、Platform row、strength/seed/repeat/feature sweep 或
input-distance fusion。

该结果只能作为 Research-side 强候选;下一步不是同 cache sweep,也不是为了补表格跑
完整 `512 / 512` shared-position。重新打开只应基于正式机制晋升、第二公开资产或独立消费合约。
当前 slots 仍为:
`active_gpu_question = none`,`next_gpu_candidate = none`,
`CPU sidecar = none selected after H2 output-cloud cross-cache transfer review`。
`CPU sidecar = none selected after H2 img2img output-cloud portability review`。
See
[docs/evidence/h2-output-cloud-geometry-20260525.md](docs/evidence/h2-output-cloud-geometry-20260525.md)。
[docs/evidence/h2-output-cloud-geometry-20260525.md](docs/evidence/h2-output-cloud-geometry-20260525.md)
and
[docs/evidence/h2-img2img-output-cloud-portability-20260525.md](docs/evidence/h2-img2img-output-cloud-portability-20260525.md)。

## 2026-05-25 Feature-Packet 通道消费者裁决

Expand Down
86 changes: 86 additions & 0 deletions docs/evidence/h2-img2img-output-cloud-portability-20260525.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# H2 Img2img Output-Cloud Portability Review

> Date: 2026-05-25
> Status: weak or unstable / not distinct from simple distance / no admitted row / no Runtime runner

## Question

H2 output-cloud geometry is strong on the DDPM/CIFAR10 response-strength
cache. This review asks a narrower portability question: does the same
output-output geometry carry useful signal on the existing SD/CelebA
image-to-image response caches, without using the known stronger
input-to-output simple distance?

This is a CPU-only existing-cache review. It does not generate new responses,
download models, or release GPU work.

## Contract

Script:
`scripts/review_h2_img2img_output_cloud_portability.py`

Output:
`workspaces/black-box/artifacts/h2-img2img-output-cloud-portability-20260525.json`

Inputs:

| Packet | Cache | Samples | Members | Nonmembers | Strength | Repeats |
| --- | --- | ---: | ---: | ---: | ---: | ---: |
| Admission | `workspaces/black-box/runs/h2-img2img-simple-distance-admission-20260501-r1/response-cache.npz` | `50` | `25` | `25` | `0.75` | `2` |
| Stability | `workspaces/black-box/runs/h2-img2img-simple-distance-stability-20260501-r1/response-cache.npz` | `20` | `10` | `10` | `0.75` | `2` |

Features use only output-output geometry:

- within-strength repeat-pair RMSE
- response-cloud PCA trace

The raw feature builder also considers duplicate mean/slope/std and PCA
top-share views, but this single-strength packet makes those columns duplicate
or constant. The review script prunes degenerate columns and records the
dropped feature names in the JSON artifact.

The review intentionally excludes input-to-output distance so it cannot
silently become the already-known img2img simple-distance scorer.

## Result

| Packet | Output-cloud logistic AUC | TPR@1%FPR | TPR@0.1%FPR | Best simple-distance AUC | Delta vs simple distance |
| --- | ---: | ---: | ---: | ---: | ---: |
| Admission `25 / 25` | `0.7888` | `0.0` | `0.0` | `0.8768` | `-0.0880` |
| Stability `10 / 10` | `0.9600` | `0.8` | `0.8` | `0.9900` | `-0.0300` |

Decision gate:

| Field | Value |
| --- | ---: |
| `min_auc` | `0.7888` |
| `min_tpr_at_0_1pct_fpr` | `0.0` |
| `max_auc_delta_vs_best_simple_distance` | `-0.0300` |
| `verdict` | `img2img_output_cloud_weak_or_unstable` |

The admission packet is the blocking result: output-cloud AUC stays below
`0.8`, strict-tail recovery is zero, and it is materially weaker than the
existing simple-distance comparator.

## Decision

`weak or unstable / not distinct from simple distance / no admitted row`.

This narrows, rather than expands, H2 output-cloud geometry:

- It remains a strong Research-side candidate on the DDPM/CIFAR10
response-strength cache.
- It does not port cleanly to the existing SD/CelebA img2img caches.
- It does not justify a Runtime runner, Platform schema, admitted bundle row,
image-to-image product claim, or same-contract sweep.

Do not expand this into strength, seed, repeat-count, feature-family,
input-distance fusion, or GPU response-generation matrices. Reopen only if a
second public asset, independent consumption contract, or formal mechanism
promotion changes the decision value.

## Platform and Runtime Impact

Expose only a watch-only boundary metadata row. The admitted
Platform/Runtime bundle remains the existing five rows: `recon`,
`PIA baseline`, `PIA defended`, `GSA`, and `DPDM W-1`.
2 changes: 1 addition & 1 deletion docs/evidence/reproduction-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Smoke tests and dry runs are engineering validation, not benchmark claims.
| Track | Status | Notes |
| --- | --- | --- |
| Black-box `recon` | `evidence-ready` | Strongest black-box method and admitted non-CLiD product row. Public data limits strict paper-aligned claims. The bounded public-100 step30 rerun plus unified artifact summary yields the promoted coherent packet: `AUC = 0.837`, `ASR = 0.74`, `TPR@1%FPR = 0.22`, `TPR@0.1%FPR = 0.11`. See [non-clid-black-box-reselection.md](non-clid-black-box-reselection.md), [recon-product-validation-contract.md](recon-product-validation-contract.md), [recon-product-validation-result.md](recon-product-validation-result.md), and [../product-bridge/recon-product-validation-handoff.md](../product-bridge/recon-product-validation-handoff.md). |
| Black-box `H2 output-cloud geometry` | `hold-candidate` | Strong Research-side output-output geometry signal on H2 response caches, but not an admitted Platform/Runtime row. Existing `512 / 512` cache review gives `AUC = 0.961529`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`; seed `177` is stable and label shuffle is random-level. The `256 / 256` shared-position order-control scout preserves the signal (`AUC = 0.967819`, `TPR@1%FPR = 0.410156`, `TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`), so class-ordered seed offset is not a sufficient explanation. The same controlled boundary at seed `177` remains strong (`AUC = 0.956192`, `TPR@1%FPR = 0.285156`, `TPR@0.1%FPR = 0.109375`) with random-level label shuffle (`AUC = 0.484070`), so the controlled signal is not single-seed. A CPU-only fold-disjoint transfer review across the two shared-position caches is also strong: seed `176` -> `177` gives `AUC = 0.948990`, `TPR@1%FPR = 0.375000`, `TPR@0.1%FPR = 0.058594`, and seed `177` -> `176` gives `AUC = 0.970520`, `TPR@1%FPR = 0.390625`, `TPR@0.1%FPR = 0.074219`; same-sample all-train/all-test transfer is diagnostic only. Do not promote, add schema/runner/UI/bundle rows, run same-cache feature sweeps, or schedule a full `512 / 512` rerun by default. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
| Black-box `H2 output-cloud geometry` | `hold-candidate` | Strong Research-side output-output geometry signal on H2 response caches, but not an admitted Platform/Runtime row. Existing `512 / 512` cache review gives `AUC = 0.961529`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`; seed `177` is stable and label shuffle is random-level. The `256 / 256` shared-position order-control scout preserves the signal (`AUC = 0.967819`, `TPR@1%FPR = 0.410156`, `TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`), so class-ordered seed offset is not a sufficient explanation. The same controlled boundary at seed `177` remains strong (`AUC = 0.956192`, `TPR@1%FPR = 0.285156`, `TPR@0.1%FPR = 0.109375`) with random-level label shuffle (`AUC = 0.484070`), so the controlled signal is not single-seed. A CPU-only fold-disjoint transfer review across the two shared-position caches is also strong: seed `176` -> `177` gives `AUC = 0.948990`, `TPR@1%FPR = 0.375000`, `TPR@0.1%FPR = 0.058594`, and seed `177` -> `176` gives `AUC = 0.970520`, `TPR@1%FPR = 0.390625`, `TPR@0.1%FPR = 0.074219`; same-sample all-train/all-test transfer is diagnostic only. The SD/CelebA img2img portability check is weak or unstable on the admission cache (`AUC = 0.7888`, zero strict-tail recovery) and not distinct from simple distance (`AUC -0.0880` on admission, `-0.0300` on stability), so it narrows output-cloud geometry to a Research-side H2 response-strength diagnostic. Do not promote, add schema/runner/UI/bundle rows, run same-cache or img2img feature sweeps, or schedule a full `512 / 512` rerun by default. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md) and [h2-img2img-output-cloud-portability-20260525.md](h2-img2img-output-cloud-portability-20260525.md). |
| Black-box `CLiD` | `hold-candidate` | Selected as a bounded black-box lane after H2 SD/CelebA text-to-image transfer was protocol-blocked. The official CPU `inter_output/*` replay is strong (`AUC = 0.961277`, `TPR@1%FPR = 0.675470`, `ASR = 0.891957`) and now has a machine-readable candidate-only card, but row identity remains blocked because the public score rows are numeric-only and the 2026-05-15 authenticated HF `mia_COCO.zip` `HEAD`/`Range` recheck still returned `403`. Earlier local prompt-conditioned packets were strong and repeat-stable, but prompt-neutral perturbation collapses the signal, swapped-prompt control is degraded, within-split prompt shuffle is weak and seed-sensitive, prompt-text-only review is moderate AUC but weak strict-tail, and control attribution shows auxiliary-feature instability under prompt controls. Current evidence supports a prompt-conditioned diagnostic claim only, not admitted general black-box evidence. No next CLiD GPU task is selected. See [../product-bridge/clid-candidate-evidence-card.md](../product-bridge/clid-candidate-evidence-card.md), [clid-official-inter-output-replay-20260515.md](clid-official-inter-output-replay-20260515.md), [clid-identity-manifest-gate-20260515.md](clid-identity-manifest-gate-20260515.md), [black-box-next-lane-selection.md](black-box-next-lane-selection.md), [clid-bridge-contract.md](clid-bridge-contract.md), [clid-score-schema-gate.md](clid-score-schema-gate.md), [clid-tiny-score-bridge.md](clid-tiny-score-bridge.md), [clid-100-score-packet.md](clid-100-score-packet.md), [clid-candidate-integrity-review.md](clid-candidate-integrity-review.md), [clid-repeat-stability.md](clid-repeat-stability.md), [clid-prompt-perturbation.md](clid-prompt-perturbation.md), [clid-prompt-conditioning-boundary.md](clid-prompt-conditioning-boundary.md), [clid-swapped-prompt-control.md](clid-swapped-prompt-control.md), [clid-within-split-shuffle-control.md](clid-within-split-shuffle-control.md), [clid-prompt-text-only-review.md](clid-prompt-text-only-review.md), and [clid-control-attribution.md](clid-control-attribution.md). |
| Black-box `variation` | `code-ready` | API-only support method; needs real query data for stronger claims. |
| Feature-packet consumer lane | `deferred-candidate` | 2026-05-25 consumer verdict keeps the gray-box feature-packet lane out of Platform/Runtime. Tracing the Roots remains positive Research evidence (`AUC = 0.815826`, `TPR@1%FPR = 0.134000`), but live narrow public-surface recheck found no second non-source-equivalent public feature-packet and no raw checkpoint/sample/regeneration assets. Do not add feature-packet schema, bundle export, validators, tests, Platform UI type, Runtime runner, GPU task, or download from this singleton. See [feature-packet-channel-consumer-verdict-20260525.md](feature-packet-channel-consumer-verdict-20260525.md) and [../product-bridge/feature-packet-lane.md](../product-bridge/feature-packet-lane.md). |
Expand Down
10 changes: 10 additions & 0 deletions docs/evidence/workspace-evidence-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,16 @@ geometry candidate, not a second public asset or Platform/Runtime contract.
Decision: `candidate complementary signal / order-control scout passed / seed-stable /
cross-cache transfer strong / no admitted row / no download / no 512/512 rerun selected`.

Latest portability check:
[h2-img2img-output-cloud-portability-20260525.md](h2-img2img-output-cloud-portability-20260525.md)
records a CPU-only existing-cache review on the SD/CelebA img2img packets.
The admission `25 / 25` cache is weak or unstable for output-cloud geometry
(`AUC = 0.7888`, `TPR@1%FPR = 0.0`, `TPR@0.1%FPR = 0.0`) and underperforms the
existing simple-distance comparator (`AUC -0.0880`). The stability `10 / 10`
cache is positive (`AUC = 0.9600`) but still not distinct from simple distance
(`AUC -0.0300`). Decision: `img2img output-cloud weak-or-unstable /
not distinct from simple distance / no Runtime runner / no Platform row`.

Previous Research update:
[feature-packet-channel-consumer-verdict-20260525.md](feature-packet-channel-consumer-verdict-20260525.md)
records a consumer-boundary verdict for the gray-box feature-packet lane.
Expand Down
Loading
Loading