Skip to content

Commit cfd2ea5

Browse files
docs: close rediffuse stl10 bounded scout (#294)
1 parent c4994c6 commit cfd2ea5

6 files changed

Lines changed: 169 additions & 28 deletions

File tree

AGENTS.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ Do not start from memory or old chat context. Re-anchor on repository files.
2828

2929
## Current Operating State
3030

31-
- Active work: `2026-05-25 ReDiffuse DDPM/STL-10 split/statistics/resource preflight is the latest roadmap operating-system update. The official STL-10 split is exact and public (50k / 50k, SHA256 14a06133f36c74e7d3cb97dbe74385fb42c22335a7cb955fd9944ca503baca52), binds to the local STL-10 unlabeled payload, and does not show obvious low-level image-statistics leakage (linear-probe holdout AUC = 0.4994776215625). The CUDA-capable surface is conda env diffaudit-research, not the default PATH Python. Official ReDiffuse DDPM UNet + GaussianDiffusionTrainer calibration succeeded at batch 4 / 20 steps and batch 64 / 10 steps, with batch 64 peak allocated VRAM 4.419 GB. This is not a membership metric, checkpoint, score packet, or admitted row. active_gpu_question = ReDiffuse DDPM/STL-10 bounded scout; next_gpu_candidate = one bounded STL-10 DDPM pipeline scout only; CPU sidecar = none; split/statistics/resource preflight complete.`
32-
- Next GPU candidate: one bounded ReDiffuse DDPM/STL-10 pipeline scout only
31+
- Active work: `2026-05-25 ReDiffuse DDPM/STL-10 bounded scout is the latest roadmap operating-system update. The official STL-10 split is exact and public, and the local pipeline produced a short-target checkpoint plus 256 / 256 score packet, but fixed-timestep denoising-loss is random-level: AUC = 0.4996337890625, ASR = 0.509765625, TPR@1%FPR = 0.01171875, TPR@0.1%FPR = 0.0. This is scoreable negative evidence, not a second asset, not a full-paper reproduction, and not an admitted row. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after ReDiffuse STL-10 bounded scout weak result.`
32+
- Next GPU candidate: none selected
3333
- Long-horizon control: follow `ROADMAP.md` section
3434
`Long-Horizon Research Task Board(2026-05-13 起)` before reopening any
3535
Research lane. The selected forward path is Lane A external asset acquisition
@@ -553,15 +553,16 @@ Do not start from memory or old chat context. Re-anchor on repository files.
553553
package it as a black-box/conditional response-contract candidate unless a
554554
separate reproducibility-maintenance task explicitly reopens admitted GSA
555555
provenance.
556-
- ReDiffuse is open only for one bounded DDPM/STL-10 scout after the 2026-05-25
557-
split/statistics/resource preflight. The official OpenReview supplement gives
558-
exact DDPM split manifests, the STL-10 `50k / 50k` split binds to local data,
559-
low-level statistics do not separate labels, and the official UNet/trainer path
560-
fits local CUDA at batch 64 calibration scale. This still has no third-party
561-
trained checkpoint, generated response/feature cache, score packet, ROC CSV,
562-
or metric artifact. Do not run full DDPM/DiT/Stable Diffusion training,
563-
`800k`-step jobs, Tiny-ImageNet downloads, Stable Diffusion downloads, or
564-
same-family attack-script sweeps by default.
556+
- ReDiffuse STL-10 is closed after the one bounded DDPM/STL-10 scout. The
557+
official OpenReview split binds cleanly to STL-10, and the local pipeline is
558+
executable, but the `300`-step short target with fixed-timestep denoising-loss
559+
produced random-level membership metrics (`AUC = 0.4996337890625`). This still
560+
has no third-party trained checkpoint, generated response/feature cache,
561+
strong score packet, ROC CSV, or admitted metric artifact. Do not expand into
562+
step-count, seed, timestep, batch-size, subset-size, EMA, scheduler,
563+
denoising-loss, full DDPM/DiT/Stable Diffusion training, `800k`-step jobs,
564+
Tiny-ImageNet downloads, Stable Diffusion downloads, or same-family
565+
attack-script sweeps by default.
565566
- `YuxinWenRick/diffusion_memorization` is closed as memorization semantic-shift
566567
watch. It has a real `500`-row `sdv1_500_memorized.jsonl` prompt manifest, but
567568
the ground-truth image package is `2.60G`, `CompVis/stable-diffusion-v1-4` is

ROADMAP.md

Lines changed: 32 additions & 4 deletions
Large diffs are not rendered by default.
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# ReDiffuse STL-10 Bounded Scout
2+
3+
> Date: 2026-05-25
4+
> Status: bounded scout completed / score packet produced / weak denoising-loss signal / no GPU expansion
5+
6+
## Question
7+
8+
After the STL-10 split and resource preflight, can a short ReDiffuse DDPM/STL-10
9+
target produce a scoreable membership packet, and does fixed-timestep
10+
denoising-loss show any immediate membership signal?
11+
12+
This is a bounded scout, not a paper-level ReDiffuse reproduction. It does not
13+
claim a trained STL-10 DDPM benchmark, does not use full-paper training length,
14+
and does not promote Platform/Runtime evidence.
15+
16+
## Frozen Contract
17+
18+
| Field | Value |
19+
| --- | --- |
20+
| Target family | Official ReDiffuse DDPM `UNet` + `GaussianDiffusionTrainer` |
21+
| Dataset | STL-10 unlabeled |
22+
| Split source | `STL10_train_ratio0.5.npz` |
23+
| Train member subset | `1024` samples from official member split |
24+
| Score packet | `256` trained members + `256` official nonmembers |
25+
| Training budget | `300` steps, batch `32` |
26+
| Hard guards | `900s` wall-clock or `7.4 GB` allocated CUDA memory |
27+
| Score definition | `score = -mean_fixed_timestep_denoising_mse` |
28+
| Score timesteps | `50`, `200`, `500`, `800` |
29+
| Seed | `20260525` |
30+
| CUDA env | `conda run -n diffaudit-research python` |
31+
32+
Batch `32` was selected instead of `64` for the scout because live free VRAM
33+
before the run was only about `4.6 GB`, while the earlier batch-64 calibration
34+
had peaked at `4.419 GB`. This was a safety change, not a change in hypothesis.
35+
36+
## Run Artifacts
37+
38+
Artifacts are stored outside Git under:
39+
`<DOWNLOAD_ROOT>/shared/runs/rediffuse-stl10-bounded-scout-20260525/`.
40+
41+
| Artifact | Size | SHA256 |
42+
| --- | ---: | --- |
43+
| `summary.json` | `2,243` bytes | `02cfe2af7346e7b380608ef748d164031ec89ba67d10822ef9d0badb8c3b209e` |
44+
| `scores.csv` | `28,732` bytes | `c0f396502114986c8c3549f626ce5083fcaaec2fcf8a319aabe509588d8abd0a` |
45+
| `checkpoint-step-final.pt` | `573,473,892` bytes | `006f5247ef2f91a331a097d16bdf1c153f94f3c2f112e1e9b8d3efdd5bb2ec5e` |
46+
| `run_rediffuse_stl10_bounded_scout.py` | `10,945` bytes | `4ee76c37594cb7834459b3059f8c0af58c1eabb5c34323275f3990c8c9933d1f` |
47+
48+
The script was intentionally kept as a run artifact rather than promoted into a
49+
repo CLI. The run answered the decision question without needing a new reusable
50+
tool surface.
51+
52+
## Result
53+
54+
| Metric | Value |
55+
| --- | ---: |
56+
| Completed steps | `300` |
57+
| Stop reason | `step_budget` |
58+
| Elapsed time | `92.750s` |
59+
| Peak allocated VRAM | `2.430 GB` |
60+
| First training loss | `1.0019083023` |
61+
| Last training loss | `0.0491645448` |
62+
| Mean last-25 loss | `0.0447090799` |
63+
| Score packet size | `256` members + `256` nonmembers |
64+
| AUC | `0.4996337890625` |
65+
| ASR | `0.509765625` |
66+
| TPR@1%FPR | `0.01171875` |
67+
| TPR@0.1%FPR | `0.0` |
68+
| Nonmember denominator | `256` |
69+
| Minimum nonzero FPR | `0.00390625` |
70+
71+
The target did train in the narrow engineering sense: loss fell quickly over
72+
the `300` steps, and a checkpoint plus row-level score packet were produced.
73+
The membership signal is effectively random under this fixed-timestep
74+
denoising-loss scorer.
75+
76+
## Decision
77+
78+
`bounded scout completed / score packet produced / weak denoising-loss signal /
79+
no GPU expansion`.
80+
81+
This scout answers the only question released by the preflight: the ReDiffuse
82+
STL-10 path is executable and scoreable locally, but the short target plus
83+
fixed-timestep denoising-loss does not produce useful membership evidence. This
84+
is a negative-but-useful result, not a reason to launch full-paper training.
85+
86+
Do not expand this into step-count, seed, timestep, batch-size, subset-size,
87+
EMA, scheduler, or denoising-loss matrices by default. Reopen ReDiffuse
88+
STL-10 only if one of these appears:
89+
90+
- a public third-party STL-10 checkpoint or score packet for the official split;
91+
- a clearly different membership observable with a one-run falsifiable
92+
hypothesis; or
93+
- an explicit decision to spend a reviewed long-training budget for scientific
94+
portability, with checkpoint publication and score-packet contract defined in
95+
advance.
96+
97+
## Platform and Runtime Impact
98+
99+
None. The admitted Platform/Runtime bundle remains the existing five rows:
100+
`recon`, `PIA baseline`, `PIA defended`, `GSA`, and `DPDM W-1`.

0 commit comments

Comments
 (0)