|
| 1 | +# ReDiffuse STL-10 Bounded Scout |
| 2 | + |
| 3 | +> Date: 2026-05-25 |
| 4 | +> Status: bounded scout completed / score packet produced / weak denoising-loss signal / no GPU expansion |
| 5 | +
|
| 6 | +## Question |
| 7 | + |
| 8 | +After the STL-10 split and resource preflight, can a short ReDiffuse DDPM/STL-10 |
| 9 | +target produce a scoreable membership packet, and does fixed-timestep |
| 10 | +denoising-loss show any immediate membership signal? |
| 11 | + |
| 12 | +This is a bounded scout, not a paper-level ReDiffuse reproduction. It does not |
| 13 | +claim a trained STL-10 DDPM benchmark, does not use full-paper training length, |
| 14 | +and does not promote Platform/Runtime evidence. |
| 15 | + |
| 16 | +## Frozen Contract |
| 17 | + |
| 18 | +| Field | Value | |
| 19 | +| --- | --- | |
| 20 | +| Target family | Official ReDiffuse DDPM `UNet` + `GaussianDiffusionTrainer` | |
| 21 | +| Dataset | STL-10 unlabeled | |
| 22 | +| Split source | `STL10_train_ratio0.5.npz` | |
| 23 | +| Train member subset | `1024` samples from official member split | |
| 24 | +| Score packet | `256` trained members + `256` official nonmembers | |
| 25 | +| Training budget | `300` steps, batch `32` | |
| 26 | +| Hard guards | `900s` wall-clock or `7.4 GB` allocated CUDA memory | |
| 27 | +| Score definition | `score = -mean_fixed_timestep_denoising_mse` | |
| 28 | +| Score timesteps | `50`, `200`, `500`, `800` | |
| 29 | +| Seed | `20260525` | |
| 30 | +| CUDA env | `conda run -n diffaudit-research python` | |
| 31 | + |
| 32 | +Batch `32` was selected instead of `64` for the scout because live free VRAM |
| 33 | +before the run was only about `4.6 GB`, while the earlier batch-64 calibration |
| 34 | +had peaked at `4.419 GB`. This was a safety change, not a change in hypothesis. |
| 35 | + |
| 36 | +## Run Artifacts |
| 37 | + |
| 38 | +Artifacts are stored outside Git under: |
| 39 | +`<DOWNLOAD_ROOT>/shared/runs/rediffuse-stl10-bounded-scout-20260525/`. |
| 40 | + |
| 41 | +| Artifact | Size | SHA256 | |
| 42 | +| --- | ---: | --- | |
| 43 | +| `summary.json` | `2,243` bytes | `02cfe2af7346e7b380608ef748d164031ec89ba67d10822ef9d0badb8c3b209e` | |
| 44 | +| `scores.csv` | `28,732` bytes | `c0f396502114986c8c3549f626ce5083fcaaec2fcf8a319aabe509588d8abd0a` | |
| 45 | +| `checkpoint-step-final.pt` | `573,473,892` bytes | `006f5247ef2f91a331a097d16bdf1c153f94f3c2f112e1e9b8d3efdd5bb2ec5e` | |
| 46 | +| `run_rediffuse_stl10_bounded_scout.py` | `10,945` bytes | `4ee76c37594cb7834459b3059f8c0af58c1eabb5c34323275f3990c8c9933d1f` | |
| 47 | + |
| 48 | +The script was intentionally kept as a run artifact rather than promoted into a |
| 49 | +repo CLI. The run answered the decision question without needing a new reusable |
| 50 | +tool surface. |
| 51 | + |
| 52 | +## Result |
| 53 | + |
| 54 | +| Metric | Value | |
| 55 | +| --- | ---: | |
| 56 | +| Completed steps | `300` | |
| 57 | +| Stop reason | `step_budget` | |
| 58 | +| Elapsed time | `92.750s` | |
| 59 | +| Peak allocated VRAM | `2.430 GB` | |
| 60 | +| First training loss | `1.0019083023` | |
| 61 | +| Last training loss | `0.0491645448` | |
| 62 | +| Mean last-25 loss | `0.0447090799` | |
| 63 | +| Score packet size | `256` members + `256` nonmembers | |
| 64 | +| AUC | `0.4996337890625` | |
| 65 | +| ASR | `0.509765625` | |
| 66 | +| TPR@1%FPR | `0.01171875` | |
| 67 | +| TPR@0.1%FPR | `0.0` | |
| 68 | +| Nonmember denominator | `256` | |
| 69 | +| Minimum nonzero FPR | `0.00390625` | |
| 70 | + |
| 71 | +The target did train in the narrow engineering sense: loss fell quickly over |
| 72 | +the `300` steps, and a checkpoint plus row-level score packet were produced. |
| 73 | +The membership signal is effectively random under this fixed-timestep |
| 74 | +denoising-loss scorer. |
| 75 | + |
| 76 | +## Decision |
| 77 | + |
| 78 | +`bounded scout completed / score packet produced / weak denoising-loss signal / |
| 79 | +no GPU expansion`. |
| 80 | + |
| 81 | +This scout answers the only question released by the preflight: the ReDiffuse |
| 82 | +STL-10 path is executable and scoreable locally, but the short target plus |
| 83 | +fixed-timestep denoising-loss does not produce useful membership evidence. This |
| 84 | +is a negative-but-useful result, not a reason to launch full-paper training. |
| 85 | + |
| 86 | +Do not expand this into step-count, seed, timestep, batch-size, subset-size, |
| 87 | +EMA, scheduler, or denoising-loss matrices by default. Reopen ReDiffuse |
| 88 | +STL-10 only if one of these appears: |
| 89 | + |
| 90 | +- a public third-party STL-10 checkpoint or score packet for the official split; |
| 91 | +- a clearly different membership observable with a one-run falsifiable |
| 92 | + hypothesis; or |
| 93 | +- an explicit decision to spend a reviewed long-training budget for scientific |
| 94 | + portability, with checkpoint publication and score-packet contract defined in |
| 95 | + advance. |
| 96 | + |
| 97 | +## Platform and Runtime Impact |
| 98 | + |
| 99 | +None. The admitted Platform/Runtime bundle remains the existing five rows: |
| 100 | +`recon`, `PIA baseline`, `PIA defended`, `GSA`, and `DPDM W-1`. |
0 commit comments