You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/evidence/reproduction-status.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,7 +74,7 @@ Smoke tests and dry runs are engineering validation, not benchmark claims.
74
74
| FERMI multi-relational tabular MIA |`hold-paper-source-only`| arXiv `2605.11527` reports strong multi-relational TabDDPM/TabDiff/TabSyn membership metrics, but the public surface has no code tree, target/split manifests, generated synthetic tables, feature/score rows, ROC arrays, metric JSON, or replay command. It does not reopen MIDST/tabular execution and releases no tabular dataset download, model training, or GPU work. See [fermi-tabular-artifact-gate-20260515.md](fermi-tabular-artifact-gate-20260515.md). |
75
75
| True known-split mechanisms | `hold-weak` | MNIST/DDPM raw-loss and x0 scouts are weak. Tiny overfit final-layer gradient norm was positive only on the extreme `8 / 64` target, weakened at `16 / 64`, and a more optimistic `64 / 64` oracle gradient-prototype alignment follow-up is effectively random (`AUC = 0.500977`, `ASR = 0.562500`, zero low-FPR recovery). Fashion-MNIST DDPM now has three weak clean-split scouts: fixed-timestep PIA-style loss (`AUC = 0.535889`, `TPR@1%FPR = 0.03125`), SimA single-query score-norm (`AUC = 0.515137`, zero low-FPR recovery), and score-Jacobian sensitivity (`AUC = 0.511719`, zero low-FPR recovery). The Beans member-LoRA denoising-loss scout repaired pseudo-membership semantics by creating an exact `SD1.5 + Beans-member LoRA` target, but the internal conditional denoising-loss score is weak (`AUC = 0.414400`, reverse `0.585600`, `TPR@1%FPR = 0.080000`) and parameter-delta sensitivity is also weak (`AUC = 0.512000`). Do not run more final-layer gradient norm/cosine variants, Fashion-MNIST timestep/seed/`p`-norm/perturbation/norm/packet-size sweeps, or Beans LoRA train-step/rank/resolution/prompt/timestep/layer matrices by default. See [fashion-mnist-ddpm-score-jacobian-sensitivity-20260514.md](fashion-mnist-ddpm-score-jacobian-sensitivity-20260514.md), [fashion-mnist-ddpm-sima-score-norm-20260514.md](fashion-mnist-ddpm-sima-score-norm-20260514.md), [beans-lora-delta-sensitivity-20260513.md](beans-lora-delta-sensitivity-20260513.md), [beans-lora-member-denoising-loss-scout-20260513.md](beans-lora-member-denoising-loss-scout-20260513.md), [fashion-mnist-ddpm-pia-loss-scout-20260513.md](fashion-mnist-ddpm-pia-loss-scout-20260513.md), [tiny-known-split-gradient-prototype-alignment-20260513.md](tiny-known-split-gradient-prototype-alignment-20260513.md), [gradient-norm-stability-gate-20260512.md](gradient-norm-stability-gate-20260512.md), and [tiny-overfit-gradient-norm-scout-20260512.md](tiny-overfit-gradient-norm-scout-20260512.md). |
76
76
| Black-box `H2 response-strength` | candidate-only | Positive-but-bounded DDPM/CIFAR10 candidate: frozen cutoff-0.50 lowpass follow-up passed, and raw H2 recovered strict-tail signal on the fresh packet. SD/CelebA text-to-image transfer is blocked by protocol mismatch. The frozen SD/CelebA image-to-image micro-packet is runnable, but H2 logistic does not beat the same-cache simple distance comparator, so H2 is not promoted beyond candidate-only. A separate simple-distance line now has bounded single-asset evidence: first 10/10 packet `AUC = 0.92`, non-overlapping 10/10 packet `AUC = 0.99` with 9/10 TP at 0 FP, and non-overlapping 25/25 admission packet `AUC = 0.8768`, `ASR = 0.84`, 11/25 TP at 0 FP. This is not a conditional-diffusion generalization or a `recon` product replacement. See [black-box-response-strength-preflight.md](black-box-response-strength-preflight.md), [h2-lowpass-followup-contract.md](h2-lowpass-followup-contract.md), [h2-cross-asset-contract-preflight.md](h2-cross-asset-contract-preflight.md), [h2-image-to-image-contract.md](h2-image-to-image-contract.md), [h2-img2img-micro-result.md](h2-img2img-micro-result.md), [h2-img2img-simple-distance-review.md](h2-img2img-simple-distance-review.md), [h2-img2img-simple-distance-stability-result.md](h2-img2img-simple-distance-stability-result.md), and [h2-img2img-simple-distance-admission-result.md](h2-img2img-simple-distance-admission-result.md). |
77
-
| Black-box `H2 output-cloud geometry`| candidate-only | CPU-only review on the existing H2 response cache found a strong output-output geometry signal that excludes seed-to-output distance (`AUC = 0.961529`, `ASR = 0.900391`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`). Seed `177` remains stable (`AUC = 0.961048`), and label-shuffle sanity returns random-level (`AUC = 0.507595`). This is not admitted because the source cache used class-ordered sample offsets, so a reseeded or interleaved order-control cache is required before promotion. Do not expand into KDE, shadow-density, repeat-count, same-cache feature sweeps, Platform schema, Runtime runner, or bundle rows. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
77
+
| Black-box `H2 output-cloud geometry` | candidate-only | Output-output geometry is now a strong H2 Research-side candidate after seed-offset control, not a Platform/Runtime row. The existing `512 / 512` cache review gives `AUC = 0.961529`, `ASR = 0.900391`, `TPR@1%FPR = 0.333984`, `TPR@0.1%FPR = 0.117188`; seed `177` remains stable (`AUC = 0.961048`), and label-shuffle sanity returns random-level (`AUC = 0.507595`). The bounded `256 / 256` shared-position order-control scout preserves the signal (`AUC = 0.967819`, `TPR@1%FPR = 0.410156`, `TPR@0.1%FPR = 0.132812`) with random-level label shuffle (`AUC = 0.464066`), and the same controlled boundary at seed `177` remains strong (`AUC = 0.956192`, `TPR@1%FPR = 0.285156`, `TPR@0.1%FPR = 0.109375`) with random-level label shuffle (`AUC = 0.484070`). Class-ordered seed offset is no longer a sufficient explanation, but this remains a single H2 response-cache geometry candidate, not a second public asset or consumer contract. Do not expand into KDE, shadow-density, repeat-count, same-cache feature sweeps, full `512 / 512` reruns, Platform schema, Runtime runner, or bundle rows. See [h2-output-cloud-geometry-20260525.md](h2-output-cloud-geometry-20260525.md). |
78
78
| Black-box mid-frequency same-noise residual | `candidate-only` | Distinct paper-backed observable gap: unlike H2/H3 response-cache frequency filters, this line requires `x_t`, `tilde_x_t`, timestep, noise provenance, and residual scores at the same noise level. The frozen `64/64` sign-check on the collaborator 750k checkpoint produced `AUC = 0.733398`, `ASR = 0.710938`, and finite `4/64` zero-FP recovery. The seed-only repeat retained signal with `AUC = 0.719238`, `ASR = 0.6875`, and finite `3/64` zero-FP recovery. A CPU comparator audit shows low-frequency and full-band residual comparators are at least as strong as the frozen mid-band score on AUC, so the line is candidate-stable-but-bounded but not a proven mid-frequency-specific mechanism. Same-contract GPU expansion is closed. See [midfreq-residual-comparator-audit-20260512.md](midfreq-residual-comparator-audit-20260512.md), [midfreq-residual-stability-result-20260512.md](midfreq-residual-stability-result-20260512.md), [midfreq-residual-stability-decision-20260512.md](midfreq-residual-stability-decision-20260512.md), [midfreq-residual-signcheck-20260512.md](midfreq-residual-signcheck-20260512.md), [midfreq-same-noise-residual-preflight-20260512.md](midfreq-same-noise-residual-preflight-20260512.md), [midfreq-residual-scorer-contract-20260512.md](midfreq-residual-scorer-contract-20260512.md), [midfreq-residual-collector-contract-20260512.md](midfreq-residual-collector-contract-20260512.md), [midfreq-residual-tiny-runner-contract-20260512.md](midfreq-residual-tiny-runner-contract-20260512.md), and [midfreq-residual-real-asset-preflight-20260512.md](midfreq-residual-real-asset-preflight-20260512.md). |
79
79
| Gray-box `PIA`|`evidence-ready`| Strongest admitted local DDPM/CIFAR10 gray-box line. PIA baseline exposes `epsilon-trajectory consistency`; stochastic dropout is a provisional defended comparator that weakens but does not eliminate the signal. The review is bounded to repeated-query adaptive checks with `adaptive repeats=3`; low-FPR values are finite empirical strict-tail points, not calibrated sub-percent FPR. Paper-aligned release provenance remains blocked. See [pia-stochastic-dropout-truth-hardening-review.md](pia-stochastic-dropout-truth-hardening-review.md). |
80
80
| Gray-box `ReDiffuse` | `hold-weak` | Candidate baseline-alignment line. The collaborator 750k bundle and checkpoint are runnable, a 64/64 direct-distance compatibility packet exists, and the existing PIA 800k checkpoint is runtime-probe compatible, but prior exact replay showed only modest AUC with weak strict-tail evidence and was not admitted. The collaborator Stable Diffusion ReDiffuse `5000`-row packet remains replayable (`AUC = 0.71031888`), but its member/nonmember labels are perfectly aligned with `LAION-5B member subset` versus `COCO2017-val non-member subset`, so it is a cross-source stress-test candidate rather than a same-distribution second asset. The official OpenReview supplement still does not release third-party target checkpoints, generated response/feature caches, score packets, ROC CSVs, or metric artifacts. A local ReDiffuse DDPM/STL-10 bounded scout now proves the split and official model path are executable and scoreable, but the short target fixed-timestep denoising-loss packet is random-level (`AUC = 0.4996337890625`, `ASR = 0.509765625`, `TPR@1%FPR = 0.01171875`, `TPR@0.1%FPR = 0.0`). Reusing the same checkpoint and `256 / 256` split for a genuinely different SimA-style denoiser-output score norm also remained random-level (`AUC = 0.5052947998046875`, `ASR = 0.525390625`, `TPR@1%FPR = 0.03125`, `TPR@0.1%FPR = 0.01953125`). Do not expand into step-count, seed, timestep, batch-size, subset-size, EMA, scheduler, denoising-loss matrices, score-norm matrices, checkpoint-step/fusion sweeps, full DDPM/DiT/Stable Diffusion targets, `800k`-step training, Tiny-ImageNet downloads, request `coco_data`, download Stable Diffusion weights, or rerun same-family attack scripts by default. See [rediffuse-stl10-sima-score-norm-20260525.md](rediffuse-stl10-sima-score-norm-20260525.md), [rediffuse-stl10-bounded-scout-20260525.md](rediffuse-stl10-bounded-scout-20260525.md), [rediffuse-stl10-split-and-microtrain-preflight-20260525.md](rediffuse-stl10-split-and-microtrain-preflight-20260525.md), [stable-diffusion-rediffuse-collaborator-artifact-20260517.md](stable-diffusion-rediffuse-collaborator-artifact-20260517.md), [rediffuse-openreview-split-manifest-audit-20260515.md](rediffuse-openreview-split-manifest-audit-20260515.md), [rediffuse-collaborator-integration-report.md](rediffuse-collaborator-integration-report.md), [rediffuse-800k-runtime-probe.md](rediffuse-800k-runtime-probe.md), [rediffuse-resnet-parity-packet.md](rediffuse-resnet-parity-packet.md), [rediffuse-direct-distance-boundary-review.md](rediffuse-direct-distance-boundary-review.md), [rediffuse-checkpoint-portability-gate.md](rediffuse-checkpoint-portability-gate.md), [rediffuse-resnet-contract-scout.md](rediffuse-resnet-contract-scout.md), [rediffuse-exact-replay-preflight.md](rediffuse-exact-replay-preflight.md), and [rediffuse-exact-replay-packet.md](rediffuse-exact-replay-packet.md). |
0 commit comments