DeliciousBuding · DeliciousBuding · May 25, 2026 · May 25, 2026 · May 25, 2026 · May 25, 2026
@@ -10,6 +10,7 @@ venv/
 build/
 dist/
 *.egg-info/
+papers/**/build/
 
 # Tool caches
 .pytest_cache/

@@ -1,6 +1,104 @@
 # DiffAudit Research Roadmap
 
-> Last updated: 2026-05-25
+> Last updated: 2026-05-26
+
+## 2026-05-26 可发表论文组合与主推稿件
+
+最新决策：Research 主线从零散实验推进切到“可发表科研成果”沉淀。已新建
+[`papers/diffaudit-evidence-paper/`](papers/diffaudit-evidence-paper/)，把当前证据
+组织成一个多论文组合和一个可编译主稿。主推方向是 security/privacy measurement
+论文，而不是单点 SOTA attack 论文：DiffAudit 的核心 thesis 是
+evidence-contracted / evidence-calibrated auditing，即扩散模型 membership
+审计必须把分数绑定到 target identity、split、score/response coverage、metric
+provenance、consumer boundary 和 surface delta，才能从“看起来强的 AUC”
+变成可复用、可消费、可发表的科学证据。
+
+已落地资产：
+
+- `paper_portfolio.md`：四个候选论文方向和研究团队分工。Direction A
+  evidence-contracted auditing 是当前主推；Direction B output-cloud geometry
+  是技术机制短文候选；Direction C artifact claim-support / negative
+  results 是可扩展测量论文候选，不写成 pooled reproducibility rate；Direction D consumer-boundary systems paper
+  是后续 artifact/demo 方向。
+- `versions/`：四个方向的 writing-ready brief，分别记录研究团队、abstract
+  draft、section spine、证据边界、最小下一步和拒绝事项。当前不 fork 四套
+  LaTeX，所有方向继续共享 `source_map.md`、`claim_register.md` 和
+  `evidence_bank.md`。Direction C 已补 `direction-c-corpus-protocol.md`，
+  冻结 metadata-only v0 corpus 和初始六关矩阵。
+- `multi_direction_paper_drafts.md` 和 `versions/drafts/`：已把四个方向扩成
+  manuscript-level Markdown drafts。Direction A 是主推完整论文；Direction B
+  是机制 short/workshop paper，不能声称跨模型/跨数据集 portability；Direction C
+  是 selected-corpus claim-support paper，当前已有 v1 corpus、fixed-search
+  batch 和 gate-summary 资产，若要独立投稿还需更大冻结 corpus 或第二标签审查；
+  Direction D 是 systems/artifact/demo paper，等待 deployment / external-use /
+  user-study / report-drift 证据。
+- `source_map.md`、`claim_register.md`、`evidence_bank.md`：把 admitted、
+  candidate、support-only、negative 和 prohibited claims 分开，避免把 H2、
+  Tracing Roots、ReDiffuse、CommonCanvas、MIDST 或 collaborator SD ReDiffuse
+  写成超出证据边界的结论。路径基准已统一为 Research repository root。
+  `source_map.md` 已补 replay/availability matrix，明确 recon/GSA 目前是
+  point-estimate/summary 依赖，PIA/PIA-dropout/DPDM 有 row-level score sidecar，
+  H2 是 frozen artifact/recorded-CI candidate。
+- `scripts/build_paper_assets.py`：从现有 JSON artifacts 和带 evidence-source
+  的 curated CSV 生成论文 CSV 和 PDF 图，不在脚本里散落手填指标。
+- `scripts/export_admitted_evidence_bundle.py` 和
+  `workspaces/implementation/artifacts/admitted-evidence-bundle.json`：白盒
+  `target_eval_size=2000` 已统一解释为 `1000 member + 1000 nonmember`，
+  strict-tail denominator 导出为 `1000`；DPDM W-1 在 docs 和 paper 中明确为
+  `runtime-smoke` defended comparator，不再写成 `runtime-mainline`。
+- `main.tex`、`refs.bib` 和 `paper.pdf`：IEEEtran 会议格式稿件已通过
+  `pdflatex + bibtex + pdflatex + pdflatex` 手动链路编译，当前为 `8` 页 draft。
+  2026-05-26 reviewer-hardening 后，abstract/introduction 已改为 RQ 和
+  claim-admission framing，Measurement Protocol 已补 evidence-packet /
+  gate-vector 形式化定义，Related Work 已改成 access-surface taxonomy 并加入
+  CDI 作为 dataset-level auditing 对照，H2 明确作为 contract stress test 而非隐藏主贡献。
+  后续 publishability pass 又把重复的防御性边界说明压缩掉，补成可复核的
+  claim-first gate adjudication 流程、admitted row 的 main/comparator/bridge
+  角色标注；`admitted-evidence-bundle.json` 也补了 `required_access` 和
+  `report_role`，避免白盒 comparator 被读成同权限主风险行。正文同步把已有
+  `metric_uncertainty.csv` 的 AUC interval 边界写清楚：
+  只有 PIA baseline、PIA-dropout 和 DPDM W-1 使用 row-score bootstrap 区间，
+  recon/GSA 仍是 point estimate，H2 interval 只作为 candidate-side control。
+  正文已包含 Related Work、Reporting/Reproducibility positioning、Audit
+  Surfaces、Evidence Contract、Measurement Protocol、candidate gate matrix、
+  Artifact Corpus、fixed-search batch summary、selected-corpus gate-summary
+  figure、Admitted Bundle、H2 case study、H2 admission-decision table、
+  Negative/Support Evidence、decision-value table、Threats to Validity 和
+  Discussion/Conclusion。
+  `BUILD.md` 记录了 `latexmk` 不可用时的手动编译链路。
+
+当前论文主线使用三类证据：
+
+- admitted evidence bundle：`recon`、`PIA baseline`、`PIA defended`、`GSA`、
+  `DPDM W-1` 五行，统一报告 AUC / ASR / TPR@1%FPR / TPR@0.1%FPR / cost /
+  boundary。
+- H2 output-cloud geometry：作为 Research-side 机制 case study，保留强候选和
+  controls，但不晋升为产品 row，也不声称跨 SD/CelebA img2img 泛化。
+- negative/support evidence：ReDiffuse STL-10、CommonCanvas、MIDST、Tracing
+  Roots、CopyMark、Stable Diffusion ReDiffuse collaborator artifact 等用于说明
+  为什么第二资产和消费者边界不能靠 headline metric 直接成立。
+
+下一步科研动作不是重型跑模型，而是补齐可投稿性：
+
+1. 继续把 Direction A 的 `8` 页稿件打磨成可投稿论文：压实 motivation、
+   artifact-corpus framing、method detail、reviewer-facing contribution wording
+   和 related-work 对照。
+2. Direction C 已有 `21` 行 v1 corpus 和 `17` 行 2026-05-26 fixed-search
+   metadata batch，并已完成 selected-corpus gate-label consistency pass 和
+   bounded agent second-pass review：consistency pass 未发现非法 gate 值或会改变路线的
+   contradiction；bounded second-pass 未发现 fixed-search admitted-like row 或
+   metadata-batch evidence/metric promotion；已额外收紧 `v1-10.delta_gate=Fail` 和
+   `fs20260526-arxiv-03.target_gate=Fail`。写法必须是 claim-support
+   measurement，不能写成“这些论文失败了”；若要独立成稿，再补更大固定 corpus
+   或外部独立标签审查。
+3. Direction B 只在出现第二个独立 response asset 或明确批准 bounded
+   response-contract experiment 时继续；否则保持为主稿 case study。
+4. 所有论文 claim 必须继续遵守 `claim_register.md`，不得把 finite empirical
+   low-FPR tail 写成连续校准，不能把 candidate 写成 admitted。
+
+当前 slots 仍不释放重型实验：
+`active_gpu_question = none`，`next_gpu_candidate = none`，
+`CPU sidecar = paper asset generation / metadata-only corpus expansion only`。
 
 ## 2026-05-25 H2 output-cloud geometry 候选信号
 

@@ -11,11 +11,11 @@
 
 | Track | Method | Attack | Defense | AUC | ASR | TPR@1%FPR | TPR@0.1%FPR | Evidence Level | Quality / Cost | Evidence Location | Limitations |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
-| Black-box | Recon risk proof | `recon DDIM public-100 step30` | `none` | 0.837 | 0.74 | 0.22 | 0.11 | `runtime-mainline` | `100 public samples per split; DDIM step30; runtime mainline plus unified artifact threshold replay; cuda runtime` | `Research/workspaces/implementation/artifacts/unified-attack-defense-table.json` (black-box row), [recon-product-validation-result.md](recon-product-validation-result.md) | Evidence holds only under `fine-tuned / controlled / public-subset / proxy-shadow-member` semantics. Demonstrates membership leakage risk under minimal permissions. Current black-box main evidence uses `upstream_threshold_reimplementation` for coherent four-metric reporting; `TPR@0.1%FPR` is a zero-false-positive empirical tail on 100 target nonmembers. |
+| Black-box | Recon risk evidence | `recon DDIM public-100 step30` | `none` | 0.837 | 0.74 | 0.22 | 0.11 | `runtime-mainline` | `100 public samples per split; DDIM step30; runtime mainline plus unified artifact threshold replay; cuda runtime` | `Research/workspaces/implementation/artifacts/unified-attack-defense-table.json` (black-box row), [recon-product-validation-result.md](recon-product-validation-result.md) | Evidence holds only under `fine-tuned / controlled / public-subset / proxy-shadow-member` semantics. Demonstrates membership leakage risk under minimal permissions. Current black-box main evidence uses `upstream_threshold_reimplementation` for coherent four-metric reporting; `TPR@0.1%FPR` is a zero-false-positive empirical tail on 100 target nonmembers. |
 | Gray-box | PIA baseline | `PIA GPU512 baseline` | `none` | 0.841339 | 0.786133 | 0.058594 | 0.011719 | `runtime-mainline` | `attack_num=30; interval=10; batch_size=8; 512 samples per split; single GPU serial; adaptive repeats=3; wall-clock=212.993833s` | `Research/workspaces/implementation/artifacts/unified-attack-defense-table.json` (gray-box baseline row) | Workspace-verified local DDPM/CIFAR10 baseline with bounded repeated-query adaptive review (`adaptive repeats=3`). Read as `epsilon-trajectory consistency` baseline exposure. `TPR@0.1%FPR` is a finite empirical strict-tail point over 512 target nonmembers, not calibrated sub-percent FPR. Still blocked by checkpoint/source provenance from paper-aligned release. |
 | Gray-box | PIA defended | `PIA GPU512 baseline` | `stochastic-dropout all-steps prototype` | 0.828075 | 0.767578 | 0.052734 | 0.009766 | `runtime-mainline` | `attack_num=30; interval=10; batch_size=8; 512 samples per split; single GPU serial; adaptive repeats=3; wall-clock=223.128438s` | `Research/workspaces/implementation/artifacts/unified-attack-defense-table.json` (gray-box defended row) | Workspace-verified local DDPM/CIFAR10 defended comparator with bounded repeated-query adaptive review (`adaptive repeats=3`). Shows inference-time randomization weakening `epsilon-trajectory consistency`, but remains provisional. `TPR@0.1%FPR` is a finite empirical strict-tail point over 512 target nonmembers, not calibrated sub-percent FPR. Blocked by checkpoint/source provenance. Not validated privacy protection. |
-| White-box | GSA attack | `GSA 1k-3shadow` | `none` | 0.998192 | 0.9895 | 0.987 | 0.432 | `runtime-mainline` | `target_eval_size=2000; shadow_train_size=4200; 3 shadows; cuda` | `Research/workspaces/implementation/artifacts/unified-attack-defense-table.json` (white-box attack row) | Admitted white-box attack line. Treat as risk upper bound, not final paper-level benchmark. |
-| White-box | DPDM defended | `GSA 1k-3shadow` | `DPDM strong-v3 full-scale` | 0.488783 | 0.4985 | 0.009 | 0.0 | `runtime-mainline` | `target_eval_size=2000; shadow_train_size=6000; classifier=logistic-regression-1d` | `Research/workspaces/implementation/artifacts/unified-attack-defense-table.json` (white-box defended row) | Admitted white-box defense comparator. Bridge frozen; not a finished benchmark. Comparison informs governance decisions. |
+| White-box | GSA attack | `GSA 1k-3shadow` | `none` | 0.998192 | 0.9895 | 0.987 | 0.432 | `runtime-mainline` | `target_eval_size=2000 (1000 member + 1000 nonmember); shadow_train_size=4200; 3 shadows; cuda` | `Research/workspaces/implementation/artifacts/unified-attack-defense-table.json` (white-box attack row) | Admitted white-box attack line. Treat as risk upper bound, not final paper-level benchmark. |
+| White-box | DPDM defended | `GSA 1k-3shadow` | `DPDM strong-v3 full-scale` | 0.488783 | 0.4985 | 0.009 | 0.0 | `runtime-smoke` | `target_eval_size=2000 (1000 member + 1000 nonmember); shadow_train_size=6000; classifier=logistic-regression-1d` | `Research/workspaces/implementation/artifacts/unified-attack-defense-table.json` (white-box defended row) | Admitted white-box defense comparator. Runtime-smoke bridge frozen; not a finished benchmark. Comparison informs governance decisions. |
 
 Each row records only the admitted primary value and can be cited directly.
 Gray-box PIA results must be reported with all four metrics (`AUC / ASR / TPR@1%FPR / TPR@0.1%FPR`).

@@ -56,7 +56,10 @@ Per row, the bundle carries:
 
 `low_fpr_interpretation.nonmember_denominator` is the finite packet denominator
 used to interpret the strict-tail readout. It does not convert the value into a
-calibrated continuous FPR estimate. For admitted rows, the bundle also exposes:
+calibrated continuous FPR estimate. When a white-box source row records
+`target_eval_size`, that value is total target evaluation size
+(`member + nonmember`), so the exported denominator is the nonmember half. For
+admitted rows, the bundle also exposes:
 
 | Field | Meaning |
 | --- | --- |

@@ -0,0 +1,55 @@
+# Build Notes
+
+## Generate Figures
+
+```powershell
+python -X utf8 .\scripts\build_paper_assets.py
+```
+
+This rebuilds:
+
+- `data/admitted_rows.csv`
+- `data/h2_output_cloud_rows.csv`
+- `data/negative_support_rows.csv`
+- `data/metric_uncertainty.csv`
+- `data/artifact_gate_summary.csv`
+- `data/artifact_strata_summary.csv`
+- `figures/admitted_rows_metrics.pdf`
+- `figures/h2_output_cloud_controls.pdf`
+- `figures/evidence_contract_pipeline.pdf`
+- `figures/artifact_gate_summary.pdf`
+
+## Compile Paper
+
+If `latexmk` is unavailable, use the manual chain:
+
+```powershell
+pdflatex -interaction=nonstopmode -halt-on-error -output-directory=build main.tex
+bibtex build/main
+pdflatex -interaction=nonstopmode -halt-on-error -output-directory=build main.tex
+pdflatex -interaction=nonstopmode -halt-on-error -output-directory=build main.tex
+Copy-Item -LiteralPath .\build\main.pdf -Destination .\paper.pdf -Force
+```
+
+The current verified compile path produced an 8-page `paper.pdf` on 2026-05-26.
+
+## Pre-Submission Sanity Checks
+
+Run these checks before claiming the PDF is submission-ready:
+
+```powershell
+pdfinfo .\paper.pdf | Select-String 'Pages|Page size'
+Select-String -Path .\build\main.log -Pattern 'Undefined|Citation.*undefined|Reference.*undefined|Overfull|LaTeX Warning'
+pdftotext .\paper.pdf - | Select-String -Pattern 'D:\\|C:\\|Users\\|Documents\\|secret|token|product-ready|disprove|field-wide prevalence' -Context 0,1
+```
+
+From the repository root, also run:
+
+```powershell
+python -X utf8 scripts\run_pr_checks.py
+git diff --check
+```
+
+The PDF text scan is intentionally conservative. Hits are not automatically
+failures, but each hit must be either a harmless boundary/caveat phrase or
+removed before public release.
@@ -0,0 +1,69 @@
+# DiffAudit Evidence Paper Workspace
+
+This directory turns the current DiffAudit Research evidence into paper-grade
+artifacts. It intentionally separates paper tracks from raw experiment
+workspaces.
+
+## Layout
+
+| Path | Purpose |
+| --- | --- |
+| `paper_portfolio.md` | Candidate paper directions, teams, venues, risks, and immediate choice. |
+| `multi_direction_paper_drafts.md` | Manuscript-level comparison of four paper versions and their research teams. |
+| `versions/` | Direction-specific paper briefs with team assignment, abstract, outline, evidence boundary, and go/no-go criteria. |
+| `versions/drafts/` | Full Markdown paper-version drafts for Directions A-D; only Direction A currently has LaTeX. |
+| `versions/direction-c-corpus-protocol.md` | Frozen v0 metadata-only corpus protocol for the artifact claim-support paper direction. |
+| `versions/direction-c-corpus-v1.md` | Structured Direction C corpus expansion from existing metadata-gate evidence notes. |
+| `versions/direction-c-fixed-search-batch-20260526.md` | Independent fixed-search metadata batch for Direction C; no downloads, no clone, no model/data execution. |
+| `versions/direction-c-second-pass-label-review-20260526.md` | Bounded Direction C agent second-pass label review; label hygiene only, not independent human reliability. |
+| `source_map.md` | Authoritative evidence sources and forbidden moves. |
+| `claim_register.md` | Allowed, support-only, and prohibited claims. |
+| `evidence_bank.md` | Human-readable metric ledger for manuscript drafting. |
+| `research_team_pitches.md` | A-D paper-team pitches with venue fit, evidence needs, and risks. |
+| `BUILD.md` | Figure generation and LaTeX compile commands. |
+| `scripts/build_paper_assets.py` | Rebuilds paper CSV/PDF figure assets from repository JSON artifacts. |
+| `data/` | Generated CSV tables used by figures and LaTeX. |
+| `data/metric_uncertainty.csv` | Generated AUC uncertainty sidecar for rows with direct score arrays or recorded H2 aggregate CIs; not all rows have intervals. |
+| `data/artifact_corpus_v1.csv` | Direction C metadata-only corpus table; not a generated metric table. |
+| `data/artifact_corpus_fixed_search_20260526.csv` | Direction C fixed-search corpus batch; curated metadata, not a generated metric table. |
+| `data/artifact_second_pass_label_review_20260526.csv` | Direction C second-pass disagreements and adjudications; not a generated metric table. |
+| `data/artifact_gate_summary.csv` | Generated selected-corpus gate counts for Direction C claim-control framing; not prevalence evidence. |
+| `data/artifact_strata_summary.csv` | Generated selected-corpus stratum and inclusion-decision counts. |
+| `figures/` | Generated PDF figures. |
+| `figures/artifact_gate_summary.pdf` | Generated gate-count figure used in `main.tex`; selected-corpus only, not field-wide prevalence. |
+| `main.tex` | Primary Direction A manuscript draft. |
+| `refs.bib` | Initial bibliography for the manuscript draft. |
+| `paper.pdf` | Current compiled PDF snapshot when built locally. |
+
+## Current Primary Track
+
+Direction A, "Evidence-Contracted Auditing", is the first manuscript because it
+can use current positive, candidate, and negative evidence without pretending
+that the project has already solved broad cross-asset generalization.
+The competing paper versions are kept under `versions/` so later work can
+advance one direction without duplicating the evidence register or claim rules.
+The current paper-version drafts are:
+
+- Direction A: evidence-contracted security/privacy measurement paper.
+- Direction B: output-cloud geometry short/workshop paper; full-paper promotion
+  requires a second independent response asset.
+- Direction C: selected-corpus claim-support measurement paper; it now has a
+  21-row v1 corpus, a fixed-search metadata batch, gate-summary outputs, and
+  label-hygiene review. The consistency pass found no invalid labels or
+  route-changing contradictions; the bounded agent second-pass review found no
+  admitted-like fixed-search row or metadata-batch evidence/metric promotion and
+  adopted two gate tightenings. Standalone aggregate claims still require a
+  larger frozen corpus or an external independent label review.
+- Direction D: audit systems/artifact paper; it remains downstream material
+  until deployment, external-use, user-study, or report-drift evidence exists.
+
+## Academic Discipline
+
+- Use only recorded metrics and cited sources.
+- Keep candidate and admitted claims separate.
+- Treat finite-tail low-FPR numbers as empirical packet readouts.
+- Do not turn bounded negative scouts into universal negative claims.
+- Treat the v1 artifact corpus as a structured starter corpus, not a complete
+  survey of all diffusion or generative privacy papers.
+- Treat the fixed-search batch as selection-process evidence, not prevalence
+  evidence over the field.