Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Do not start from memory or old chat context. Re-anchor on repository files.

## Current Operating State

- Active work: `2026-05-25 ReDiffuse DDPM/STL-10 bounded scout plus SimA-style score-norm one-pass scorer are the latest roadmap operating-system update. The official STL-10 split is exact and public, and the local pipeline produced a short-target checkpoint plus 256 / 256 score packet, but fixed-timestep denoising-loss is random-level: AUC = 0.4996337890625, ASR = 0.509765625, TPR@1%FPR = 0.01171875, TPR@0.1%FPR = 0.0. The same checkpoint and split also failed a different denoiser-output norm observable: AUC = 0.5052947998046875, ASR = 0.525390625, TPR@1%FPR = 0.03125, TPR@0.1%FPR = 0.01953125. This is scoreable negative evidence, not a second asset, not a full-paper reproduction, and not an admitted row. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after ReDiffuse STL-10 denoising-loss and score-norm weak results.`
- Active work: `2026-05-25 LeakyCLIP CLIP-inversion boundary gate is the latest Lane A metadata-only update. The official dongdongunique/LeakyCLIP repo is code-public and exposes CLIP inversion, embedding alignment, Stable Diffusion refinement, metrics, configs, and scripts, but the audited target is CLIP and diffusion is only an optional refinement stage. The checked public surface has no frozen target hashes, immutable member/nonmember manifests, generated reconstruction packet, per-row membership score file, ROC array, metric JSON, trained alignment weights, or no-training verifier. This is CLIP / multimodal privacy watch-plus, not a second diffusion asset, not a Platform/Runtime row, and not a GPU or download release. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after LeakyCLIP CLIP-inversion boundary gate. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line is very long and difficult to read. The use of backticks for a long paragraph of text is also unconventional. For better maintainability and readability, consider restructuring this information. Instead of a single long line within backticks, you could use a multi-line blockquote or break it down into a nested list. This would make the information much easier to parse and edit.

Suggested change
- Active work: `2026-05-25 LeakyCLIP CLIP-inversion boundary gate is the latest Lane A metadata-only update. The official dongdongunique/LeakyCLIP repo is code-public and exposes CLIP inversion, embedding alignment, Stable Diffusion refinement, metrics, configs, and scripts, but the audited target is CLIP and diffusion is only an optional refinement stage. The checked public surface has no frozen target hashes, immutable member/nonmember manifests, generated reconstruction packet, per-row membership score file, ROC array, metric JSON, trained alignment weights, or no-training verifier. This is CLIP / multimodal privacy watch-plus, not a second diffusion asset, not a Platform/Runtime row, and not a GPU or download release. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after LeakyCLIP CLIP-inversion boundary gate. ReDiffuse DDPM/STL-10 remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).`
- Active work:
> **2026-05-25 LeakyCLIP CLIP-inversion boundary gate:** This is the latest Lane A metadata-only update. The official `dongdongunique/LeakyCLIP` repo is code-public and exposes CLIP inversion, embedding alignment, Stable Diffusion refinement, metrics, configs, and scripts, but the audited target is CLIP and diffusion is only an optional refinement stage. The checked public surface has no frozen target hashes, immutable member/nonmember manifests, generated reconstruction packet, per-row membership score file, ROC array, metric JSON, trained alignment weights, or no-training verifier. This is CLIP / multimodal privacy watch-plus, not a second diffusion asset, not a Platform/Runtime row, and not a GPU or download release.
>
> **Slots:** `active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after LeakyCLIP CLIP-inversion boundary gate`.
>
> **ReDiffuse DDPM/STL-10:** Remains closed by default after the weak bounded scout (AUC = 0.4996337890625) and weak SimA-style score-norm scorer (AUC = 0.5052947998046875).

- Next GPU candidate: none selected
- Long-horizon control: follow `ROADMAP.md` section
`Long-Horizon Research Task Board(2026-05-13 起)` before reopening any
Expand Down
25 changes: 25 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,31 @@

> Last updated: 2026-05-25

## 2026-05-25 LeakyCLIP CLIP-inversion 边界门控

最新决策:`dongdongunique/LeakyCLIP` 是一个真实的官方代码公开面,但它不是当前
DiffAudit 的扩散模型 membership target。该工作审计对象是 CLIP,Stable Diffusion
只作为可选 refinement 阶段出现;因此它是 multimodal privacy / CLIP inversion
watch-plus,而不是第二扩散资产、不是准入行,也不释放模型/数据下载或 GPU。

已完成的 metadata-only 门控:arXiv `2508.00756v4`,GitHub 默认分支 `main`,
仓库 pushed `2026-02-27T08:12:16Z`,updated `2026-05-11T07:59:31Z`,`0`
releases,`0` tags。公开树包含 CLIP inversion、embedding alignment、Stable
Diffusion refinement、metrics、configs 和 scripts,但没有 frozen CLIP/SDXL/VAE
hashes、immutable member/nonmember manifests、generated reconstruction packet、
per-row membership score file、ROC array、metric JSON、trained alignment weights
或 no-training verifier。

因此当前 slots 仍为:
`active_gpu_question = none`,`next_gpu_candidate = none`,
`CPU sidecar = none selected after LeakyCLIP CLIP-inversion boundary gate`。

不要下载 CLIP、robust CLIP、SDXL、VAE、SSCD、LAION、Flickr、LFW、Furniture
或 generated reconstructions;不要运行 LeakyCLIP inversion/refinement/metrics。
只有未来出现 row-bound replay artifacts,且 DiffAudit 明确开通 CLIP / multimodal
privacy consumer boundary 时,才重新评估。See
[docs/evidence/leakyclip-clip-inversion-boundary-gate-20260525.md](docs/evidence/leakyclip-clip-inversion-boundary-gate-20260525.md)。

## 2026-05-25 ReDiffuse STL-10 bounded scout 与 score-norm 结果

最新决策:ReDiffuse DDPM/STL-10 的唯一有界训练 scout 和一个不同 observable 的
Expand Down
132 changes: 132 additions & 0 deletions docs/evidence/leakyclip-clip-inversion-boundary-gate-20260525.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# LeakyCLIP CLIP-Inversion Boundary Gate

> Date: 2026-05-25
> Status: CLIP inversion privacy watch-plus / official code-public / diffusion used only as refinement / no row-bound membership artifact / no download / no GPU release / no admitted row

## Question

Does arXiv `2508.00756` / `LeakyCLIP: Extracting Training Data from CLIP`
expose a public DiffAudit-ready diffusion membership target, split, score
packet, or verifier that should change the active Research slots?

This was selected as a narrow Lane A metadata gate because the recent arXiv
search surfaced a privacy paper whose abstract mentions diffusion models and
training-data membership leakage, and GitHub search found an official public
repository. The check used arXiv API metadata, GitHub repository metadata, the
recursive tree, releases/tags metadata, and the official README. It did not
clone the repository, download CLIP models, download Stable Diffusion or VAE
weights, download LAION/Flickr/LFW/Furniture datasets, generate images, train
embedding alignment, run inversion, or compute metrics.

## Public Surface

| Field | Value |
| --- | --- |
| Paper line | `LeakyCLIP: Extracting Training Data from CLIP` |
| arXiv | `https://arxiv.org/abs/2508.00756v4` |
| Published / updated | `2025-08-01T16:32:48Z` / `2026-05-21T07:46:00Z` |
| Official code | `https://github.com/dongdongunique/LeakyCLIP` |
| Repository state | default branch `main`, pushed `2026-02-27T08:12:16Z`, updated `2026-05-11T07:59:31Z`, `24` stars, no license in GitHub metadata, `0` releases, `0` tags |
| Repository topics | `clip-inversion`, `clip-memorization`, `clip-privacy-leakage`, `multimodal-model-privacy`, `training-data-extraction-from-clip` |
| Current scope | CLIP inversion / multimodal privacy; Stable Diffusion is an optional refinement component, not the audited target model |

The official repository is a real code surface. The checked tree includes
configuration files, inversion/refinement code, evaluation metrics, scripts,
and documentation images:

```text
README.md
config.py
data.py
ea_train.py
eval/metrics.py
inversion/inverter.py
main.py
refinement/sd_refiner.py
configs/baseline/{flickr,furniture_object,laion}.json
configs/method/{flickr,furniture_object,laion,single}.json
scripts/run_dataset_inversion_baseline.sh
scripts/run_ea_train.sh
scripts/run_inversion.sh
scripts/run_laion_inversion.sh
docs/example_output.png
docs/pipeline.png
```

The README describes a three-stage pipeline: adversarial fine-tuning of CLIP,
embedding alignment from text embeddings to pseudo-image embeddings, and
optional Stable Diffusion refinement for texture/detail recovery. It lists
required public CLIP, robust CLIP, Stable Diffusion XL, VAE, SSCD, and dataset
downloads. It also states that membership can be inferred from reconstruction
metrics.

No committed result packet was visible in the checked public tree: no frozen
CLIP checkpoint hashes, no immutable LAION/Flickr/LFW/Furniture member and
nonmember manifests, no generated reconstruction packet, no per-row membership
score file, no ROC arrays, no metric JSON, no trained embedding-alignment
weights, and no no-training verifier output.

## Claim Boundary

LeakyCLIP is scientifically relevant privacy evidence, but it does not audit a
diffusion model as the target. The target under attack is CLIP. Diffusion
appears as an image-refinement stage after CLIP inversion and as a general
background risk in the paper abstract, not as the membership target whose
training set is being inferred.

That makes it adjacent to DiffAudit's image-generation privacy map but outside
the current diffusion / latent-image per-sample membership consumer contract.
Treating it as a diffusion MIA asset would conflate a CLIP inversion threat
model with a diffusion-model membership row.

## Gate Result

| Gate | Result |
| --- | --- |
| Current image/latent-image fit | Partial. It is multimodal image privacy work, but the audited target family is CLIP rather than a diffusion or latent-diffusion generator. |
| Target identity | Fail for DiffAudit replay. The README lists CLIP/robust CLIP/SDXL/VAE model sources, but no paper-bound hashes or frozen target bundle is committed. |
| Exact member split | Fail. No immutable member row IDs, image filenames, captions, URLs, or split manifests are committed. |
| Exact nonmember split | Fail. No row-bound holdout/nonmember manifest is committed. |
| Query/response or score coverage | Fail. The repository ships code and examples, not reconstruction packets, per-row membership scores, ROC arrays, metric JSON, or verifier output. |
| Mechanism delta | Pass as watch-plus only. CLIP inversion with reconstruction metrics is a different privacy surface, but not a diffusion membership mechanism for the current admitted boundary. |
| Download justification | Fail. Running it would require CLIP/SDXL/VAE/SSCD/model and dataset downloads without a released row-bound replay packet. |
| GPU release | Fail. The blocker is target-family boundary plus missing replay artifacts, not local compute. |

## Decision

`CLIP inversion privacy watch-plus / official code-public / diffusion used only
as refinement / no row-bound membership artifact / no download / no GPU release
/ no admitted row`.

Keep LeakyCLIP as Research-only adjacent privacy evidence. It is useful for
framing multimodal model privacy and CLIP training-data extraction risk, but it
does not reopen the current diffusion asset path and does not justify model,
dataset, or GPU work.

Current slots become `active_gpu_question = none`, `next_gpu_candidate = none`,
and `CPU sidecar = none selected after LeakyCLIP CLIP-inversion boundary gate`.

Smallest valid reopen condition:

- authors publish compact row-bound reconstruction and membership score
packets with immutable member/nonmember manifests, model hashes, ROC arrays,
metric JSON, and a no-training verifier; and
- DiffAudit explicitly opens a CLIP / multimodal-model privacy consumer
boundary separate from diffusion / latent-image admitted rows.

Stop condition:

- Do not download CLIP, robust CLIP, SDXL, VAE, SSCD, LAION, Flickr, LFW,
Furniture, generated reconstructions, or embedding-alignment weights from
this gate.
- Do not clone or run LeakyCLIP for execution from this gate.
- Do not run `main.py`, `ea_train.py`, `scripts/run_*`, inversion, refinement,
metric computation, or GPU work from this gate.
- Do not add Platform/Runtime rows, schemas, product copy, or recommendation
logic until a reviewed CLIP privacy consumer boundary or row-bound replay
artifacts exist.

## Platform and Runtime Impact

None. Platform and Runtime continue consuming only the admitted `recon / PIA
baseline / PIA defended / GSA / DPDM W-1` set.
1 change: 1 addition & 0 deletions docs/evidence/reproduction-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ Smoke tests and dry runs are engineering validation, not benchmark claims.
| GUARD surgical mitigation | `hold-semantic-shift` | arXiv `2603.00133` / `You Don't Need All That Attention: Surgical Memorization Mitigation in Text-to-Image Diffusion Models` is official code-public memorization-mitigation watch evidence. The `kairanzhao/GUARD` repo exposes `sdv1_500_mem` inference, detection, mask-generation, metric, and vendored `open_clip` code, but the release requires Google Drive benchmark assets and Stable Diffusion/reference model execution rather than shipping checkpoint-bound target identities, immutable row manifests, generated response packets, pre/post GUARD score rows, ROC arrays, metric JSON, retained-utility artifacts, or a no-training verifier. No arXiv source, GitHub archive, Google Drive payload, Stable Diffusion/reference model weight, generated image, mask, checkpoint, script execution, CPU/GPU sidecar, or Platform/Runtime row is selected. See [guard-surgical-mitigation-artifact-gate-20260523.md](guard-surgical-mitigation-artifact-gate-20260523.md). |
| BAF LoRA parameter-space mitigation | `hold-semantic-shift` | arXiv `2605.10439` / `Filtering Memorization from Parameter-Space in Diffusion Models` is a weight-only LoRA memorization-mitigation watch item. The paper proposes Base-Anchored Filtering, a post-hoc, training-free, data-free method that decomposes LoRA updates into spectral channels and suppresses weakly backbone-aligned channels as possible memorization carriers. The public surface is supplementary-code-claim-only: arXiv HTML says code is in supplementary material, but GitHub exact-title/arXiv-id/BAF searches found no official public repository, target LoRA/checkpoint bundle, training-image manifest, member/nonmember rows, generated response packet, per-row score file, ROC array, metric JSON, retained-utility artifact, or verifier. No arXiv source/supplement, LoRA weights, SD base weights, training images, mitigation implementation, CPU/GPU sidecar, or Platform/Runtime row is selected. See [baf-lora-parameter-space-mitigation-gate-20260523.md](baf-lora-parameter-space-mitigation-gate-20260523.md). |
| Broken Memories | `hold-semantic-shift` | arXiv `2605.22050` / `Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations` is fresh Stable Diffusion memorization detection/mitigation evidence with reported SD `1.4` `AUC > 0.999`, `0.0%` post-mitigation memorization rate, and about `0.01s` overhead. It is not a current per-sample membership row: arXiv metadata and GitHub searches expose no official code, exact prompt/image manifest, generated image packet, internal trace, per-row score file, ROC array, metric JSON, mitigation-decision artifact, or verifier. No arXiv source tarball, Stable Diffusion weights, LAION/Webster assets, implementation-from-paper, CPU/GPU sidecar, or Platform/Runtime row is selected. See [broken-memories-artifact-gate-20260523.md](broken-memories-artifact-gate-20260523.md). |
| LeakyCLIP CLIP inversion | `hold-semantic-shift` | arXiv `2508.00756` / `LeakyCLIP: Extracting Training Data from CLIP` is official code-public CLIP inversion / multimodal privacy evidence. The repository exposes real inversion/refinement code, configs, metrics, scripts, and README instructions, but the audited target is CLIP; Stable Diffusion is only an optional refinement component. The checked public tree has no frozen CLIP/SDXL/VAE hashes, immutable member/nonmember manifests, generated reconstruction packet, per-row membership score file, ROC array, metric JSON, trained embedding-alignment weights, or no-training verifier. No CLIP/SDXL/VAE/SSCD/model/dataset download, inversion/refinement run, CPU/GPU sidecar, or Platform/Runtime row is selected. See [leakyclip-clip-inversion-boundary-gate-20260525.md](leakyclip-clip-inversion-boundary-gate-20260525.md). |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This table row contains a very long line of text, which harms readability and maintainability of the markdown source. Consider adding line breaks (<br>) to split the content into more manageable paragraphs. This will make the table cell easier to read and edit without affecting the rendered output.

Suggested change
| LeakyCLIP CLIP inversion | `hold-semantic-shift` | arXiv `2508.00756` / `LeakyCLIP: Extracting Training Data from CLIP` is official code-public CLIP inversion / multimodal privacy evidence. The repository exposes real inversion/refinement code, configs, metrics, scripts, and README instructions, but the audited target is CLIP; Stable Diffusion is only an optional refinement component. The checked public tree has no frozen CLIP/SDXL/VAE hashes, immutable member/nonmember manifests, generated reconstruction packet, per-row membership score file, ROC array, metric JSON, trained embedding-alignment weights, or no-training verifier. No CLIP/SDXL/VAE/SSCD/model/dataset download, inversion/refinement run, CPU/GPU sidecar, or Platform/Runtime row is selected. See [leakyclip-clip-inversion-boundary-gate-20260525.md](leakyclip-clip-inversion-boundary-gate-20260525.md). |
| LeakyCLIP CLIP inversion | `hold-semantic-shift` | arXiv `2508.00756` / `LeakyCLIP: Extracting Training Data from CLIP` is official code-public CLIP inversion / multimodal privacy evidence. The repository exposes real inversion/refinement code, configs, metrics, scripts, and README instructions, but the audited target is CLIP; Stable Diffusion is only an optional refinement component.<br><br>The checked public tree has no frozen CLIP/SDXL/VAE hashes, immutable member/nonmember manifests, generated reconstruction packet, per-row membership score file, ROC array, metric JSON, trained embedding-alignment weights, or no-training verifier.<br><br>No CLIP/SDXL/VAE/SSCD/model/dataset download, inversion/refinement run, CPU/GPU sidecar, or Platform/Runtime row is selected. See [leakyclip-clip-inversion-boundary-gate-20260525.md](leakyclip-clip-inversion-boundary-gate-20260525.md). |

| IAR Privacy Attacks | `hold-code-report-only` | arXiv `2502.02514` / `Privacy Attacks on Image AutoRegressive Models` is image-generation privacy watch-plus with official code and strong reported MIA/DI/extraction claims, including IAR `TPR@FPR=1% = 94.57%`, dataset inference with as few as `4` samples, and `698` extracted samples from `VAR-d30`. The target family is image autoregressive generation rather than current diffusion/latent-image rows. The official repository exposes `main.py`, `environment.yaml`, MIA/DI/memorization analysis scripts, VAR/RAR/MAR configs, and attack code, but no model hashes, immutable ImageNet row manifests, generated sample packet, per-row MIA scores, ROC arrays, metric JSON, DI CSV, memorization CSV, or verifier is committed. No ImageNet/model/upstream-repo download, MIA/DI/extraction run, CPU/GPU sidecar, or Platform/Runtime row is selected. See [iar-privacy-attacks-artifact-gate-20260523.md](iar-privacy-attacks-artifact-gate-20260523.md). |
| Silent Brush / Art Arena | `hold-semantic-shift` | arXiv `2605.17500` / `The Silent Brush: Evaluating Artistic Style Leakage in AI Art Generation` is text-to-image diffusion-adjacent privacy evidence, but the claim is style leakage / copyright evaluation rather than the current per-sample membership contract. The anonymous resource surface exposes code/notebook inventory only: `ArtArena.ipynb`, `README.md`, ET/MD eval and infer scripts, `FT_models.py`, `get_leadger.py`, prep scripts, `CSD/model.py`, `CSD/utils.py`, and figure PDFs. It exposes no target checkpoint hash, immutable member/nonmember artwork manifest, generated image packet, per-row membership score file, ROC array, metric JSON, or ready verifier. No artwork/model/source-tarball download, script execution, CPU/GPU sidecar, or Platform/Runtime row is selected. See [silent-brush-artarena-artifact-gate-20260523.md](silent-brush-artarena-artifact-gate-20260523.md). |
| Trajectory Generation Privacy | `hold-paper-source-only` | arXiv `2605.15246` / `Privacy Evaluation of Generative Models for Trajectory Generation` is cross-domain trajectory/mobility privacy evidence, not a current image or latent-image asset. It evaluates LSTM-TrajGAN, MoveSim, DiffTraj, and Diff-RNTraj; the diffusion trajectory results are near random (`0.5012` and `0.4949` AUC-ROC), while the only clearly positive table value is GAN MoveSim `AUC-ROC = 0.7002`. The checked public surface is TeX/source only, GitHub searches returned no official code or artifact hits, and no model checkpoint, immutable member/nonmember trajectory manifest, generated trajectory packet, per-row score file, ROC array, metric JSON, or verifier is public. No trajectory dataset/model download, attack implementation, CPU/GPU sidecar, or Platform/Runtime row is selected. See [trajectory-generation-privacy-artifact-gate-20260523.md](trajectory-generation-privacy-artifact-gate-20260523.md). |
Expand Down
Loading
Loading