Skip to content

Commit 6433f2d

Browse files
authored
Merge branch 'main' into revisit-deps-tests
2 parents e2f8851 + 85ffcf1 commit 6433f2d

26 files changed

Lines changed: 3098 additions & 119 deletions

.ai/review-rules.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# PR Review Rules
2+
3+
Review-specific rules for Claude. Focus on correctness — style is handled by ruff.
4+
5+
Before reviewing, read and apply the guidelines in:
6+
- [AGENTS.md](AGENTS.md) — coding style, dependencies, copied code, model conventions
7+
- [skills/model-integration/SKILL.md](skills/model-integration/SKILL.md) — attention pattern, pipeline rules, implementation checklist, gotchas
8+
- [skills/parity-testing/SKILL.md](skills/parity-testing/SKILL.md) — testing rules, comparison utilities
9+
- [skills/parity-testing/pitfalls.md](skills/parity-testing/pitfalls.md) — known pitfalls (dtype mismatches, config assumptions, etc.)
10+
11+
## Common mistakes (add new rules below this line)
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
name: Claude PR Review
2+
3+
on:
4+
issue_comment:
5+
types: [created]
6+
pull_request_review_comment:
7+
types: [created]
8+
9+
permissions:
10+
contents: write
11+
pull-requests: write
12+
issues: read
13+
14+
jobs:
15+
claude-review:
16+
if: |
17+
(
18+
github.event_name == 'issue_comment' &&
19+
github.event.issue.pull_request &&
20+
github.event.issue.state == 'open' &&
21+
contains(github.event.comment.body, '@claude') &&
22+
(github.event.comment.author_association == 'MEMBER' ||
23+
github.event.comment.author_association == 'OWNER' ||
24+
github.event.comment.author_association == 'COLLABORATOR')
25+
) || (
26+
github.event_name == 'pull_request_review_comment' &&
27+
contains(github.event.comment.body, '@claude') &&
28+
(github.event.comment.author_association == 'MEMBER' ||
29+
github.event.comment.author_association == 'OWNER' ||
30+
github.event.comment.author_association == 'COLLABORATOR')
31+
)
32+
runs-on: ubuntu-latest
33+
steps:
34+
- uses: anthropics/claude-code-action@v1
35+
with:
36+
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
37+
claude_args: |
38+
--append-system-prompt "Review this PR against the rules in .ai/review-rules.md. Focus on correctness, not style (ruff handles style). Only review changes under src/diffusers/. Do NOT commit changes unless the comment explicitly asks you to using the phrase 'commit this'."

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -670,6 +670,10 @@
670670
- local: api/pipelines/z_image
671671
title: Z-Image
672672
title: Image
673+
- sections:
674+
- local: api/pipelines/llada2
675+
title: LLaDA2
676+
title: Text
673677
- sections:
674678
- local: api/pipelines/allegro
675679
title: Allegro
@@ -718,6 +722,8 @@
718722
- sections:
719723
- local: api/schedulers/overview
720724
title: Overview
725+
- local: api/schedulers/block_refinement
726+
title: BlockRefinementScheduler
721727
- local: api/schedulers/cm_stochastic_iterative
722728
title: CMStochasticIterativeScheduler
723729
- local: api/schedulers/ddim_cogvideox
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# LLaDA2
14+
15+
[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) is a family of discrete diffusion language models
16+
that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation,
17+
LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement
18+
steps.
19+
20+
## Usage
21+
22+
```py
23+
import torch
24+
from transformers import AutoModelForCausalLM, AutoTokenizer
25+
26+
from diffusers import BlockRefinementScheduler, LLaDA2Pipeline
27+
28+
model_id = "inclusionAI/LLaDA2.1-mini"
29+
model = AutoModelForCausalLM.from_pretrained(
30+
model_id, trust_remote_code=True, dtype=torch.bfloat16, device_map="auto"
31+
)
32+
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
33+
scheduler = BlockRefinementScheduler()
34+
35+
pipe = LLaDA2Pipeline(model=model, scheduler=scheduler, tokenizer=tokenizer)
36+
output = pipe(
37+
prompt="Write a short poem about the ocean.",
38+
gen_length=256,
39+
block_length=32,
40+
num_inference_steps=32,
41+
threshold=0.7,
42+
editing_threshold=0.5,
43+
max_post_steps=16,
44+
temperature=0.0,
45+
)
46+
print(output.texts[0])
47+
```
48+
49+
## Callbacks
50+
51+
Callbacks run after each refinement step. Pass `callback_on_step_end_tensor_inputs` to select which tensors are
52+
included in `callback_kwargs`. In the current implementation, `block_x` (the sequence window being refined) and
53+
`transfer_index` (mask-filling commit mask) are provided; return `{"block_x": ...}` from the callback to replace the
54+
window.
55+
56+
```py
57+
def on_step_end(pipe, step, timestep, callback_kwargs):
58+
block_x = callback_kwargs["block_x"]
59+
# Inspect or modify `block_x` here.
60+
return {"block_x": block_x}
61+
62+
out = pipe(
63+
prompt="Write a short poem.",
64+
callback_on_step_end=on_step_end,
65+
callback_on_step_end_tensor_inputs=["block_x"],
66+
)
67+
```
68+
69+
## Recommended parameters
70+
71+
LLaDA2.1 models support two modes:
72+
73+
| Mode | `threshold` | `editing_threshold` | `max_post_steps` |
74+
|------|-------------|---------------------|------------------|
75+
| Quality | 0.7 | 0.5 | 16 |
76+
| Speed | 0.5 | `None` | 16 |
77+
78+
Pass `editing_threshold=None`, `0.0`, or a negative value to turn off post-mask editing.
79+
80+
For LLaDA2.0 models, disable editing by passing `editing_threshold=None` or `0.0`.
81+
82+
For all models: `block_length=32`, `temperature=0.0`, `num_inference_steps=32`.
83+
84+
## LLaDA2Pipeline
85+
[[autodoc]] LLaDA2Pipeline
86+
- all
87+
- __call__
88+
89+
## LLaDA2PipelineOutput
90+
[[autodoc]] pipelines.LLaDA2PipelineOutput

docs/source/en/api/pipelines/overview.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
6363
| [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
6464
| [Latte](latte) | text2image |
6565
| [LEDITS++](ledits_pp) | image editing |
66+
| [LLaDA2](llada2) | text2text |
6667
| [Lumina-T2X](lumina) | text2image |
6768
| [Marigold](marigold) | depth-estimation, normals-estimation, intrinsic-decomposition |
6869
| [MultiDiffusion](panorama) | text2image |
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# BlockRefinementScheduler
14+
15+
The `BlockRefinementScheduler` manages block-wise iterative refinement for discrete token diffusion. At each step it
16+
commits the most confident tokens and optionally edits already-committed tokens when the model predicts a different
17+
token with high confidence.
18+
19+
This scheduler is used by [`LLaDA2Pipeline`].
20+
21+
## BlockRefinementScheduler
22+
[[autodoc]] BlockRefinementScheduler
23+
24+
## BlockRefinementSchedulerOutput
25+
[[autodoc]] schedulers.scheduling_block_refinement.BlockRefinementSchedulerOutput

docs/source/en/optimization/fp16.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,24 @@ Refer to the [diffusers/benchmarks](https://huggingface.co/datasets/diffusers/be
248248

249249
The [diffusers-torchao](https://github.com/sayakpaul/diffusers-torchao#benchmarking-results) repository also contains benchmarking results for compiled versions of Flux and CogVideoX.
250250

251+
## Kernels
252+
253+
[Kernels](https://huggingface.co/docs/kernels/index) is a library for building, distributing, and loading optimized compute kernels on the [Hub](https://huggingface.co/kernels-community). It supports [attention](./attention_backends#set_attention_backend) kernels and custom CUDA kernels for operations like RMSNorm, GEGLU, RoPE, and AdaLN.
254+
255+
The [Diffusers Pipeline Integration](https://github.com/huggingface/kernels/blob/main/skills/cuda-kernels/references/diffusers-integration.md) guide shows how to integrate a kernel with the [add cuda-kernels](https://github.com/huggingface/kernels/blob/main/skills/cuda-kernels/SKILL.md) skill. This skill enables an agent, like Claude or Codex, to write custom kernels targeted towards a specific model and your hardware.
256+
257+
> [!TIP]
258+
> Install the [add cuda-kernels](https://github.com/huggingface/kernels/blob/main/skills/cuda-kernels/SKILL.md) skill to teach an agent how to write a kernel. The [Custom kernels for all from Codex and Claude](https://huggingface.co/blog/custom-cuda-kernels-agent-skills) blog post covers this in more detail.
259+
260+
For example, a custom RMSNorm kernel (generated by the `add cuda-kernels` skill) with [torch.compile](#torchcompile) speeds up LTX-Video generation 1.43x on an H100.
261+
262+
<iframe
263+
src="https://huggingface.co/datasets/docs-benchmarks/kernel-ltx-video/embed/viewer/default/train"
264+
frameborder="0"
265+
width="100%"
266+
height="560px"
267+
></iframe>
268+
251269
## Dynamic quantization
252270

253271
[Dynamic quantization](https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html) improves inference speed by reducing precision to enable faster math operations. This particular type of quantization determines how to scale the activations based on the data at runtime rather than using a fixed scaling factor. As a result, the scaling factor is more accurately aligned with the data.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Discrete Token Diffusion (Experimental)
2+
3+
This folder contains **training and sampling examples** for *discrete diffusion over token IDs* (language-model style), built to follow the `diffusers` + `accelerate` training conventions.
4+
5+
## LLaDA2
6+
7+
[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) generates text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, it starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement steps.
8+
9+
### Train
10+
11+
The training script uses confidence-aware loss and works with any causal LM from the Hub (e.g. Qwen, Llama, Mistral):
12+
13+
```bash
14+
accelerate launch examples/discrete_diffusion/train_llada2.py \
15+
--model_name_or_path Qwen/Qwen2.5-0.5B \
16+
--dataset_name wikitext \
17+
--dataset_config_name wikitext-2-raw-v1 \
18+
--text_column text \
19+
--output_dir llada2-output \
20+
--max_train_steps 1000 \
21+
--prompt_length 32 \
22+
--block_length 32 \
23+
--lambda_conf 2.0 \
24+
--conf_temperature 0.5
25+
```
26+
27+
If you don't want to download a dataset, you can use random-token data:
28+
29+
```bash
30+
accelerate launch examples/discrete_diffusion/train_llada2.py \
31+
--model_name_or_path Qwen/Qwen2.5-0.5B \
32+
--output_dir llada2-output \
33+
--use_dummy_data \
34+
--num_dummy_samples 2048
35+
```
36+
37+
### Sample
38+
39+
```bash
40+
python examples/discrete_diffusion/sample_llada2.py \
41+
--model_id inclusionAI/LLaDA2.1-mini \
42+
--prompt "Write a short poem about the ocean." \
43+
--gen_length 256 \
44+
--num_inference_steps 32 \
45+
--threshold 0.7 \
46+
--editing_threshold 0.5 \
47+
--max_post_steps 16 \
48+
--use_chat_template \
49+
--add_generation_prompt
50+
```

0 commit comments

Comments
 (0)