huggingface
diff --git a/‎.ai/review-rules.md‎
Lines changed: 11 additions & 0 deletions b/‎.ai/review-rules.md‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎.github/workflows/claude_review.yml‎
Lines changed: 38 additions & 0 deletions b/‎.github/workflows/claude_review.yml‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎docs/source/en/_toctree.yml‎
Lines changed: 6 additions & 0 deletions b/‎docs/source/en/_toctree.yml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/source/en/api/pipelines/llada2.md‎
Lines changed: 90 additions & 0 deletions b/‎docs/source/en/api/pipelines/llada2.md‎
Lines changed: 90 additions & 0 deletions
diff --git a/‎docs/source/en/api/pipelines/overview.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/en/api/pipelines/overview.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/en/api/schedulers/block_refinement.md‎
Lines changed: 25 additions & 0 deletions b/‎docs/source/en/api/schedulers/block_refinement.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎docs/source/en/optimization/fp16.md‎
Lines changed: 18 additions & 0 deletions b/‎docs/source/en/optimization/fp16.md‎
Lines changed: 18 additions & 0 deletions
diff --git a/‎examples/discrete_diffusion/README.md‎
Lines changed: 50 additions & 0 deletions b/‎examples/discrete_diffusion/README.md‎
Lines changed: 50 additions & 0 deletions
@@ -0,0 +1,11 @@
+# PR Review Rules
+
+Review-specific rules for Claude. Focus on correctness — style is handled by ruff.
+
+Before reviewing, read and apply the guidelines in:
+- [AGENTS.md](AGENTS.md) — coding style, dependencies, copied code, model conventions
+- [skills/model-integration/SKILL.md](skills/model-integration/SKILL.md) — attention pattern, pipeline rules, implementation checklist, gotchas
+- [skills/parity-testing/SKILL.md](skills/parity-testing/SKILL.md) — testing rules, comparison utilities
+- [skills/parity-testing/pitfalls.md](skills/parity-testing/pitfalls.md) — known pitfalls (dtype mismatches, config assumptions, etc.)
+
+## Common mistakes (add new rules below this line)
@@ -0,0 +1,38 @@
+name: Claude PR Review
+
+on:
+  issue_comment:
+    types: [created]
+  pull_request_review_comment:
+    types: [created]
+
+permissions:
+  contents: write
+  pull-requests: write
+  issues: read
+
+jobs:
+  claude-review:
+    if: |
+      (
+        github.event_name == 'issue_comment' &&
+        github.event.issue.pull_request &&
+        github.event.issue.state == 'open' &&
+        contains(github.event.comment.body, '@claude') &&
+        (github.event.comment.author_association == 'MEMBER' ||
+         github.event.comment.author_association == 'OWNER' ||
+         github.event.comment.author_association == 'COLLABORATOR')
+      ) || (
+        github.event_name == 'pull_request_review_comment' &&
+        contains(github.event.comment.body, '@claude') &&
+        (github.event.comment.author_association == 'MEMBER' ||
+         github.event.comment.author_association == 'OWNER' ||
+         github.event.comment.author_association == 'COLLABORATOR')
+      )
+    runs-on: ubuntu-latest
+    steps:
+      - uses: anthropics/claude-code-action@v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          claude_args: |
+            --append-system-prompt "Review this PR against the rules in .ai/review-rules.md. Focus on correctness, not style (ruff handles style). Only review changes under src/diffusers/. Do NOT commit changes unless the comment explicitly asks you to using the phrase 'commit this'."
@@ -670,6 +670,10 @@
       - local: api/pipelines/z_image
         title: Z-Image
       title: Image
+    - sections:
+      - local: api/pipelines/llada2
+        title: LLaDA2
+      title: Text
     - sections:
       - local: api/pipelines/allegro
         title: Allegro
@@ -718,6 +722,8 @@
   - sections:
     - local: api/schedulers/overview
       title: Overview
+    - local: api/schedulers/block_refinement
+      title: BlockRefinementScheduler
     - local: api/schedulers/cm_stochastic_iterative
       title: CMStochasticIterativeScheduler
     - local: api/schedulers/ddim_cogvideox
 
@@ -0,0 +1,90 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# LLaDA2
+
+[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) is a family of discrete diffusion language models
+that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation,
+LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement
+steps.
+
+## Usage
+
+```py
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+from diffusers import BlockRefinementScheduler, LLaDA2Pipeline
+
+model_id = "inclusionAI/LLaDA2.1-mini"
+model = AutoModelForCausalLM.from_pretrained(
+    model_id, trust_remote_code=True, dtype=torch.bfloat16, device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+scheduler = BlockRefinementScheduler()
+
+pipe = LLaDA2Pipeline(model=model, scheduler=scheduler, tokenizer=tokenizer)
+output = pipe(
+    prompt="Write a short poem about the ocean.",
+    gen_length=256,
+    block_length=32,
+    num_inference_steps=32,
+    threshold=0.7,
+    editing_threshold=0.5,
+    max_post_steps=16,
+    temperature=0.0,
+)
+print(output.texts[0])
+```
+
+## Callbacks
+
+Callbacks run after each refinement step. Pass `callback_on_step_end_tensor_inputs` to select which tensors are
+included in `callback_kwargs`. In the current implementation, `block_x` (the sequence window being refined) and
+`transfer_index` (mask-filling commit mask) are provided; return `{"block_x": ...}` from the callback to replace the
+window.
+
+```py
+def on_step_end(pipe, step, timestep, callback_kwargs):
+    block_x = callback_kwargs["block_x"]
+    # Inspect or modify `block_x` here.
+    return {"block_x": block_x}
+
+out = pipe(
+    prompt="Write a short poem.",
+    callback_on_step_end=on_step_end,
+    callback_on_step_end_tensor_inputs=["block_x"],
+)
+```
+
+## Recommended parameters
+
+LLaDA2.1 models support two modes:
+
+| Mode | `threshold` | `editing_threshold` | `max_post_steps` |
+|------|-------------|---------------------|------------------|
+| Quality | 0.7 | 0.5 | 16 |
+| Speed | 0.5 | `None` | 16 |
+
+Pass `editing_threshold=None`, `0.0`, or a negative value to turn off post-mask editing.
+
+For LLaDA2.0 models, disable editing by passing `editing_threshold=None` or `0.0`.
+
+For all models: `block_length=32`, `temperature=0.0`, `num_inference_steps=32`.
+
+## LLaDA2Pipeline
+[[autodoc]] LLaDA2Pipeline
+    - all
+    - __call__
+
+## LLaDA2PipelineOutput
+[[autodoc]] pipelines.LLaDA2PipelineOutput
@@ -63,6 +63,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
 | [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
 | [Latte](latte) | text2image |
 | [LEDITS++](ledits_pp) | image editing |
+| [LLaDA2](llada2) | text2text |
 | [Lumina-T2X](lumina) | text2image |
 | [Marigold](marigold) | depth-estimation, normals-estimation, intrinsic-decomposition |
 | [MultiDiffusion](panorama) | text2image |
 
@@ -0,0 +1,25 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# BlockRefinementScheduler
+
+The `BlockRefinementScheduler` manages block-wise iterative refinement for discrete token diffusion. At each step it
+commits the most confident tokens and optionally edits already-committed tokens when the model predicts a different
+token with high confidence.
+
+This scheduler is used by [`LLaDA2Pipeline`].
+
+## BlockRefinementScheduler
+[[autodoc]] BlockRefinementScheduler
+
+## BlockRefinementSchedulerOutput
+[[autodoc]] schedulers.scheduling_block_refinement.BlockRefinementSchedulerOutput
@@ -248,6 +248,24 @@ Refer to the [diffusers/benchmarks](https://huggingface.co/datasets/diffusers/be
 
 The [diffusers-torchao](https://github.com/sayakpaul/diffusers-torchao#benchmarking-results) repository also contains benchmarking results for compiled versions of Flux and CogVideoX.
 
+## Kernels
+
+[Kernels](https://huggingface.co/docs/kernels/index) is a library for building, distributing, and loading optimized compute kernels on the [Hub](https://huggingface.co/kernels-community). It supports [attention](./attention_backends#set_attention_backend) kernels and custom CUDA kernels for operations like RMSNorm, GEGLU, RoPE, and AdaLN.
+
+The [Diffusers Pipeline Integration](https://github.com/huggingface/kernels/blob/main/skills/cuda-kernels/references/diffusers-integration.md) guide shows how to integrate a kernel with the [add cuda-kernels](https://github.com/huggingface/kernels/blob/main/skills/cuda-kernels/SKILL.md) skill. This skill enables an agent, like Claude or Codex, to write custom kernels targeted towards a specific model and your hardware.
+
+> [!TIP]
+> Install the [add cuda-kernels](https://github.com/huggingface/kernels/blob/main/skills/cuda-kernels/SKILL.md) skill to teach an agent how to write a kernel. The [Custom kernels for all from Codex and Claude](https://huggingface.co/blog/custom-cuda-kernels-agent-skills) blog post covers this in more detail.
+
+For example, a custom RMSNorm kernel (generated by the `add cuda-kernels` skill) with [torch.compile](#torchcompile) speeds up LTX-Video generation 1.43x on an H100.
+
+<iframe
+  src="https://huggingface.co/datasets/docs-benchmarks/kernel-ltx-video/embed/viewer/default/train"
+  frameborder="0"
+  width="100%"
+  height="560px"
+></iframe>
+
 ## Dynamic quantization
 
 [Dynamic quantization](https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html) improves inference speed by reducing precision to enable faster math operations. This particular type of quantization determines how to scale the activations based on the data at runtime rather than using a fixed scaling factor. As a result, the scaling factor is more accurately aligned with the data.
 
@@ -0,0 +1,50 @@
+# Discrete Token Diffusion (Experimental)
+
+This folder contains **training and sampling examples** for *discrete diffusion over token IDs* (language-model style), built to follow the `diffusers` + `accelerate` training conventions.
+
+## LLaDA2
+
+[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) generates text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, it starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement steps.
+
+### Train
+
+The training script uses confidence-aware loss and works with any causal LM from the Hub (e.g. Qwen, Llama, Mistral):
+
+```bash
+accelerate launch examples/discrete_diffusion/train_llada2.py \
+  --model_name_or_path Qwen/Qwen2.5-0.5B \
+  --dataset_name wikitext \
+  --dataset_config_name wikitext-2-raw-v1 \
+  --text_column text \
+  --output_dir llada2-output \
+  --max_train_steps 1000 \
+  --prompt_length 32 \
+  --block_length 32 \
+  --lambda_conf 2.0 \
+  --conf_temperature 0.5
+```
+
+If you don't want to download a dataset, you can use random-token data:
+
+```bash
+accelerate launch examples/discrete_diffusion/train_llada2.py \
+  --model_name_or_path Qwen/Qwen2.5-0.5B \
+  --output_dir llada2-output \
+  --use_dummy_data \
+  --num_dummy_samples 2048
+```
+
+### Sample
+
+```bash
+python examples/discrete_diffusion/sample_llada2.py \
+  --model_id inclusionAI/LLaDA2.1-mini \
+  --prompt "Write a short poem about the ocean." \
+  --gen_length 256 \
+  --num_inference_steps 32 \
+  --threshold 0.7 \
+  --editing_threshold 0.5 \
+  --max_post_steps 16 \
+  --use_chat_template \
+  --add_generation_prompt
+```