NVIDIA
diff --git a/‎README.md‎
Lines changed: 3 additions & 0 deletions b/‎README.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/source/blogs/media/tech_blog19_async_comm_contention.png‎
351 KB b/‎docs/source/blogs/media/tech_blog19_async_comm_contention.png‎
351 KB
diff --git a/‎docs/source/blogs/media/tech_blog19_dwdp_overview.png‎
308 KB b/‎docs/source/blogs/media/tech_blog19_dwdp_overview.png‎
308 KB
diff --git a/‎docs/source/blogs/media/tech_blog19_dwdp_runtime_flow.png‎
472 KB b/‎docs/source/blogs/media/tech_blog19_dwdp_runtime_flow.png‎
472 KB
diff --git a/‎docs/source/blogs/media/tech_blog19_e2e_pareto_frontier.png‎
147 KB b/‎docs/source/blogs/media/tech_blog19_e2e_pareto_frontier.png‎
147 KB
diff --git a/‎docs/source/blogs/media/tech_blog19_sync_overhead_in_dep.png‎
24.5 KB b/‎docs/source/blogs/media/tech_blog19_sync_overhead_in_dep.png‎
24.5 KB
diff --git a/‎docs/source/blogs/tech_blog/blog19_DWDP_Distributed_Weight_Data_Parallelism_for_High_Performance_LLM_Inference_on_NVL72.md‎
Lines changed: 357 additions & 0 deletions b/‎docs/source/blogs/tech_blog/blog19_DWDP_Distributed_Weight_Data_Parallelism_for_High_Performance_LLM_Inference_on_NVL72.md‎
Lines changed: 357 additions & 0 deletions
diff --git a/‎examples/dwdp/README.md‎
Lines changed: 101 additions & 0 deletions b/‎examples/dwdp/README.md‎
Lines changed: 101 additions & 0 deletions
@@ -22,6 +22,9 @@ state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.<
 ## Tech Blogs
 
 <!-- Use github markdown link to link for the latest blog since the doc build has not happened yet. When the doc build is updated, it should be updated to the webpage link. -->
+* [04/03] DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72
+✨ [➡️ link](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog19_DWDP_Distributed_Weight_Data_Parallelism_for_High_Performance_LLM_Inference_on_NVL72.md)
+
 * [03/16] Optimizing MoE Communication with One-Sided AlltoAll Over NVLink
 ✨ [➡️ link](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md)
 
 
@@ -0,0 +1,101 @@
+# DWDP Reproduction
+
+This directory provides a thin reproduction layer on top of
+`examples/disaggregated/slurm/benchmark/submit_dwdp.py`.
+It does not modify that launcher. Instead, it combines:
+
+- `env.yaml`: cluster, container, model, and dataset inputs provided by the user
+- `dwdp_reproduce.yaml`: the DWDP reproduction matrix
+- `reproduce.py`: the script that merges both files, generates full benchmark
+  configs, and forwards them to `submit_dwdp.py`
+
+## Files
+
+- `env.yaml`
+  Holds environment-specific inputs such as Slurm settings, container image,
+  mount list, model path, and dataset mapping.
+- `dwdp_reproduce.yaml`
+  Holds only experiment parameters such as `isl`, `osl`, `ctx_tp`, `gen_tp`,
+  `batch`, `prefetch`, and DWDP settings.
+- `generated/`
+  Output directory for the generated full configs that are passed to
+  `submit_dwdp.py`.
+
+## How It Works
+
+`reproduce.py` reads `env.yaml` and `dwdp_reproduce.yaml`, generates one full
+benchmark config per experiment, writes the config by default to `generated/`, then
+invokes:
+
+```bash
+python examples/disaggregated/slurm/benchmark/submit_dwdp.py -c <generated_config>
+```
+
+## Configure `env.yaml`
+
+Update these sections before running:
+
+- `slurm`
+  Set `partition`, `account`, `time`, and any cluster-specific `extra_args`.
+- `hardware`
+  Set `gpus_per_node` for your cluster.
+- `environment`
+  Set `container_image`, `container_mount`, `model_path`, and usually
+  `trtllm_repo`.
+  Leave `log_dir` unset unless you intentionally want a fixed log location.
+  When `log_dir` is omitted, `submit_dwdp.py` creates a unique per-run log
+  directory automatically.
+- `datasets`
+  Map short dataset keys to concrete dataset files.
+
+`environment.work_dir` is optional. If omitted, `reproduce.py` automatically
+points it to `examples/disaggregated/slurm/benchmark`, which is what
+`submit_dwdp.py` expects for locating the benchmark shell scripts.
+
+## Configure `dwdp_reproduce.yaml`
+
+This file can define both context-only and end-to-end reproduction experiments.
+
+The reproduction matrix is split into:
+
+- `experiment_defaults`
+  Common fields shared across many experiments.
+- `experiments`
+  One entry per benchmark case.
+
+Each experiment may reference datasets in two ways:
+
+- `dataset_key`: resolves through `env.yaml -> datasets`
+- `dataset_file`: directly provides the full dataset path
+
+`dataset_key` is the preferred path when several experiments share the same
+dataset file.
+
+## Usage
+
+Install required Python dependency first:
+
+```bash
+python3 -m pip install pyyaml
+```
+
+
+```bash
+python3 examples/dwdp/reproduce.py \
+  --env-config /path/to/env.yaml \
+  --reproduce-config /path/to/dwdp_reproduce.yaml \
+  --output-dir /path/to/generated
+```
+
+Before running, update `dwdp_reproduce.yaml` as needed so it includes the
+reproduction experiments you want to launch.
+
+## Generated Configs
+
+Generated configs are written by default to `examples/dwdp/generated/`.
+The filenames include both the experiment name and the generated benchmark
+identifier so they can be inspected or reused directly with
+`submit_dwdp.py`.
+
+> **IMPORTANT:** Leave `environment.log_dir` unset by default. Logs are written
+> under `examples/disaggregated/slurm/benchmark/logs/`.