Skip to content

Commit 3ac2704

Browse files
wanqian-nvWanqian Li
andauthored
[None][doc] Blog19 for DWDP. (#12725)
Signed-off-by: Wanqian Li <serli@nvidia.com> Signed-off-by: Wanqian Li <serli@serli-mlt.client.nvidia.com> Co-authored-by: Wanqian Li <serli@serli-mlt.client.nvidia.com>
1 parent 1045f38 commit 3ac2704

11 files changed

+1443
-0
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.<
2222
## Tech Blogs
2323

2424
<!-- Use github markdown link to link for the latest blog since the doc build has not happened yet. When the doc build is updated, it should be updated to the webpage link. -->
25+
* [04/03] DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72
26+
[➡️ link](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog19_DWDP_Distributed_Weight_Data_Parallelism_for_High_Performance_LLM_Inference_on_NVL72.md)
27+
2528
* [03/16] Optimizing MoE Communication with One-Sided AlltoAll Over NVLink
2629
[➡️ link](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md)
2730

351 KB
Loading
308 KB
Loading
472 KB
Loading
147 KB
Loading
24.5 KB
Loading

docs/source/blogs/tech_blog/blog19_DWDP_Distributed_Weight_Data_Parallelism_for_High_Performance_LLM_Inference_on_NVL72.md

Lines changed: 357 additions & 0 deletions
Large diffs are not rendered by default.

examples/dwdp/README.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# DWDP Reproduction
2+
3+
This directory provides a thin reproduction layer on top of
4+
`examples/disaggregated/slurm/benchmark/submit_dwdp.py`.
5+
It does not modify that launcher. Instead, it combines:
6+
7+
- `env.yaml`: cluster, container, model, and dataset inputs provided by the user
8+
- `dwdp_reproduce.yaml`: the DWDP reproduction matrix
9+
- `reproduce.py`: the script that merges both files, generates full benchmark
10+
configs, and forwards them to `submit_dwdp.py`
11+
12+
## Files
13+
14+
- `env.yaml`
15+
Holds environment-specific inputs such as Slurm settings, container image,
16+
mount list, model path, and dataset mapping.
17+
- `dwdp_reproduce.yaml`
18+
Holds only experiment parameters such as `isl`, `osl`, `ctx_tp`, `gen_tp`,
19+
`batch`, `prefetch`, and DWDP settings.
20+
- `generated/`
21+
Output directory for the generated full configs that are passed to
22+
`submit_dwdp.py`.
23+
24+
## How It Works
25+
26+
`reproduce.py` reads `env.yaml` and `dwdp_reproduce.yaml`, generates one full
27+
benchmark config per experiment, writes the config by default to `generated/`, then
28+
invokes:
29+
30+
```bash
31+
python examples/disaggregated/slurm/benchmark/submit_dwdp.py -c <generated_config>
32+
```
33+
34+
## Configure `env.yaml`
35+
36+
Update these sections before running:
37+
38+
- `slurm`
39+
Set `partition`, `account`, `time`, and any cluster-specific `extra_args`.
40+
- `hardware`
41+
Set `gpus_per_node` for your cluster.
42+
- `environment`
43+
Set `container_image`, `container_mount`, `model_path`, and usually
44+
`trtllm_repo`.
45+
Leave `log_dir` unset unless you intentionally want a fixed log location.
46+
When `log_dir` is omitted, `submit_dwdp.py` creates a unique per-run log
47+
directory automatically.
48+
- `datasets`
49+
Map short dataset keys to concrete dataset files.
50+
51+
`environment.work_dir` is optional. If omitted, `reproduce.py` automatically
52+
points it to `examples/disaggregated/slurm/benchmark`, which is what
53+
`submit_dwdp.py` expects for locating the benchmark shell scripts.
54+
55+
## Configure `dwdp_reproduce.yaml`
56+
57+
This file can define both context-only and end-to-end reproduction experiments.
58+
59+
The reproduction matrix is split into:
60+
61+
- `experiment_defaults`
62+
Common fields shared across many experiments.
63+
- `experiments`
64+
One entry per benchmark case.
65+
66+
Each experiment may reference datasets in two ways:
67+
68+
- `dataset_key`: resolves through `env.yaml -> datasets`
69+
- `dataset_file`: directly provides the full dataset path
70+
71+
`dataset_key` is the preferred path when several experiments share the same
72+
dataset file.
73+
74+
## Usage
75+
76+
Install required Python dependency first:
77+
78+
```bash
79+
python3 -m pip install pyyaml
80+
```
81+
82+
83+
```bash
84+
python3 examples/dwdp/reproduce.py \
85+
--env-config /path/to/env.yaml \
86+
--reproduce-config /path/to/dwdp_reproduce.yaml \
87+
--output-dir /path/to/generated
88+
```
89+
90+
Before running, update `dwdp_reproduce.yaml` as needed so it includes the
91+
reproduction experiments you want to launch.
92+
93+
## Generated Configs
94+
95+
Generated configs are written by default to `examples/dwdp/generated/`.
96+
The filenames include both the experiment name and the generated benchmark
97+
identifier so they can be inspected or reused directly with
98+
`submit_dwdp.py`.
99+
100+
> **IMPORTANT:** Leave `environment.log_dir` unset by default. Logs are written
101+
> under `examples/disaggregated/slurm/benchmark/logs/`.

0 commit comments

Comments
 (0)