Skip to content

Commit 8944a35

Browse files
authored
[AMD] Re-apply: Update search-space for ATOM DSR1 configs (#792)
* Revert "Revert "[AMD] Update search-space for ATOM DSR1 configs (#699)" (#791)" This reverts commit 96e54ca. * Update amd-master.yaml
1 parent d2c2acb commit 8944a35

2 files changed

Lines changed: 18 additions & 5 deletions

File tree

.github/configs/amd-master.yaml

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,17 +33,17 @@ dsr1-fp4-mi355x-atom:
3333
- isl: 1024
3434
osl: 1024
3535
search-space:
36-
- { tp: 4, ep: 1, conc-start: 32, conc-end: 128 }
36+
- { tp: 4, ep: 1, conc-start: 32, conc-end: 256 }
3737
- { tp: 8, ep: 1, conc-start: 4, conc-end: 32 }
3838
- isl: 1024
3939
osl: 8192
4040
search-space:
41-
- { tp: 4, ep: 1, conc-start: 128, conc-end: 128 }
41+
- { tp: 4, ep: 1, conc-start: 128, conc-end: 256 }
4242
- { tp: 8, ep: 1, conc-start: 4, conc-end: 128 }
4343
- isl: 8192
4444
osl: 1024
4545
search-space:
46-
- { tp: 4, ep: 1, conc-start: 4, conc-end: 128 }
46+
- { tp: 4, ep: 1, conc-start: 4, conc-end: 256 }
4747
- { tp: 8, ep: 1, conc-start: 4, conc-end: 4 }
4848

4949
dsr1-fp4-mi355x-atom-mtp:
@@ -64,11 +64,13 @@ dsr1-fp4-mi355x-atom-mtp:
6464
- isl: 1024
6565
osl: 8192
6666
search-space:
67-
- { tp: 8, conc-start: 256, conc-end: 256, spec-decoding: mtp }
67+
# - { tp: 4, conc-start: 4, conc-end: 256, spec-decoding: mtp }
68+
- { tp: 8, conc-start: 4, conc-end: 256, spec-decoding: mtp }
6869
- isl: 8192
6970
osl: 1024
7071
search-space:
71-
- { tp: 8, conc-start: 4, conc-end: 256, spec-decoding: mtp }
72+
#- { tp: 4, conc-start: 32, conc-end: 256, spec-decoding: mtp }
73+
- { tp: 8, conc-start: 4, conc-end: 256, spec-decoding: mtp }
7274

7375
dsr1-fp8-mi300x-sglang:
7476
image: lmsysorg/sglang:v0.5.8-rocm700-mi30x

perf-changelog.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -723,3 +723,14 @@
723723
- "Gains: CUTLASS MoE optimizations (~8% throughput), FP4 kernel improvements (~4% E2E on B200), torch.compile cold-start fix"
724724
- "v0.15.1 includes fix for prefix cache hit rate of 0% on GPT-OSS hybrid attention models"
725725
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/789
726+
727+
- config-keys:
728+
- dsr1-fp4-mi355x-atom
729+
- dsr1-fp4-mi355x-atom-mtp
730+
description:
731+
- "Update search-space configurations for DSR1 FP4 MI355X ATOM and ATOM-MTP"
732+
- "Comment out TP=4 configs, consolidate to TP=8 only"
733+
- "Extend concurrency range to conc-end: 256 across all sequence lengths (1k1k, 1k8k, 8k1k)"
734+
- "Fix MTP 1k8k conc-start from 256 to 4 to enable full concurrency sweep"
735+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/699
736+

0 commit comments

Comments
 (0)