Performance Regression Detected
Commit: 6619cc7d
Run: https://github.com/ROCm/ATOM/actions/runs/26050231461
Date: 2026-05-19T05:00:10.609891+00:00
Regressed Configurations
| Model |
ISL/OSL |
Conc |
Tput (cur) |
Tput (base) |
Δ% |
TPOT (cur) |
TPOT (base) |
Δ% |
| DeepSeek-R1-0528 |
1024/1024 |
8 |
580.9 |
679.2 |
-14.5% |
13.33 |
11.41 |
16.9% |
| DeepSeek-R1-0528 |
8192/1024 |
4 |
312.7 |
330.3 |
-5.3% |
12.14 |
11.49 |
5.7% |
| DeepSeek-R1-0528 |
8192/1024 |
8 |
569.9 |
587.1 |
-2.9% |
13.36 |
13.00 |
2.7% |
| DeepSeek-R1-0528 MTP3 |
1024/1024 |
4 |
405.1 |
567.5 |
-28.6% |
9.35 |
6.70 |
39.5% |
| DeepSeek-R1-0528 MTP3 |
1024/1024 |
16 |
1081.2 |
1440.9 |
-25.0% |
14.36 |
10.48 |
37.1% |
| DeepSeek-R1-0528 MTP3 |
1024/1024 |
32 |
2091.8 |
2465.0 |
-15.1% |
14.08 |
12.36 |
13.9% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
4 |
479.6 |
485.6 |
-1.2% |
7.50 |
7.53 |
-0.5% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
16 |
995.2 |
1117.1 |
-10.9% |
11.82 |
13.15 |
-10.2% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
32 |
1487.7 |
1575.8 |
-5.6% |
19.44 |
16.94 |
14.7% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
64 |
2025.8 |
2155.2 |
-6.0% |
27.93 |
27.34 |
2.2% |
| DeepSeek-R1-0528-MXFP4 |
1024/1024 |
16 |
1127.5 |
1137.3 |
-0.9% |
13.58 |
13.55 |
0.2% |
| DeepSeek-R1-0528-MXFP4 |
1024/1024 |
32 |
1716.0 |
1758.4 |
-2.4% |
17.97 |
17.55 |
2.4% |
| DeepSeek-R1-0528-MXFP4 MTP3 |
1024/1024 |
4 |
617.1 |
633.6 |
-2.6% |
5.88 |
6.07 |
-3.2% |
| DeepSeek-R1-0528-MXFP4 MTP3 |
1024/1024 |
16 |
1543.0 |
1571.3 |
-1.8% |
9.76 |
9.70 |
0.6% |
| DeepSeek-R1-0528-MXFP4 MTP3 |
1024/1024 |
32 |
2328.8 |
2026.0 |
14.9% |
12.95 |
15.15 |
-14.6% |
| DeepSeek-V4-Pro |
1024/1024 |
4 |
170.1 |
175.3 |
-3.0% |
21.70 |
21.86 |
-0.7% |
| DeepSeek-V4-Pro |
1024/1024 |
8 |
336.0 |
331.3 |
1.4% |
22.84 |
23.36 |
-2.2% |
| DeepSeek-V4-Pro |
1024/1024 |
64 |
1545.3 |
1535.7 |
0.6% |
39.55 |
40.00 |
-1.1% |
| DeepSeek-V4-Pro |
1024/1024 |
128 |
2441.5 |
2393.8 |
2.0% |
50.31 |
51.44 |
-2.2% |
| DeepSeek-V4-Pro |
8192/1024 |
64 |
1005.9 |
995.8 |
1.0% |
59.83 |
60.88 |
-1.7% |
| DeepSeek-V4-Pro |
8192/1024 |
128 |
1297.2 |
1284.1 |
1.0% |
91.74 |
93.24 |
-1.6% |
| GLM-5-FP8 |
8192/1024 |
32 |
754.5 |
756.9 |
-0.3% |
40.01 |
40.01 |
0.0% |
| Kimi-K2.5-MXFP4 |
1024/1024 |
16 |
1091.1 |
1103.0 |
-1.1% |
14.16 |
14.09 |
0.5% |
| Kimi-K2.5-MXFP4 |
1024/1024 |
256 |
4959.6 |
4960.6 |
-0.0% |
49.54 |
49.82 |
-0.6% |
| MiniMax-M2.7-MXFP4 |
1024/1024 |
4 |
380.1 |
389.1 |
-2.3% |
10.12 |
9.89 |
2.3% |
| MiniMax-M2.7-MXFP4 |
1024/1024 |
16 |
1050.0 |
1039.5 |
1.0% |
14.79 |
14.96 |
-1.2% |
| MiniMax-M2.7-MXFP4 |
1024/1024 |
32 |
1642.0 |
1620.3 |
1.3% |
18.82 |
19.18 |
-1.9% |
| Qwen3.5-397B-A17B-FP8 |
1024/1024 |
64 |
2818.3 |
2857.6 |
-1.4% |
21.92 |
21.64 |
1.3% |
| Qwen3.5-397B-A17B-FP8 |
8192/1024 |
4 |
409.9 |
423.2 |
-3.1% |
9.23 |
8.96 |
3.0% |
| Qwen3.5-397B-A17B-FP8 |
8192/1024 |
8 |
708.6 |
731.4 |
-3.1% |
10.64 |
10.40 |
2.4% |
| Qwen3.5-397B-A17B-MXFP4 |
8192/1024 |
64 |
2326.7 |
2355.1 |
-1.2% |
25.81 |
25.72 |
0.3% |
| gpt-oss-120b |
1024/1024 |
16 |
2641.3 |
2621.1 |
0.8% |
5.71 |
5.84 |
-2.3% |
Performance Summary
# Trace Performance Summary
**File:** `DeepSeek-R1-0528_ts_20260519_054632_204.pt.trace.json.gz`
## Prefill
| # | Label | Duration |
|---|-------|----------|
| 0 | `prefill[bs=2 tok=901 ctx=[991, 886]]` | 87.24 ms |
| 1 | `prefill[bs=6 tok=5577 ctx=[1014, 866, 922]...+3]` | 87.64 ms |
| 2 | `prefill[bs=1 tok=840 ctx=840]` | 87.61 ms |
| 3 | `prefill[bs=1 tok=855 ctx=855]` | 83.61 ms |
| 4 | `prefill[bs=1 tok=906 ctx=906]` | 87.18 ms |
| 5 | `prefill[bs=1 tok=889 ctx=889]` | 82.08 ms |
| 6 | `prefill[bs=2 tok=1866 ctx=[907, 959]]` | 85.97 ms |
| 7 | `prefill[bs=1 tok=877 ctx=877]` | 86.78 ms |
| 8 | `prefill[bs=1 tok=1012 ctx=1012]` | 80.91 ms |
**Total prefill:** 769.02 ms
## Decode
- **Iterations:** 2006
- **Mean:** 898.6 us
- **Min:** 723.8 us
- **Max:** 7.57 ms
- **Total:** 1802.49 ms
Profiler Traces
Download from workflow artifacts.
Open in Perfetto UI or Chrome chrome://tracing for analysis.
Next Steps
- Download
profiler-analysis-26050231461 artifact
- Open trace files in Perfetto UI
- Compare kernel durations against previous traces
- Identify bottleneck changes
Performance Regression Detected
Commit:
6619cc7dRun: https://github.com/ROCm/ATOM/actions/runs/26050231461
Date: 2026-05-19T05:00:10.609891+00:00
Regressed Configurations
Performance Summary
Profiler Traces
Download from workflow artifacts.
Open in Perfetto UI or Chrome
chrome://tracingfor analysis.Next Steps
profiler-analysis-26050231461artifact