Commit 86d4e15
authored
[PyT] Reduce test sizes in fused attn fp8 vs fp16 to avoid OOM (#3020)
* tests/attention: shrink fp8_vs_f16 configs from B=2 to B=1
The 9 fp8_9..fp8_17 configs in `model_configs_fp8_vs_f16` use shapes
(B=2, S=4096-8192, H=32-128, D=64-192) for the bf16-vs-fp8 reference
comparison. The reference path in `test_dpa_fp8_vs_f16` materializes the
full (B, H, S, S) attention matrix in bf16, and keeps a handful of them
live (S, P, dP, dS, dropout-mask) simultaneously. At B=2, S=8192, H=64
the per-test peak is ~70 GiB, which exceeds the memory of common 80 GB
cards (H100) and pushes the suite into OOM territory on Blackwell (~91
GB measured with the cuDNN caching allocator residue).
Halving B to 1 halves the bytes of every (B, H, S, S) tensor. Measured
on B200 (SM_100, cuDNN 9.23, TE main):
per-test peak `torch.cuda.max_memory_allocated`:
before: 70.0 GiB (fp8_14)
after : 36.1 GiB (fp8_14) -48%
per-test peak `nvidia-smi memory.used`:
before: 96.8 GiB
after : 51.3 GiB -47%
test outcome (B200, develop FE, this TE):
identical 618F / 2196P / 891S, wall time within ~3%
The shrunk configs still exercise every distinct shape/mask/SWA/GQA
combination that the originals did -- only B is smaller. The suite now
fits comfortably on 80 GB cards.
fp8_19/20 (B=2, S=2048) are left at B=2 because their peak is small
(~few GiB) and the larger batch is useful coverage for padding_causal.
Signed-off-by: Vedaanta Agarwalla <vagarwalla@nvidia.com>
* address changes recommended by Kshitij
Signed-off-by: Vedaanta Agarwalla <142048820+vedaanta@users.noreply.github.com>
* tests/attention: black format fp8_13 ModelConfig
Line was 105 chars; black requires <=100 with the project's preview+
string_processing settings.
Signed-off-by: Vedaanta Agarwalla <vagarwalla@nvidia.com>
---------
Signed-off-by: Vedaanta Agarwalla <vagarwalla@nvidia.com>
Signed-off-by: Vedaanta Agarwalla <142048820+vedaanta@users.noreply.github.com>1 parent 815bf36 commit 86d4e15
1 file changed
Lines changed: 12 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1926 | 1926 | | |
1927 | 1927 | | |
1928 | 1928 | | |
1929 | | - | |
| 1929 | + | |
1930 | 1930 | | |
1931 | 1931 | | |
1932 | 1932 | | |
1933 | 1933 | | |
1934 | 1934 | | |
1935 | | - | |
1936 | | - | |
| 1935 | + | |
| 1936 | + | |
1937 | 1937 | | |
1938 | 1938 | | |
1939 | 1939 | | |
1940 | 1940 | | |
1941 | 1941 | | |
1942 | 1942 | | |
1943 | 1943 | | |
1944 | | - | |
| 1944 | + | |
1945 | 1945 | | |
1946 | 1946 | | |
1947 | 1947 | | |
1948 | 1948 | | |
1949 | 1949 | | |
1950 | | - | |
1951 | | - | |
1952 | | - | |
1953 | | - | |
| 1950 | + | |
| 1951 | + | |
| 1952 | + | |
| 1953 | + | |
| 1954 | + | |
| 1955 | + | |
1954 | 1956 | | |
1955 | | - | |
| 1957 | + | |
1956 | 1958 | | |
1957 | 1959 | | |
1958 | | - | |
| 1960 | + | |
1959 | 1961 | | |
1960 | 1962 | | |
1961 | 1963 | | |
| |||
0 commit comments