[wave] NSA: performance benchmarks on MI350

## Parent
Part of #1243 — DeepSeek NSA kernels for MI350

## Description

Build performance benchmarks for NSA kernels on MI350, measuring against dense FlashAttention and the paper's reported speedups.

### Benchmark suite

1. **Per-kernel microbenchmarks**
   - Mean pooling: vary N, block_size
   - Compressed attention: vary N/block_size
   - Top-k selection: vary N/block_size, block_count
   - Selection attention forward: vary N, block_count, block_size
   - Selection attention backward: vary N, block_count, block_size
   - Sliding window: vary N, window_size
   - Gated combination: vary M, H, D

2. **End-to-end pipeline benchmarks**
   - Full NSA forward vs dense FA forward
   - Full NSA forward+backward vs dense FA forward+backward
   - Prefill mode: M=N (long prompt)
   - Decode mode: M=1, varying N (context length)

3. **Scaling benchmarks**
   - Context length scaling: N = 1k, 4k, 16k, 64k, 128k, 256k
   - Batch size scaling: B = 1, 2, 4, 8, 16
   - Head count scaling: H = 32, 64, 128
   - block_count sensitivity: T = 4, 8, 16, 32, 64

### Metrics
- Wall-clock time (ms)
- Memory bandwidth utilization (% of MI350 peak)
- Compute utilization (TFLOPS achieved vs peak)
- Peak memory usage (MB)
- Speedup vs dense FlashAttention

### Target speedups (from paper, 64k context)
| Mode | Target speedup vs dense |
|------|------------------------|
| Decode (M=1) | 11.6x |
| Forward (prefill) | 9.0x |
| Backward | 6.0x |

### Infrastructure
- Use wave's existing benchmark infrastructure
- Output results as JSON for CI tracking
- Generate roofline plots comparing achieved vs theoretical performance

### Depends on
- All kernel tickets complete
- #1251 (inference pipeline)
- #1255 (training integration)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wave] NSA: performance benchmarks on MI350 #1261

Parent

Description

Benchmark suite

Metrics

Target speedups (from paper, 64k context)

Infrastructure

Depends on

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[wave] NSA: performance benchmarks on MI350 #1261

Description

Parent

Description

Benchmark suite

Metrics

Target speedups (from paper, 64k context)

Infrastructure

Depends on

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions