Skip to content

[wave] NSA: performance benchmarks on MI350 #1261

@harsh-nod

Description

@harsh-nod

Parent

Part of #1243 — DeepSeek NSA kernels for MI350

Description

Build performance benchmarks for NSA kernels on MI350, measuring against dense FlashAttention and the paper's reported speedups.

Benchmark suite

  1. Per-kernel microbenchmarks

    • Mean pooling: vary N, block_size
    • Compressed attention: vary N/block_size
    • Top-k selection: vary N/block_size, block_count
    • Selection attention forward: vary N, block_count, block_size
    • Selection attention backward: vary N, block_count, block_size
    • Sliding window: vary N, window_size
    • Gated combination: vary M, H, D
  2. End-to-end pipeline benchmarks

    • Full NSA forward vs dense FA forward
    • Full NSA forward+backward vs dense FA forward+backward
    • Prefill mode: M=N (long prompt)
    • Decode mode: M=1, varying N (context length)
  3. Scaling benchmarks

    • Context length scaling: N = 1k, 4k, 16k, 64k, 128k, 256k
    • Batch size scaling: B = 1, 2, 4, 8, 16
    • Head count scaling: H = 32, 64, 128
    • block_count sensitivity: T = 4, 8, 16, 32, 64

Metrics

  • Wall-clock time (ms)
  • Memory bandwidth utilization (% of MI350 peak)
  • Compute utilization (TFLOPS achieved vs peak)
  • Peak memory usage (MB)
  • Speedup vs dense FlashAttention

Target speedups (from paper, 64k context)

Mode Target speedup vs dense
Decode (M=1) 11.6x
Forward (prefill) 9.0x
Backward 6.0x

Infrastructure

  • Use wave's existing benchmark infrastructure
  • Output results as JSON for CI tracking
  • Generate roofline plots comparing achieved vs theoretical performance

Depends on

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestnsaDeepSeek Native Sparse Attention

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions