You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update on "Add GEMM-based standard SDPA benchmark"
Add bench_sdpa.cpp with a standalone GEMM-based SDPA implementation
(run_standard_sdpa) alongside ExecuTorch's tiled flash attention
(custom_sdpa_out) for comparative benchmarking.
The standalone SDPA uses full GEMM per head with 3-pass softmax and
supports both [B,S,H,D] and [B,H,S,D] layouts via BLAS leading
dimension parameters, allowing isolation of algorithm vs layout effects.
Includes validation tests that verify the GEMM-based implementation
matches custom_sdpa_out within tolerance.
Differential Revision: [D96044313](https://our.internmc.facebook.com/intern/diff/D96044313/)
[ghstack-poisoned]
Copy file name to clipboardExpand all lines: .ci/scripts/test_model_e2e.sh
+22-1Lines changed: 22 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -354,7 +354,7 @@ EOF
354
354
fi
355
355
;;
356
356
qwen3_5_moe)
357
-
RUNNER_ARGS="$RUNNER_ARGS --tokenizer_path ${MODEL_DIR}/$TOKENIZER_FILE --prompt 'What is the capital of France?' --max_new_tokens 128 --temperature 0"
357
+
RUNNER_ARGS="$RUNNER_ARGS --tokenizer_path ${MODEL_DIR}/$TOKENIZER_FILE --prompt 'What is the capital of France?' --max_new_tokens 128 --temperature 0 --cuda_graph"
0 commit comments