You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update on "Fix SLEEF preprocessor macro name to match ATen vec headers"
The ATen NEON vectorized math headers (vec128_float_neon.h) check for
AT_BUILD_ARM_VEC256_WITH_SLEEF to enable SLEEF intrinsics for exp(),
log(), etc. ExecuTorch's get_vec_preprocessor_flags() was defining
ET_BUILD_ARM_VEC256_WITH_SLEEF (wrong prefix), so the USE_SLEEF macro
always took the fallback path: map(std::exp) — scalar exp called
per-element with full vector load/store overhead wrapping it.
With this fix, Vectorized<float>::exp() correctly dispatches to
Sleef_expf4_u10 on ARM, which is the intended behavior.
Differential Revision: [D96044314](https://our.internmc.facebook.com/intern/diff/D96044314/)
[ghstack-poisoned]
Copy file name to clipboardExpand all lines: .ci/scripts/test_model_e2e.sh
+22-1Lines changed: 22 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -354,7 +354,7 @@ EOF
354
354
fi
355
355
;;
356
356
qwen3_5_moe)
357
-
RUNNER_ARGS="$RUNNER_ARGS --tokenizer_path ${MODEL_DIR}/$TOKENIZER_FILE --prompt 'What is the capital of France?' --max_new_tokens 128 --temperature 0"
357
+
RUNNER_ARGS="$RUNNER_ARGS --tokenizer_path ${MODEL_DIR}/$TOKENIZER_FILE --prompt 'What is the capital of France?' --max_new_tokens 128 --temperature 0 --cuda_graph"
0 commit comments