Off by default. Opt in to permit LLVM optimizations on f64 arithmetic that produce observably different results from Node's V8 in exchange for faster code on a narrow class of numeric workloads.
| Mode | Bit-exact with Node | Speed |
|---|---|---|
| Default | Yes (~94% of random FP programs match Node bit-for-bit; the residual ~6% comes from the LLVM SLP vectorizer at -O3, not from fast-math) |
Same as Node within noise on realistic FP code |
--fast-math |
No (~70%; ~30% of random FP programs diverge by 1 ULP) | ~7x faster on tight sum += constant loops; ~0% difference on dot products, array reductions, or any data-dependent FP-heavy code (M-series ARM64 numbers; x86_64 may differ) |
If your program does scientific computing, signal processing, or any
hand-tuned numeric kernel that benefits from autovectorization or FMA
fusion, --fast-math may help. For everything else (UI, business logic,
crypto, networking, framework code), it changes nothing observable
except correctness — leave it off.
CLI flag wins over env var, env var wins over package.json:
# 1. Per-build CLI flag
perry --fast-math myapp.ts
# 2. Per-shell environment
PERRY_FAST_MATH=1 perry myapp.ts
# 3. Per-project package.json (most common)
{
"perry": {
"fastMath": true
}
}Exactly two LLVM per-instruction fast-math flags become emitted on
every fadd / fsub / fmul / fdiv / frem / fneg:
-
reassoc— permits the optimizer to reorder associative chains.(a + b) + cmay becomea + (b + c). This is what the loop-vectorizer needs to break a serial accumulator dependency chain into 4 parallel accumulators. Worst-case observable behavior: tiny ULP-level differences in long sum chains over operands of widely-different magnitudes; rewrites like(a / b) * b → (a * b) / b(algebraically equal, IEEE-different). -
contract— permits fused multiply-add.a * b + cmay become a single FMA instruction with one rounding step instead of two. ARM and modern x86 both have hardware FMA. Worst-case observable behavior: intermediatea * bno longer rounds independently, so code that depends on the rounding structure (Kahan summation, compensated arithmetic) sees different bits.
The full clang -ffast-math is off even with --fast-math. In
particular, these flags stay clear:
nnan/ninf— these tell LLVM to assume no NaN/Inf inputs, which is catastrophic for Perry: NaN-boxing uses NaN bit patterns for every non-number value (strings, objects, null, undefined, booleans). Enabling them caused LLVM to replaceTAG_NULL/TAG_UNDEFINEDconstants with0.0at codegen time. Tried at v0.2.x commit083ce16, reverted two days later inb5a8c83f. Will not return.nsz(no signed zeros) — would make(a + 0) → aa valid rewrite even whenais-0.Object.is(-0, 0)is observable in JS.arcp(allow reciprocal) — would rewritea / b → a * (1 / b), which loses precision whenbis far from a power of two.afn(approximate functions) — would let LLVM substitute lower- precision math intrinsics.
For reference, Rust nightly's #![feature(float_algebraic)] enables
reassoc + contract + nsz + arcp + afn. Perry's --fast-math is
strictly more conservative than that.
Benchmarks on Apple Silicon (M-series, ARM64), min of 3 runs each,
LLVM 19, perry 0.5.569. Run scripts/perf_bench.sh to reproduce.
| Benchmark | Default | --fast-math |
Ratio | Node |
|---|---|---|---|---|
sum_loop (100M sum += 1) |
96 ms | 13 ms | 7.4× faster | 53 ms |
dot_product (10M sum += a[i]*b[i]) |
13 ms | 13 ms | 1.00× | 12 ms |
array_sum (10M sum += xs[i]) |
10 ms | 10 ms | 1.00× | 11 ms |
Read these together: --fast-math produces a large speedup ONLY on
loops where the accumulator step is constant or trivially-redundant
enough that LLVM can split it into parallel partial sums. Real FP
workloads rarely look like sum += 1 and so rarely benefit. The default
mode beats Node on array_sum and matches it on dot_product without
giving up bit-exact parity.
scripts/fp_fuzz.mjs — randomly generates TS programs exercising the
six patterns most likely to trip per-instruction FMFs (left-fold,
tree-fold, right-fold reductions; FMA-shaped chains; algebraic
identities like (a/b)*b; cancellation predicates). Each program is
compiled with both Node and Perry, and stdout is diffed byte-for-byte.
| Mode | Pass rate (100 random programs, seed=200) |
|---|---|
| Default | 94/100 |
--fast-math |
~70/100 |
The 6/100 default-mode failures are residual divergences from sources
not gated by per-instruction FMFs — most originate in the LLVM SLP
vectorizer at -O3, which can apply pairwise reduction even without
the reassoc permission. Tracked separately; out of scope for this
flag.
Perry's per-module .o cache (in .perry-cache/objects/) keys on the
fast_math setting alongside source hash and other compile options.
Toggling the flag invalidates affected cache entries — perry --fast-math
right after perry does a clean recompile of every module that contains
f64 arithmetic. No --no-cache necessary.
(This is a deliberate fix. During the original investigation, an early
version of the flag forgot to enter the cache key, and the result was
that toggling the flag appeared to do nothing because all .o files
came from the cache. If you ever see fast-math defaults that seem
not to take effect, suspect the cache key first.)
- For library authors: if your TS library publishes benchmark numbers, document which mode you measured under. The 7× sum-loop case is the only place the gap is large; if your benchmark doesn't look like that, the numbers are mode-independent and you can publish one set.
- For app authors: there is no migration. Default behavior is the
pre-flag behavior with
--fast-mathremoved; bit-exact results are more compatible with Node, not less. - For determinism-critical code (lockstep simulations, financial
reconciliation, hash function correctness): leave the default. Even
with
--fast-mathoff there's a residual ~6% divergence rate on random FP code, which is too high for true determinism work — but it's an order of magnitude better than the ~30% with the flag on.