You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Speed up Integer#to_s with a two digit lookup table (ruby#16719)
* numeric: emit two decimal digits per iteration in rb_fix2str
Replace the digit-at-a-time loop in rb_fix2str with the standard
itoa 2-digit lookup table for base 10. Each iteration now
writes two digits using a single (u % 100, u / 100) pair, so the
number of loop iterations is halved for multi-digit integers.
The classic per-digit loop is kept for non-base-10 conversion.
Benchmark (Apple M-series, 5M-10M ops, best of 3 runs):
case base patch delta
--------- ----- ----- -----
1-digit (5) 64 ns/op 64 ns/op -0%
2-digit (42) 64 ns/op 65 ns/op +2% (noise)
3-digit (400) 66 ns/op 64 ns/op -3%
5-digit (12345) 69 ns/op 67 ns/op -3%
10-digit (1234567890) 77 ns/op 67 ns/op -13%
19-digit (2^62-1) 111 ns/op 75 ns/op -33%
The crossover is at ~3 digits: below that the constant setup
dominates and the benefit is within noise, above that the halved
iteration count shows up linearly. Typical Rails payloads mix
short IDs (1-5 digits) and longer values (timestamps, nanos,
large counts), so the win is workload-dependent but strictly
non-negative for real code.
Correctness: 100k random fuzz across the full fixnum range plus
targeted edges (0, ±1, ±99, ±100, 2^30-1, 2^62-1, etc.) all pass.
make test-all shows 34694 tests, 7325860 assertions, 0 new
failures (same pre-existing TestArgf#test_puts flake as on
master) — test_integer.rb alone runs 38 tests / 421628 assertions
of which Integer#to_s exercises the bulk, all pass.
The 200-byte lookup table sits in .rodata and fits in a single
cache line of its own (3 lines for the whole table). No change
to public API, no change to bignum conversion, no change to
non-base-10 conversion paths.
* bignum: emit two decimal digits per iteration in big2str_2bdigits
Extend the 2-digit lookup-table itoa optimisation from rb_fix2str to
the inner conversion loop used by Bignum#to_s. big2str_2bdigits has
two code paths — a leading-chunk path that emits variable-length
digits, and a recursive-chunk path that emits a fixed-width zero-
padded block — and both gain from the halved division count. The
classic per-digit loop is preserved for non-base-10 conversion.
Moves the ruby_decimal_digit_pairs table from a file-static in
numeric.c to bignum.c next to ruby_digitmap, and exposes it through
internal/bignum.h so both files share the same 200-byte .rodata
instance.
Benchmark (Apple M-series, best of 3 runs, measures bignum-only
speedup against the preceding fixnum commit):
case base patch delta
--------- ----- ----- -----
big_20dig 10^19+... 146 ns/op 124 ns/op -15%
big_40dig 10^39+... 174 ns/op 152 ns/op -13%
big_100dig 10^99+42 236 ns/op 213 ns/op -10%
big_500dig 10^499+7 1119 ns/op 1086 ns/op -3%
big_1000dig 10^999 3490 ns/op 3459 ns/op -1%
fix_19dig 2^62-1 76 ns/op 76 ns/op 0% (unchanged path)
Wins concentrate in the 20-100 digit range where big2str_2bdigits
is the dominant cost. Above ~500 digits the Karatsuba divmod
recursion dominates and the digit-emission saving shrinks to the
noise floor. The 20-100 range is what actual Ruby code exercises
(financial high-precision sums, nanosecond timestamps, large
counters); crypto-size (1000+ digit) bignums are rare in to_s paths.
Correctness: 100k random fixnum fuzz unchanged, 500 random bignum
fuzz up to 2^256 with cross-check against sprintf("%d"), bases
2/8/16/36 round-trip, plus edge cases (0, just-above-fixnum, ±2^100,
20-digit strings near the fixnum boundary). test/ruby/test_integer.rb
stays at 38 tests / 421628 assertions / 0 failures, test_bignum.rb
passes 74 / 607 / 0 failures, full make test-all reports 34694
tests / 0 new failures (same TestArgf#test_puts pre-existing flake
as master).
* benchmark: add int_to_s yaml for Integer#to_s
Reproducible benchmark for the two preceding commits. Covers:
- 1/2/3/5/10/19-digit positive fixnums (spans the break-even point
and the two large-number wins at the top)
- A negative fixnum (exercises the minus-sign prepend path)
- 20/40/100-digit bignums (spans the big2str_2bdigits win range)
- Two string-interpolation scenarios, so reviewers can see how much
of the Integer#to_s speedup reaches real code that allocates the
result string too
Intended to be consumed by benchmark-driver against master vs
int-to-s-twodigit for A/B comparison. Matches the numbers in the
commit messages of 5bfb7e0 and c5df6de.
---------
Co-authored-by: tomoya ishida <tomoyapenguin@gmail.com>
0 commit comments