Skip to content

Commit 4a0072d

Browse files
khasinskitompng
andauthored
Speed up Integer#to_s with a two digit lookup table (ruby#16719)
* numeric: emit two decimal digits per iteration in rb_fix2str Replace the digit-at-a-time loop in rb_fix2str with the standard itoa 2-digit lookup table for base 10. Each iteration now writes two digits using a single (u % 100, u / 100) pair, so the number of loop iterations is halved for multi-digit integers. The classic per-digit loop is kept for non-base-10 conversion. Benchmark (Apple M-series, 5M-10M ops, best of 3 runs): case base patch delta --------- ----- ----- ----- 1-digit (5) 64 ns/op 64 ns/op -0% 2-digit (42) 64 ns/op 65 ns/op +2% (noise) 3-digit (400) 66 ns/op 64 ns/op -3% 5-digit (12345) 69 ns/op 67 ns/op -3% 10-digit (1234567890) 77 ns/op 67 ns/op -13% 19-digit (2^62-1) 111 ns/op 75 ns/op -33% The crossover is at ~3 digits: below that the constant setup dominates and the benefit is within noise, above that the halved iteration count shows up linearly. Typical Rails payloads mix short IDs (1-5 digits) and longer values (timestamps, nanos, large counts), so the win is workload-dependent but strictly non-negative for real code. Correctness: 100k random fuzz across the full fixnum range plus targeted edges (0, ±1, ±99, ±100, 2^30-1, 2^62-1, etc.) all pass. make test-all shows 34694 tests, 7325860 assertions, 0 new failures (same pre-existing TestArgf#test_puts flake as on master) — test_integer.rb alone runs 38 tests / 421628 assertions of which Integer#to_s exercises the bulk, all pass. The 200-byte lookup table sits in .rodata and fits in a single cache line of its own (3 lines for the whole table). No change to public API, no change to bignum conversion, no change to non-base-10 conversion paths. * bignum: emit two decimal digits per iteration in big2str_2bdigits Extend the 2-digit lookup-table itoa optimisation from rb_fix2str to the inner conversion loop used by Bignum#to_s. big2str_2bdigits has two code paths — a leading-chunk path that emits variable-length digits, and a recursive-chunk path that emits a fixed-width zero- padded block — and both gain from the halved division count. The classic per-digit loop is preserved for non-base-10 conversion. Moves the ruby_decimal_digit_pairs table from a file-static in numeric.c to bignum.c next to ruby_digitmap, and exposes it through internal/bignum.h so both files share the same 200-byte .rodata instance. Benchmark (Apple M-series, best of 3 runs, measures bignum-only speedup against the preceding fixnum commit): case base patch delta --------- ----- ----- ----- big_20dig 10^19+... 146 ns/op 124 ns/op -15% big_40dig 10^39+... 174 ns/op 152 ns/op -13% big_100dig 10^99+42 236 ns/op 213 ns/op -10% big_500dig 10^499+7 1119 ns/op 1086 ns/op -3% big_1000dig 10^999 3490 ns/op 3459 ns/op -1% fix_19dig 2^62-1 76 ns/op 76 ns/op 0% (unchanged path) Wins concentrate in the 20-100 digit range where big2str_2bdigits is the dominant cost. Above ~500 digits the Karatsuba divmod recursion dominates and the digit-emission saving shrinks to the noise floor. The 20-100 range is what actual Ruby code exercises (financial high-precision sums, nanosecond timestamps, large counters); crypto-size (1000+ digit) bignums are rare in to_s paths. Correctness: 100k random fixnum fuzz unchanged, 500 random bignum fuzz up to 2^256 with cross-check against sprintf("%d"), bases 2/8/16/36 round-trip, plus edge cases (0, just-above-fixnum, ±2^100, 20-digit strings near the fixnum boundary). test/ruby/test_integer.rb stays at 38 tests / 421628 assertions / 0 failures, test_bignum.rb passes 74 / 607 / 0 failures, full make test-all reports 34694 tests / 0 new failures (same TestArgf#test_puts pre-existing flake as master). * benchmark: add int_to_s yaml for Integer#to_s Reproducible benchmark for the two preceding commits. Covers: - 1/2/3/5/10/19-digit positive fixnums (spans the break-even point and the two large-number wins at the top) - A negative fixnum (exercises the minus-sign prepend path) - 20/40/100-digit bignums (spans the big2str_2bdigits win range) - Two string-interpolation scenarios, so reviewers can see how much of the Integer#to_s speedup reaches real code that allocates the result string too Intended to be consumed by benchmark-driver against master vs int-to-s-twodigit for A/B comparison. Matches the numbers in the commit messages of 5bfb7e0 and c5df6de. --------- Co-authored-by: tomoya ishida <tomoyapenguin@gmail.com>
1 parent 33744d2 commit 4a0072d

4 files changed

Lines changed: 135 additions & 13 deletions

File tree

benchmark/int_to_s.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
prelude: |
2+
# frozen_string_literal: true
3+
N1 = 5
4+
N2 = 42
5+
N3 = 400
6+
N5 = 12345
7+
N10 = 1_234_567_890
8+
N19 = 4_611_686_018_427_387_903
9+
NEG = -1_234_567_890
10+
BIG20 = 10 ** 19 + 12_345_678_901_234_567
11+
BIG40 = 10 ** 39 + 123_456_789_012_345
12+
BIG100 = 10 ** 99 + 42
13+
benchmark:
14+
fix_1digit: "N1.to_s"
15+
fix_2digit: "N2.to_s"
16+
fix_3digit: "N3.to_s"
17+
fix_5digit: "N5.to_s"
18+
fix_10digit: "N10.to_s"
19+
fix_19digit: "N19.to_s"
20+
fix_negative: "NEG.to_s"
21+
big_20digit: "BIG20.to_s"
22+
big_40digit: "BIG40.to_s"
23+
big_100digit: "BIG100.to_s"
24+
interp_id: '"id=#{N10}"'
25+
interp_mixed: '"a=#{N2},b=#{N5},c=#{N10}"'

bignum.c

Lines changed: 76 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,21 @@ static const bool debug_integer_pack = (
6464

6565
const char ruby_digitmap[] = "0123456789abcdefghijklmnopqrstuvwxyz";
6666

67+
/* Two-digit decimal lookup table. Offset 2*n holds the ASCII pair for
68+
* n in the range 0..99. Used by both rb_fix2str in numeric.c and
69+
* big2str_2bdigits below to emit two base-10 digits per iteration. */
70+
const char ruby_decimal_digit_pairs[201] =
71+
"00010203040506070809"
72+
"10111213141516171819"
73+
"20212223242526272829"
74+
"30313233343536373839"
75+
"40414243444546474849"
76+
"50515253545556575859"
77+
"60616263646566676869"
78+
"70717273747576777879"
79+
"80818283848586878889"
80+
"90919293949596979899";
81+
6782
#ifndef SIZEOF_BDIGIT_DBL
6883
# if SIZEOF_INT*2 <= SIZEOF_LONG_LONG
6984
# define SIZEOF_BDIGIT_DBL SIZEOF_LONG_LONG
@@ -4811,23 +4826,74 @@ big2str_2bdigits(struct big2str_struct *b2s, BDIGIT *xds, size_t xn, size_t tail
48114826
return;
48124827
p = buf;
48134828
j = sizeof(buf);
4814-
do {
4815-
BDIGIT_DBL idx = num % b2s->base;
4816-
num /= b2s->base;
4817-
p[--j] = ruby_digitmap[idx];
4818-
} while (num);
4829+
if (b2s->base == 10) {
4830+
/* Emit two decimal digits per iteration from ruby_decimal_digit_pairs.
4831+
* See the comment on the table in bignum.c near ruby_digitmap. */
4832+
while (num >= 100) {
4833+
BDIGIT_DBL idx = (num % 100) * 2;
4834+
num /= 100;
4835+
j -= 2;
4836+
p[j] = ruby_decimal_digit_pairs[idx];
4837+
p[j + 1] = ruby_decimal_digit_pairs[idx + 1];
4838+
}
4839+
if (num >= 10) {
4840+
BDIGIT_DBL idx = num * 2;
4841+
j -= 2;
4842+
p[j] = ruby_decimal_digit_pairs[idx];
4843+
p[j + 1] = ruby_decimal_digit_pairs[idx + 1];
4844+
}
4845+
else {
4846+
/* num is 1..9 here (0 was handled above) */
4847+
p[--j] = (char)('0' + num);
4848+
}
4849+
}
4850+
else {
4851+
do {
4852+
BDIGIT_DBL idx = num % b2s->base;
4853+
num /= b2s->base;
4854+
p[--j] = ruby_digitmap[idx];
4855+
} while (num);
4856+
}
48194857
len = sizeof(buf) - j;
48204858
big2str_alloc(b2s, len + taillen);
48214859
MEMCPY(b2s->ptr, buf + j, char, len);
48224860
}
48234861
else {
48244862
p = b2s->ptr;
48254863
j = b2s->hbase2_numdigits;
4826-
do {
4827-
BDIGIT_DBL idx = num % b2s->base;
4828-
num /= b2s->base;
4829-
p[--j] = ruby_digitmap[idx];
4830-
} while (j);
4864+
if (b2s->base == 10) {
4865+
/* Non-beginning chunks must emit EXACTLY hbase2_numdigits,
4866+
* zero-padded on the left. Consume num in 2-digit groups,
4867+
* handle the odd trailing digit, then memset remaining
4868+
* positions with '0'. */
4869+
while (num >= 100) {
4870+
BDIGIT_DBL idx = (num % 100) * 2;
4871+
num /= 100;
4872+
j -= 2;
4873+
p[j] = ruby_decimal_digit_pairs[idx];
4874+
p[j + 1] = ruby_decimal_digit_pairs[idx + 1];
4875+
}
4876+
if (num >= 10) {
4877+
BDIGIT_DBL idx = num * 2;
4878+
j -= 2;
4879+
p[j] = ruby_decimal_digit_pairs[idx];
4880+
p[j + 1] = ruby_decimal_digit_pairs[idx + 1];
4881+
}
4882+
else if (num > 0) {
4883+
p[--j] = (char)('0' + num);
4884+
}
4885+
if (j > 0) {
4886+
memset(p, '0', j);
4887+
j = 0;
4888+
}
4889+
}
4890+
else {
4891+
do {
4892+
BDIGIT_DBL idx = num % b2s->base;
4893+
num /= b2s->base;
4894+
p[--j] = ruby_digitmap[idx];
4895+
} while (j);
4896+
}
48314897
len = b2s->hbase2_numdigits;
48324898
}
48334899
b2s->ptr += len;

internal/bignum.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ struct RBignum {
107107

108108
/* bignum.c */
109109
extern const char ruby_digitmap[];
110+
extern const char ruby_decimal_digit_pairs[];
110111
double rb_big_fdiv_double(VALUE x, VALUE y);
111112
VALUE rb_big_uminus(VALUE x);
112113
VALUE rb_big_hash(VALUE);

numeric.c

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4040,6 +4040,11 @@ rb_int_uminus(VALUE num)
40404040
}
40414041
}
40424042

4043+
/* ruby_decimal_digit_pairs is defined in bignum.c and declared in
4044+
* internal/bignum.h. See there for the rationale of the 2-digit
4045+
* lookup-table itoa optimisation; both rb_fix2str here and big2str_2bdigits
4046+
* in bignum.c consume it. */
4047+
40434048
VALUE
40444049
rb_fix2str(VALUE x, int base)
40454050
{
@@ -4072,9 +4077,34 @@ rb_fix2str(VALUE x, int base)
40724077
else {
40734078
u = val;
40744079
}
4075-
do {
4076-
*--b = ruby_digitmap[(int)(u % base)];
4077-
} while (u /= base);
4080+
if (base == 10) {
4081+
/* Emit two digits per iteration from a precomputed table. The
4082+
* compiler lowers `u % 100` and `u / 100` to a single multiply +
4083+
* shift, so each iteration costs roughly one multiply, one shift,
4084+
* and two stores. About 2x fewer iterations than the classic
4085+
* per-digit loop for multi-digit inputs. */
4086+
while (u >= 100) {
4087+
unsigned long idx = (u % 100) * 2;
4088+
u /= 100;
4089+
b -= 2;
4090+
b[0] = ruby_decimal_digit_pairs[idx];
4091+
b[1] = ruby_decimal_digit_pairs[idx + 1];
4092+
}
4093+
if (u >= 10) {
4094+
unsigned long idx = u * 2;
4095+
b -= 2;
4096+
b[0] = ruby_decimal_digit_pairs[idx];
4097+
b[1] = ruby_decimal_digit_pairs[idx + 1];
4098+
}
4099+
else {
4100+
*--b = (char)('0' + u);
4101+
}
4102+
}
4103+
else {
4104+
do {
4105+
*--b = ruby_digitmap[(int)(u % base)];
4106+
} while (u /= base);
4107+
}
40784108
if (neg) {
40794109
*--b = '-';
40804110
}

0 commit comments

Comments
 (0)