Skip to content

feat(perf): Add fast path for decodeLatin1()#1037

Merged
boorad merged 6 commits into
margelo:mainfrom
wh201906:wh201906/fast-decode-latin1
May 12, 2026
Merged

feat(perf): Add fast path for decodeLatin1()#1037
boorad merged 6 commits into
margelo:mainfrom
wh201906:wh201906/fast-decode-latin1

Conversation

@wh201906
Copy link
Copy Markdown
Contributor

@wh201906 wh201906 commented May 8, 2026

This PR adds the fast path for decodeLatin1() when Hermes and getStringData() are available.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 8, 2026

@wh201906 is attempting to deploy a commit to the Margelo Team on Vercel.

A member of the Team first needs to authorize it.

@wh201906 wh201906 changed the title feat: Add fast path for decodeLatin1() feat(perf): Add fast path for decodeLatin1() May 8, 2026
@wh201906
Copy link
Copy Markdown
Contributor Author

wh201906 commented May 8, 2026

Test cases from Node.js v24.15.0.

Try to create 0-length buffers. Should not throw.

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Try to create 0-length buffers. Should not throw.',
  () => {
    const encodings = ['ascii', 'latin1', 'binary'] as const;

    for (const encoding of encodings) {
      const ab = stringToBuffer('', encoding);
      expect(ab.byteLength).to.equal(0);
      expect(bufferToString(ab, encoding)).to.equal('');
    }
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

// Try to create 0-length buffers. Should not throw.
Buffer.from('');
Buffer.from('', 'ascii');
Buffer.from('', 'latin1');
new Buffer('', 'binary');
Buffer.from('foo', encoding).toString(encoding) returns 'foo'.

Current encoding_tests.ts:

test(
  SUITE,
  "[Node.js] Buffer.from('foo', encoding).toString(encoding) returns 'foo'.",
  () => {
    const encodings = ['utf8', 'utf16le', 'ascii', 'latin1', 'binary'] as const;

    for (const encoding of encodings) {
      const ab = stringToBuffer('foo', encoding);
      expect(bufferToString(ab, encoding)).to.equal('foo');
    }
  },
);

Original Node.js (test/parallel/test-buffer-tostring.js):

// utf8, ucs2, ascii, latin1, utf16le
for (const encoding of [
  'utf8',
  'utf-8',
  'ucs2',
  'ucs-2',
  'ascii',
  'latin1',
  'binary',
  'utf16le',
  'utf-16le',
].flatMap(e => [e, e.toUpperCase()])) {
  assert.strictEqual(Buffer.from('foo', encoding).toString(encoding), 'foo');
}
Data "Hello, ÆÊÎÖÿ".

Current encoding_tests.ts:

test(SUITE, '[Node.js] Data "Hello, ÆÊÎÖÿ".', () => {
  const str = 'Hello, ÆÊÎÖÿ';
  const expected = new Uint8Array([
    ...Array.from('Hello, ', c => c.charCodeAt(0)),
    0xc6,
    0xca,
    0xce,
    0xd6,
    0xff,
  ]);
  const ab = stringToBuffer(str, 'latin1');

  expect(toU8(ab)).to.deep.equal(expected);
  expect(bufferToString(expected.buffer as ArrayBuffer, 'latin1')).to.equal(
    str,
  );
});

Original Node.js (test/cctest/test_string_bytes.cc):

// Data "Hello, ÆÊÎÖÿ"
static const char latin1_data[] = "Hello, \xC6\xCA\xCE\xD6\xFF";
static const char utf8_data[] = "Hello, ÆÊÎÖÿ";
Verify that StringBytes::Write converts two-byte characters to one-byte characters, even if there is no valid one-byte representation.

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Verify that StringBytes::Write converts two-byte characters to one-byte characters, even if there is no valid one-byte representation.',
  () => {
    const expected = new Uint8Array([
      ...Array.from('Hello, ', c => c.charCodeAt(0)),
      0x16,
      0x4c,
    ]);
    const ab = stringToBuffer('Hello, 世界', 'latin1');

    expect(toU8(ab)).to.deep.equal(expected);
    expect(bufferToString(ab, 'latin1')).to.equal(
      String.fromCharCode(...expected),
    );
  },
);

Original Node.js (test/cctest/test_string_bytes.cc):

// Verify that StringBytes::Write converts two-byte characters to one-byte
// characters, even if there is no valid one-byte representation.
Local<String> utf8_str =
    String::NewFromUtf8(isolate_, "Hello, 世界").ToLocalChecked();
ASSERT_STREQ("Hello, \x16\x4C", buf.out());
Manually controlled string for checking binary output.

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Manually controlled string for checking binary output',
  () => {
    const ucs2Control = 'a\u0000';
    const writeStr = 'a';
    const bytes = toU8(stringToBuffer(writeStr, 'utf16le'));

    expect(bytes[0]).to.equal(0x61);
    expect(bytes[1]).to.equal(0);
    expect(bufferToString(bytes.buffer as ArrayBuffer, 'latin1')).to.equal(
      ucs2Control,
    );
    expect(bufferToString(bytes.buffer as ArrayBuffer, 'binary')).to.equal(
      ucs2Control,
    );
  },
);

Original Node.js (test/parallel/test-stringbytes-external.js):

// Manually controlled string for checking binary output
let ucs2_control = 'a\u0000';
let write_str = 'a';

// first check latin1
let c = b.toString('latin1');
// now check binary
c = b.toString('binary');

Correspondence: Node creates b with Buffer.from(write_str, 'ucs2'), where write_str is 'a', so the bytes are [0x61, 0x00]. RNQC does not expose ucs2, so the equivalent non-alias encoding is utf16le; stringToBuffer(writeStr, 'utf16le') produces the same bytes. The assertions then match Node directly: byte 0 is 0x61, byte 1 is 0, and both latin1 and binary toString output equal ucs2_control.

ASCII slice test.

Current encoding_tests.ts:

test(SUITE, '[Node.js] ASCII slice test', () => {
  {
    const asciiString = 'hello world';
    const bytes = new Uint8Array(128);

    for (let i = 0; i < asciiString.length; i++) {
      bytes[i] = asciiString.charCodeAt(i);
    }
    const asciiSlice = bufferToString(
      bytes.buffer as ArrayBuffer,
      'ascii',
      0,
      asciiString.length,
    );

    expect(asciiSlice).to.equal(asciiString);
  }

  {
    const asciiString = 'hello world';
    const offset = 100;
    const bytes = new Uint8Array(128);

    bytes.set(toU8(stringToBuffer(asciiString, 'ascii')), offset);
    const asciiSlice = bufferToString(
      bytes.buffer as ArrayBuffer,
      'ascii',
      offset,
      offset + asciiString.length,
    );

    expect(asciiSlice).to.equal(asciiString);
  }
});

Original Node.js (test/parallel/test-buffer-alloc.js):

// ASCII slice test
{
  const asciiString = 'hello world';

  for (let i = 0; i < asciiString.length; i++) {
    b[i] = asciiString.charCodeAt(i);
  }
  const asciiSlice = b.toString('ascii', 0, asciiString.length);
  assert.strictEqual(asciiString, asciiSlice);
}

{
  const asciiString = 'hello world';
  const offset = 100;

  assert.strictEqual(asciiString.length, b.write(asciiString, offset, 'ascii'));
  const asciiSlice = b.toString('ascii', offset, offset + asciiString.length);
  assert.strictEqual(asciiString, asciiSlice);
}

Correspondence: the first RNQC block writes asciiString.charCodeAt(i) into a byte array and reads toString('ascii', 0, asciiString.length), matching Node's first block. The second RNQC block uses stringToBuffer(asciiString, 'ascii') to model Node's b.write(asciiString, offset, 'ascii'), then reads the same offset range.

@wh201906
Copy link
Copy Markdown
Contributor Author

wh201906 commented May 8, 2026

The new implementation can be 10x faster than the old one, tested on 1MB data

06ebc8c (old) ba70dd3 (new)
old new

Copy link
Copy Markdown
Collaborator

@boorad boorad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice perf win — reusing the decodeUtf16Le chunk-callback pattern keeps this very readable, and the Node.js-derived tests ('Hello, 世界' truncation, manually-controlled binary output, ASCII slice with offset) directly exercise the new path. A few small things below — mostly nits, plus one question about why the Hermes-only gate differs from decodeUtf16Le's unconditional use of getStringData.

Verified bun tsc passes on both packages.

Comment thread packages/react-native-quick-crypto/cpp/utils/HybridUtils.hpp Outdated
Comment thread packages/react-native-quick-crypto/cpp/utils/HybridUtils.cpp
Comment thread packages/react-native-quick-crypto/cpp/utils/HybridUtils.cpp
Comment thread packages/react-native-quick-crypto/cpp/utils/HybridUtils.cpp Outdated
wh201906 added 3 commits May 12, 2026 21:31
Avoid zero-filling for pure ASCII path
Allocate less memory than single resize() (might work)
Unify the private member name to xxx_
Remove this pointer
Add comment
@boorad boorad merged commit 67b3332 into margelo:main May 12, 2026
1 check failed
@wh201906 wh201906 deleted the wh201906/fast-decode-latin1 branch May 12, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants