Skip to content

Add *AsBigInt() methods for accurately reading integers up to 64 bits#38

Open
generalmimon wants to merge 4 commits into
masterfrom
add-bigint-methods
Open

Add *AsBigInt() methods for accurately reading integers up to 64 bits#38
generalmimon wants to merge 4 commits into
masterfrom
add-bigint-methods

Conversation

@generalmimon
Copy link
Copy Markdown
Member

@generalmimon generalmimon commented May 12, 2026

Implements the necessary methods for kaitai-io/kaitai_struct#183

These are purely additive changes, so they don't break backward compatibility (all existing methods remain unchanged).

As @Azq2 pointed out in #37 (comment), the BigInt type was added in ES2020, so it might not be available in particular on JavaScript engines that only support ES5, as discussed in #37 (comment). Calling the new *AsBigInt() methods will fail if the JavaScript engine doesn't have the required support for BigInt.

Note that in readBitsInt{Be,Le}AsBigInt(), the BigInt literals (created by appending n to the end of an integer literal, e.g. 0n) were intentionally avoided, because using them would trigger a SyntaxError in JS engines without support for ES2020 syntax, rendering the entire library unusable. That's not really worth the small convenience gain, so I've worked around this problem by using e.g. BigInt(0) instead for now.

See kaitai-io/kaitai_struct#183

The compiler will switch to using only these new `*AsBigInt()` methods
for reading 64-bit integers. After that, we will be able to deprecate
the legacy `Number` methods (`read{S,U}8{be,le}()`) and eventually
remove them.
See kaitai-io/kaitai_struct#183

The compiler will use these `*AsBigInt()` methods to read bit-sized
integers wider than 32 bits (i.e. starting with `b33`). The existing
`readBitsInt{Be,Le}()` methods use the `Number` type, which limits
bitwise operators to a width of 32 bits (hence the maximum supported
width of 32 bits for these methods).
The most notable changes concern the `readBytes(len)` method:

1. If `len < 0`, a `RangeError` is now explicitly raised with the error
   message `negative length ${len} given`. This message is consistent
   with our Ruby runtime library:
   https://github.com/kaitai-io/kaitai_struct_ruby_runtime/blob/41f9c4ee78cb3a01fa9d543c76c96897fc94d653/lib/kaitai/struct/struct.rb#L590

   Previously, a `RangeError` was being thrown by the `Uint8Array`
   constructor with the message `Invalid typed array length: ${len}`
   (V8, i.e. Chrome/Node.js) or `invalid or out-of-range index`
   (Firefox).

   By throwing a `RangeError` ourselves, we can ensure a consistent
   error message across JS engines. Additionally (which was the original
   reason why this change was included in this commit), we check that
   `len` is not a negative number *before* the conversion from `BigInt`
   to `Number`, so that we can include the exact (unrounded) integer
   value in the error message.

2. If the `len` argument is a `bigint`, we call `ensureBytesLeft(len)`
   already from `readBytes(len)`. This is explained in the code comment:
   this is done so that a potential `EOFError` will contain the original
   `bigint` value.

3. The `length |= 0;` statement was removed from the
   `mapUint8Array(length)` method. The intention was apparently to
   truncate the `length: number` argument to an integer, but using the
   bitwise OR `|` operator for this purpose silently truncates the value
   to the signed 32-bit integer range (i.e. from `-0x8000_0000` to
   `0x7fff_ffff`), which was clearly a bug.

   This meant that not even `u4` lengths worked correctly: for example
   `readBytes(0xffff_ffff)` failed with `Invalid typed array length:
   -1`, which is an incorrect and confusing error message. And any
   values beyond 32 bits could even cause incorrect successful parsing:
   for example, `readBytes(0x1_0000_0004)` was equivalent to
   `readBytes(4)`.

   To preserve the behavior of truncating numbers to integers,
   `Math.floor(len)` is now called in the `readBytes(len)` method.

---

I believe the other changes are pretty straightforward and don't require
any further explanation beyond what's already stated in the code
comments.
This duplication sucks, but it seems to be the right solution - see
https://stackoverflow.com/a/76887928

In any case, this fixes the issue where the generated `KaitaiStream.js`
file was missing documentation comments for the overloaded methods, and
where the generated `KaitaiStream.d.ts` file contained documentation
comments only for the first overload (with type `number`), while these
comments were missing for the second overloead (with type `bigint`).
Comment thread KaitaiStream.ts
* @returns The read bytes.
*/
public readBytes(len: number): Uint8Array {
public readBytes(len: number | bigint): Uint8Array {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better visibility, here's the commit message of 9fd0954, which explains several changes related to the readBytes() method:

The most notable changes concern the readBytes(len) method:

  1. If len < 0, a RangeError is now explicitly raised with the error message negative length ${len} given. This message is consistent with our Ruby runtime library: https://github.com/kaitai-io/kaitai_struct_ruby_runtime/blob/41f9c4ee78cb3a01fa9d543c76c96897fc94d653/lib/kaitai/struct/struct.rb#L590

    Previously, a RangeError was being thrown by the Uint8Array constructor with the message Invalid typed array length: ${len} (V8, i.e. Chrome/Node.js) or invalid or out-of-range index (Firefox).

    By throwing a RangeError ourselves, we can ensure a consistent error message across JS engines. Additionally (which was the original reason why this change was included in this commit), we check that len is not a negative number before the conversion from BigInt to Number, so that we can include the exact (unrounded) integer value in the error message.

  2. If the len argument is a bigint, we call ensureBytesLeft(len) already from readBytes(len). This is explained in the code comment: this is done so that a potential EOFError will contain the original bigint value.

  3. The length |= 0; statement was removed from the mapUint8Array(length) method. The intention was apparently to truncate the length: number argument to an integer, but using the bitwise OR | operator for this purpose silently truncates the value to the signed 32-bit integer range (i.e. from -0x8000_0000 to 0x7fff_ffff), which was clearly a bug.

    This meant that not even u4 lengths worked correctly: for example readBytes(0xffff_ffff) failed with Invalid typed array length: -1, which is an incorrect and confusing error message. And any values beyond 32 bits could even cause incorrect successful parsing: for example, readBytes(0x1_0000_0004) was equivalent to readBytes(4).

    To preserve the behavior of truncating numbers to integers, Math.floor(len) is now called in the readBytes(len) method.


I believe the other changes are pretty straightforward and don't require any further explanation beyond what's already stated in the code comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant