Skip to content

ctutils: add BytesCtEq and BytesCtSelect traits#1359

Merged
tarcieri merged 1 commit into
masterfrom
ctutils/bytes-traits
Jan 16, 2026
Merged

ctutils: add BytesCtEq and BytesCtSelect traits#1359
tarcieri merged 1 commit into
masterfrom
ctutils/bytes-traits

Conversation

@tarcieri

Copy link
Copy Markdown
Member

The generic implementations of CtEq and CtSelect for [T] and [T; N] are suboptimal for [u8] and [u8; N], where there are now optimized implementations in the cmov crate that perform operations on such types a word-at-a-time instead of a byte-at-a-time.

Lacking specialization, this adds some redundant sealed traits that are impl'd for [u8; N] and [u8] which provide access to the optimized implementations for these types in the cmov crate, without exposing cmov in the public API of ctutils.

The generic implementations of `CtEq` and `CtSelect` for `[T]` and
`[T; N]` are suboptimal for `[u8]` and `[u8; N]`, where there are now
optimized implementations in the `cmov` crate that perform operations on
such types a word-at-a-time instead of a byte-at-a-time.

Lacking specialization, this adds some redundant sealed traits that
are impl'd for `[u8; N]` and `[u8]` which provide access to the
optimized implementations for these types in the `cmov` crate, without
exposing `cmov` in the public API of `ctutils`.
@tarcieri tarcieri merged commit 1658a0f into master Jan 16, 2026
12 checks passed
@tarcieri tarcieri deleted the ctutils/bytes-traits branch January 16, 2026 20:23
@tarcieri tarcieri mentioned this pull request Jan 16, 2026
tarcieri added a commit that referenced this pull request Jan 16, 2026
### Added
- `BytesCtEq` and `BytesCtSelect` traits (#1359)
- `CtFind` trait (#1361)
- `CtLookup` trait (#1362)

### Changed
- Bump `cmov` crate dependency to v0.5.0-pre.0 (#1357)
tarcieri added a commit that referenced this pull request Jan 17, 2026
`cmov` now contains optimized implementations of its `Cmov` and `CmovEq`
traits for the following slice types:

- `[i8]`, `[i16]`, `[i32]`, `[i64]`, `[i128]`
- `[u8]`, `[u16]`, `[u32]`, `[u64]`, `[u128]`

These impls coalesce the inputs into word-size chunks and perform the
underlying predication/equality testing operation on words, whereas the
generic implementation in this crate previously would widen each array
element to a word and perform the operation, which is much less
efficient.

We'd previously added duplicated specialized traits for operating on
bytes which called the fast path for `[u8]` in #1359, but not only are
the redundant traits annoying but such an optimization is useful for any
integer type smaller than the word size.

This commit removes the generic trait impls of `CtAssign`, `CtEq`, and
`CtSelect` for arrays and slices, factoring them into free functions in
the `array` and `slice` modules so they're still available, just not via
a trait-based interface.

Next, the existing macros for delegating to `cmov` like
`impl_ct_assign_with_cmov!` are used to delegate to the above slice
types. The array impls have been changed to be generic around when the
underlying slice type impls the same trait.

This means we have a bunch of specialized impls of these traits for the
core integer types which are always fast, instead of a leaky generic
implementation which is potentially quite slow. It means users may need
to add their own concrete impls where the generic impl would've
previously sufficed, but they can call into the generic implementations
in `array` and `slice` for that, and if it's too annoying we can add a
macro to write the impl for them.

The impl of `CtSelect` for `[T; N]` has been changed to bound on
`T: Clone` and `[T]: CtAssign`, but should perform well, and this is
actually more consistent with the impls on `Box` and `Vec`. The old
generic implementation which avoids a clone bound by using
`core::array::from_fn` is retained in the new `array` module.

Since the impls of `CtAssign` and `CtEq` for `[T]` and `[T; N]` should
now be fast (along with the `CtSelect` impl for `[T; N]`), this removes
the `BytesCt*` traits since they're no longer needed.
tarcieri added a commit that referenced this pull request Jan 17, 2026
`cmov` now contains optimized implementations of its `Cmov` and `CmovEq`
traits for the following slice types:

- `[i8]`, `[i16]`, `[i32]`, `[i64]`, `[i128]`
- `[u8]`, `[u16]`, `[u32]`, `[u64]`, `[u128]`

These impls coalesce the inputs into word-size chunks and perform the
underlying predication/equality testing operation on words, whereas the
generic implementation in this crate previously would widen each array
element to a word and perform the operation, which is much less
efficient.

We'd previously added duplicated specialized traits for operating on
bytes which called the fast path for `[u8]` in #1359, but not only are
the redundant traits annoying but such an optimization is useful for any
integer type smaller than the word size.

This commit removes the generic trait impls of `CtAssign`, `CtEq`, and
`CtSelect` for arrays and slices, factoring them into free functions in
the `array` and `slice` modules so they're still available, just not via
a trait-based interface.

Next, the existing macros for delegating to `cmov` like
`impl_ct_assign_with_cmov!` are used to delegate to the above slice
types. The array impls have been changed to be generic around when the
underlying slice type impls the same trait.

This means we have a bunch of specialized impls of these traits for the
core integer types which are always fast, instead of a leaky generic
implementation which is potentially quite slow. It means users may need
to add their own concrete impls where the generic impl would've
previously sufficed, but they can call into the generic implementations
in `array` and `slice` for that, and if it's too annoying we can add a
macro to write the impl for them.

The impl of `CtSelect` for `[T; N]` has been changed to bound on
`T: Clone` and `[T]: CtAssign`, but should perform well, and this is
actually more consistent with the impls on `Box` and `Vec`. The old
generic implementation which avoids a clone bound by using
`core::array::from_fn` is retained in the new `array` module.

Since the impls of `CtAssign` and `CtEq` for `[T]` and `[T; N]` should
now be fast (along with the `CtSelect` impl for `[T; N]`), this removes
the `BytesCt*` traits since they're no longer needed.
tarcieri added a commit that referenced this pull request Jan 17, 2026
`cmov` now contains optimized implementations of its `Cmov` and `CmovEq`
traits for the following slice types:

- `[i8]`, `[i16]`, `[i32]`, `[i64]`, `[i128]`
- `[u8]`, `[u16]`, `[u32]`, `[u64]`, `[u128]`

These impls coalesce the inputs into word-size chunks and perform the
underlying predication/equality testing operation on words, whereas the
generic implementation in this crate previously would widen each array
element to a word and perform the operation, which is much less
efficient.

We'd previously added duplicated specialized traits for operating on
bytes which called the fast path for `[u8]` in #1359, but not only are
the redundant traits annoying but such an optimization is useful for any
integer type smaller than the word size.

This commit removes the generic trait impls of `CtAssign`, `CtEq`, and
`CtSelect` for arrays and slices, factoring them into free functions in
the `array` and `slice` modules so they're still available, just not via
a trait-based interface.

Next, the existing macros for delegating to `cmov` like
`impl_ct_assign_with_cmov!` are used to delegate to the above slice
types. The array impls have been changed to be generic around when the
underlying slice type impls the same trait.

This means we have a bunch of specialized impls of these traits for the
core integer types which are always fast, instead of a leaky generic
implementation which is potentially quite slow. It means users may need
to add their own concrete impls where the generic impl would've
previously sufficed, but they can call into the generic implementations
in `array` and `slice` for that, and if it's too annoying we can add a
macro to write the impl for them.

The impl of `CtSelect` for `[T; N]` has been changed to bound on `T:
Clone` and `[T]: CtAssign`, but should perform well, and this is
actually more consistent with the impls on `Box` and `Vec`. The old
generic implementation which avoids a clone bound by using
`core::array::from_fn` is retained in the new `array` module.

Since the impls of `CtAssign` and `CtEq` for `[T]` and `[T; N]` should
now be fast (along with the `CtSelect` impl for `[T; N]`), this removes
the `BytesCt*` traits since they're no longer needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant