Skip to content

Commit 23c9db5

Browse files
committed
cmov: add asm! optimized maskgen32 for ARM32
In #1332 we ran into LLVM inserting branches in this routine for `thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with `black_box` but that seems brittle. In #1334 we attempted a simple portable `asm!` optimization barrier approach but it did not work as expected. This instead opts to implement one of the fiddliest bits, mask generation, using ARM assembly instead. The resulting assembly is actually more efficient than what rustc/LLVM outputs and avoids touching the stack pointer. It's a simple enough function to implement in assembly on other platforms with stable `asm!` too, but this is a start.
1 parent 3008a4f commit 23c9db5

2 files changed

Lines changed: 27 additions & 3 deletions

File tree

.github/workflows/cmov.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,6 @@ jobs:
135135
strategy:
136136
matrix:
137137
target:
138-
- armv7-unknown-linux-gnueabi
139138
- powerpc-unknown-linux-gnu
140139
- s390x-unknown-linux-gnu
141140
- x86_64-unknown-linux-gnu

cmov/src/portable.rs

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,15 +100,40 @@ impl CmovEq for u64 {
100100
}
101101

102102
/// Return a [`u32::MAX`] mask if `condition` is non-zero, otherwise return zero for a zero input.
103-
pub fn nzmask32(condition: Condition) -> u32 {
103+
#[cfg(not(target_arch = "arm"))]
104+
fn nzmask32(condition: Condition) -> u32 {
104105
bitnz!(condition as u32, u32::BITS).wrapping_neg()
105106
}
106107

107108
/// Return a [`u64::MAX`] mask if `condition` is non-zero, otherwise return zero for a zero input.
108-
pub fn nzmask64(condition: Condition) -> u64 {
109+
#[cfg(not(target_arch = "arm"))]
110+
fn nzmask64(condition: Condition) -> u64 {
109111
bitnz!(condition as u64, u64::BITS).wrapping_neg()
110112
}
111113

114+
/// Optimized mask generation for ARM32 targets.
115+
#[cfg(target_arch = "arm")]
116+
fn nzmask32(condition: u8) -> u32 {
117+
let mut out = condition as u32;
118+
unsafe {
119+
core::arch::asm!(
120+
"uxtb {0}, {0}", // Extend 8-bit value to 32-bit
121+
"rsbs {0}, {0}, #0", // Reverse subtract
122+
"sbcs {0}, {0}, {0}", // Subtract with carry, setting flags
123+
inout(reg) out,
124+
options(nostack, nomem),
125+
);
126+
}
127+
out
128+
}
129+
130+
/// 64-bit wrapper for targets that implement 32-bit mask generation in assembly.
131+
#[cfg(target_arch = "arm")]
132+
fn nzmask64(condition: u8) -> u64 {
133+
let mask = nzmask32(condition) as u64;
134+
mask | mask << 32
135+
}
136+
112137
#[cfg(test)]
113138
mod tests {
114139
#[test]

0 commit comments

Comments
 (0)