Skip to content

Commit 3855956

Browse files
committed
cmov: add asm! optimized maskgen32 for ARM32
In #1332 we ran into LLVM inserting branches in this routine for `thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with `black_box` but that seems brittle. In #1334 we attempted a simple portable `asm!` optimization barrier approach but it did not work as expected. This instead opts to implement one of the fiddliest bits, mask generation, using ARM assembly instead. The resulting assembly is actually more efficient than what rustc/LLVM outputs and avoids touching the stack pointer. It's a simple enough function to implement in assembly on other platforms with stable `asm!` too, but this is a start.
1 parent 3008a4f commit 3855956

1 file changed

Lines changed: 27 additions & 2 deletions

File tree

cmov/src/portable.rs

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,15 +100,40 @@ impl CmovEq for u64 {
100100
}
101101

102102
/// Return a [`u32::MAX`] mask if `condition` is non-zero, otherwise return zero for a zero input.
103-
pub fn nzmask32(condition: Condition) -> u32 {
103+
#[cfg(not(target_arch = "arm"))]
104+
fn nzmask32(condition: Condition) -> u32 {
104105
bitnz!(condition as u32, u32::BITS).wrapping_neg()
105106
}
106107

107108
/// Return a [`u64::MAX`] mask if `condition` is non-zero, otherwise return zero for a zero input.
108-
pub fn nzmask64(condition: Condition) -> u64 {
109+
#[cfg(not(target_arch = "arm"))]
110+
fn nzmask64(condition: Condition) -> u64 {
109111
bitnz!(condition as u64, u64::BITS).wrapping_neg()
110112
}
111113

114+
/// Optimized mask generation for ARM32 targets.
115+
#[cfg(target_arch = "arm")]
116+
fn nzmask32(condition: u8) -> u32 {
117+
let mut out = condition as u32;
118+
unsafe {
119+
core::arch::asm!(
120+
"uxtb {0}, {0}", // Extend 8-bit value to 32-bit
121+
"rsbs {0}, {0}, #0", // Reverse subtract
122+
"sbcs {0}, {0}, {0}", // Subtract with carry, setting flags
123+
inout(reg) out,
124+
options(nostack, nomem),
125+
);
126+
}
127+
out
128+
}
129+
130+
/// 64-bit wrapper for targets that implement 32-bit mask generation in assembly.
131+
#[cfg(target_arch = "arm")]
132+
fn nzmask64(condition: u8) -> u64 {
133+
let mask = nzmask32(condition) as u64;
134+
mask | mask << 32
135+
}
136+
112137
#[cfg(test)]
113138
mod tests {
114139
#[test]

0 commit comments

Comments
 (0)