Since the max length of &[char] is 3, we could simplify the table to:
pub const CASE_FOLDING_SIMPLE: [[char; 4]; 2938] = [
['A', 'a', '\0', '\0'],
['B', 'b', '\0', '\0'],
['C', 'c', '\0', '\0'],
['D', 'd', '\0', '\0'],
['E', 'e', '\0', '\0'],
['F', 'f', '\0', '\0'],
['G', 'g', '\0', '\0'],
['H', 'h', '\0', '\0'],
['I', 'i', '\0', '\0'],
['J', 'j', '\0', '\0'],
['K', 'k', 'K', '\0'],
// other items
];
A simple test shows a drop from 220.5kb to 179.5kb on Windows:
use std::env;
mod case_folding_simple;
#[allow(unused)]
fn search_old_version<'a>(data: &'a [(char, &[char])], c: char) -> Result<&'a [char], usize> {
match data.binary_search_by_key(&c, |&(c1, _)| c1) {
Ok(i) => Ok(data[i].1),
Err(i) => Err(i)
}
}
fn search(data: &[[char; 4]], c: char) -> Result<&[char], usize> {
match data.binary_search_by_key(&c, |row| row[0]) {
Ok(i) => {
let row = &data[i];
let variants_part = &row[1..];
let actual_len = variants_part
.iter()
.position(|&x| x == '\0')
.unwrap_or(3);
Ok(&variants_part[..actual_len])
}
Err(i) => Err(i),
}
}
fn main() {
let data = case_folding_simple::CASE_FOLDING_SIMPLE;
let mut c = 'S';
let args: Vec<String> = env::args().collect();
if args.len() > 1 {
if let Some(ch) = args[1].chars().next() {
c = ch;
}
}
match search(&data, c) {
Ok(v) => println!("found: {:?}", v),
Err(_) => println!("not found.")
}
}
p.s. Some code was written with AI assistance. I'm not familiar with Rust, although the idea should be clear enough.
After another digging, this could go further if not mind modifying more code , given that chars in every mapping row
<= u16::MAX, or
> u16::MAX and 1:1 mapping
Therefore, it can be divided into two tables: [[u16; 4]; N] and [[char; 2]; M]
Since the max length of
&[char]is 3, we could simplify the table to:A simple test shows a drop from 220.5kb to 179.5kb on Windows:
p.s. Some code was written with AI assistance. I'm not familiar with Rust, although the idea should be clear enough.
After another digging, this could go further if not mind modifying more code , given that chars in every mapping row
<= u16::MAX, or> u16::MAXand 1:1 mappingTherefore, it can be divided into two tables:
[[u16; 4]; N]and[[char; 2]; M]