| theme | gaia |
|---|---|
| _class | lead |
| paginate | true |
| backgroundColor | |
| backgroundImage | url('https://marp.app/assets/hero-background.svg') |
| style | section.photo h1,section.photo h2,section.photo h3,section.photo h4,section.photo h5,section.photo h6 { background-color: #888; color: #FFF; } h6 { font-size: 30%; } img[alt~="centre"] { display: block; margin: 0 auto; } |
| marp | true |
- A String is just a String, right?
- A Brief History of the String
- Not all Strings are alike
- String
- Byte String
- OS String
- C Strings
let s: String = "Hi 😀!".to_owned();
dbg!(&s);
dbg!(s.len());
dbg!(s.bytes().count());
dbg!(s.chars().count());- A Vector of
u8inside - Iterates as 32-bit
char
let s: [u8; 13] = b"Hello, world!".to_owned();
dbg!(&s);
dbg!(s.len());- Iterates as octets (
u8) - A Vector of octets (
u8) inside
- Computers work in numbers
- Humans like to write words
- Words are made of characters
- Technically grapheme clusters
- Is ï one character or two?
- We need a conversion table!
- AKA: A Character Set
- Morse Code
- Telegraph / Baudot codes
- BCD
- EBCDIC
- ASA X3.4-1963
- aka ASCII
- We get 128 more characters!
- MS-DOS Code Page 437, 850, ...
- Windows Code Page 1252, 1250, ...
- Macintosh Code Page 1275, 1282, ...
Unicode is intended to address the need for a workable, reliable world text encoding. Unicode could be roughly described as "wide-body ASCII" that has been stretched to 16 bits to encompass the characters of all the world's living languages. In a properly engineered design, 16 bits per character are more than sufficient for this purpose.
- Microsoft used it in Windows
- Sun used it in Java
- Netscape used it in JavaScript
- The Standard C Library added
wcslenand friends
- Unicode Translation Format 16 (UTF-16) arrives
- Unit length != number of characters
- Not ASCII compatible
- Enter Plan 9 and UTF-8...
- Variable-length encoding
- Can encode any Unicode Scalar Value as one, two, three or four bytes.
- Unit length != number of characters
0b0xxxxxxx0b110xxxxx 0b10xxxxxx0b1110xxxx 0b10xxxxxx 0b10xxxxxx
- POSIX says file names are an array of 8-bit values
- Windows says file names are an array of 16-bit
wchar_t - :(
String/&[str]/"hi"- use this by default
Vec<u8>/&[u8]/b"hi"- use for exchanging data with 8-bit / ASCII systems
OsString/OsStr- use for exchanging data with your Operating System
CString/CStr- use for exchanging data with 8-bit C APIs
- null-terminated
- Might not be UTF-8
- https://docs.rs/widestring/
- use for exchanging data with 'wide' C APIs




