|
| 1 | +# AsciiHash |
| 2 | + |
| 3 | +Efficient matching of well-known short string tokens is a high-volume scenario, for example when matching RESP literals. |
| 4 | + |
| 5 | +The purpose of this generator is to efficiently interpret input tokens like `bin`, `f32`, etc - whether as byte or character data. |
| 6 | + |
| 7 | +There are multiple ways of using this tool, with the main distinction being whether you are confirming a single |
| 8 | +token, or choosing between multiple tokens (in which case an `enum` is more appropriate): |
| 9 | + |
| 10 | +## Isolated literals (part 1) |
| 11 | + |
| 12 | +When using individual tokens, a `static partial class` can be used to generate helpers: |
| 13 | + |
| 14 | +``` c# |
| 15 | +[AsciiHash] public static partial class bin { } |
| 16 | +[AsciiHash] public static partial class f32 { } |
| 17 | +``` |
| 18 | + |
| 19 | +Usually the token is inferred from the name; `[AsciiHash("real value")]` can be used if the token is not a valid identifier. |
| 20 | +Underscores are replaced with hyphens, so a field called `my_token` has the default value `"my-token"`. |
| 21 | +The generator demands *all* of `[AsciiHash] public static partial class`, and note that any *containing* types must |
| 22 | +*also* be declared `partial`. |
| 23 | + |
| 24 | +The output is of the form: |
| 25 | + |
| 26 | +``` c# |
| 27 | +static partial class bin |
| 28 | +{ |
| 29 | + public const int Length = 3; |
| 30 | + public const long HashCS = ... |
| 31 | + public const long HashUC = ... |
| 32 | + public static ReadOnlySpan<byte> U8 => @"bin"u8; |
| 33 | + public static string Text => @"bin"; |
| 34 | + public static bool IsCS(in ReadOnlySpan<byte> value, long cs) => ... |
| 35 | + public static bool IsCI(in RawResult value, long uc) => ... |
| 36 | + |
| 37 | +} |
| 38 | +``` |
| 39 | +The `CS` and `UC` are case-sensitive and case-insensitive (using upper-case) tools, respectively. |
| 40 | + |
| 41 | +(this API is strictly an internal implementation detail, and can change at any time) |
| 42 | + |
| 43 | +This generated code allows for fast, efficient, and safe matching of well-known tokens, for example: |
| 44 | + |
| 45 | +``` c# |
| 46 | +var key = ... |
| 47 | +var hash = key.HashCS(); |
| 48 | +switch (key.Length) |
| 49 | +{ |
| 50 | + case bin.Length when bin.Is(key, hash): |
| 51 | + // handle bin |
| 52 | + break; |
| 53 | + case f32.Length when f32.Is(key, hash): |
| 54 | + // handle f32 |
| 55 | + break; |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +The switch on the `Length` is optional, but recommended - these low values can often be implemented (by the compiler) |
| 60 | +as a simple jump-table, which is very fast. However, switching on the hash itself is also valid. All hash matches |
| 61 | +must also perform a sequence equality check - the `Is(value, hash)` convenience method validates both hash and equality. |
| 62 | + |
| 63 | +Note that `switch` requires `const` values, hence why we use generated *types* rather than partial-properties |
| 64 | +that emit an instance with the known values. Also, the `"..."u8` syntax emits a span which is awkward to store, but |
| 65 | +easy to return via a property. |
| 66 | + |
| 67 | +## Isolated literals (part 2) |
| 68 | + |
| 69 | +In some cases, you want to be able to say "match this value, only known at runtime". For this, note that `AsciiHash` |
| 70 | +is also a `struct` that you can create an instance of and supply to code; the best way to do this is *inside* your |
| 71 | +`partial class`: |
| 72 | + |
| 73 | +``` c# |
| 74 | +[AsciiHash] |
| 75 | +static partial class bin |
| 76 | +{ |
| 77 | + public static readonly AsciiHash Hash = new(U8); |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +Now, `bin.Hash` can be supplied to a caller that takes an `AsciiHash` instance (commonly with `in` semantics), |
| 82 | +which then has *instance* methods for case-sensitive and case-insensitive matching; the instance already knows |
| 83 | +the target hash and payload values. |
| 84 | + |
| 85 | +The `AsciiHash` returned implements `IEquatable<AsciiHash>` implementing case-sensitive equality; there are |
| 86 | +also independent case-sensitive and case-insensitive comparers available via the static |
| 87 | +`CaseSensitiveEqualityComparer` and `CaseInsensitiveEqualityComparer` properties respectively. |
| 88 | + |
| 89 | +Comparison values can be constructed on the fly on top of transient buffers using the constructors **that take |
| 90 | +arrays**. Note that the other constructors may allocate on a per-usage basis. |
| 91 | + |
| 92 | +## Enum parsing (part 1) |
| 93 | + |
| 94 | +When identifying multiple values, an `enum` may be more convenient. Consider: |
| 95 | + |
| 96 | +``` c# |
| 97 | +[AsciiHash] |
| 98 | +public static partial bool TryParse(ReadOnlySpan<byte> value, out SomeEnum value); |
| 99 | +``` |
| 100 | + |
| 101 | +This generates an efficient parser; inputs can be common `byte` or `char` types. Case sensitivity |
| 102 | +is controlled by the optional `CaseSensitive` property on the attribute, or via a 3rd (`bool`) parameter |
| 103 | +bbon the method, i.e. |
| 104 | + |
| 105 | +``` c# |
| 106 | +[AsciiHash(CaseSensitive = false)] |
| 107 | +public static partial bool TryParse(ReadOnlySpan<byte> value, out SomeEnum value); |
| 108 | +``` |
| 109 | + |
| 110 | +or |
| 111 | + |
| 112 | +``` c# |
| 113 | +[AsciiHash] |
| 114 | +public static partial bool TryParse(ReadOnlySpan<byte> value, out SomeEnum value, bool caseSensitive = true); |
| 115 | +``` |
| 116 | + |
| 117 | +Individual enum members can also be marked with `[AsciiHash("token value")]` to override the token payload. If |
| 118 | +an enum member declares an empty explicit value (i.e. `[AsciiHash("")]`), then that member is ignored by the |
| 119 | +tool; this is useful for marking "unknown" or "invalid" enum values (commonly the first enum, which by |
| 120 | +convention has the value `0`): |
| 121 | + |
| 122 | +``` c# |
| 123 | +public enum SomeEnum |
| 124 | +{ |
| 125 | + [AsciiHash("")] |
| 126 | + Unknown, |
| 127 | + SomeRealValue, |
| 128 | + [AsciiHash("another-real-value")] |
| 129 | + AnotherRealValue, |
| 130 | + // ... |
| 131 | +} |
| 132 | +``` |
| 133 | + |
| 134 | +## Enum parsing (part 2) |
| 135 | + |
| 136 | +The tool has an *additional* facility when it comes to enums; you generally don't want to have to hard-code |
| 137 | +things like buffer-lengths into your code, but when parsing an enum, you need to know how many bytes to read. |
| 138 | + |
| 139 | +The tool can generate a `static partial class` that contains the maximum length of any token in the enum, as well |
| 140 | +as the maximum length of any token in bytes (when encoded as UTF-8). For example: |
| 141 | + |
| 142 | +``` c# |
| 143 | +[AsciiHash("SomeTypeName")] |
| 144 | +public enum SomeEnum |
| 145 | +{ |
| 146 | + // ... |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +This generates a class like the following: |
| 151 | + |
| 152 | +``` c# |
| 153 | +static partial class SomeTypeName |
| 154 | +{ |
| 155 | + public const int EnumCount = 48; |
| 156 | + public const int MaxChars = 11; |
| 157 | + public const int MaxBytes = 11; // as UTF8 |
| 158 | + public const int BufferBytes = 16; |
| 159 | +} |
| 160 | +``` |
| 161 | + |
| 162 | +The last of these is probably the most useful - it allows an additional byte (to rule out false-positives), |
| 163 | +and rounds up to word-sizes, allowing for convenient stack-allocation - for example: |
| 164 | + |
| 165 | +``` c# |
| 166 | +var span = reader.TryGetSpan(out var tmp) ? tmp : reader.Buffer(stackalloc byte[SomeTypeName.BufferBytes]); |
| 167 | +if (TryParse(span, out var value)) |
| 168 | +{ |
| 169 | + // got a value |
| 170 | +} |
| 171 | +``` |
| 172 | + |
| 173 | +which allows for very efficient parsing of well-known tokens. |
0 commit comments