Feature/csharp runtime perf optimizations#4938
Conversation
- OptimizedToken: deferred text materialization + ValueTuple source (opt-in) - OptimizedTokenFactory: creates OptimizedToken instances (opt-in) - SpanInputStream: string-backed ICharStream, no char[] copy (opt-in) - ValueStringBuilder: stack-allocated string building in BufferedTokenStream.GetText - ObjectPool: reuse ATNConfigSet, HashSet<ATNConfig>, MergeCache in ParserATNSimulator - ArrayPool: rent/return buffers in UnbufferedCharStream growth - ATNDeserializer: ReadOnlySpan<int> overload for netstandard2.1+/net8.0+ - Multi-target: netstandard2.0 + netstandard2.1 + net8.0 (net45 preserved on Windows) All existing public API is unchanged. New types are additive and opt-in. Signed-off-by: Harry Cordewener <admin@twilightdays.org>
- Seal + singleton ConfigEqualityComparator/ObjectEqualityComparator (eliminates per-HashSet allocation, enables devirtualization) - Fix FilterPrecedencePredicates: add missing return, single-pass rewrite (was a silent bug + 2x LINQ enumeration) - ArrayList.Equals: index-based loop replaces enumerator allocation - IntervalSet.intervals: IList<Interval> → List<Interval> (enables struct enumerator, bounds-check elimination) - HashCode.Combine on net8.0+ for ATNConfig, LexerATNConfig, SemanticContext (xxHash64-based, better distribution, fewer instructions) - Cache typeof().GetHashCode() in AND/OR SemanticContext - BufferedTokenStream: capacity hints (100→1024 initial, sized sublists) - Seal LexerATNConfig, ArrayPredictionContext (devirt + smaller vtables) - Remove redundant Tuple allocation in Lexer.SetInputStream - Remove 'is IWritableToken' type check in BufferedTokenStream.Fetch() (all tokens implement IWritableToken; avoids interface dispatch) Signed-off-by: Harry Cordewener <admin@twilightdays.org>
…ry API - Backing store changed from string to char[] for direct Span access - New ctor: SpanInputStream(char[], int) for zero-copy scenarios - AsSpan() / AsMemory() expose zero-copy slicing to callers - net8.0+: bypasses BaseInputCharStream entirely, implements ICharStream directly with bounds-elided LA() via Unsafe.Add + MemoryMarshal - net8.0+ Seek() is O(1) unconditionally (no Consume loop) - netstandard2.0/2.1: still inherits BaseInputCharStream for compat Signed-off-by: Harry Cordewener <admin@twilightdays.org>
- InputStreamBenchmarks: AntlrInputStream vs SpanInputStream (construct, LA, seek, GetText, lookback) - TokenBenchmarks: CommonTokenFactory vs OptimizedTokenFactory (create, GetText) - ATNConfigSetBenchmarks: Add/Hash/Equals with sealed+singleton comparators - IntervalSetBenchmarks: Contains, Or, Complement with List<Interval> backing - ArrayListBenchmarks: Equals with index-based loop Signed-off-by: Harry Cordewener <admin@twilightdays.org>
…putStream
- CharSpanInputStream: char[] backed, zero-copy for char[]/stream/reader ctors
- StringSpanInputStream: string backed, zero-copy ctor from string (no ToCharArray)
- ToString() returns original string reference (zero alloc)
- GetText() uses Substring() (no char[] intermediary)
- Same ICharStream / BaseInputCharStream split on #if NET8_0_OR_GREATER
- Unicode: BMP (U+0000-FFFF), same as AntlrInputStream — surrogate pairs
counted as two chars; use CodePointCharStream for supplementary code points
- Tests: 149/149 passing (63 new StringSpanInputStream tests)
- Benchmarks: InputStreamBenchmarks updated with StringSpanInputStream column
Signed-off-by: Harry Cordewener <admin@twilightdays.org>
CHANGES.txt is no longer being used Signed-off-by: Harry Cordewener <admin@twilightdays.org>
Update CSharp.stg code generator template and XPathLexer.cs to emit RVA-backed ReadOnlySpan<int> for the serialized ATN data on net8.0+, falling back to int[] on older targets via #if preprocessor directives. This eliminates a permanent Gen2 managed heap allocation for every generated parser and lexer. The data lives in the PE image's read-only .text section instead, mapped by the OS loader with zero GC pressure. The SerializedAtn property override uses .ToArray() on net8.0+ since the base class returns int[]. Signed-off-by: Harry Cordewener <admin@twilightdays.org>
The runner builds the runtime DLL without -f, so multi-TFM projects output the highest TFM (net8.0). The generated test project template must match or CS1705 fires on all ~359 grammar regression tests. Signed-off-by: Harry Cordewener <admin@twilightdays.org>
| if (!collection.OfType<PrecedencePredicate>().Any()) | ||
| Collections.EmptyList<PrecedencePredicate>(); | ||
|
|
||
| List<PrecedencePredicate> result = collection.OfType<PrecedencePredicate>().ToList(); | ||
| List<PrecedencePredicate> result = null; | ||
| foreach (var item in collection) | ||
| { | ||
| if (item is PrecedencePredicate pp) | ||
| (result ??= new List<PrecedencePredicate>()).Add(pp); | ||
| } | ||
| if (result == null) return Collections.EmptyList<PrecedencePredicate>(); |
There was a problem hiding this comment.
Replaces two LINQ passes + a missing return bug with a single foreach.
The original code called .OfType<PrecedencePredicate>().Any() then .OfType<PrecedencePredicate>().ToList() — two full linear scans of the collection, each constructing a new LINQ enumerator state machine on the heap. The early-exit was also missing its return keyword, so the short-circuit never fired: execution always fell through to the .ToList() call, allocated an empty List<PrecedencePredicate>, passed it to ExceptWith (a no-op), and returned it. Silently harmless, but wrong.
This replacement makes one pass. result is only allocated if at least one PrecedencePredicate is found — the common case (most grammars don't use precedence predicates) exits with zero heap allocations beyond the loop itself. The missing return bug is fixed as a side effect: a null result correctly returns the empty singleton.
| /// <c>int[]</c> on the heap. On net8.0+ the generated parser can expose | ||
| /// the serialized ATN as a <c>ReadOnlySpan<int></c> backed by RVA static data. | ||
| /// </summary> | ||
| public virtual ATN Deserialize(ReadOnlySpan<int> spanData) |
There was a problem hiding this comment.
ATNDeserializer.Deserialize(ReadOnlySpan<int>) — forward-looking RVA static data support
Background
Every ANTLR4-generated parser emits a static readonly int[] field containing the serialized ATN — the state machine that drives all parsing decisions. For large grammars this array can be hundreds of kilobytes. Because it's a managed int[] it is allocated on the GC heap at class load time, promoted to Gen2, and pinned there for the lifetime of the process. It is never mutated, never collected, and contributes to GC root scanning overhead on every collection.
What net8.0+ enables
C# 12 / .NET 8 can emit ReadOnlySpan<T> properties backed by RVA static data — the compiler places the data in the PE image's read-only .text section rather than allocating a managed array:
// Today — managed heap, lives forever in Gen2
public static readonly int[] serializedATN = { 4, 1, 23, 456, ... };
// With RVA static data — no managed allocation, lives in mapped PE memory
private static ReadOnlySpan<int> SerializedATN => new int[] { 4, 1, 23, 456, ... };The data is memory-mapped from the assembly by the OS loader. Zero GC pressure, zero Gen2 promotion, and the OS can page it out under memory pressure since it's backed by the file on disk. See Konrad Kokosa's write-up on RVA static fields and the .NET 8 performance improvements post.
What this overload does
public virtual ATN Deserialize(ReadOnlySpan<int> spanData)- Rents a temporary
int[]fromArrayPool<int>.Shared(no permanent allocation). - Copies the span data in — this is a one-time startup cost when the parser class initializes.
- Deserializes the ATN using the existing internal methods (unchanged).
- Returns the rented array to the pool in a
finallyblock.
The ATN object is the permanent output. The temporary array is short-lived and pooled. The serialized data stays in read-only PE memory, never touching the managed heap.
What's needed to complete this
This overload is the runtime-side preparation. To actually use it, the ANTLR4 code generator (the Java tool that emits C# parser/lexer classes) needs to be updated to emit the ReadOnlySpan<int> pattern instead of static readonly int[] when targeting net8.0+.
See changes to: CSharp.stg
Risk
Low. The overload is additive (the existing Deserialize(int[]) is untouched), guarded by #if NETSTANDARD2_1_OR_GREATER || NET8_0_OR_GREATER, and invisible to existing code. The ArrayPool rent/return is bounded by a try/finally so there's no leak path. The deserialization logic is identical — only the data source changes.
| /// Not thread-safe — intended for use by a single parser/lexer instance | ||
| /// which already cannot be shared across threads during a parse. | ||
| /// </summary> | ||
| internal sealed class ObjectPool<T> where T : class |
There was a problem hiding this comment.
Object Pooling: Custom Pool vs Microsoft.Extensions.ObjectPool
The pooling implementation (Misc/ObjectPool.cs) uses a custom non-thread-safe Stack-based pool rather than Microsoft.Extensions.ObjectPool. This is intentional for two reasons:
1. Zero-dependency guarantee
Antlr4.Runtime.Standard currently has no opinionated NuGet dependencies (only System.Memory/System.Buffers polyfills for netstandard2.0, which are low-level Microsoft primitives). Adding Microsoft.Extensions.ObjectPool would be the first real transitive dependency imposed on every downstream consumer, with version conflict risk if their application already references a different version.
2. Thread-safety overhead is wasted here
DefaultObjectPool uses Interlocked operations across per-core slots to be lock-free thread-safe. ParserATNSimulator pools are instance-level fields — simulators are not shared across threads, so that overhead buys nothing. The custom pool is a ~20 line Stack with a cap, which is exactly as much machinery as the use case requires.
If Antlr4.Runtime.Standard ever acquires a dependency on Microsoft.Extensions.* for another reason, revisiting this would be reasonable. For now, the custom pool is the right tradeoff.
| /// <para>Falls back to <see cref="ArrayPool{T}"/> for strings that exceed | ||
| /// the initial stack-allocated buffer.</para> | ||
| /// </summary> | ||
| internal ref struct ValueStringBuilder |
There was a problem hiding this comment.
Custom Implementation vs BCL Alternatives
The Misc/ValueStringBuilder.cs implementation is a copy of the ref struct + stackalloc + ArrayPool pattern used internally by dotnet/runtime, rather than using StringBuilder or any public API. This is intentional.
Why not StringBuilder?
StringBuilder heap-allocates on construction and is a class — it cannot use stackalloc for its initial buffer. The entire point of ValueStringBuilder in BufferedTokenStream.GetText is to start on the stack (512 chars stackalloc) and only spill to a pooled heap array if the output exceeds that. StringBuilder cannot do this.
Why not a Microsoft package?
The dotnet/runtime team's own ValueStringBuilder is marked internal and has never been shipped as a public API. This is deliberate — ref struct limitations make it unsuitable as a general-purpose public type:
- Cannot be stored as a field
- Cannot cross async/await boundaries
- Cannot be boxed or used in generics
Because of these constraints, Microsoft has no plans to expose it publicly. The only way to use this pattern in a library is to copy the implementation, which is the standard approach across the .NET ecosystem (see also: System.Text.Json, System.Text.RegularExpressions, etc. all do the same internally).
Attribution
The implementation follows the pattern from dotnet/runtime, which is MIT licensed and consistent with ANTLR4's BSD-3 license.
Object pooling in
|
On sealing
|
Token benchmarks — why we touched
|
| Method | Mean | Alloc | vs baseline |
|---|---|---|---|
CreateTokens_CommonTokenFactory (baseline) |
1,289.7 ns | 27,320 B | 1.0× |
CreateTokens_OptimizedTokenFactory |
639.2 ns | 8,024 B | 2.0× faster, 70% less memory |
GetText_CommonToken (1000 reads) |
13,304.0 ns | 244,096 B | 1.0× |
GetText_OptimizedToken (1000 reads) |
211.0 ns | 224 B | 63× faster, 99.9% less memory |
The construction improvement (2×, 70% less memory) is entirely the ValueTuple — one allocation instead of two. The GetText improvement (63×, 99.9% less memory) is entirely the cache — 1,000 reads of CommonToken.Text allocate 1,000 strings; 1,000 reads of OptimizedToken.Text allocate one.
Compatibility
OptimizedToken implements both IToken and IWritableToken — identical interfaces to CommonToken. OptimizedTokenFactory implements ITokenFactory. Adoption is a one-liner:
lexer.TokenFactory = OptimizedTokenFactory.Default;No grammar changes, no parser changes. The copyText constructor overload preserves the CommonTokenFactory(true) contract needed for UnbufferedCharStream, where the input window may slide away before Text is accessed.
- MergeCache: replace (PredictionContext, PredictionContext) ValueTuple key with a private PredictionContextPair readonly struct; uses RuntimeHelpers.GetHashCode for reference-identity semantics - OptimizedToken: replace (ITokenSource, ICharStream) ValueTuple field with a private TokenSourcePair struct; ValueTuple unavailable on net45 - ValueStringBuilder: guard entire file with #if NETSTANDARD2_0_OR_GREATER || NET8_0_OR_GREATER; ref struct and Span<T> are not available on net45 - BufferedTokenStream: guard System.Buffers using and ValueStringBuilder call site; net45 falls back to plain StringBuilder - UnbufferedCharStream: guard System.Buffers using and all ArrayPool usage; net45 falls back to plain array allocation No behaviour change on netstandard2.0+ or net8.0. Signed-off-by: Harry Cordewener <admin@twilightdays.org>
…lasses and OptimizedToken - SpanInputStream, CharSpanInputStream: remove ReadOnlySpan<char> Data property from #else (net45) fallback class — Span unavailable on net45 - StringSpanInputStream: remove ReadOnlySpan<char> Data and ReadOnlyMemory<char> AsMemory() from #else (net45) fallback class - OptimizedToken: guard EmptySource field and ValueTuple constructor with #if NETSTANDARD2_0_OR_GREATER || NET8_0_OR_GREATER; add flat ITokenSource/ICharStream overload for net45 - OptimizedTokenFactory: guard ValueTuple constructor call site with matching #if, using flat parameters on net45 Signed-off-by: Harry Cordewener <admin@twilightdays.org>
Add /p:UseSharedCompilation=false to disable VBCSCompiler server and /p:CopyRetryCount=3 /p:CopyRetryDelayMilliseconds=500 for retry logic on both dotnet build invocations. Addresses System.IO.IOException on Antlr4.Runtime.Standard.deps.json during parallel test execution. Signed-off-by: Harry Cordewener <admin@twilightdays.org>
41d093d to
0fee362
Compare
|
Typescript build failures seem like they are unrelated by all means to this change. |
|
Using CharStreams.fromPath(), this PR appears to reduce performance for csharp/v8-spec by about ~35%. There is a similar performance reduction using the |
|
@kaby76 Thanks for that call-out. I'll take a look at that path and see what's going on there. |
MergeCache.PredictionContextPair was using identity-based equality (ReferenceEquals / RuntimeHelpers.GetHashCode), which made the cache ineffective: structurally-equal but reference-different PredictionContext objects never hit the cache, forcing redundant merge computations. Restore value-based semantics (object.Equals / PredictionContext.GetHashCode) while keeping the flat Dictionary<PredictionContextPair, PredictionContext> structure (fewer allocations than the prior nested-dict approach). Benchmark (kaby76/a4-4938 repro, 23 C# files, CSharp grammar, net10.0): Before fix: ~5.77s After fix: ~3.29s (43% faster) Add MergeCacheBenchmarks to validate cache hit effectiveness via BenchmarkDotNet. Signed-off-by: Harry Cordewener <admin@twilightdays.org>
MergeCache Regression FixRoot CauseMergeCache.PredictionContextPair was using identity-based equality (ReferenceEquals + RuntimeHelpers.GetHashCode). During ATN prediction, the runtime frequently creates structurally-identical PredictionContext objects at different heap addresses. With identity semantics the cache never hits for these — effectively disabling caching entirely and forcing full merge recomputation on every call. FixRestore value-based equality in PredictionContextPair: public bool Equals(PredictionContextPair other)
- => ReferenceEquals(A, other.A) && ReferenceEquals(B, other.B);
+ => Equals(A, other.A) && Equals(B, other.B);
public override int GetHashCode()
{
int h = 17;
- h = h * 31 + (A != null ? System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode(A) : 0);
- h = h * 31 + (B != null ? System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode(B) : 0);
+ h = h * 31 + (A != null ? A.GetHashCode() : 0);
+ h = h * 31 + (B != null ? B.GetHashCode() : 0);
return h;
}PredictionContext already has proper value-based GetHashCode() (returns cachedHashCode) and structural Equals() overrides in its subclasses. The flat Dictionary<PredictionContextPair, PredictionContext> structure from the PR is preserved — fewer allocations than the old nested-dict design. Benchmark ResultsEnd-to-end (kaby76/a4-4938 repro — 23 C# files, CSharp grammar, net10.0 Release):
BenchmarkDotNet micro-benchmark (MergeCacheBenchmarks, MergeCount=500):
Cache hits reduce both time and GC pressure significantly. |
|
@kaby76 - give that another go. |
|
Much better: 30% faster. This is quite good. |
|
@HarryCordewener Thanks for all the above proposals. |
|
Can we get Github Copilot turned on in this repo? https://docs.github.com/en/copilot/how-tos/use-copilot-agents/request-a-code-review/use-code-review. While I don't trust Copilot's analysis, I trust human reviews less. In the meantime, I'll test this PR against the entire grammars-v4 repo. This is what Claude Code wrote up: |
|
Could we get Github Copilot turned on for this repo? Here's a summary that Claude Code wrote up. In particular, I asked it to consider the other targets, as that was not usually considered. (NB: I thought that posted this comment before, but perhaps I didn't because when I reloaded this page, the comment was gone.) |
|
I'm with you @ericvergnaud on splitting it out. I will see what the safest and healthiest splits are. It's likely that the OptimizedToken optionals and the Streams will get cut into a separate branch, but keep the rest - as those two are the lowest gains for a non-DSL grammar (and can be added by outside sources without a PR into this repo) But before I do, I will address @kaby76's Claude review, so that this branch remains linear in discussion. |
Signed-off-by: Harry Cordewener <admin@twilightdays.org>
Verification of Claude Code's Review Claims1. "ReadOnlySpan _serializedATN => new int[] { … } does NOT produce RVA static data"VERDICT: PARTIALLY CORRECT, but nuanced. Claude claims new int[] { … } in a ReadOnlySpan property always allocates a heap array. This is the KEY issue and it's complicated:
The fix Claude suggests is also correct: change new int[] { ... } to a collection expression [...] to get actual RVA backing. Andrew Lock's article confirms that ReadOnlySpan = [1, 2, 3, 4, 5] compiles to RuntimeHelpers.CreateSpan() which wraps a pointer directly into the assembly data. However, the practical impact is mitigated because _serializedATN is only accessed once (in the static field initializer for _ATN), so it's a one-time allocation. But it means the stated benefit ("eliminates a permanent Gen2 managed heap allocation") is wrong — it actually creates a short-lived heap array that gets GC'd after deserialization, while the old static int[] field was one allocation retained forever. The old way is arguably better for this access pattern (allocated once, retained, no re-creation). 2. "SerializedAtn property on net8.0+ allocates on every call"VERDICT: CORRECT. Looking at line 296 of CSharp.stg:
On net8.0+, _serializedATN is a ReadOnlySpan property that allocates new int[] each invocation, then .ToArray() copies it again. That's TWO allocations per access. Claude is right that this is worse than the legacy path if called more than once. Practically it's rarely called (diagnostic tools), so low risk, but the analysis is correct. 3. "Removed IWritableToken guard is a silent breaking change"VERDICT: CORRECT in principle, LOW risk in practice. The ITokenFactory contract technically allows returning tokens that don't implement IWritableToken. Removing the guard means such tokens would throw InvalidCastException. Claude is right that this is technically a backward-incompatible change. However, in practice every token implementation in the ANTLR ecosystem implements IWritableToken, so the real-world risk is negligible. The original comment in the code ("all tokens implement IWritableToken") is the stated justification. 4. "OptimizedToken copy constructor tight-coupling to CommonToken.source"VERDICT: MOSTLY CORRECT but overstated risk. Claude correctly identifies the white-box dependency on CommonToken's internal source field. The concern about UnbufferedCharStream is valid — if the stream has advanced past the token's indices, lazy re-fetch will return wrong text. However, this is the same hazard CommonToken already has (it also re-fetches lazily by default). The copyText parameter exists precisely for this case. So the risk exists but it's not new to OptimizedToken. 5. "Code duplication between SpanInputStream and CharSpanInputStream"VERDICT: CORRECT — straightforward code smell observation, valid. 6. "IntervalSet.intervals field type changed from IList to List"VERDICT: CORRECT — technically breaking for subclasses. The field is protected internal so any subclass assigning a non-List would break. Low real-world risk since nobody subclasses IntervalSet in practice. 7. "tokens = new List(1024) initial capacity is aggressive"VERDICT: REASONABLE CONCERN. 1024 * 8 bytes (reference) = 8KB per stream. For small inputs this wastes memory. Whether it matters depends on how many concurrent parsers exist. A moderate concern, not a bug. 8. "Pool management methods are internal instead of private"VERDICT: VALID design concern but low practical risk since ANTLR internals are rarely subclassed. 9. "ATNDeserializer.Deserialize(ReadOnlySpan) leaves this.data = null"VERDICT: CORRECT but irrelevant — as Claude itself notes, the generated code always does new ATNDeserializer().Deserialize(...) so instance reuse doesn't happen. SummaryThe most important finding is Issue #1 — the RVA static data claim. Claude Code is correct that new int[] { ... } does NOT produce RVA data for ReadOnlySpan. The fix is to use a collection expression [...] in the template. This is the only issue that represents a genuine correctness problem (stated benefit doesn't match reality). The rest are valid but lower-priority design/compatibility observations. Changes made:Fix #1 — RVA static data via collection expressions (3 files):
This means on NET8+, _serializedATN is now backed by RuntimeHelpers.CreateSpan() reading from RVA static data — zero heap allocation. Fix #2 — Cache SerializedAtn (same 2 files + XPathLexer):
Fix #7 — Token list initial capacity (1 file):
|
|
@kaby76 - give Claude another go at the current state, see if it has anything it believes must be changed. After that, I can fork out the OptimizedToken and Span stream classes. |
Again, we need to have Copilot turned on for this repo. I cross-check AI blather with alternate AI. |
Restores the 'is' pattern check before setting TokenIndex, preventing InvalidCastException for custom ITokenFactory implementations that return tokens not implementing IWritableToken. Signed-off-by: Harry Cordewener <admin@twilightdays.org>
Regarding the two remaining items Claude flagged:IWritableToken guardAgreed — restoring this is the right call. I'll put the is check back: if (t is IWritableToken wt) wt.TokenIndex = tokens.Count;The isinst IL instruction compiles to a single MT pointer comparison (~1 cycle). On a 100K-token file that's sub-microsecond total overhead — effectively zero. The correctness argument is what matters: anyone with a custom ITokenFactory returning tokens that don't implement IWritableToken would get an InvalidCastException at runtime without the guard. That's a behavioral contract violation we shouldn't ship. I checked GitHub — most ecosystem consumers (TypeCobol, SharpMUSH, IS4Code/Sona, and even the vendored copies in microsoft-ui-xaml, Azure Cosmos, XSharp) all use the if (t is IWritableToken ...) pattern. The few that direct-cast (sql-bi/Vpax-Obfuscator, dotnet/runtime ilasm) happen to use tokens that do implement the interface, but we shouldn't rely on that assumption for all consumers. IntervalSet.intervals:
|
…asses SpanInputStream variants, OptimizedToken, and OptimizedTokenFactory (with their tests) have been extracted to the dedicated branch feature/csharp-extra-classes. Removing them here to keep this branch focused on modifications to existing runtime files. Signed-off-by: Harry Cordewener <admin@twilightdays.org>
Restructuring NoteI have extracted the purely additive classes (SpanInputStream variants, OptimizedToken, OptimizedTokenFactory, and their tests) into a separate PR: #4941. This PR now focuses exclusively on modifications to existing runtime types:
The two PRs are independent and can be reviewed/merged in either order. #4941 adds no modifications to existing files; this PR contains no new standalone classes. |
Goal
Summary
This PR adds a suite of opt-in, additive performance optimizations to the C# runtime. No existing public APIs are removed or changed in a breaking way (with exception of sealed classes, see below). All new types implement the same
ICharStream/IToken/ITokenSourceinterfaces the existing code uses, so they are drop-in replacements at the call site.Net 4.5 will not see much of these benefits.
Changes
New:
CharSpanInputStreamA replacement for
AntlrInputStreambacked bychar[]rather than copying into the base class's internal structure. Intended for inputs arriving fromTextReader,Stream, or a pre-builtchar[].new CharSpanInputStream(char[], length)— zero-copy construction, ~40 B allocated vs 2 KB forAntlrInputStream.Seek()is O(1) — just_index = value.AntlrInputStream.Seek()loops forward.Dataproperty exposes aReadOnlySpan<char>slice for zero-alloc text access.GetTextSpan(Interval)— zero-allocReadOnlySpan<char>alternative toGetText.net8.0(directICharStream— no virtual dispatch) andnetstandard2.x(extendsBaseInputCharStream).New:
StringSpanInputStreamA replacement for
AntlrInputStreambacked directly by astring. Intended for the common case of parsing an in-memory string.new StringSpanInputStream(string)— stores the reference, zero copy, 40 B allocated._lengthfield (JIT BCE patterns).ToString()returns the original string reference — zero alloc.GetTextusesSubstring()directly, which shares the same BCL path asnew string(ReadOnlySpan<char>)(dotnet/runtime source).New:
OptimizedToken/OptimizedTokenFactoryAn
ITokenimplementation with cachedTextand a[StructLayout(Sequential)]field order to reduce GC header overhead.OptimizedTokenFactory.Defaultis a singleton drop-in forCommonTokenFactory.Default.Micro-optimizations on existing types
ArrayList<T>.Equals— zero-alloc equality via span comparison.IntervalSet— constructor param narrowed fromIList<Interval>toList<Interval>where the implementation already required it.ATNConfighash — no change (ATNConfigSet caches at the set level; per-config caching was benchmarked and found to regress).Benchmarks
A new
tests/benchmarks/project (BenchmarkDotNet) provides head-to-head comparisons. Run with:Benchmark Results
Hardware: Intel Core Ultra 7 265F, .NET 10.0.8, RyuJIT AVX2. All ratios vs
AntlrInputStream.Construction
AntlrInputStreamCharSpanInputStream(char[])StringSpanInputStreamCodePointCharStreamCharSpanInputStreamandStringSpanInputStreamconstruction cost is flat — independent of input length.AntlrInputStreamandCodePointCharStreamscale linearly because they copy the entire input at construction.ConsumeAll (simulated lexer hot loop)
AntlrInputStreamCharSpanInputStreamStringSpanInputStreamCodePointCharStreamStringSpanInputStreamis the standout: 3.6× faster thanAntlrInputStreamat 100 k chars, zero allocation. The zero-alloc result is because the string already lives on the heap — the stream object is the only allocation and the benchmark reuses it.CharSpanInputStreammatchesAntlrInputStreamin the hot loop (samechar[]indexer speed) but saves the construction allocation when built from a pre-built array.Seek
AntlrInputStreamCharSpanInputStreamStringSpanInputStreamCodePointCharStreamStringSpanInputStreamseek is O(1) and completely flat — purely_index = value.GetText
AntlrInputStreamStringSpanInputStreamCodePointCharStreamStringSpanInputStream.GetTextis 3.5× faster and 3.4× less memory at 1 k. At 100 k it is slower in time (but still uses 3× less memory) because the extracted text crosses the LOH threshold (~85,000 bytes), where .NET switches from bump-pointer to free-list allocation and batchesmemmovein 16 KB chunks with GC polls between them. This is a BCL constraint, not a code issue. TheGetTextSpan()method avoids this entirely for callers that can consume a span.LookBack
AntlrInputStreamCharSpanInputStreamStringSpanInputStreamCodePointCharStreamStringSpanInputStreamlookback is flat regardless of input size.Design Decisions
Why
char[]forCharSpanInputStream, notstringorReadOnlyMemory<char>?stringdoesn't supportMemoryMarshal.GetArrayDataReferenceand can't be used asTextReader/Streamsource.ReadOnlyMemory<char>has no indexer — hot-path reads require.Spanproperty access, which constructs a newReadOnlySpan<char>struct on every call.char[]gives a direct JIT intrinsic indexer with full span support. See Span<T> design notes.Why
stringforStringSpanInputStream?For string inputs,
ToCharArray()(used byAntlrInputStreamandCharSpanInputStream) copies every character. Storing the string reference directly is zero-copy. The string indexer is devirtualized by the JIT and compiles to the sameldeleminstruction as array access. The immutable string length enables stronger bounds-check elimination than a mutable_lengthfield.Substring()andnew string(ReadOnlySpan<char>)share the same BCL code path (FastAllocateString + Buffer.Memmove) so there is no advantage to the span ctor forGetText.Why not
MemoryMarshal.GetArrayDataReference+Unsafe.Add?The
(uint)i < (uint)_lengthguard is the standard pattern the JIT is specifically trained to recognise. The branchless>> 31sign-bit trick and the Unsafe pointer path were evaluated and benchmarked — both either matched or regressed vs the simple if/else +_data[pos]. The branch predictor handlesLA(1)(always positive) perfectly, making branches essentially free. Unsafe adds maintenance cost with no measured gain.Why not seal
SingletonPredictionContext?EmptyPredictionContextinherits from it. Sealing would break the existing class hierarchy.Public API compatibility
All existing types (
AntlrInputStream,CommonToken,CommonTokenFactory, etc.) are untouched. New types are purely additive.CharSpanInputStreamandStringSpanInputStreamimplementICharStream;OptimizedTokenimplementsIToken;OptimizedTokenFactoryimplementsITokenFactory. Any call site that accepts an interface can adopt these with no other changes.Testing
tests/perf-optimizations/, covering allICharStreamoperations, parity withAntlrInputStream, edge cases (empty input, EOF, unicode BMP, seek/reset), and integration scenarios.AntlrInputStream's behaviour. Supplementary code points (U+10000+) are two positions, same asAntlrInputStream. UseCodePointCharStream(viaCharStreams.fromString) for grammars that target supplementary characters.