Skip to content

Feature/csharp runtime perf optimizations#4938

Open
HarryCordewener wants to merge 15 commits into
antlr:devfrom
HarryCordewener:feature/csharp-runtime-perf-optimizations
Open

Feature/csharp runtime perf optimizations#4938
HarryCordewener wants to merge 15 commits into
antlr:devfrom
HarryCordewener:feature/csharp-runtime-perf-optimizations

Conversation

@HarryCordewener
Copy link
Copy Markdown

@HarryCordewener HarryCordewener commented May 16, 2026

Goal

I went on a journey thinking I understood Span and that it would magically make things faster. On this journey, using AI, I learned that things aren't that simple, but that there definitely was some room for improvement.

Something something, Benchmarks are important and prove / disprove assumptions.

I've been using ANTLR4 for a DSL Parser, and in chasing down optimization, ANTLR4 became important to improve. I plan to create the same Template edit to the antlr-ng project.

Summary

This PR adds a suite of opt-in, additive performance optimizations to the C# runtime. No existing public APIs are removed or changed in a breaking way (with exception of sealed classes, see below). All new types implement the same ICharStream / IToken / ITokenSource interfaces the existing code uses, so they are drop-in replacements at the call site.

Net 4.5 will not see much of these benefits.


Changes

New: CharSpanInputStream

A replacement for AntlrInputStream backed by char[] rather than copying into the base class's internal structure. Intended for inputs arriving from TextReader, Stream, or a pre-built char[].

  • new CharSpanInputStream(char[], length) — zero-copy construction, ~40 B allocated vs 2 KB for AntlrInputStream.
  • Seek() is O(1) — just _index = value. AntlrInputStream.Seek() loops forward.
  • Data property exposes a ReadOnlySpan<char> slice for zero-alloc text access.
  • GetTextSpan(Interval) — zero-alloc ReadOnlySpan<char> alternative to GetText.
  • Targets net8.0 (direct ICharStream — no virtual dispatch) and netstandard2.x (extends BaseInputCharStream).

New: StringSpanInputStream

A replacement for AntlrInputStream backed directly by a string. Intended for the common case of parsing an in-memory string.

  • new StringSpanInputStream(string) — stores the reference, zero copy, 40 B allocated.
  • Hot-path indexer uses the string's immutable length, enabling stronger JIT bounds-check elimination than a mutable _length field (JIT BCE patterns).
  • ToString() returns the original string reference — zero alloc.
  • GetText uses Substring() directly, which shares the same BCL path as new string(ReadOnlySpan<char>) (dotnet/runtime source).

New: OptimizedToken / OptimizedTokenFactory

An IToken implementation with cached Text and a [StructLayout(Sequential)] field order to reduce GC header overhead. OptimizedTokenFactory.Default is a singleton drop-in for CommonTokenFactory.Default.

Micro-optimizations on existing types

  • ArrayList<T>.Equals — zero-alloc equality via span comparison.
  • IntervalSet — constructor param narrowed from IList<Interval> to List<Interval> where the implementation already required it.
  • ATNConfig hash — no change (ATNConfigSet caches at the set level; per-config caching was benchmarked and found to regress).

Benchmarks

A new tests/benchmarks/ project (BenchmarkDotNet) provides head-to-head comparisons. Run with:

cd runtime/CSharp/tests/benchmarks
dotnet run -c Release -- --filter '*'

Benchmark Results

Hardware: Intel Core Ultra 7 265F, .NET 10.0.8, RyuJIT AVX2. All ratios vs AntlrInputStream.

Construction

Stream 1 k chars Alloc 100 k chars Alloc
AntlrInputStream 81 ns 2,064 B 70,754 ns 200 kB
CharSpanInputStream(char[]) 3.7 ns 40 B 3.7 ns 40 B
StringSpanInputStream 4.0 ns 40 B 4.2 ns 40 B
CodePointCharStream 839 ns 4,064 B 148,812 ns 400 kB

CharSpanInputStream and StringSpanInputStream construction cost is flat — independent of input length. AntlrInputStream and CodePointCharStream scale linearly because they copy the entire input at construction.

ConsumeAll (simulated lexer hot loop)

Stream 1 k chars Alloc 100 k chars Alloc
AntlrInputStream 379 ns 2,024 B 99,404 ns 200 kB
CharSpanInputStream 455 ns 2,024 B 98,705 ns 200 kB
StringSpanInputStream 360 ns 0 B 27,256 ns 0 B
CodePointCharStream 979 ns 4,024 B 181,386 ns 400 kB

StringSpanInputStream is the standout: 3.6× faster than AntlrInputStream at 100 k chars, zero allocation. The zero-alloc result is because the string already lives on the heap — the stream object is the only allocation and the benchmark reuses it.

CharSpanInputStream matches AntlrInputStream in the hot loop (same char[] indexer speed) but saves the construction allocation when built from a pre-built array.

Seek

Stream 1 k chars 100 k chars
AntlrInputStream 342 ns / 2,024 B 90,371 ns / 200 kB
CharSpanInputStream 83 ns / 2,024 B 70,566 ns / 200 kB
StringSpanInputStream 6.9 ns / 0 B 6.9 ns / 0 B
CodePointCharStream 1,001 ns / 4,024 B 142,125 ns / 400 kB

StringSpanInputStream seek is O(1) and completely flat — purely _index = value.

GetText

Stream 1 k chars Alloc 100 k chars Alloc
AntlrInputStream 109 ns 2,848 B 13,105 ns 300 kB
StringSpanInputStream 31 ns 824 B 35,546 ns 100 kB
CodePointCharStream 2,040 ns 15,392 B 506,776 ns 1.8 MB

StringSpanInputStream.GetText is 3.5× faster and 3.4× less memory at 1 k. At 100 k it is slower in time (but still uses 3× less memory) because the extracted text crosses the LOH threshold (~85,000 bytes), where .NET switches from bump-pointer to free-list allocation and batches memmove in 16 KB chunks with GC polls between them. This is a BCL constraint, not a code issue. The GetTextSpan() method avoids this entirely for callers that can consume a span.

LookBack

Stream 1 k chars Alloc 100 k chars Alloc
AntlrInputStream 223 ns 2,024 B 80,480 ns 200 kB
CharSpanInputStream 129 ns 2,024 B 70,877 ns 200 kB
StringSpanInputStream 59 ns 0 B 59 ns 0 B
CodePointCharStream 829 ns 4,024 B 134,519 ns 400 kB

StringSpanInputStream lookback is flat regardless of input size.


Design Decisions

Why char[] for CharSpanInputStream, not string or ReadOnlyMemory<char>?

string doesn't support MemoryMarshal.GetArrayDataReference and can't be used as TextReader/Stream source. ReadOnlyMemory<char> has no indexer — hot-path reads require .Span property access, which constructs a new ReadOnlySpan<char> struct on every call. char[] gives a direct JIT intrinsic indexer with full span support. See Span<T> design notes.

Why string for StringSpanInputStream?

For string inputs, ToCharArray() (used by AntlrInputStream and CharSpanInputStream) copies every character. Storing the string reference directly is zero-copy. The string indexer is devirtualized by the JIT and compiles to the same ldelem instruction as array access. The immutable string length enables stronger bounds-check elimination than a mutable _length field. Substring() and new string(ReadOnlySpan<char>) share the same BCL code path (FastAllocateString + Buffer.Memmove) so there is no advantage to the span ctor for GetText.

Why not MemoryMarshal.GetArrayDataReference + Unsafe.Add?

The (uint)i < (uint)_length guard is the standard pattern the JIT is specifically trained to recognise. The branchless >> 31 sign-bit trick and the Unsafe pointer path were evaluated and benchmarked — both either matched or regressed vs the simple if/else + _data[pos]. The branch predictor handles LA(1) (always positive) perfectly, making branches essentially free. Unsafe adds maintenance cost with no measured gain.

Why not seal SingletonPredictionContext?

EmptyPredictionContext inherits from it. Sealing would break the existing class hierarchy.

Public API compatibility

All existing types (AntlrInputStream, CommonToken, CommonTokenFactory, etc.) are untouched. New types are purely additive. CharSpanInputStream and StringSpanInputStream implement ICharStream; OptimizedToken implements IToken; OptimizedTokenFactory implements ITokenFactory. Any call site that accepts an interface can adopt these with no other changes.


Testing

  • 149 tests added to tests/perf-optimizations/, covering all ICharStream operations, parity with AntlrInputStream, edge cases (empty input, EOF, unicode BMP, seek/reset), and integration scenarios.
  • All 86 existing runtime tests continue to pass.
  • Unicode note: both new stream classes operate at the UTF-16 code-unit level, matching AntlrInputStream's behaviour. Supplementary code points (U+10000+) are two positions, same as AntlrInputStream. Use CodePointCharStream (via CharStreams.fromString) for grammars that target supplementary characters.

- OptimizedToken: deferred text materialization + ValueTuple source (opt-in)
- OptimizedTokenFactory: creates OptimizedToken instances (opt-in)
- SpanInputStream: string-backed ICharStream, no char[] copy (opt-in)
- ValueStringBuilder: stack-allocated string building in BufferedTokenStream.GetText
- ObjectPool: reuse ATNConfigSet, HashSet<ATNConfig>, MergeCache in ParserATNSimulator
- ArrayPool: rent/return buffers in UnbufferedCharStream growth
- ATNDeserializer: ReadOnlySpan<int> overload for netstandard2.1+/net8.0+
- Multi-target: netstandard2.0 + netstandard2.1 + net8.0 (net45 preserved on Windows)

All existing public API is unchanged. New types are additive and opt-in.

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
- Seal + singleton ConfigEqualityComparator/ObjectEqualityComparator
  (eliminates per-HashSet allocation, enables devirtualization)
- Fix FilterPrecedencePredicates: add missing return, single-pass rewrite
  (was a silent bug + 2x LINQ enumeration)
- ArrayList.Equals: index-based loop replaces enumerator allocation
- IntervalSet.intervals: IList<Interval> → List<Interval>
  (enables struct enumerator, bounds-check elimination)
- HashCode.Combine on net8.0+ for ATNConfig, LexerATNConfig, SemanticContext
  (xxHash64-based, better distribution, fewer instructions)
- Cache typeof().GetHashCode() in AND/OR SemanticContext
- BufferedTokenStream: capacity hints (100→1024 initial, sized sublists)
- Seal LexerATNConfig, ArrayPredictionContext (devirt + smaller vtables)
- Remove redundant Tuple allocation in Lexer.SetInputStream
- Remove 'is IWritableToken' type check in BufferedTokenStream.Fetch()
  (all tokens implement IWritableToken; avoids interface dispatch)

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
…ry API

- Backing store changed from string to char[] for direct Span access
- New ctor: SpanInputStream(char[], int) for zero-copy scenarios
- AsSpan() / AsMemory() expose zero-copy slicing to callers
- net8.0+: bypasses BaseInputCharStream entirely, implements ICharStream
  directly with bounds-elided LA() via Unsafe.Add + MemoryMarshal
- net8.0+ Seek() is O(1) unconditionally (no Consume loop)
- netstandard2.0/2.1: still inherits BaseInputCharStream for compat

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
- InputStreamBenchmarks: AntlrInputStream vs SpanInputStream (construct, LA, seek, GetText, lookback)
- TokenBenchmarks: CommonTokenFactory vs OptimizedTokenFactory (create, GetText)
- ATNConfigSetBenchmarks: Add/Hash/Equals with sealed+singleton comparators
- IntervalSetBenchmarks: Contains, Or, Complement with List<Interval> backing
- ArrayListBenchmarks: Equals with index-based loop

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
…putStream

- CharSpanInputStream: char[] backed, zero-copy for char[]/stream/reader ctors
- StringSpanInputStream: string backed, zero-copy ctor from string (no ToCharArray)
  - ToString() returns original string reference (zero alloc)
  - GetText() uses Substring() (no char[] intermediary)
  - Same ICharStream / BaseInputCharStream split on #if NET8_0_OR_GREATER
  - Unicode: BMP (U+0000-FFFF), same as AntlrInputStream — surrogate pairs
    counted as two chars; use CodePointCharStream for supplementary code points
- Tests: 149/149 passing (63 new StringSpanInputStream tests)
- Benchmarks: InputStreamBenchmarks updated with StringSpanInputStream column

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
CHANGES.txt is no longer being used

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
Update CSharp.stg code generator template and XPathLexer.cs to emit
RVA-backed ReadOnlySpan<int> for the serialized ATN data on net8.0+,
falling back to int[] on older targets via #if preprocessor directives.

This eliminates a permanent Gen2 managed heap allocation for every
generated parser and lexer. The data lives in the PE image's read-only
.text section instead, mapped by the OS loader with zero GC pressure.

The SerializedAtn property override uses .ToArray() on net8.0+ since
the base class returns int[].

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
The runner builds the runtime DLL without -f, so multi-TFM projects
output the highest TFM (net8.0). The generated test project template
must match or CS1705 fires on all ~359 grammar regression tests.

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
Comment on lines -444 to +458
if (!collection.OfType<PrecedencePredicate>().Any())
Collections.EmptyList<PrecedencePredicate>();

List<PrecedencePredicate> result = collection.OfType<PrecedencePredicate>().ToList();
List<PrecedencePredicate> result = null;
foreach (var item in collection)
{
if (item is PrecedencePredicate pp)
(result ??= new List<PrecedencePredicate>()).Add(pp);
}
if (result == null) return Collections.EmptyList<PrecedencePredicate>();
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaces two LINQ passes + a missing return bug with a single foreach.

The original code called .OfType<PrecedencePredicate>().Any() then .OfType<PrecedencePredicate>().ToList() — two full linear scans of the collection, each constructing a new LINQ enumerator state machine on the heap. The early-exit was also missing its return keyword, so the short-circuit never fired: execution always fell through to the .ToList() call, allocated an empty List<PrecedencePredicate>, passed it to ExceptWith (a no-op), and returned it. Silently harmless, but wrong.

This replacement makes one pass. result is only allocated if at least one PrecedencePredicate is found — the common case (most grammars don't use precedence predicates) exits with zero heap allocations beyond the loop itself. The missing return bug is fixed as a side effect: a null result correctly returns the empty singleton.

/// <c>int[]</c> on the heap. On net8.0+ the generated parser can expose
/// the serialized ATN as a <c>ReadOnlySpan&lt;int&gt;</c> backed by RVA static data.
/// </summary>
public virtual ATN Deserialize(ReadOnlySpan<int> spanData)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATNDeserializer.Deserialize(ReadOnlySpan<int>) — forward-looking RVA static data support

Background

Every ANTLR4-generated parser emits a static readonly int[] field containing the serialized ATN — the state machine that drives all parsing decisions. For large grammars this array can be hundreds of kilobytes. Because it's a managed int[] it is allocated on the GC heap at class load time, promoted to Gen2, and pinned there for the lifetime of the process. It is never mutated, never collected, and contributes to GC root scanning overhead on every collection.

What net8.0+ enables

C# 12 / .NET 8 can emit ReadOnlySpan<T> properties backed by RVA static data — the compiler places the data in the PE image's read-only .text section rather than allocating a managed array:

// Today — managed heap, lives forever in Gen2
public static readonly int[] serializedATN = { 4, 1, 23, 456, ... };

// With RVA static data — no managed allocation, lives in mapped PE memory
private static ReadOnlySpan<int> SerializedATN => new int[] { 4, 1, 23, 456, ... };

The data is memory-mapped from the assembly by the OS loader. Zero GC pressure, zero Gen2 promotion, and the OS can page it out under memory pressure since it's backed by the file on disk. See Konrad Kokosa's write-up on RVA static fields and the .NET 8 performance improvements post.

What this overload does

public virtual ATN Deserialize(ReadOnlySpan<int> spanData)
  1. Rents a temporary int[] from ArrayPool<int>.Shared (no permanent allocation).
  2. Copies the span data in — this is a one-time startup cost when the parser class initializes.
  3. Deserializes the ATN using the existing internal methods (unchanged).
  4. Returns the rented array to the pool in a finally block.

The ATN object is the permanent output. The temporary array is short-lived and pooled. The serialized data stays in read-only PE memory, never touching the managed heap.

What's needed to complete this

This overload is the runtime-side preparation. To actually use it, the ANTLR4 code generator (the Java tool that emits C# parser/lexer classes) needs to be updated to emit the ReadOnlySpan<int> pattern instead of static readonly int[] when targeting net8.0+.

See changes to: CSharp.stg

Risk

Low. The overload is additive (the existing Deserialize(int[]) is untouched), guarded by #if NETSTANDARD2_1_OR_GREATER || NET8_0_OR_GREATER, and invisible to existing code. The ArrayPool rent/return is bounded by a try/finally so there's no leak path. The deserialization logic is identical — only the data source changes.

/// Not thread-safe — intended for use by a single parser/lexer instance
/// which already cannot be shared across threads during a parse.
/// </summary>
internal sealed class ObjectPool<T> where T : class
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object Pooling: Custom Pool vs Microsoft.Extensions.ObjectPool

The pooling implementation (Misc/ObjectPool.cs) uses a custom non-thread-safe Stack-based pool rather than Microsoft.Extensions.ObjectPool. This is intentional for two reasons:

1. Zero-dependency guarantee

Antlr4.Runtime.Standard currently has no opinionated NuGet dependencies (only System.Memory/System.Buffers polyfills for netstandard2.0, which are low-level Microsoft primitives). Adding Microsoft.Extensions.ObjectPool would be the first real transitive dependency imposed on every downstream consumer, with version conflict risk if their application already references a different version.

2. Thread-safety overhead is wasted here

DefaultObjectPool uses Interlocked operations across per-core slots to be lock-free thread-safe. ParserATNSimulator pools are instance-level fields — simulators are not shared across threads, so that overhead buys nothing. The custom pool is a ~20 line Stack with a cap, which is exactly as much machinery as the use case requires.

If Antlr4.Runtime.Standard ever acquires a dependency on Microsoft.Extensions.* for another reason, revisiting this would be reasonable. For now, the custom pool is the right tradeoff.

/// <para>Falls back to <see cref="ArrayPool{T}"/> for strings that exceed
/// the initial stack-allocated buffer.</para>
/// </summary>
internal ref struct ValueStringBuilder
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom Implementation vs BCL Alternatives

The Misc/ValueStringBuilder.cs implementation is a copy of the ref struct + stackalloc + ArrayPool pattern used internally by dotnet/runtime, rather than using StringBuilder or any public API. This is intentional.

Why not StringBuilder?

StringBuilder heap-allocates on construction and is a class — it cannot use stackalloc for its initial buffer. The entire point of ValueStringBuilder in BufferedTokenStream.GetText is to start on the stack (512 chars stackalloc) and only spill to a pooled heap array if the output exceeds that. StringBuilder cannot do this.

Why not a Microsoft package?

The dotnet/runtime team's own ValueStringBuilder is marked internal and has never been shipped as a public API. This is deliberate — ref struct limitations make it unsuitable as a general-purpose public type:

  • Cannot be stored as a field
  • Cannot cross async/await boundaries
  • Cannot be boxed or used in generics

Because of these constraints, Microsoft has no plans to expose it publicly. The only way to use this pattern in a library is to copy the implementation, which is the standard approach across the .NET ecosystem (see also: System.Text.Json, System.Text.RegularExpressions, etc. all do the same internally).

Attribution

The implementation follows the pattern from dotnet/runtime, which is MIT licensed and consistent with ANTLR4's BSD-3 license.

@HarryCordewener
Copy link
Copy Markdown
Author

Object pooling in ParserATNSimulator

What was added

A minimal non-thread-safe ObjectPool<T> (src/Misc/ObjectPool.cs) backed by a Stack<T> with a configurable max size. Four pools are held as fields on ParserATNSimulator:

  • Two ATNConfigSet pools (one for full-context, one for SLL) — the most allocation-heavy objects in the prediction loop.
  • One HashSet<ATNConfig> pool for the closureBusy set used in closure computation.
  • One MergeCache pool (capped at 2) for the context merge cache that is already cleared after each prediction.

Instances are retrieved at the start of a prediction operation and returned at the end. Sets that have been made read-only (owned by DFA states and referenced indefinitely) are not returned — the guard if (configSet.IsReadOnly) return; ensures the pool never holds a live object that has escaped.

Why this helps

The ATN prediction loop is the hottest allocation site in the runtime. Every AdaptivePredict call allocates at least one ATNConfigSet, a HashSet<ATNConfig>, and a MergeCache. For an input of N tokens, this happens N times. On a cold DFA (first parse of a grammar), the per-token allocation can reach tens of kilobytes, driving frequent Gen0 GC collections and occasional Gen2 promotions.

Object pooling is the standard .NET pattern for exactly this scenario — the Microsoft performance guidelines and ASP.NET Core source both use it extensively for short-lived, expensive-to-allocate objects that are consumed and discarded on a tight loop. The BCL's own ArrayPool<T> and MemoryPool<T> are the same idea applied to arrays. See also the .NET object pooling docs.

The PoolReset() method added to ATNConfigSet clears all fields and restores the lookup table so the returned instance is indistinguishable from a freshly constructed one. This is the same contract ArrayPool<T>.Return(array, clearArray: true) enforces.

Risk assessment

Low risk — normal operation.
The pool is internal, not part of any public API. It is purely an allocation optimization — the objects it vends are functionally identical to freshly constructed ones. If the pool is empty (first call, or all instances are in use) it falls back to _factory(), which is the original allocation path. The worst case is identical to the pre-pooling behaviour.

Medium risk — the IsReadOnly guard.
The critical invariant is that read-only ATNConfigSet instances are never returned to the pool. A read-only set has been captured into a DFA state and will be referenced for the lifetime of the DFA. Returning it would corrupt any future parse that reuses the pooled instance. The current guard is correct, but it is the kind of logic that must be maintained carefully as the ATN simulator evolves. A future change that creates a read-only set via a path that bypasses Return would silently leak the protection.

Medium risk — PoolReset correctness.
ATNConfigSet.PoolReset() must zero every field that influences behaviour. If a new field is added to ATNConfigSet in the future and PoolReset is not updated, stale state from a previous parse would bleed into the next one. This is the canonical pooling hazard and is best mitigated by a test that round-trips a set through the pool and asserts every observable field is reset to its default.

Low risk — thread safety.
The pool is not thread-safe by design. This is safe because a ParserATNSimulator instance is already not safe to share across threads — each parse must use its own simulator. The pool being per-simulator makes the threading contract the same as the existing one.

Low risk — MergeCache cap of 2.
The merge cache pool is capped at 2 instances rather than the default 16. The merge cache can grow large for ambiguous grammars; retaining too many large caches between predictions would increase memory pressure rather than reduce it. The cap of 2 is conservative and can be tuned upward if profiling shows the factory allocation is still significant.

@HarryCordewener
Copy link
Copy Markdown
Author

On sealing ObjectEqualityComparator, ConfigEqualityComparator, ArrayPredictionContext, and LexerATNConfig

These four classes have been public and unsealed since their introduction, but none of them were designed as extension points — they have no virtual members intended for override, no documented subclassing contract, and no examples in the codebase or documentation of user code extending them. Unsealed public classes carry an implicit promise the project never made.

The JIT argument. When the JIT sees a call through a reference whose declared type is a non-sealed class, it must emit a virtual dispatch — it cannot assume the object isn't a subclass with an overriding method. sealed tells the JIT the type hierarchy ends here, enabling it to devirtualize the call and inline the method body directly at the call site. For ObjectEqualityComparator and ConfigEqualityComparator, which are invoked on every ATN config comparison — a hot path during prediction — this is a concrete, measurable gain. The .Instance singleton compounds the benefit: one allocation + devirtualized calls, instead of new ObjectEqualityComparator() on every ATNConfigSet construction plus virtual dispatch on every comparison.

The correctness argument. ObjectEqualityComparator and ConfigEqualityComparator encode specific, intentional comparison semantics that the ATN prediction engine depends on being stable. A subclass that overrides Equals or GetHashCode could silently corrupt prediction results in ways that are extremely hard to diagnose. Sealing makes the contract explicit: this comparer is what it is, not a base for customization.

The ArrayPredictionContext and LexerATNConfig argument. These are implementation details of the ATN machinery. ArrayPredictionContext is one of two concrete PredictionContext subtypes; the prediction engine switches on their exact identity via is checks and direct field access. A subclass that doesn't understand those invariants could corrupt context graphs. LexerATNConfig carries lexer-specific state the ATN simulator handles specially. Neither type is documented, neither is in any public API surface that users are expected to program against. Leaving them unsealed is an accidental affordance, not an intentional design.

The breakage risk is low in practice. A search of GitHub for subclasses of these types in C# ANTLR4 consumers returns no results. Mocking frameworks that need to fake IEqualityComparer<ATNConfig> should mock the interface, not the concrete class — that's correct practice regardless of sealing. Any consumer genuinely extending LexerATNConfig or ArrayPredictionContext is deep in the ATN internals at their own risk.

The recommendation is to seal all four, document the decision in the changelog as an intentional API clarification, and accept it as a minor breaking change in the next major or minor version boundary — the same way the BCL has progressively sealed implementation types over the years as the JIT's ability to exploit it has improved.

@HarryCordewener
Copy link
Copy Markdown
Author

Token benchmarks — why we touched OptimizedToken and OptimizedTokenFactory

The problem with CommonToken

CommonToken has two allocation problems that compound across every token the lexer produces.

1. Tuple<ITokenSource, ICharStream> is a heap-allocated class.
CommonToken stores its source pair as a Tuple<ITokenSource, ICharStream>. Every new CommonToken(...) allocates a new Tuple object on top of the token itself — two heap objects per token, always.

2. Text is materialized eagerly on every ToString() call and in many code paths.
Most tokens are never inspected by user code — the parser matches on Type alone. CommonToken still pays the cost of calling input.GetText(start, stop) for tokens whose text is read repeatedly, allocating a new string each time because there is no cache.

What OptimizedToken does differently

ValueTuple instead of Tuple for the source pair.
(ITokenSource, ICharStream) is a struct. It lives inline in the OptimizedToken object's memory — no second heap allocation, no pointer indirection. This is the standard guidance for small fixed-arity groupings that don't need reference identity.

Lazy Text with a single cached result.
_text starts as null. On first access, it calls input.GetText(start, stop) once and caches the result. Every subsequent access returns the cached string. For the overwhelming majority of tokens (matched and discarded), Text is never accessed — cost is zero.

Benchmark results

Method Mean Alloc vs baseline
CreateTokens_CommonTokenFactory (baseline) 1,289.7 ns 27,320 B 1.0×
CreateTokens_OptimizedTokenFactory 639.2 ns 8,024 B 2.0× faster, 70% less memory
GetText_CommonToken (1000 reads) 13,304.0 ns 244,096 B 1.0×
GetText_OptimizedToken (1000 reads) 211.0 ns 224 B 63× faster, 99.9% less memory

The construction improvement (2×, 70% less memory) is entirely the ValueTuple — one allocation instead of two. The GetText improvement (63×, 99.9% less memory) is entirely the cache — 1,000 reads of CommonToken.Text allocate 1,000 strings; 1,000 reads of OptimizedToken.Text allocate one.

Compatibility

OptimizedToken implements both IToken and IWritableToken — identical interfaces to CommonToken. OptimizedTokenFactory implements ITokenFactory. Adoption is a one-liner:

lexer.TokenFactory = OptimizedTokenFactory.Default;

No grammar changes, no parser changes. The copyText constructor overload preserves the CommonTokenFactory(true) contract needed for UnbufferedCharStream, where the input window may slide away before Text is accessed.

- MergeCache: replace (PredictionContext, PredictionContext) ValueTuple
  key with a private PredictionContextPair readonly struct; uses
  RuntimeHelpers.GetHashCode for reference-identity semantics
- OptimizedToken: replace (ITokenSource, ICharStream) ValueTuple field
  with a private TokenSourcePair struct; ValueTuple unavailable on net45
- ValueStringBuilder: guard entire file with
  #if NETSTANDARD2_0_OR_GREATER || NET8_0_OR_GREATER; ref struct and
  Span<T> are not available on net45
- BufferedTokenStream: guard System.Buffers using and ValueStringBuilder
  call site; net45 falls back to plain StringBuilder
- UnbufferedCharStream: guard System.Buffers using and all ArrayPool
  usage; net45 falls back to plain array allocation

No behaviour change on netstandard2.0+ or net8.0.

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
…lasses and OptimizedToken

- SpanInputStream, CharSpanInputStream: remove ReadOnlySpan<char> Data
  property from #else (net45) fallback class — Span unavailable on net45
- StringSpanInputStream: remove ReadOnlySpan<char> Data and
  ReadOnlyMemory<char> AsMemory() from #else (net45) fallback class
- OptimizedToken: guard EmptySource field and ValueTuple constructor
  with #if NETSTANDARD2_0_OR_GREATER || NET8_0_OR_GREATER; add flat
  ITokenSource/ICharStream overload for net45
- OptimizedTokenFactory: guard ValueTuple constructor call site with
  matching #if, using flat parameters on net45

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
Add /p:UseSharedCompilation=false to disable VBCSCompiler server and
/p:CopyRetryCount=3 /p:CopyRetryDelayMilliseconds=500 for retry logic
on both dotnet build invocations. Addresses System.IO.IOException on
Antlr4.Runtime.Standard.deps.json during parallel test execution.

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
@HarryCordewener HarryCordewener force-pushed the feature/csharp-runtime-perf-optimizations branch from 41d093d to 0fee362 Compare May 16, 2026 20:14
@HarryCordewener
Copy link
Copy Markdown
Author

Typescript build failures seem like they are unrelated by all means to this change.

@HarryCordewener HarryCordewener marked this pull request as ready for review May 16, 2026 23:08
@kaby76
Copy link
Copy Markdown
Contributor

kaby76 commented May 17, 2026

Using CharStreams.fromPath(), this PR appears to reduce performance for csharp/v8-spec by about ~35%. There is a similar performance reduction using the CharSpanInputStream() call. See https://github.com/kaby76/a4-4938

@HarryCordewener
Copy link
Copy Markdown
Author

@kaby76 Thanks for that call-out. I'll take a look at that path and see what's going on there.

MergeCache.PredictionContextPair was using identity-based equality
(ReferenceEquals / RuntimeHelpers.GetHashCode), which made the cache
ineffective: structurally-equal but reference-different PredictionContext
objects never hit the cache, forcing redundant merge computations.

Restore value-based semantics (object.Equals / PredictionContext.GetHashCode)
while keeping the flat Dictionary<PredictionContextPair, PredictionContext>
structure (fewer allocations than the prior nested-dict approach).

Benchmark (kaby76/a4-4938 repro, 23 C# files, CSharp grammar, net10.0):
  Before fix: ~5.77s
  After fix:  ~3.29s  (43% faster)

Add MergeCacheBenchmarks to validate cache hit effectiveness via
BenchmarkDotNet.

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
@HarryCordewener
Copy link
Copy Markdown
Author

MergeCache Regression Fix

Root Cause

MergeCache.PredictionContextPair was using identity-based equality (ReferenceEquals + RuntimeHelpers.GetHashCode). During ATN prediction, the runtime frequently creates structurally-identical PredictionContext objects at different heap addresses. With identity semantics the cache never hits for these — effectively disabling caching entirely and forcing full merge recomputation on every call.

Fix

Restore value-based equality in PredictionContextPair:

     public bool Equals(PredictionContextPair other)
    -    => ReferenceEquals(A, other.A) && ReferenceEquals(B, other.B);
    +    => Equals(A, other.A) && Equals(B, other.B);
    
     public override int GetHashCode()
     {
         int h = 17;
    -    h = h * 31 + (A != null ? System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode(A) : 0);
    -    h = h * 31 + (B != null ? System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode(B) : 0);
    +    h = h * 31 + (A != null ? A.GetHashCode() : 0);
    +    h = h * 31 + (B != null ? B.GetHashCode() : 0);
         return h;
     }

PredictionContext already has proper value-based GetHashCode() (returns cachedHashCode) and structural Equals() overrides in its subclasses. The flat Dictionary<PredictionContextPair, PredictionContext> structure from the PR is preserved — fewer allocations than the old nested-dict design.

Benchmark Results

End-to-end (kaby76/a4-4938 repro — 23 C# files, CSharp grammar, net10.0 Release):

Variant Time Delta
PR before fix (identity cache) ~5.77s
PR after fix (value cache) ~3.29s −43%

BenchmarkDotNet micro-benchmark (MergeCacheBenchmarks, MergeCount=500):

Method Mean Allocated Ratio
MergeWithCacheHits 9.3 µs 12.3 KB 0.67x time, 0.11x alloc
MergeWithoutCache (baseline) 13.8 µs 109.4 KB 1.00

Cache hits reduce both time and GC pressure significantly.

@HarryCordewener
Copy link
Copy Markdown
Author

@kaby76 - give that another go.

@kaby76
Copy link
Copy Markdown
Contributor

kaby76 commented May 17, 2026

Much better: 30% faster. This is quite good.

@ericvergnaud
Copy link
Copy Markdown
Contributor

@HarryCordewener Thanks for all the above proposals.
May I suggest that they be provided as separate PR's?
Each of them may require some dedicated discussion.

@kaby76
Copy link
Copy Markdown
Contributor

kaby76 commented May 18, 2026

Can we get Github Copilot turned on in this repo? https://docs.github.com/en/copilot/how-tos/use-copilot-agents/request-a-code-review/use-code-review. While I don't trust Copilot's analysis, I trust human reviews less. In the meantime, I'll test this PR against the entire grammars-v4 repo.

This is what Claude Code wrote up:

❯ Claude, could you give a summary of the changes between git origin/dev feature/csharp-runtime-perf-optimizations, and look for
potential issues in the PR? In addition, compare the changes in runtime/CSharp to that in the other runtime/ runtimes. For example,
would it be beneficial to propagate to apply similar changes to the other targets? You may also refer to the PR description at
https://github.com/antlr/antlr4/pull/4938.

.....

 PR #4938 — C# Runtime Performance Optimizations: Review

  What the PR Does (12 commits)

  The branch introduces a batch of allocation-reduction and throughput optimizations to the ANTLR4 C# runtime, organized into six
  main areas:

  1. New TFMs and Build Changes (Antlr4.csproj)

  - Adds netstandard2.1 and net8.0 build targets (in addition to existing net45 / netstandard2.0).
  - Sets <LangVersion>12</LangVersion> project-wide.
  - Adds System.Memory / System.Buffers polyfill package references for netstandard2.0.

  2. New Input Stream Classes

  Three new ICharStream implementations, each with #if NET8_0_OR_GREATER / #else dual implementations (backed by a new
  BaseInputCharStream base class on older TFMs):

  ┌───────────────────────┬────────────────────────────────────┬──────────────────────┐
  │         Class         │          Backing storage           │ Copy on construction │
  ├───────────────────────┼────────────────────────────────────┼──────────────────────┤
  │ SpanInputStream       │ char[] (from string)               │ Yes (ToCharArray)    │
  ├───────────────────────┼────────────────────────────────────┼──────────────────────┤
  │ CharSpanInputStream   │ char[] (from char[]/Reader/Stream) │ No (direct use)      │
  ├───────────────────────┼────────────────────────────────────┼──────────────────────┤
  │ StringSpanInputStream │ string (direct ref)                │ No                   │
  └───────────────────────┴────────────────────────────────────┴──────────────────────┘

  All three expose a zero-allocation GetTextSpan(Interval) returning ReadOnlySpan<char> on net8.0+.

  3. OptimizedToken + OptimizedTokenFactory

  - Replaces Tuple<ITokenSource, ICharStream> source pair with a private TokenSourcePair struct → eliminates one heap allocation per
  token.
  - Defers .Text materialization until first access (lazy substring extraction).
  - Drop-in replacements for CommonToken / CommonTokenFactory: lexer.TokenFactory = OptimizedTokenFactory.Default.

  4. ATN Internals Optimizations

  - MergeCache: Replaces nested Dictionary<PredictionContext, Dictionary<PredictionContext, PredictionContext>> with a flat
  Dictionary<PredictionContextPair, PredictionContext> (struct key) — fewer allocations, one-level lookup.
  - ObjectPool<T>: New simple non-thread-safe stack-based pool.
  - ParserATNSimulator: Pools ATNConfigSet, HashSet<ATNConfig>, and MergeCache instances per-simulator to reduce GC pressure during
  prediction.
  - ATNConfigSet.PoolReset(): Full reset method for pool-safe reuse.
  - ATNDeserializer: New Deserialize(ReadOnlySpan<int>) overload (netstandard2.1+).
  - LexerATNConfig, ArrayPredictionContext: Sealed (enables JIT devirtualization).

  5. Hash Function Improvements

  - ATNConfig.GetHashCode(), LexerATNConfig.GetHashCode(), ConfigEqualityComparator.GetHashCode(),
  SemanticContext.Predicate.GetHashCode() — use System.HashCode.Combine on net8.0+ (faster, better distribution) vs. Murmur on older
  TFMs.
  - SemanticContext.AND/OR.GetHashCode(): Cache typeof(AND/OR).GetHashCode() in a static field — was calling reflection-backed
  Type.GetHashCode() on every invocation.
  - Interval: Implements IEquatable<Interval>, adds ==/!= operators.
  - Singleton ObjectEqualityComparator.Instance / ConfigEqualityComparator.Instance — avoid per-configset allocator allocation.

  6. BufferedTokenStream Micro-Optimizations

  - Pre-sized list allocations (new List<IToken>(stop - start + 1) etc.).
  - ValueStringBuilder (stack-allocated char buffer with ArrayPool fallback) for GetText on modern TFMs.
  - Initial token buffer capacity raised from 100 → 1024.
  - Removed if (t is IWritableToken) null guard on TokenIndex assignment.

  7. Codegen Template (CSharp.stg)

  On net8.0+, generated parsers/lexers emit _serializedATN as ReadOnlySpan<int> property and call the new
  Deserialize(ReadOnlySpan<int>) overload.

  ---
  Potential Issues

  Bug / Correctness

  1. ReadOnlySpan<int> _serializedATN => new int[] { … } does NOT produce RVA static data

  The PR comment and commit message claim this avoids keeping a static int[] on the heap using "RVA static data". This is incorrect
  as written. new int[] { … } in a ReadOnlySpan<int> property expression always allocates a heap array on every property access. RVA
  statics in C# 12 / .NET 8+ require either:
  - A UTF-8 string literal for ReadOnlySpan<byte> ("..."u8), or
  - A collection expression ([1, 2, 3, ...] — not new int[] { … })

  Since the static constructor (new ATNDeserializer().Deserialize(_serializedATN)) only runs once, the excess allocation is a
  one-time cost. But the stated benefit of eliminating the long-lived int[] from the heap is not achieved — the new int[] lives as
  long as the GC needs it (until deserialization completes, then collected). The legacy private static int[] _serializedATN = { … }
  field is actually cheaper (one allocation, retained, no re-creation). The net8.0 codegen path as written is worse than the legacy
  path in terms of the initial allocation pattern, not better.

  Fix: Use a collection expression — private static ReadOnlySpan<int> _serializedATN => [<elements>]; — which the C# 12 compiler and
  .NET 8 JIT will store as RVA data.

  2. SerializedAtn property on net8.0+ allocates on every call

  public override int[] SerializedAtn { get { return _serializedATN.ToArray(); } }
  Each access to SerializedAtn calls _serializedATN (heap array allocation) then .ToArray() (second heap copy). If this property is
  called more than once (diagnostic tools, reflection-based tools), it's unnecessarily expensive. Even caching just SerializedAtn in
  a field would help.

  3. Removed IWritableToken guard is a silent breaking change

  Original:
  if (t is IWritableToken) ((IWritableToken)t).TokenIndex = tokens.Count;
  New:
  ((IWritableToken)t).TokenIndex = tokens.Count;
  Any custom ITokenFactory returning tokens that don't implement IWritableToken (which is allowed by the interface contract) will now
   get an InvalidCastException instead of silently working. This is a backward-incompatible behavior change. The guard should be
  reinstated, or at least documented.

  4. OptimizedToken copy constructor tight-coupling to CommonToken.source

  else if (oldToken is CommonToken ct)
  {
      _text = ct.source.Item2 != null ? null : oldToken.Text;
      _source = new TokenSourcePair(ct.source.Item1, ct.source.Item2);
  }
  CommonToken.source is protected internal — accessible within the assembly, but this creates a white-box dependency on CommonToken's
   internal representation. If CommonToken ever changes its source field name/type, OptimizedToken breaks silently. Also, the logic
  ct.source.Item2 != null ? null : oldToken.Text means: if there's a stream, don't copy text (rely on lazy re-fetch). But lazy
  re-fetch requires the stream to remain valid — if it was an UnbufferedCharStream that has advanced past those indices, the
  re-fetched text will be wrong.

  Design / Minor Issues

  5. Code duplication between SpanInputStream and CharSpanInputStream

  These two classes share ~200 lines of nearly identical code (the LA(), Consume(), Seek(), GetText(), GetTextSpan(), Mark(),
  Release() implementations are verbatim copies). The only real difference is that SpanInputStream has a string constructor.
  CharSpanInputStream is also more confusingly named — it sounds like the "canonical" char-stream class. Consider consolidating them
  or clarifying the naming convention in documentation.

  6. IntervalSet.intervals field type changed from IList<Interval> to List<Interval>

  This is a public API field (despite being protected internal). Any subclass that assigns a non-List<T> to intervals (e.g., an
  ImmutableList) will break. Worth calling out in a migration note.

  7. tokens = new List<IToken>(1024) initial capacity is aggressive

  Increased 10× from 100. For small grammars (expressions, config files), the typical token count is well under 100. Allocating 1024
  slots upfront wastes ~8KB per BufferedTokenStream creation. Consider basing this on a smaller default (e.g., 256) or making it
  configurable.

  8. Pool management methods are internal instead of private

  GetPooledConfigSet, ReturnConfigSet, GetPooledClosureBusy, ReturnClosureBusy are marked internal. Since ParserATNSimulator is a
  public, inheritable class, this exposes pool management to all other classes in the assembly. If a subclass (in the same assembly
  or via InternalsVisibleTo) overrides computeReachSet and calls GetPooledConfigSet without a matching ReturnConfigSet (e.g., due to
  an exception), it will silently leak pool slots. private with protected virtual hooks would be safer.

  9. ATNDeserializer.Deserialize(ReadOnlySpan<int>) leaves this.data = null

  After the span overload runs, this.data is null. If the same ATNDeserializer instance is reused (the generated code does new
  ATNDeserializer().Deserialize(...) so this is fine today), calling other methods on the instance would NPE. This is fragile and
  should be documented or guarded.

  ---
  Comparison with Other Runtimes

  ┌───────────────────────────────┬────────────────┬───────────────────────┬────────────┬─────────────┬───────────┬─────┬─────┐
  │         Optimization          │  C# (this PR)  │         Java          │    C++     │   Python    │   Swift   │ Go  │ JS  │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ MergeCache: flat map w/ pair  │ ✅             │ ❌ still nested       │ ❌ nested  │ N/A (dict)  │ N/A       │ N/A │ N/A │
  │ key                           │                │ DoubleKeyMap          │ map        │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Hash: use platform hash API   │ ✅ net8.0+     │ ❌ Murmur only        │ ❌ Murmur  │ ✅ (tuple   │ ✅        │ ✅  │ ✅  │
  │                               │                │                       │ only       │ hash)       │ (Hasher)  │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Cache                         │ ✅             │ ❌                    │ ❌         │ N/A         │ N/A       │ N/A │ N/A │
  │ typeof(AND).GetHashCode()     │                │                       │            │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Singleton comparator          │ ✅             │ ❌                    │ N/A        │ N/A         │ N/A       │ N/A │ N/A │
  │ instances                     │                │                       │            │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Sealed ATN config subclasses  │ ✅             │ ❌                    │ N/A        │ N/A         │ N/A       │ N/A │ N/A │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ ATNConfigSet / mergeCache     │ ✅             │ ❌                    │ ❌         │ N/A         │ N/A       │ N/A │ N/A │
  │ pooling                       │                │                       │            │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Lazy token text               │ ✅             │ ❌                    │ ❌         │ ❌          │ ❌        │ ❌  │ ❌  │
  │ materialization               │                │                       │            │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ IList → List for performance  │ ✅             │ ❌ uses List natively │ ✅ STL     │ N/A         │ N/A       │ N/A │ N/A │
  │                               │ (IntervalSet)  │                       │            │             │           │     │     │
  └───────────────────────────────┴────────────────┴───────────────────────┴────────────┴─────────────┴───────────┴─────┴─────┘

  Recommended back-ports to other runtimes

  Java (highest value):
  - DoubleKeyMap → flat map: Replace DoubleKeyMap<PredictionContext, PredictionContext, PredictionContext> in ParserATNSimulator with
   a HashMap keyed on a pair object. Reduces cache lookup from two hash table lookups to one.
  - Cache typeof(AND/OR).GetHashCode() equivalent: SemanticContext.AND/OR.hashCode() calls getClass().hashCode() every invocation —
  caching this as a private static final int is a trivial speedup.
  - Singleton comparators: Java's ATNConfigSet creates comparator instances per set; these can be singletons.

  C++:
  - MergeCache refactor: C++'s DoubleKeyMap equivalent uses nested unordered_map — same improvement applies.
  - Cache SemanticContext type hash: Same pattern as Java.

  Swift/Go/JS/Python/Dart: These runtimes use their language's native hash/equality mechanisms and are already at or near idiomatic
  efficiency. The most impactful remaining area would be lazy token text materialization, which benefits all runtimes with buffered
  streams, but requires interface changes and isn't a simple port.

  ---
  Summary

  The PR is a solid and substantive improvement. The most serious issue to fix before merging is #1 (the ReadOnlySpan<int> codegen
  doesn't actually achieve RVA statics — needs collection expression syntax). Issue #3 (removed IWritableToken guard) is a quiet
  breaking change that should be addressed. The rest are design concerns rather than blockers.

✻ Sautéed for 4m 51s

@kaby76
Copy link
Copy Markdown
Contributor

kaby76 commented May 18, 2026

Could we get Github Copilot turned on for this repo?

Here's a summary that Claude Code wrote up. In particular, I asked it to consider the other targets, as that was not usually considered.

❯ Claude, could you give a summary of the changes between git origin/dev feature/csharp-runtime-perf-optimizations, and look for
potential issues in the PR? In addition, compare the changes in runtime/CSharp to that in the other runtime/ runtimes. For example,
would it be beneficial to propagate and apply similar changes to the other targets? You may also refer to the PR description at
https://github.com/antlr/antlr4/pull/4938.

....

  ---
  PR #4938 — C# Runtime Performance Optimizations: Review

  What the PR Does (12 commits)

  The branch introduces a batch of allocation-reduction and throughput optimizations to the ANTLR4 C# runtime, organized into six
  main areas:

  1. New TFMs and Build Changes (Antlr4.csproj)

  - Adds netstandard2.1 and net8.0 build targets (in addition to existing net45 / netstandard2.0).
  - Sets <LangVersion>12</LangVersion> project-wide.
  - Adds System.Memory / System.Buffers polyfill package references for netstandard2.0.

  2. New Input Stream Classes

  Three new ICharStream implementations, each with #if NET8_0_OR_GREATER / #else dual implementations (backed by a new
  BaseInputCharStream base class on older TFMs):

  ┌───────────────────────┬────────────────────────────────────┬──────────────────────┐
  │         Class         │          Backing storage           │ Copy on construction │
  ├───────────────────────┼────────────────────────────────────┼──────────────────────┤
  │ SpanInputStream       │ char[] (from string)               │ Yes (ToCharArray)    │
  ├───────────────────────┼────────────────────────────────────┼──────────────────────┤
  │ CharSpanInputStream   │ char[] (from char[]/Reader/Stream) │ No (direct use)      │
  ├───────────────────────┼────────────────────────────────────┼──────────────────────┤
  │ StringSpanInputStream │ string (direct ref)                │ No                   │
  └───────────────────────┴────────────────────────────────────┴──────────────────────┘

  All three expose a zero-allocation GetTextSpan(Interval) returning ReadOnlySpan<char> on net8.0+.

  3. OptimizedToken + OptimizedTokenFactory

  - Replaces Tuple<ITokenSource, ICharStream> source pair with a private TokenSourcePair struct → eliminates one heap allocation per
  token.
  - Defers .Text materialization until first access (lazy substring extraction).
  - Drop-in replacements for CommonToken / CommonTokenFactory: lexer.TokenFactory = OptimizedTokenFactory.Default.

  4. ATN Internals Optimizations

  - MergeCache: Replaces nested Dictionary<PredictionContext, Dictionary<PredictionContext, PredictionContext>> with a flat
  Dictionary<PredictionContextPair, PredictionContext> (struct key) — fewer allocations, one-level lookup.
  - ObjectPool<T>: New simple non-thread-safe stack-based pool.
  - ParserATNSimulator: Pools ATNConfigSet, HashSet<ATNConfig>, and MergeCache instances per-simulator to reduce GC pressure during
  prediction.
  - ATNConfigSet.PoolReset(): Full reset method for pool-safe reuse.
  - ATNDeserializer: New Deserialize(ReadOnlySpan<int>) overload (netstandard2.1+).
  - LexerATNConfig, ArrayPredictionContext: Sealed (enables JIT devirtualization).

  5. Hash Function Improvements

  - ATNConfig.GetHashCode(), LexerATNConfig.GetHashCode(), ConfigEqualityComparator.GetHashCode(),
  SemanticContext.Predicate.GetHashCode() — use System.HashCode.Combine on net8.0+ (faster, better distribution) vs. Murmur on older
  TFMs.
  - SemanticContext.AND/OR.GetHashCode(): Cache typeof(AND/OR).GetHashCode() in a static field — was calling reflection-backed
  Type.GetHashCode() on every invocation.
  - Interval: Implements IEquatable<Interval>, adds ==/!= operators.
  - Singleton ObjectEqualityComparator.Instance / ConfigEqualityComparator.Instance — avoid per-configset allocator allocation.

  6. BufferedTokenStream Micro-Optimizations

  - Pre-sized list allocations (new List<IToken>(stop - start + 1) etc.).
  - ValueStringBuilder (stack-allocated char buffer with ArrayPool fallback) for GetText on modern TFMs.
  - Initial token buffer capacity raised from 100 → 1024.
  - Removed if (t is IWritableToken) null guard on TokenIndex assignment.

  7. Codegen Template (CSharp.stg)

  On net8.0+, generated parsers/lexers emit _serializedATN as ReadOnlySpan<int> property and call the new
  Deserialize(ReadOnlySpan<int>) overload.

  ---
  Potential Issues

  Bug / Correctness

  1. ReadOnlySpan<int> _serializedATN => new int[] { … } does NOT produce RVA static data

  The PR comment and commit message claim this avoids keeping a static int[] on the heap using "RVA static data". This is incorrect
  as written. new int[] { … } in a ReadOnlySpan<int> property expression always allocates a heap array on every property access. RVA
  statics in C# 12 / .NET 8+ require either:
  - A UTF-8 string literal for ReadOnlySpan<byte> ("..."u8), or
  - A collection expression ([1, 2, 3, ...] — not new int[] { … })

  Since the static constructor (new ATNDeserializer().Deserialize(_serializedATN)) only runs once, the excess allocation is a
  one-time cost. But the stated benefit of eliminating the long-lived int[] from the heap is not achieved — the new int[] lives as
  long as the GC needs it (until deserialization completes, then collected). The legacy private static int[] _serializedATN = { … }
  field is actually cheaper (one allocation, retained, no re-creation). The net8.0 codegen path as written is worse than the legacy
  path in terms of the initial allocation pattern, not better.

  Fix: Use a collection expression — private static ReadOnlySpan<int> _serializedATN => [<elements>]; — which the C# 12 compiler and
  .NET 8 JIT will store as RVA data.

  2. SerializedAtn property on net8.0+ allocates on every call

  public override int[] SerializedAtn { get { return _serializedATN.ToArray(); } }
  Each access to SerializedAtn calls _serializedATN (heap array allocation) then .ToArray() (second heap copy). If this property is
  called more than once (diagnostic tools, reflection-based tools), it's unnecessarily expensive. Even caching just SerializedAtn in
  a field would help.

  3. Removed IWritableToken guard is a silent breaking change

  Original:
  if (t is IWritableToken) ((IWritableToken)t).TokenIndex = tokens.Count;
  New:
  ((IWritableToken)t).TokenIndex = tokens.Count;
  Any custom ITokenFactory returning tokens that don't implement IWritableToken (which is allowed by the interface contract) will now
   get an InvalidCastException instead of silently working. This is a backward-incompatible behavior change. The guard should be
  reinstated, or at least documented.

  4. OptimizedToken copy constructor tight-coupling to CommonToken.source

  else if (oldToken is CommonToken ct)
  {
      _text = ct.source.Item2 != null ? null : oldToken.Text;
      _source = new TokenSourcePair(ct.source.Item1, ct.source.Item2);
  }
  CommonToken.source is protected internal — accessible within the assembly, but this creates a white-box dependency on CommonToken's
   internal representation. If CommonToken ever changes its source field name/type, OptimizedToken breaks silently. Also, the logic
  ct.source.Item2 != null ? null : oldToken.Text means: if there's a stream, don't copy text (rely on lazy re-fetch). But lazy
  re-fetch requires the stream to remain valid — if it was an UnbufferedCharStream that has advanced past those indices, the
  re-fetched text will be wrong.

  Design / Minor Issues

  5. Code duplication between SpanInputStream and CharSpanInputStream

  These two classes share ~200 lines of nearly identical code (the LA(), Consume(), Seek(), GetText(), GetTextSpan(), Mark(),
  Release() implementations are verbatim copies). The only real difference is that SpanInputStream has a string constructor.
  CharSpanInputStream is also more confusingly named — it sounds like the "canonical" char-stream class. Consider consolidating them
  or clarifying the naming convention in documentation.

  6. IntervalSet.intervals field type changed from IList<Interval> to List<Interval>

  This is a public API field (despite being protected internal). Any subclass that assigns a non-List<T> to intervals (e.g., an
  ImmutableList) will break. Worth calling out in a migration note.

  7. tokens = new List<IToken>(1024) initial capacity is aggressive

  Increased 10× from 100. For small grammars (expressions, config files), the typical token count is well under 100. Allocating 1024
  slots upfront wastes ~8KB per BufferedTokenStream creation. Consider basing this on a smaller default (e.g., 256) or making it
  configurable.

  8. Pool management methods are internal instead of private

  GetPooledConfigSet, ReturnConfigSet, GetPooledClosureBusy, ReturnClosureBusy are marked internal. Since ParserATNSimulator is a
  public, inheritable class, this exposes pool management to all other classes in the assembly. If a subclass (in the same assembly
  or via InternalsVisibleTo) overrides computeReachSet and calls GetPooledConfigSet without a matching ReturnConfigSet (e.g., due to
  an exception), it will silently leak pool slots. private with protected virtual hooks would be safer.

  9. ATNDeserializer.Deserialize(ReadOnlySpan<int>) leaves this.data = null

  After the span overload runs, this.data is null. If the same ATNDeserializer instance is reused (the generated code does new
  ATNDeserializer().Deserialize(...) so this is fine today), calling other methods on the instance would NPE. This is fragile and
  should be documented or guarded.

  ---
  Comparison with Other Runtimes

  ┌───────────────────────────────┬────────────────┬───────────────────────┬────────────┬─────────────┬───────────┬─────┬─────┐
  │         Optimization          │  C# (this PR)  │         Java          │    C++     │   Python    │   Swift   │ Go  │ JS  │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ MergeCache: flat map w/ pair  │ ✅             │ ❌ still nested       │ ❌ nested  │ N/A (dict)  │ N/A       │ N/A │ N/A │
  │ key                           │                │ DoubleKeyMap          │ map        │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Hash: use platform hash API   │ ✅ net8.0+     │ ❌ Murmur only        │ ❌ Murmur  │ ✅ (tuple   │ ✅        │ ✅  │ ✅  │
  │                               │                │                       │ only       │ hash)       │ (Hasher)  │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Cache                         │ ✅             │ ❌                    │ ❌         │ N/A         │ N/A       │ N/A │ N/A │
  │ typeof(AND).GetHashCode()     │                │                       │            │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Singleton comparator          │ ✅             │ ❌                    │ N/A        │ N/A         │ N/A       │ N/A │ N/A │
  │ instances                     │                │                       │            │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Sealed ATN config subclasses  │ ✅             │ ❌                    │ N/A        │ N/A         │ N/A       │ N/A │ N/A │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ ATNConfigSet / mergeCache     │ ✅             │ ❌                    │ ❌         │ N/A         │ N/A       │ N/A │ N/A │
  │ pooling                       │                │                       │            │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ Lazy token text               │ ✅             │ ❌                    │ ❌         │ ❌          │ ❌        │ ❌  │ ❌  │
  │ materialization               │                │                       │            │             │           │     │     │
  ├───────────────────────────────┼────────────────┼───────────────────────┼────────────┼─────────────┼───────────┼─────┼─────┤
  │ IList → List for performance  │ ✅             │ ❌ uses List natively │ ✅ STL     │ N/A         │ N/A       │ N/A │ N/A │
  │                               │ (IntervalSet)  │                       │            │             │           │     │     │
  └───────────────────────────────┴────────────────┴───────────────────────┴────────────┴─────────────┴───────────┴─────┴─────┘

  Recommended back-ports to other runtimes

  Java (highest value):
  - DoubleKeyMap → flat map: Replace DoubleKeyMap<PredictionContext, PredictionContext, PredictionContext> in ParserATNSimulator with
   a HashMap keyed on a pair object. Reduces cache lookup from two hash table lookups to one.
  - Cache typeof(AND/OR).GetHashCode() equivalent: SemanticContext.AND/OR.hashCode() calls getClass().hashCode() every invocation —
  caching this as a private static final int is a trivial speedup.
  - Singleton comparators: Java's ATNConfigSet creates comparator instances per set; these can be singletons.

  C++:
  - MergeCache refactor: C++'s DoubleKeyMap equivalent uses nested unordered_map — same improvement applies.
  - Cache SemanticContext type hash: Same pattern as Java.

  Swift/Go/JS/Python/Dart: These runtimes use their language's native hash/equality mechanisms and are already at or near idiomatic
  efficiency. The most impactful remaining area would be lazy token text materialization, which benefits all runtimes with buffered
  streams, but requires interface changes and isn't a simple port.

  ---
  Summary

  The PR is a solid and substantive improvement. The most serious issue to fix before merging is #1 (the ReadOnlySpan<int> codegen
  doesn't actually achieve RVA statics — needs collection expression syntax). Issue #3 (removed IWritableToken guard) is a quiet
  breaking change that should be addressed. The rest are design concerns rather than blockers.

✻ Sautéed for 4m 51s

(NB: I thought that posted this comment before, but perhaps I didn't because when I reloaded this page, the comment was gone.)

@HarryCordewener
Copy link
Copy Markdown
Author

I'm with you @ericvergnaud on splitting it out. I will see what the safest and healthiest splits are.

It's likely that the OptimizedToken optionals and the Streams will get cut into a separate branch, but keep the rest - as those two are the lowest gains for a non-DSL grammar (and can be added by outside sources without a PR into this repo)

But before I do, I will address @kaby76's Claude review, so that this branch remains linear in discussion.

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
@HarryCordewener
Copy link
Copy Markdown
Author

HarryCordewener commented May 18, 2026

Verification of Claude Code's Review Claims

1. "ReadOnlySpan _serializedATN => new int[] { … } does NOT produce RVA static data"

VERDICT: PARTIALLY CORRECT, but nuanced.

Claude claims new int[] { … } in a ReadOnlySpan property always allocates a heap array. This is the KEY issue and it's complicated:

The fix Claude suggests is also correct: change new int[] { ... } to a collection expression [...] to get actual RVA backing. Andrew Lock's article confirms that ReadOnlySpan = [1, 2, 3, 4, 5] compiles to RuntimeHelpers.CreateSpan() which wraps a pointer directly into the assembly data.

However, the practical impact is mitigated because _serializedATN is only accessed once (in the static field initializer for _ATN), so it's a one-time allocation. But it means the stated benefit ("eliminates a permanent Gen2 managed heap allocation") is wrong — it actually creates a short-lived heap array that gets GC'd after deserialization, while the old static int[] field was one allocation retained forever. The old way is arguably better for this access pattern (allocated once, retained, no re-creation).

2. "SerializedAtn property on net8.0+ allocates on every call"

VERDICT: CORRECT.

Looking at line 296 of CSharp.stg:

public override int[] SerializedAtn { get { return _serializedATN.ToArray(); } }

On net8.0+, _serializedATN is a ReadOnlySpan property that allocates new int[] each invocation, then .ToArray() copies it again. That's TWO allocations per access. Claude is right that this is worse than the legacy path if called more than once. Practically it's rarely called (diagnostic tools), so low risk, but the analysis is correct.

3. "Removed IWritableToken guard is a silent breaking change"

VERDICT: CORRECT in principle, LOW risk in practice.

The ITokenFactory contract technically allows returning tokens that don't implement IWritableToken. Removing the guard means such tokens would throw InvalidCastException. Claude is right that this is technically a backward-incompatible change. However, in practice every token implementation in the ANTLR ecosystem implements IWritableToken, so the real-world risk is negligible. The original comment in the code ("all tokens implement IWritableToken") is the stated justification.

4. "OptimizedToken copy constructor tight-coupling to CommonToken.source"

VERDICT: MOSTLY CORRECT but overstated risk.

Claude correctly identifies the white-box dependency on CommonToken's internal source field. The concern about UnbufferedCharStream is valid — if the stream has advanced past the token's indices, lazy re-fetch will return wrong text. However, this is the same hazard CommonToken already has (it also re-fetches lazily by default). The copyText parameter exists precisely for this case. So the risk exists but it's not new to OptimizedToken.

5. "Code duplication between SpanInputStream and CharSpanInputStream"

VERDICT: CORRECT — straightforward code smell observation, valid.

6. "IntervalSet.intervals field type changed from IList to List"

VERDICT: CORRECT — technically breaking for subclasses. The field is protected internal so any subclass assigning a non-List would break. Low real-world risk since nobody subclasses IntervalSet in practice.

7. "tokens = new List(1024) initial capacity is aggressive"

VERDICT: REASONABLE CONCERN. 1024 * 8 bytes (reference) = 8KB per stream. For small inputs this wastes memory. Whether it matters depends on how many concurrent parsers exist. A moderate concern, not a bug.

8. "Pool management methods are internal instead of private"

VERDICT: VALID design concern but low practical risk since ANTLR internals are rarely subclassed.

9. "ATNDeserializer.Deserialize(ReadOnlySpan) leaves this.data = null"

VERDICT: CORRECT but irrelevant — as Claude itself notes, the generated code always does new ATNDeserializer().Deserialize(...) so instance reuse doesn't happen.

Summary

The most important finding is Issue #1 — the RVA static data claim. Claude Code is correct that new int[] { ... } does NOT produce RVA data for ReadOnlySpan. The fix is to use a collection expression [...] in the template. This is the only issue that represents a genuine correctness problem (stated benefit doesn't match reality). The rest are valid but lower-priority design/compatibility observations.

Changes made:

Fix #1 — RVA static data via collection expressions (3 files):

  • tool/resources/.../CSharp/CSharp.stg line 1058: new int[] { ... } → [ ... ]
  • runtime/CSharp/src/Tree/Xpath/XPathLexer.cs line 116: same change, plus split the closing }; into per-branch ]; / }; with #if

This means on NET8+, _serializedATN is now backed by RuntimeHelpers.CreateSpan() reading from RVA static data — zero heap allocation.

Fix #2 — Cache SerializedAtn (same 2 files + XPathLexer):

  • Added private int[] _cachedSerializedATN; field
  • Changed property to _cachedSerializedATN ??= _serializedATN.ToArray()
  • This ensures .ToArray() only runs once if external code (e.g. GetATNWithBypassAlts) calls the property multiple times.

Fix #7 — Token list initial capacity (1 file):

  • runtime/CSharp/src/BufferedTokenStream.cs line 57: new List(1024) → new List(256)
  • Middle ground: still pre-allocates to avoid early resizes for typical grammars, but doesn't over-allocate 8KB for small inputs. List will double as needed (256→512→1024) so large inputs still converge quickly.

@HarryCordewener
Copy link
Copy Markdown
Author

@kaby76 - give Claude another go at the current state, see if it has anything it believes must be changed. After that, I can fork out the OptimizedToken and Span stream classes.

@kaby76
Copy link
Copy Markdown
Contributor

kaby76 commented May 19, 2026

  • Tested this PR against grammars-v4. Looks okay. For some grammars that have speculative sub-parses (because Antlr4 got rid of syntactic predicates), these failed with assembly version conflicts. Overall, it was slightly faster: ~5m out of 2h. The small improvement is likely because most grammar are tested only slightly: people write a grammar, add a couple of test files, then declare "victory" if the two or three files parse even though less than 50% of the grammar is exercised. Fortunately, I taught Claude Code to interactively add to the test suite and retest with Trash trcover.

  • This is what claude outputed.

$ claude -r

...

❯ Claude, this PR (https://github.com/antlr/antlr4/pull/4938) was updated recently. I believe you reviewed commit hash
03c3f019728078c01d74ca0f35bfde35c52efd98; it has been updated with 7e85589b956dda565c48d5696406d92083ff53cb, and has comments
addressing yours at https://github.com/antlr/antlr4/pull/4938#issuecomment-4480289238. Stated in
https://github.com/antlr/antlr4/pull/4938#issuecomment-4484279827, describe what the recent commit(s) changed, and whether there is
"anything [further ...] that must be changed" before splitting this PR into two PRs.

...

● ---
  What the "PR Adjustments" Commit (7e85589b) Changed

  The single commit touches exactly 3 files and addresses 3 of my earlier findings:

  1. Fixed: _serializedATN collection expression → actual RVA statics (CSharp.stg, XPathLexer.cs)

  -private static ReadOnlySpan<int> _serializedATN => new int[] {
  +private static ReadOnlySpan<int> _serializedATN => [
       ...
  -};
  +];
  Both the generator template and the hand-maintained XPathLexer.cs are updated. Using […] (C# 12 collection expression) correctly
  lowers to RuntimeHelpers.CreateSpan<int>() backed by RVA data in the PE image, which is what the PR claimed from the start. This
  fix is correct and complete.

  2. Fixed: SerializedAtn allocated a new array on every call (CSharp.stg, XPathLexer.cs)

  -public override int[] SerializedAtn { get { return _serializedATN.ToArray(); } }
  +private int[] _cachedSerializedATN;
  +public override int[] SerializedAtn { get { return _cachedSerializedATN ??= _serializedATN.ToArray(); } }
  The ??= lazy cache means the ToArray() copy from the RVA span happens at most once per parser/lexer class. This fix is correct.

  3. Fixed: BufferedTokenStream initial capacity reduced (BufferedTokenStream.cs)

  -protected internal IList<IToken> tokens = new List<IToken>(1024);
  +protected internal IList<IToken> tokens = new List<IToken>(256);
  256 is a more reasonable default. Addressed.

  ---
  What Remains Open Before Splitting

  HarryCordewener responded to the review in https://github.com/antlr/antlr4/pull/4938#issuecomment-4480289238 with per-finding
  verdicts. Three issues he agreed with are now fixed. Three others he assessed as "correct but low risk" and left in place. Here is
  the current status of each:

  ┌──────────────────────────────────────────┬──────────────────┬──────────────────────────────────────────────┐
  │                  Issue                   │      Status      │             Verdict from author              │
  ├──────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────┤
  │ RVA static new int[] → […]               │ ✅ Fixed         │ Correct, now fixed                           │
  ├──────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────┤
  │ SerializedAtn double-alloc               │ ✅ Fixed         │ Correct, now fixed                           │
  ├──────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────┤
  │ Initial capacity 1024 → reasonable       │ ✅ Fixed (256)   │ Addressed                                    │
  ├──────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────┤
  │ IWritableToken guard removed             │ ❌ Still present │ "Correct in principle, low risk in practice" │
  ├──────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────┤
  │ IntervalSet.intervals: IList → List      │ ❌ Still present │ (not addressed in the comment)               │
  ├──────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────┤
  │ Pool methods internal instead of private │ ❌ Still present │ (not addressed)                              │
  ├──────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────┤
  │ ATNDeserializer sets this.data = null    │ ❌ Still present │ Low risk given usage                         │
  └──────────────────────────────────────────┴──────────────────┴──────────────────────────────────────────────┘

  My assessment of what must change before the split

  Must fix before splitting:

  1. Restore the IWritableToken guard (BufferedTokenStream.cs:244). This is the only change in the core PR (PR B — the non-opt-in
  part) that breaks existing custom ITokenFactory implementations. The OptimizedToken and stream classes going into PR A don't affect
   this. Restoring the guard is a one-liner:
  // Before:
  ((IWritableToken)t).TokenIndex = tokens.Count;
  // After:
  if (t is IWritableToken wt) wt.TokenIndex = tokens.Count;

  Should document before splitting:

  2. IntervalSet.intervals field type narrows to List<Interval>. Any subclass that assigned an IList<Interval> that isn't a List<T>
  will fail to compile after upgrade. Whether this matters in practice depends on what's in the wild — it's not directly visible from
   this repo. If no community subclasses of IntervalSet override intervals (likely), it can stay; but it should be called out in the
  PR description or CHANGES.txt as a minor breaking change.

  Can defer (fine as-is):

  3. Pool method visibility (internal on ParserATNSimulator) — pool misuse requires same-assembly code; no practical risk.
  4. this.data = null after span deserialize — the generated code always does new ATNDeserializer().Deserialize(...), so the instance
   is discarded immediately after.

  Bottom line: Fix the IWritableToken guard (trivial one-liner), note the IntervalSet narrowing, then the split is clean.

✻ Churned for 1m 55s

Again, we need to have Copilot turned on for this repo. I cross-check AI blather with alternate AI.

Restores the 'is' pattern check before setting TokenIndex, preventing
InvalidCastException for custom ITokenFactory implementations that
return tokens not implementing IWritableToken.

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
@HarryCordewener
Copy link
Copy Markdown
Author

HarryCordewener commented May 19, 2026

Regarding the two remaining items Claude flagged:

IWritableToken guard

Agreed — restoring this is the right call. I'll put the is check back:

if (t is IWritableToken wt) wt.TokenIndex = tokens.Count;

The isinst IL instruction compiles to a single MT pointer comparison (~1 cycle). On a 100K-token file that's sub-microsecond total overhead — effectively zero. The correctness argument is what matters: anyone with a custom ITokenFactory returning tokens that don't implement IWritableToken would get an InvalidCastException at runtime without the guard. That's a behavioral contract violation we shouldn't ship.

I checked GitHub — most ecosystem consumers (TypeCobol, SharpMUSH, IS4Code/Sona, and even the vendored copies in microsoft-ui-xaml, Azure Cosmos, XSharp) all use the if (t is IWritableToken ...) pattern. The few that direct-cast (sql-bi/Vpax-Obfuscator, dotnet/runtime ilasm) happen to use tokens that do implement the interface, but we shouldn't rely on that assumption for all consumers.

IntervalSet.intervals: List<Interval> (keeping as-is)

I'm keeping this as List. Here's why:

Performance: Using the concrete type allows the JIT to devirtualize Count, [], Add(), RemoveAt(), and Insert() — avoiding interface dispatch (~2-5ns per call). The Add method loops over intervals with indexed access and mutations, so this adds up during ATN construction for grammars with many token types. Not dramatic end-to-end (<1% of total parse time since IntervalSet ops aren't in the token-by-token loop), but it's free perf we'd be leaving on the table.

Ecosystem impact: I searched GitHub for any third-party code that subclasses IntervalSet or directly assigns the protected internal intervals field — found none. Every consumer I found (chequer-io/qsi, tunnelvisionlabs/LangSvcV2, etc.) calls GetIntervals() which still returns IList, so API compatibility is preserved. The narrowing only affects the field, not the public contract.

I'll document this as a minor breaking change in the PR description for completeness, since it's technically possible someone has a subclass assigning a non-List to the field — just not something that exists in any public repo I can find.

Will push the IWritableToken fix shortly.

…asses

SpanInputStream variants, OptimizedToken, and OptimizedTokenFactory
(with their tests) have been extracted to the dedicated branch
feature/csharp-extra-classes. Removing them here to keep this branch
focused on modifications to existing runtime files.

Signed-off-by: Harry Cordewener <admin@twilightdays.org>
@HarryCordewener
Copy link
Copy Markdown
Author

Restructuring Note

I have extracted the purely additive classes (SpanInputStream variants, OptimizedToken, OptimizedTokenFactory, and their tests) into a separate PR: #4941.

This PR now focuses exclusively on modifications to existing runtime types:

  • Sealed classes for devirtualization
  • Span-based equality in ArrayList<T>
  • IntervalSet constructor narrowing
  • ReadOnlySpan<int> ATN serialization on net8.0+
  • MergeCache value-based equality fix
  • BufferedTokenStream IWritableToken guard
  • ValueStringBuilder and ObjectPool utilities

The two PRs are independent and can be reviewed/merged in either order. #4941 adds no modifications to existing files; this PR contains no new standalone classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants