Skip to content

Latest commit

 

History

History
262 lines (204 loc) · 17.6 KB

File metadata and controls

262 lines (204 loc) · 17.6 KB

.NET Runtime in .NET 11 Preview 5 - Release Notes

.NET 11 Preview 5 includes new runtime features and performance work:

.NET Runtime updates in .NET 11:

Runtime-async suspension is faster

Runtime-async suspension and resumption continue to get faster in Preview 5. The biggest win is for async methods that are optimized by on-stack replacement (OSR). OSR is the JIT feature that lets a long-running method switch from initial code to optimized code while the method is still executing.

Runtime-async now resumes those methods directly into optimized code instead of taking the general-purpose OSR transition path. The PR reports that the transition overhead was around 10-20x, and the sample suspension-heavy benchmark improved from Took 6357.1 ms to Took 457.1 ms (dotnet/runtime #127074).

Other runtime-async changes reduce the cost and size of generated suspension code:

  • Common suspension paths now use smaller generated code. The PR reports an approximately 8% improvement on a suspension-heavy microbenchmark, and the generated code in the PR sample shrank from 766 bytes to 751 bytes (dotnet/runtime #126041).
  • Suspension and resumption now do less thread-local-storage work and avoid several write barriers on hot paths. The PR sample improved from Took 350.3 ms to Took 291.3 ms (dotnet/runtime #127336).
  • Runtime-async now reuses continuations when an IValueTaskSource-backed ValueTask suspends, removing an allocation on that path (dotnet/runtime #127973).

JIT optimizations

Several JIT optimizations landed this preview that benefit typical C# without source changes.

Redundant span and null checks

The JIT now removes more range checks from span loops that repeatedly slice off a fixed-width prefix. These loops commonly use a length check to ask, "is at least this much data left?". In the following example, the data.Length >= Vector128<int>.Count guard proves that the next Vector128.Create(data) and data.Slice(Vector128<int>.Count) are in range across the loop back edge:

int Sum(ReadOnlySpan<int> data)
{
    Vector128<int> sum = default;
    while (data.Length >= Vector128<int>.Count)
    {
        sum += Vector128.Create(data);
        data = data.Slice(Vector128<int>.Count);
    }

    int result = Vector128.Sum(sum);
    foreach (int t in data)
    {
        result += t;
    }

    return result;
}

The extra range-check block is gone, and the PR sample shrank from 113 bytes to 79 bytes (dotnet/runtime #127117):

 G_M38854_IG03:
-       cmp      ecx, 4
-       jl       SHORT G_M38854_IG09
        vpaddd   xmm0, xmm0, xmmword ptr [rax]
        add      rax, 16
        add      ecx, -4
        cmp      ecx, 4
        jge      SHORT G_M38854_IG03
...
-G_M38854_IG09:
-       mov      ecx, 6
-       call     [System.ThrowHelper:ThrowArgumentOutOfRangeException(int)]
-       int3
-; Total bytes of code 113
+; Total bytes of code 79

A related value-numbering and range-check improvement removes redundant checks for span.Slice(span.Length - constant). This makes patterns such as reading the final four bytes of a span compile directly to the load after the initial length guard (dotnet/runtime #127488):

int Test(ReadOnlySpan<byte> span)
{
    if (span.Length >= sizeof(int))
    {
        return BinaryPrimitives.ReadInt32BigEndian(span.Slice(span.Length - sizeof(int)));
    }

    return -1;
}
+       mov      rax, bword ptr [rcx]
+       mov      ecx, dword ptr [rcx+0x08]
+       cmp      ecx, 4
+       jl       SHORT G_M6173_IG05
+       add      ecx, -4
+       add      rax, rcx
+       movbe    eax, dword ptr [rax]
+       ret
        mov      eax, -1
-       lea      eax, [rdx-0x04]
-       cmp      eax, edx
-       ja       SHORT G_M27777_IG07
-       ...
-       call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
-       int3

Null-check propagation also looks through PHIs that merge a newly-created value with an existing non-null value. For (_inner ??= new Inner()).Do(n), the explicit null check before the branch is removed (dotnet/runtime #127810):

 G_M*_IG04:
-       cmp      byte  ptr [rax], al
        test     ebx, ebx
        jg       SHORT G_M*_IG06

Smaller arithmetic and faster casts

The JIT can now transform CONST - x into x ^ CONST when range information proves the identities are equivalent. This works when the constant is an all-ones mask for the bits that x can use. For a byte, 255 is 1111_1111, so x ^ 255 flips exactly those eight bits and produces the same result as 255 - x. For 255 - byte, the generated code changes from neg + add to a single xor; for -1 - x, it changes to not (dotnet/runtime #126529). Thank you @BoyBaykiller for this contribution!

        movzx    rax, dl
-       neg      eax
-       add      eax, 255
+       xor      eax, 255

On x86 processors with AVX-512 or AVX10.2, floating-point to long/ulong casts now use hardware conversion instructions for all non-overflow casts. The typical double to long diff replaces CORINFO_HELP_DBL2LNG with vcvttpd2qq and mask handling (dotnet/runtime #125180). Thank you @saucecontrol for this contribution!

-       call     CORINFO_HELP_DBL2LNG
+       vcmpordsd k1, xmm0, xmm0
+       vcmpge_oqsd k2, xmm0, qword ptr [@RWD00]
+       vcvttpd2qq xmm0 {k1}{z}, xmm0
+       vpblendmq xmm0 {k2}, xmm0, qword ptr [@RWD08] {1to2}
+       vmovd    eax, xmm0
+       vpextrd  edx, xmm0, 1

Arm intrinsics add native integer and SVE2 predicates

System.Runtime.Intrinsics.Arm now has native-integer overloads for several scalar Arm intrinsics. ArmBase.LeadingZeroCount, ArmBase.ReverseElementBits, Crc32.ComputeCrc32, and Crc32.ComputeCrc32C accept nint or nuint, and the JIT lowers the 64-bit form directly to the Arm64 instruction width (dotnet/runtime #127327).

if (Crc32.IsSupported)
{
    nuint value = (nuint)0x1234_5678;
    uint crc = Crc32.ComputeCrc32(0, value);
}

SVE2 now has CreateWhileGreaterThanMask* and CreateWhileReadAfterWriteMask* predicate-generation intrinsics. The new CreateWhileGreaterThanMask* methods cover byte, signed byte, 16-bit, 32-bit, 64-bit, double, and single element masks, while CreateWhileReadAfterWriteMask* methods create masks from pointer ranges (dotnet/runtime #127538).

GC trimming and compaction improvements

A new GC configuration switch, DOTNET_GCTrimYoungestKeepPercent, lets memory-footprint latency mode keep a configurable percentage of the youngest generation during trimming. This gives applications another way to balance memory trimming against startup cost when using DOTNET_GCLatencyLevel=0 (dotnet/runtime #109863). Thank you @ashaurtaev for this contribution!

$env:DOTNET_GCLatencyLevel = "0"
$env:DOTNET_GCTrimYoungestKeepPercent = "0xF"

GC compaction now keeps the heap_segment_used watermark accurate after relocating objects into a region. The fix avoids a stale gap that could retain dirty data when large pages or never_decommit_p caused decommit_region to clear only up to used. In the large-pages repro from the PR, the optimized fix was compared with a safe clear-all baseline over five-minute runs on .NET 11 Release standalone GC (dotnet/runtime #128217). Thank you @cshung for this contribution!

Metric Clear-all Optimized Diff
Avg throughput (entries) 3,322,843 3,391,784 +2.1%
Peak throughput 4,708,191 5,152,893 +9.4%
OOM count 52 34 -35%

Browser/WebAssembly CoreCLR enablement

Browser CoreCLR continues its bring-up in Preview 5:

  • JS initializer hooks. The browser CoreCLR loader now implements invokeLibraryInitializers, enabling the same library-initializer hook used by the Mono WebAssembly runtime (dotnet/runtime #127551).
  • Download retry is on by default. The browser loader now enables download retry by default and improves retry sequencing for framework assets (dotnet/runtime #127559).
  • Reverse P/Invoke and UnmanagedCallersOnly codegen. WASM RyuJIT can now generate code for reverse P/Invoke and UnmanagedCallersOnly paths (dotnet/runtime #127751).
  • Interpreter code resolution on portable-entrypoint platforms. Browser CoreCLR can resolve interpreter code when FEATURE_PORTABLE_ENTRYPOINTS is enabled, which supports diagnostics paths such as sampling and profiler lookups (dotnet/runtime #127370).

Try browser CoreCLR in a Blazor WebAssembly app

The browser CoreCLR runtime is opt-in this preview. To use it in a Blazor WebAssembly project that already targets net11.0, add the UseMonoRuntime property to the WebAssembly client project file:

<PropertyGroup>
  <TargetFramework>net11.0</TargetFramework>
  <UseMonoRuntime>false</UseMonoRuntime>
</PropertyGroup>

The same property (or /p:UseMonoRuntime=false on the command line) works for non-Blazor WebAssembly projects that use <Project Sdk="Microsoft.NET.Sdk.WebAssembly">.

To confirm the app is running on CoreCLR, open the browser developer console and run:

globalThis.getDotnetRuntime(0).INTERNAL.GetDotNetRuntimeHeap()

CoreCLR exposes this GetDotNetRuntimeHeap hook (returning a Uint8Array); the Mono WebAssembly runtime does not, so a successful call is itself the signal you're on CoreCLR. The returned buffer may be empty depending on runtime state.

A dedicated native WebAssembly toolchain/workload for browser CoreCLR isn't available yet, so AOT and the native build paths still require the Mono runtime in Preview 5.

Diagnostics and loader messages

Heap dumps are smaller and faster by default because createdump now uses HEAP2. HEAP2 uses a unified memory-region enumeration path for the runtime data diagnostic tools need, while skipping several older enumeration paths that made dumps slower and larger. The old environment-variable workarounds for slow dumps are deprecated (dotnet/runtime #127321).

Assembly version conflicts now include details about the already-loaded assembly in FileLoadException messages. When a lower-versioned assembly is already loaded, the message can now include the loaded assembly identity and path (dotnet/runtime #123969):

A different copy of assembly 'LibA' is already loaded. Loaded assembly: 'LibA, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' from 'C:\repos\helloworld\bin\Debug\net10.0\LibA.dll'

Bug fixes

Community contributors

Thank you contributors! ❤️