Skip to content

[✨ Triage] dotnet/runtime#126925 by wtgodbe - IIS in-process hosting returns HTTP 500 after runtime-async enablement #197

@MihuBot

Description

@MihuBot

Triage for dotnet/runtime#126925.
Repo filter: All networking issues.
MihuBot version: 43c005.
Ping MihaZupan for any issues.

This is a test triage report generated by AI, aimed at helping the triage team quickly identify past issues/PRs that may be related.
Take any conclusions with a large grain of salt.

Tool logs
dotnet/runtime#126925: IIS in-process hosting returns HTTP 500 after runtime-async enablement by wtgodbe
Extracted 5 search queries: IIS in-process returns HTTP 500 after enabling runtime-async in System.Private.CoreLib, IHttpApplication.ProcessRequestAsync / IISHttpContext async continuations failing in in-process hosting, ANCM in-process native/managed async continuation issue after runtime-async enablement, DOTNET_JitEnableRuntimeAsync=0 fixes IIS in-process 500 (runtime-async related), ASP.NET Core in-process request pipeline silently fails to complete responses after runtime async changes
Found 25 candidate issues

Below are the potentially relevant existing issues/PRs from the search, with short summaries of their discussions / conclusions and why they matter for runtime-async + IIS in-process failures.

  • PR #125406 (Mar 10 2026) - Enable runtime-async for shared framework source projects in net11.0+ and remove RuntimeAsync config knob
    Summary: This infrastructure PR auto-enables the new "runtime-async" codegen for shared framework source projects targeting net11+, and removes the old DOTNET_RuntimeAsync config knob (the runtime now assumes runtime-async is always enabled on supported platforms). It also consolidates platform exclusions and included an unrelated PEObjectWriter relocation fix. Relevance: this is the change that made runtime-async unconditional for many core assemblies (the commits you called out match this enablement path). The PR authors and reviewers flagged non-obvious platform/test impacts and called out that removing the runtime config knob means you must rely on builds/tests to gate correctness.

  • PR #125556 (Mar 14 2026) - Add runtime async support for saving and reusing continuation instances
    Summary: Large JIT/runtime change that implements a shared continuation layout per async method and continuation reuse. It reworked continuation Flags (encoding per-suspension slot indices), changed VM/JIT/CoreLib consumers, and added reuse logic. The change required many small fixes and spawned a number of review comments about correctness (index encoding overflow, runtime checks that turn into silent corruption in release builds, interpreter/runtime alignment, clearing stale slots on reuse). Relevance: this PR changes continuation object layout, flag encoding, and resumption semantics — exactly the sort of low-level change that can break crossing native/managed continuation semantics used by ANCM in IIS in-process hosting. The PR also caused a handful of test failures in CI (and some tests were deterministically broken until fixes were applied).

  • PR #125406 / follow-ups & test-enablement thread (Mar–Mar 2026) — gating/backport notes and follow-ups
    Summary: After enabling runtime-async, reviewers added follow-ups (disable crossgen runtime-async generation on untested archs, backports, and removing the old config knob everywhere). Relevance: shows the enablement was broad and had immediate follow-up work to contain platform/test regressions — useful context when a newly-enabled runtime feature suddenly breaks hosting scenarios.

  • PR #119432 (Sep 6 2025) - [RuntimeAsync] Enable runtime async in Libraries partition
    Summary: Earlier experiment enabling runtime-async for the Libraries partition. Many tests did run and pass, but the change also surfaced a set of JIT asserts and sync-context differences. Relevance: historical evidence that enabling runtime-async can surface subtle JIT/VM/behavior differences across many libraries and tests; not all such issues are obvious until enablement is broad.

  • PR #125556 follow-up: test regressions reported (Mar 20 2026)
    Summary: Commenters reported this change broke specific tests (#125806, #125805) and noted they were reproducible locally; author reconciled some fixes. Relevance: confirms that runtime-async changes have already caused test regressions in library code, increasing the plausibility that the IIS in-process 500s are caused by similar low-level differences.

  • PR #126721 (Apr 9 2026) - Fix GC write barrier when writing async method return into continuation object
    Summary: Fixes a GC write-barrier bug where the runtime did not always add a write barrier when storing an async method return into the continuation object. The bug caused random GC crashes in System.Text.Json tests; this PR fixes it. Relevance: this is a concrete runtime-async–related correctness bug found after large runtime-async work landed — it demonstrates the sort of GC/interop correctness issues that are exposed by the new continuation model. ANCM/in-process hosting crosses native/managed boundaries where GC safety and write barriers are critical.

  • Issue #126018 (Mar 24 2026) - JIT: Runtime async enhancements (meta issue)
    Summary: Tracker for many JIT changes required by runtime-async (liveness analyses, codegen fixes, refactors). Relevance: shows ongoing JIT work and outstanding items; indicates the feature involved many incremental codegen changes that could affect async scheduling/continuation layout/stack handling — all relevant to in-process hosting which relies on continuations crossing native/managed boundaries.

  • Issue #121422 (Nov 6 2025) - [RuntimeAsync] Difference with the async1 baseline when dispatching a context-less continuation while running with a context
    Summary: A behavioral difference between runtime-async and the legacy async1 approach: who/where a continuation runs (inlining vs posting to threadpool / synchronization context) can differ in subtle, yet observable ways for certain awaiter patterns (and custom awaitables). The thread discusses whether to match baseline behavior. Relevance: IIS in-process hosting depends on continuations running in certain ways across native/managed boundaries; a scheduling difference (e.g., inlining or not, where the continuation runs) could make the ANCM + managed pipeline behave differently and fail silently.

  • PR #121460 (Nov 8 2025) - Do not hold to the last used Continuation in the runtime async tasks
    Summary: Fix to stop keeping a continuation referenced after it was consumed (nulling out references) to avoid observable leaks and finalization effects. Relevance: this is another low-level change to continuation lifetime/ownership that can affect interaction with native code and GC lifetimes — relevant because ANCM stores managed pointers/handles for callbacks.

  • Issue/PRs around NativeAOT and continuation types (PR #121398, Nov 5 2025 and related items)
    Summary: Several changes to support continuation types for NativeAOT and to emit methodtables/continuation metadata. Relevance: these show that the runtime had to add new VM-level handling for continuation types / MethodTable-like data — again indicating this is invasive VM work touching allocation, pointer maps, and runtime metadata.

  • aspnetcore issue #40498 (Mar 2 2022) - ANCM intermittently notifies the wrong HttpContext of disconnect
    Summary: Deep investigation found a race: ANCM can receive a disconnect after the managed IISHttpContext was disposed and its GCHandle freed; the freed GCHandle address can be reallocated and reused for a different managed context, so ANCM may call AbortIO on the wrong IISHttpContext. The root cause was a race between NotifyDisconnect and IndicateManagedRequestComplete (they used an SRW lock but notified outside the lock). The issue includes a working repro and a middleware workaround that delays disposal. Relevance: directly demonstrates that native ↔ managed pointer handling and GCHandle lifetimes in ANCM are delicate — changes to allocation patterns (including introducing shared continuation reuse, different object lifetimes, or changed GC behavior) can expose races that previously were not observed. The new runtime-async changes change allocation/continuation lifetimes and could be triggering the same kind of GCHandle reuse race, explaining the silent 500s.

  • aspnetcore issue #12946 (Aug 7 2019) - Response.OnCompleted exceptions not caught when hosted InProcess
    Summary: In-process hosting had ordering/exception-handling differences (OnCompleted invoked before response is fully sent and exceptions reported to the pipeline). The discussion pointed out ordering bugs and that in-process hosting requires special handling compared to out-of-process. Relevance: another example of subtle ordering/exception behavior differences in in-process hosting that can interact poorly with async scheduling changes.

  • Other ASP.NET issues about OnCompleted / OnStarting / in-process hosting (various older threads)
    Summary: Multiple issues (e.g., #31123, #4505, #6415) document that in-process hosting has unusual interaction points (order of OnCompleted, reading request body, disconnect handling) and that bugs in these areas can produce silent failures or different behavior vs out-of-process hosting. Relevance: useful background that in-process hosting is a fragile surface when runtime behavior changes.

Conclusions / takeaways for triage

  • The runtime changes that enabled runtime-async are large and intrusive: they change continuation layout, flag encoding, reuse, and codegen optimizations. Several follow-up fixes (GC write-barrier, JIT fixes, interpreter/VM updates) were needed after the enablement. PRs above show this landed across JIT, VM, and managed library layers.
  • IIS in-process hosting (ANCM + IISHttpContext) is sensitive to continuation lifetimes, allocation patterns, and GCHandle timing/reuse (see #40498). Changing continuation allocation/GC/write-barrier behavior — exactly what runtime-async touches — can expose races that were previously rare or latent and can cause ANCM to observe a faulty/aborted request pipeline (manifesting as 500s with no manageable exception).
  • The new runtime-async semantics also changed continuation dispatch/inlining rules (see #121422), so scheduling differences could cause continuations to run in different contexts/threads vs previous behavior; that may also break the native/managed crossing assumptions ANCM or the request pipeline relied upon.

Practical next steps to confirm / diagnose (echoing items suggested in the new issue)

  • Reproduce the failing IIS in-process test with runtime-async disabled (set DOTNET_JitEnableRuntimeAsync=0 or run a build that does not include the unconditional enablement). If the test passes with runtime-async disabled, that strongly implicates the runtime-async changes.
  • Capture a native + managed trace / dump from a failing Helix job to see whether ANCM is calling into the wrong IISHttpContext (compare GCHandle pointer values / connection IDs as done in #40498). The #40498 post shows concrete logging that identified duplicate pManagedHttpContext addresses — the same technique is likely helpful here.
  • Try the middleware workaround from #40498 (temporarily delay disposal / add an OnCompleted noop) to see if it reduces/avoids the 500s — if it does, it points toward a GCHandle/disposing race being at fault.
  • Look for recent fixes in the runtime tree (e.g., PR #126721) that address write-barrier / GC issues introduced by runtime-async; ensure your runtime build includes those fixes. If the failing build predates such fixes, test again with a runtime that includes them.
  • If repro is available, instrument/trace where continuations are allocated and when ANCM sees the managed context pointer to check for reuse of the same native->managed pointer (GCHandle) across requests.

If you want, I can:

  • Link specific files/lines mentioned in the ASP.NET issue #40498 that you can add tracing around in your Helix run, or
  • Produce a concise checklist of which runtime PRs/commits to test/build variations against (e.g., test before/after PR #125556, test with/without PR #126721), or
  • Help draft a minimal repro or logging guidance to capture the same GCHandle pointer data ANCM logging used in #40498.

If you want only the most directly relevant items: focus on PR #125556 (continuation layout/reuse), PR #125406 (feature enablement), PR #126721 (GC write-barrier fix), and aspnetcore issue #40498 (ANCM GCHandle reuse race).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions