Skip to content

Fix GroupBy.Select EmptyProjectionMember#38140

Merged
AndriySvyryd merged 4 commits into
dotnet:mainfrom
henriquewr:emptyProjectionMember
Jun 6, 2026
Merged

Fix GroupBy.Select EmptyProjectionMember#38140
AndriySvyryd merged 4 commits into
dotnet:mainfrom
henriquewr:emptyProjectionMember

Conversation

@henriquewr

@henriquewr henriquewr commented Apr 19, 2026

Copy link
Copy Markdown
Contributor

"partially" fixes #30052, #31209 and probably others

The issue was that the QueryExpression instance is the same reference as the external instance
So the inner projection was wrongly applied to the external too

This fixes the original EmptyProjectionMember error, now GroupBy(...).Select(...).Where(...) (or any method that follows the Select call, as long as it is not other Select call) works, but isn't the entire thing

If the Select call following a GroupBy falls back to _indexBasedBinding (RelationalProjectionBindingExpressionVisitor or InMemory) GroupBy(...).Select(SomethingThatFallsbackToIndexBasing)
it is no longer possible to apply any projection that doesn't also falls back to _indexBasedBinding
This occurs because the second projection cannot override the previous one (_clientProjection has a projection and _projectionMapping has other projection)

It seems that this entire thing is caused because the GroupBy version of NavigationExpandingExpressionVisitor.ProcessSelect which does not return a pending selector. Instead, it returns a direct Select call that is not extensible

I tested the approach returning a pending selector, which resolved the issue (allowing GroupBy(...).Select(SomethingThatFallsbackToIndexBasing).Select(...) to work)
However doing it generates lots of other problems, it would need a different approach to how pending selectors are handled

Certain methods would need to resolve the pending selector before the call
GroupBy().Select().Join(), Join would need to resolve the pending selector before the call

At least a workaround exists:

Instead of chaining Select calls:

GroupBy()
.Select()
.Select()

You can nest Select calls:

GroupBy()
.Select(x => x.Select(...))

About the skipped InMemory test:

It seems to be caused by the same issue

GroupBy()
.Select(SelectorFunc)
.Where(valueOfTheSelectCall => valueOfTheSelectCall == something)

Is supposed to be transformed into

GroupBy
.Where(SelectorFunc == something)
.Select(SelectorFunc)

And since GroupBy.Select does not return a pending selector, the Where call fails to retrieve the value

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a query translation failure involving GroupBy(...).Select(...) where applying an inner projection could inadvertently mutate a shared SelectExpression, leading to KeyNotFoundException: EmptyProjectionMember. It adds new regression coverage around GroupBy().Select(first).Where(...) and ordering patterns (and documents remaining gaps via skipped tests).

Changes:

  • Prevents projection application side-effects by cloning the subquery SelectExpression before replacing/applying projections.
  • Adds new Northwind GroupBy specification tests targeting the GroupBy(...).Select(...).Where(...) and ordering scenarios related to #31209.
  • Adds SQL Server baselines for the newly-enabled scenarios and skips the failing variants for InMemory.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/EFCore.Relational/Query/RelationalSqlTranslatingExpressionVisitor.cs Clones SelectExpression for subquery property binding to avoid mutating shared query state.
test/EFCore.Specification.Tests/Query/NorthwindGroupByQueryTestBase.cs Adds new GroupBy+Select+Where/Order spec tests (and keeps currently-failing shapes skipped).
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindGroupByQuerySqlServerTest.cs Adds SQL Server baselines for the new spec tests.
test/EFCore.InMemory.FunctionalTests/Query/NorthwindGroupByQueryInMemoryTest.cs Skips the newly-added tests which currently fail for InMemory.

@AndriySvyryd AndriySvyryd merged commit ebc301e into dotnet:main Jun 6, 2026
13 checks passed
@AndriySvyryd

Copy link
Copy Markdown
Member

Thanks for your contribution

@dotnet-milestone-bot dotnet-milestone-bot Bot added this to the 11.0-preview6 milestone Jun 10, 2026
ajcvickers added a commit to ajcvickers/efcore that referenced this pull request Jun 16, 2026
…31209, dotnet#26748)

- Member-mode ReplaceProjection left stale _clientProjections, producing an invalid mixed projection state.
- Made the member-mode overload clear _clientProjections, mirroring the index-mode overload.
- Fixes member Select / second GroupBy composed on top of GroupBy(k).Select(g => g.First()).
- Enables 4 previously-skipped tests and adds 4 regression tests; InMemory's separate limitation stays skipped.
- Verified across ~27k query tests (SQLite + SqlServer) with zero regressions.

A SelectExpression holds its projection as either a projection-member mapping or a
client-projection (index) list, never both. The two ReplaceProjection overloads were
asymmetric: the index-mode overload cleared both representations, but the member-mode
overload cleared only the member mapping. So a member-mode Select following an index-mode
Select left stale client projections behind, and ApplyProjection then took the index path
over a member-based shaper and threw. This change makes the member-mode overload clear the
client-projection list too, restoring the invariant.

This is a partial follow-up to dotnet#38140. It covers the member-Select / second-GroupBy family
on top of the "entity per group" idiom. InMemory has an independent limitation in the same
area and remains skipped.

## The original issue

dotnet#31209 (and the older, more general dotnet#26748) collect a large family of GroupBy translation
failures, all sharing one shape: a GroupBy whose Select projects an entire entity per group
(the "latest row per group" idiom), with a further operator composed on top:

    db.Set<Order>()
      .GroupBy(o => o.CustomerID)
      .Select(g => g.OrderByDescending(x => x.Date).First())  // one whole entity per group
      .<further operator>                                       // the trigger

Reporters hit it through many surface forms - a second Select projecting a member, a final
Where/OrderBy, a result-selector GroupBy, a second GroupBy, FirstOrDefault instead of First.
The original repros ended in a second GroupBy (dotnet#31209) or in Count/Select/OrderBy over a
composite (anonymous-type) key (dotnet#26748). The symptom was a confusing
KeyNotFoundException: The given key 'EmptyProjectionMember' was not present in the dictionary,
thrown deep in projection post-processing rather than at the offending operator, which is why
the threads accumulated years of dead-end workarounds (FromSqlRaw round-trips, query-tag
interceptors, ToList().AsQueryable()).

dotnet#38140 fixed part of this by cloning the subquery SelectExpression in
RelationalSqlTranslatingExpressionVisitor.BindProperty, which resolved the Where/OrderBy/
anonymous-type and Count cases. It explicitly left the member-Select-after case unsolved, and
after that fix the remaining cases surfaced a different error,
InvalidOperationException: Nullable object must have a value.

## What is actually being fixed

Root cause is an invariant violation in SelectExpression. A projection is represented as
exactly one of:

  - _projectionMapping  - keyed by ProjectionMember (member mode), or
  - _clientProjections  - an index list (client/index mode)

GetProjection reads one or the other depending on whether the binding carries a ProjectionMember
or an Index; the two are never meant to coexist.

ReplaceProjection has two overloads, and they were asymmetric:

  - ReplaceProjection(IReadOnlyList<Expression>)   clears _projectionMapping AND _clientProjections
  - ReplaceProjection(IReadOnlyDictionary<...>)     cleared ONLY _projectionMapping   <-- bug

Sequence that breaks:

  1. .Select(g => g.First())  falls back to index-based binding -> populates _clientProjections.
  2. .Select(r => r.EmployeeID)  translates in member mode -> repopulates _projectionMapping via the
     Dictionary overload, but leaves _clientProjections stale and non-empty.
  3. ApplyProjection sees _clientProjections.Count > 0, takes the index path, and the
     ClientProjectionRemappingExpressionVisitor walks the member-based shaper (Index == null) ->
     "Nullable object must have a value" (was EmptyProjectionMember before dotnet#38140).

The fix makes the Dictionary overload also clear _clientProjections and _aliasForClientProjections,
making it symmetric with the List overload and restoring the one-representation invariant.

## Challenges

- Misleading symptom. The exception is thrown in post-processing (ApplyProjection), far from the
  Select that creates the bad state, and it had already mutated once (EmptyProjectionMember ->
  Nullable-must-have-a-value) across dotnet#38140, so the stack trace points away from the cause.

- Two plausible fix sites. The contributor on dotnet#38140 pursued the upstream
  NavigationExpandingExpressionVisitor.ProcessSelect path (making the GroupBy Select return a
  pending selector) and found it cascades into broad regressions requiring a redesign of pending-
  selector handling. The actual defect is downstream, in the relational SelectExpression, where a
  one-line symmetry restoration suffices.

- Shared, hot code path. ReplaceProjection(IReadOnlyDictionary) is called from ~7 sites across the
  relational, SqlServer, and Sqlite providers. The change had to be proven safe everywhere, not just
  for the reported query.

## Outcome

Production change: clear _clientProjections / _aliasForClientProjections in the member-mode
ReplaceProjection overload. The clear is a no-op whenever the list is already empty (the normal
case), so the only behavioral change is in the previously-invalid mixed state.

Tests:
  - Un-skipped GroupBy_Select_Entire_Entity_Select and _Where_Select.
  - Added regression tests: GroupBy_Select_Entire_Entity_FirstOrDefault_Where,
    GroupBy_ResultSelector_Entire_Entity_Where (result-selector entry point),
    GroupBy_Select_Entire_Entity_GroupBy (the literal dotnet#31209 repro),
    GroupBy_Select_Entire_Entity_composite_key_Select (the literal dotnet#26748 repro).
  - Added SqlServer AssertSql baselines for all of the above (clean correlated-subquery / nested
    GROUP BY SQL).
  - InMemory keeps these skipped - it has a separate limitation in the same area.

## Review and validation

Three independent reviewers analysed the change for blast radius, cross-shape consequences, and
coverage gaps; their proposed experiments were then run.

  - Blast radius: all 7 callers of the member-mode overload traced. Six provably have an empty
    _clientProjections at call time (fresh select, just-cleared, or member-mode by construction), so
    the clear is a literal no-op; the seventh is the bug's target.

  - One genuine behavioral delta - the set-operation guard
    (SetOperationsNotAllowedAfterClientEvaluation). Verified empirically:
      * GroupBy(k).Select(g => g.First()).Select(o => o.OrderID).Union(...) previously threw
        spuriously and now returns correct results - corrective.
      * GroupBy(k).Select(g => g.First()).Union(...) (no intervening member Select) still correctly
        throws - the fix does not suppress the legitimate guard.

  - Structural invariant confirmed: when a kept value depends on a collection / correlated subquery /
    JSON node / entity held in _clientProjections, member-mode translation bails to the index
    fallback (List overload), so the Dictionary clear never fires destructively. No silent drop of
    collections, includes, JSON, or split-query metadata.

  - Pre-existing limitation, not a regression: projecting a navigation off the picked entity
    (GroupBy(k).Select(g => g.First()).Select(r => r.Customer.City)) fails as
    "ProjectionBindingExpression: 0 could not be translated". Confirmed by reverting the fix that it
    fails identically with and without this change. Separate, deeper issue.

  - dotnet#26748 cross-check: its Count and OrderBy repros were already resolved by dotnet#38140; its remaining
    member-Select repro (composite/anonymous key) failed with the exact "Nullable object must have a
    value" error and is resolved by this change. Verified by removing the fix and re-running.

  - Regression suites: ~12.1k SQLite and ~14.9k SqlServer query tests pass with zero failures; the
    NorthwindGroupBy class is green on SQLite, SqlServer, and InMemory.

Fixes dotnet#31209. Fixes dotnet#26748. Follow-up to dotnet#38140.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Combination of GroupBy, FirstOrDefault and Select throws a KeyNotFoundException

4 participants