Skip to content

Commit 4c05d38

Browse files
Add Cosmos Linux emulator failure analysis report
Co-authored-by: AndriySvyryd <6539701+AndriySvyryd@users.noreply.github.com>
1 parent 0c7814b commit 4c05d38

1 file changed

Lines changed: 293 additions & 0 deletions

File tree

Lines changed: 293 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,293 @@
1+
# Cosmos Linux emulator (`vnext-latest`) — remaining test failures
2+
3+
This document summarizes the test failures that remain after switching the test
4+
suite to the Linux Cosmos emulator image
5+
`mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-latest` and removing
6+
all the `IsNotLinuxEmulator` / `SkipOnLinuxEmulator` skip code and annotations.
7+
8+
Each group below has a **non-EF repro** that uses only the `Microsoft.Azure.Cosmos`
9+
SDK so the underlying emulator behavior can be reported to the Cosmos DB emulator
10+
team without any EF Core involvement.
11+
12+
## How to run the repros
13+
14+
All repros connect to a running emulator with the well-known emulator credentials:
15+
16+
```csharp
17+
using Microsoft.Azure.Cosmos;
18+
19+
const string endpoint = "https://localhost:8081";
20+
const string key = "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";
21+
22+
using var client = new CosmosClient(
23+
endpoint,
24+
key,
25+
new CosmosClientOptions { ConnectionMode = ConnectionMode.Gateway });
26+
27+
var db = (await client.CreateDatabaseIfNotExistsAsync("repro")).Database;
28+
```
29+
30+
(`ConnectionMode.Gateway` and disabling SSL validation are how the EF test
31+
infrastructure talks to the local emulator.)
32+
33+
## Methodology
34+
35+
1. Changed the image to `vnext-latest` in `CosmosTestEnvironment.cs` and
36+
`.github/workflows/copilot-setup-steps.yml`.
37+
2. Removed every Linux-emulator skip mechanism (`EmulatorType`, `IsLinuxEmulator`,
38+
`IsNotLinuxEmulator`, `SkipOnLinuxEmulator`, the `[ConditionalClass]` guard, and
39+
the `LinuxEmulatorSaveChangesInterceptor`).
40+
3. Ran the full `EFCore.Cosmos.FunctionalTests` suite, and ran the same suite at the
41+
previous commit (`HEAD~1`) as a baseline to separate pre-existing failures from
42+
the ones revealed by un-skipping.
43+
4. Re-ran every newly-failing test **class in isolation** to separate deterministic
44+
emulator bugs from failures that are only caused by load/contention when the whole
45+
suite hits a single local emulator at once.
46+
47+
After isolation, **97 tests fail deterministically** because of genuine emulator
48+
behavior differences. They fall into the six groups below.
49+
50+
| Group | Root cause | Failing tests |
51+
|-------|-----------|---------------|
52+
| 1 | Aggregate over an inline/array subquery returns the wrong shape | 37 |
53+
| 2 | Object / complex-type equality returns no rows | ~16 |
54+
| 3 | Partial hierarchical-partition-key query returns too many documents | 21 |
55+
| 4 | `ORDER BY` by expression / composite / shadow property is not rejected | 14 |
56+
| 5 | Full-text default-language validation is not enforced | 4 |
57+
| 6 | `MaxItemCount` / paging is not honored | 2 |
58+
59+
> Note: ~140 additional failures seen only in the full run (e.g. all of
60+
> `NorthwindWhereQueryCosmosTest`, `CosmosBulkEndToEndTest`) are **not** included
61+
> above — they pass 100% when their class is run in isolation. See
62+
> [Non-deterministic failures](#non-deterministic-failures-not-emulator-bugs).
63+
64+
---
65+
66+
## Group 1 — Aggregate over an inline/array subquery returns the wrong shape (37)
67+
68+
**Symptom.** Aggregating (`MIN`/`MAX`/`SUM`/`AVERAGE`/`COUNT`) over a subquery that
69+
projects an inline array fails inside the Cosmos client with
70+
`System.ArgumentException: RewrittenAggregateProjections was not an array for a value
71+
aggregate query`, or — for an empty inline array — with a Newtonsoft
72+
`JsonSerializationException: Deserialized JSON type 'JValue' is not compatible with
73+
expected type 'JObject'`.
74+
75+
**Failing classes:** `PrimitiveCollectionsQueryCosmosTest` (22), the 13
76+
`Cosmos*TypeTest` classes (1 each), `ComplexPropertiesCollectionCosmosTest` (1),
77+
`OwnedQueryCosmosTest` (1).
78+
79+
**Exact failing SQL (captured from the test run):**
80+
81+
```sql
82+
-- MIN over a two-element inline array
83+
SELECT VALUE c
84+
FROM root c
85+
WHERE ((SELECT VALUE MIN(a) FROM a IN (SELECT VALUE [30, c["Int"]])) = 30)
86+
87+
-- COUNT over an empty inline array
88+
SELECT VALUE c
89+
FROM root c
90+
WHERE ((SELECT VALUE COUNT(1) FROM a IN (SELECT VALUE []) WHERE (a > c["Id"])) = 1)
91+
```
92+
93+
**Non-EF repro:**
94+
95+
```csharp
96+
var container = (await db.CreateContainerIfNotExistsAsync("g1", "/pk")).Container;
97+
await container.CreateItemAsync(new { id = "1", pk = "1", Int = 30 }, new PartitionKey("1"));
98+
99+
// Fails: "RewrittenAggregateProjections was not an array for a value aggregate query"
100+
var min = container.GetItemQueryIterator<Dictionary<string, object>>(
101+
"SELECT VALUE c FROM root c " +
102+
"WHERE ((SELECT VALUE MIN(a) FROM a IN (SELECT VALUE [30, c[\"Int\"]])) = 30)");
103+
await min.ReadNextAsync();
104+
105+
// Fails: JsonSerializationException JValue/JObject
106+
var cnt = container.GetItemQueryIterator<Dictionary<string, object>>(
107+
"SELECT VALUE c FROM root c " +
108+
"WHERE ((SELECT VALUE COUNT(1) FROM a IN (SELECT VALUE []) WHERE (a > c[\"id\"])) = 1)");
109+
await cnt.ReadNextAsync();
110+
```
111+
112+
---
113+
114+
## Group 2 — Object / complex-type equality returns no rows (~16)
115+
116+
**Symptom.** A predicate that compares a whole embedded object (complex/owned type)
117+
to an object literal/parameter returns zero documents on the emulator even when a
118+
matching document exists. Tests assert `Expected: 1, Actual: 0` (or
119+
`Sequence contains no elements`).
120+
121+
**Failing classes:** `ComplexPropertiesStructuralEqualityCosmosTest` (8),
122+
`ComplexTypeQueryCosmosTest` (4), `ComplexTypeToJsonPropertyQueryCosmosTest` (4),
123+
plus single cases in `AdHocComplexTypeQueryCosmosTest` and `ConfigPatternsCosmosTest`.
124+
125+
**Exact failing SQL pattern (object compared inside `EXISTS`):**
126+
127+
```sql
128+
SELECT VALUE c
129+
FROM root c
130+
WHERE EXISTS (
131+
SELECT 1
132+
FROM a IN c["AssociateCollection"]
133+
WHERE ((a["Id"] > @get_Item_Id) AND (a = @entity_equality_get_Item)))
134+
```
135+
136+
**Non-EF repro:**
137+
138+
```csharp
139+
var container = (await db.CreateContainerIfNotExistsAsync("g2", "/pk")).Container;
140+
await container.CreateItemAsync(
141+
new { id = "1", pk = "1", obj = new { X = 1, Y = 2 } }, new PartitionKey("1"));
142+
143+
// Emulator returns 0 documents; a matching document exists.
144+
var q = new QueryDefinition("SELECT VALUE c FROM root c WHERE (c[\"obj\"] = @v)")
145+
.WithParameter("@v", new { X = 1, Y = 2 });
146+
var it = container.GetItemQueryIterator<Dictionary<string, object>>(q);
147+
var page = await it.ReadNextAsync(); // page.Count == 0 (expected 1)
148+
```
149+
150+
---
151+
152+
## Group 3 — Partial hierarchical-partition-key query returns too many documents (21)
153+
154+
**Symptom.** Querying with only a prefix of a hierarchical (multi-level) partition key
155+
returns documents from outside the specified prefix. Tests assert e.g.
156+
`Expected: 2, Actual: 4` or `Expected: 4, Actual: 8`.
157+
158+
**Failing classes:** `ReadItemPartitionKeyQueryNoDiscriminatorInIdTest` (6),
159+
`ReadItemPartitionKeyQueryRootDiscriminatorInIdTest` (6),
160+
`ReadItemPartitionKeyQueryDiscriminatorInIdTest` (6),
161+
`ReadItemPartitionKeyQueryTest` (3).
162+
163+
**Non-EF repro:**
164+
165+
```csharp
166+
var props = new ContainerProperties("g3", new List<string> { "/tenant", "/user" });
167+
var container = (await db.CreateContainerIfNotExistsAsync(props)).Container;
168+
169+
PartitionKey Full(string t, string u) =>
170+
new PartitionKeyBuilder().Add(t).Add(u).Build();
171+
172+
await container.CreateItemAsync(new { id = "1", tenant = "t1", user = "u1" }, Full("t1", "u1"));
173+
await container.CreateItemAsync(new { id = "2", tenant = "t1", user = "u2" }, Full("t1", "u2"));
174+
await container.CreateItemAsync(new { id = "3", tenant = "t2", user = "u1" }, Full("t2", "u1"));
175+
176+
// Prefix partition key "t1" should match only the 2 t1 documents.
177+
var prefix = new PartitionKeyBuilder().Add("t1").Build();
178+
var it = container.GetItemQueryIterator<Dictionary<string, object>>(
179+
"SELECT VALUE c FROM root c",
180+
requestOptions: new QueryRequestOptions { PartitionKey = prefix });
181+
int count = 0;
182+
while (it.HasMoreResults) count += (await it.ReadNextAsync()).Count;
183+
// Emulator returns 3 (all documents); expected 2.
184+
```
185+
186+
---
187+
188+
## Group 4 — `ORDER BY` by expression / composite / shadow property is not rejected (14)
189+
190+
**Symptom.** Real Azure Cosmos DB rejects `ORDER BY` over multiple properties without
191+
a composite index (and over certain computed expressions) with a `CosmosException`.
192+
The emulator accepts these queries, so tests that do
193+
`await Assert.ThrowsAsync<CosmosException>(...)` fail with
194+
`No exception was thrown`.
195+
196+
**Failing classes:** `NorthwindMiscellaneousQueryCosmosTest` (8),
197+
`OwnedQueryCosmosTest` (5), `NorthwindSelectQueryCosmosTest` (1).
198+
199+
**Non-EF repro:**
200+
201+
```csharp
202+
var container = (await db.CreateContainerIfNotExistsAsync("g4", "/pk")).Container;
203+
await container.CreateItemAsync(new { id = "1", pk = "1", City = "B", Region = "X" }, new PartitionKey("1"));
204+
await container.CreateItemAsync(new { id = "2", pk = "1", City = "A", Region = "Y" }, new PartitionKey("1"));
205+
206+
// Real Cosmos: BadRequest (composite index required). Emulator: succeeds.
207+
var it = container.GetItemQueryIterator<Dictionary<string, object>>(
208+
"SELECT VALUE c FROM root c ORDER BY c[\"City\"], c[\"Region\"]");
209+
await it.ReadNextAsync(); // no exception on the emulator
210+
```
211+
212+
---
213+
214+
## Group 5 — Full-text default-language validation is not enforced (4)
215+
216+
**Symptom.** Configuring an unsupported / mismatched full-text default language should
217+
produce a `CosmosException` at container-create / index time. The emulator accepts the
218+
configuration, so `Assert.ThrowsAsync<CosmosException>(...)` fails with
219+
`No exception was thrown`.
220+
221+
**Failing class:** `AdHocFullTextSearchCosmosTest` (4).
222+
223+
**Non-EF repro:**
224+
225+
```csharp
226+
// Real Cosmos rejects an unsupported full-text language; emulator accepts it.
227+
var props = new ContainerProperties("g5", "/pk")
228+
{
229+
FullTextPolicy = new FullTextPolicy
230+
{
231+
DefaultLanguage = "xx-XX", // unsupported language
232+
FullTextPaths = new List<FullTextPath>
233+
{
234+
new FullTextPath { Path = "/text", Language = "xx-XX" }
235+
}
236+
}
237+
};
238+
// Expected to throw on the service; succeeds on the emulator.
239+
await db.CreateContainerIfNotExistsAsync(props);
240+
```
241+
242+
---
243+
244+
## Group 6 — `MaxItemCount` / paging is not honored (2)
245+
246+
**Symptom.** A query executed with `MaxItemCount = 1` (EF `ToPageAsync`) should return a
247+
single-item page plus a continuation token. The emulator returns **all 91** matching
248+
documents in the first page. Tests assert `Assert.Single` and fail with
249+
`The collection contained 91 items`.
250+
251+
**Failing class:** `NorthwindMiscellaneousQueryCosmosTest` (2).
252+
253+
**Non-EF repro:**
254+
255+
```csharp
256+
var container = (await db.CreateContainerIfNotExistsAsync("g6", "/pk")).Container;
257+
for (var i = 0; i < 91; i++)
258+
await container.CreateItemAsync(new { id = $"c{i}", pk = "1" }, new PartitionKey("1"));
259+
260+
var it = container.GetItemQueryIterator<Dictionary<string, object>>(
261+
"SELECT VALUE c[\"id\"] FROM root c ORDER BY c[\"id\"]",
262+
requestOptions: new QueryRequestOptions { MaxItemCount = 1 });
263+
var page = await it.ReadNextAsync();
264+
// Emulator: page.Count == 91 (expected 1, with a continuation token).
265+
```
266+
267+
---
268+
269+
## Non-deterministic failures (not emulator bugs)
270+
271+
When the **entire** suite runs against one local emulator, ~140 extra tests fail with
272+
`CosmosException: NotFound (404) — Collection 'Customers'/'Employees' not found in
273+
database 'Northwind'` (122 in `NorthwindWhereQueryCosmosTest`) or transient
274+
write errors (`CosmosBulkEndToEndTest`, etc.). **Every one of these passes when its
275+
class is run in isolation** (e.g. `NorthwindWhereQueryCosmosTest`: 421/421 passing).
276+
These are caused by shared-container contention / throttling under concurrent load on a
277+
single emulator instance, not by a deterministic emulator behavior difference, so they
278+
are excluded from the groups above.
279+
280+
## Pre-existing failures (present before this change)
281+
282+
Running the suite at `HEAD~1` produces 37 failures that are unrelated to the
283+
Linux-emulator skips (they fail regardless of this change) because the emulator does
284+
not implement these index features:
285+
286+
- **Full-text search (20)** — `FullTextScore' is not a recognized built-in function
287+
name (SC2005)`.
288+
- **Vector search (14) and hybrid search (3)** — container creation fails with
289+
`vectorIndexes[0]: only 'diskANN' type is currently supported` for `flat` /
290+
`quantizedFlat` vector index types.
291+
292+
These should be tracked against the emulator's full-text / vector feature support and
293+
are out of scope for the un-skipping change.

0 commit comments

Comments
 (0)