Skip to content

Commit fdc86ae

Browse files
author
MPCoreDeveloper
committed
docs: add ORM-vs-DB collation mismatch use case and expand EF Core integration
1 parent 8cec818 commit fdc86ae

File tree

1 file changed

+238
-1
lines changed

1 file changed

+238
-1
lines changed

docs/COLLATE_SUPPORT_PLAN.md

Lines changed: 238 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
---
1111

12+
1213
## 1. Executive Summary
1314

1415
Add SQL-standard `COLLATE` support to SharpCoreDB, enabling case-insensitive and
@@ -372,13 +373,22 @@ CREATE INDEX idx_name_de ON users (name COLLATE "de_DE");
372373

373374
## 5. EF Core Integration (Separate Deliverable)
374375

376+
**Goal:** Full collation support in the EF Core provider — DDL generation, query translation,
377+
`EF.Functions.Collate()`, and `string.Equals(x, StringComparison)` translation.
378+
379+
See also **Section 12** for the ORM-vs-DB collation mismatch problem this solves.
380+
375381
#### Modified Files
376382
| File | Change |
377383
|---|---|
378384
| `src/SharpCoreDB.EntityFrameworkCore/Migrations/SharpCoreDBMigrationsSqlGenerator.cs``ColumnDefinition()` | Emit `COLLATE <name>` after type and NOT NULL |
379385
| `src/SharpCoreDB.EntityFrameworkCore/Storage/SharpCoreDBTypeMappingSource.cs` | Map `UseCollation()` to `CollationType` |
386+
| `src/SharpCoreDB.EntityFrameworkCore/Query/SharpCoreDBStringMethodCallTranslator.cs` | Translate `string.Equals(string, StringComparison)``COLLATE` SQL |
387+
| `src/SharpCoreDB.EntityFrameworkCore/Query/SharpCoreDBQuerySqlGenerator.cs` | Emit `COLLATE <name>` expression in SQL visitor |
388+
| `src/SharpCoreDB.EntityFrameworkCore/Query/SharpCoreDBMethodCallTranslatorPlugin.cs` | Register collate translator |
389+
| New: `src/SharpCoreDB.EntityFrameworkCore/Query/SharpCoreDBCollateTranslator.cs` | Translate `EF.Functions.Collate()` calls to SQL |
380390

381-
#### EF Core Fluent API
391+
#### 5.1 EF Core Fluent API — DDL Generation
382392

383393
```csharp
384394
modelBuilder.Entity<User>()
@@ -389,6 +399,78 @@ modelBuilder.Entity<User>()
389399
// Name TEXT COLLATE NOCASE
390400
```
391401

402+
#### 5.2 EF.Functions.Collate() — Query-Level Override
403+
404+
```csharp
405+
// Explicit collation override (standard EF Core pattern)
406+
var users = await context.Users
407+
.Where(u => EF.Functions.Collate(u.Name, "NOCASE") == "john")
408+
.ToListAsync();
409+
410+
// Generated SQL:
411+
// SELECT * FROM Users WHERE Name COLLATE NOCASE = 'john'
412+
```
413+
414+
#### 5.3 string.Equals(string, StringComparison) Translation (SharpCoreDB-Specific)
415+
416+
Other EF Core providers silently drop the `StringComparison` parameter.
417+
SharpCoreDB can do better because we control both sides:
418+
419+
```csharp
420+
// C# idiomatic case-insensitive comparison
421+
var users = db.Users
422+
.Where(u => u.Name.Equals("john", StringComparison.OrdinalIgnoreCase))
423+
.ToList();
424+
425+
// SharpCoreDB generates:
426+
// SELECT * FROM Users WHERE Name COLLATE NOCASE = 'john'
427+
//
428+
// Other EF providers would generate:
429+
// SELECT * FROM Users WHERE Name = 'john' ← WRONG if column is CS!
430+
```
431+
432+
**StringComparison → SQL mapping:**
433+
| C# `StringComparison` | Generated SQL |
434+
|---|---|
435+
| `Ordinal` | `WHERE Name = 'value'` (no COLLATE — uses column default) |
436+
| `OrdinalIgnoreCase` | `WHERE Name COLLATE NOCASE = 'value'` |
437+
| `CurrentCultureIgnoreCase` | `WHERE Name COLLATE UNICODE_CI = 'value'` (Phase 6) |
438+
| `InvariantCultureIgnoreCase` | `WHERE Name COLLATE NOCASE = 'value'` |
439+
440+
**Implementation in `SharpCoreDBStringMethodCallTranslator.cs`:**
441+
```csharp
442+
private static readonly MethodInfo _equalsWithComparisonMethod =
443+
typeof(string).GetRuntimeMethod(nameof(string.Equals),
444+
[typeof(string), typeof(StringComparison)])!;
445+
446+
// In Translate():
447+
if (method == _equalsWithComparisonMethod && instance is not null)
448+
{
449+
var comparisonArg = arguments[1];
450+
if (comparisonArg is SqlConstantExpression { Value: StringComparison comparison })
451+
{
452+
var collation = comparison switch
453+
{
454+
StringComparison.OrdinalIgnoreCase => "NOCASE",
455+
StringComparison.InvariantCultureIgnoreCase => "NOCASE",
456+
StringComparison.CurrentCultureIgnoreCase => "UNICODE_CI",
457+
_ => null // No COLLATE for case-sensitive comparisons
458+
};
459+
460+
if (collation is not null)
461+
{
462+
// Emit: column COLLATE NOCASE = @value
463+
return _sqlExpressionFactory.Equal(
464+
_sqlExpressionFactory.Collate(instance, collation),
465+
arguments[0]);
466+
}
467+
468+
// Case-sensitive: standard equality
469+
return _sqlExpressionFactory.Equal(instance, arguments[0]);
470+
}
471+
}
472+
```
473+
392474
---
393475

394476
## 6. Test Plan
@@ -409,6 +491,11 @@ modelBuilder.Entity<User>()
409491
| `LowerFunction_ShouldReturnLowercase` | 5 | `CollationQueryTests.cs` |
410492
| `SaveMetadata_WithCollation_ShouldPersistAndReload` | 1 | `CollationPersistenceTests.cs` |
411493
| `EFCore_UseCollation_ShouldEmitCollateDDL` | EF | `CollationEFCoreTests.cs` |
494+
| `EFCore_StringEqualsIgnoreCase_ShouldEmitCollateNoCase` | EF | `CollationEFCoreTests.cs` |
495+
| `EFCore_StringEqualsOrdinal_ShouldNotEmitCollate` | EF | `CollationEFCoreTests.cs` |
496+
| `EFCore_EFFunctionsCollate_ShouldEmitCollateClause` | EF | `CollationEFCoreTests.cs` |
497+
| `EFCore_NoCaseColumn_SimpleEquals_ShouldReturnBothCases` | EF | `CollationEFCoreTests.cs` |
498+
| `EFCore_CSColumn_IgnoreCase_ShouldLogDiagnosticWarning` | EF | `CollationEFCoreTests.cs` |
412499

413500
### Integration Tests
414501

@@ -417,6 +504,8 @@ modelBuilder.Entity<User>()
417504
| Create table with NOCASE → insert mixed-case → SELECT with exact case → should match | 3 |
418505
| Create table with NOCASE → create index → lookup with different case → should find via index | 4 |
419506
| Roundtrip: create table → save metadata → reload → verify collation preserved | 1 |
507+
| **ORM mismatch scenario:** CS column + `Equals(x, OrdinalIgnoreCase)` → returns both rows | EF |
508+
| **ORM mismatch scenario:** NOCASE column + simple `== "john"` → returns both rows | EF |
420509

421510
---
422511

@@ -498,6 +587,154 @@ modelBuilder.Entity<User>()
498587
| `src/SharpCoreDB/Services/EnhancedSqlParser.*.cs` | 5 |
499588
| `src/SharpCoreDB.EntityFrameworkCore/Migrations/SharpCoreDBMigrationsSqlGenerator.cs` | EF |
500589
| `src/SharpCoreDB.EntityFrameworkCore/Storage/SharpCoreDBTypeMappingSource.cs` | EF |
590+
| `src/SharpCoreDB.EntityFrameworkCore/Query/SharpCoreDBStringMethodCallTranslator.cs` | EF |
591+
| `src/SharpCoreDB.EntityFrameworkCore/Query/SharpCoreDBQuerySqlGenerator.cs` | EF |
592+
| `src/SharpCoreDB.EntityFrameworkCore/Query/SharpCoreDBMethodCallTranslatorPlugin.cs` | EF |
593+
| New: `src/SharpCoreDB.EntityFrameworkCore/Query/SharpCoreDBCollateTranslator.cs` | EF |
594+
595+
---
596+
597+
## 12. Critical Use Case: ORM-vs-Database Collation Mismatch
598+
599+
> **Source:** LinkedIn discussion (Dave Callan / Dmitry Maslov / Shay Rojansky — EF Core team)
600+
601+
### The Problem
602+
603+
There is a **fundamental semantic contradiction** between how C# LINQ and SQL handle
604+
string comparisons when collation is involved:
605+
606+
```csharp
607+
// Developer writes this C# LINQ query:
608+
var users = db.Users
609+
.Where(u => u.Name.Equals("john", StringComparison.OrdinalIgnoreCase))
610+
.ToList();
611+
612+
// Developer EXPECTS: 2 records ("John" and "john")
613+
// EF Core DEFAULT behavior: generates WHERE Name = 'john'
614+
// If column is COLLATE CS (case-sensitive): returns ONLY "john" → 1 record!
615+
```
616+
617+
The database was created with a case-sensitive collation:
618+
```sql
619+
CREATE TABLE Users (
620+
Id INT IDENTITY PRIMARY KEY,
621+
Name NVARCHAR(50) COLLATE Latin1_General_CS_AS -- case-sensitive!
622+
);
623+
624+
INSERT INTO Users (Name) VALUES ('John'), ('john');
625+
```
626+
627+
The C# code says "compare case-insensitively" but the database has a case-sensitive
628+
collation on the column. **The ORM cannot resolve this contradiction silently** because:
629+
630+
1. EF Core translates `.Equals("john", OrdinalIgnoreCase)` to `WHERE Name = 'john'`
631+
by default — it drops the `StringComparison` parameter entirely
632+
2. The SQL engine then applies the column's collation (`CS_AS`) → case-sensitive match
633+
3. Result: only 1 record instead of the expected 2
634+
635+
### Why This Is Hard (Industry-Wide)
636+
637+
As the EF Core team (Shay Rojansky) has noted, this is an unsolvable problem from
638+
the ORM side alone:
639+
- The ORM doesn't know the column's collation at query translation time
640+
- `StringComparison` in C# doesn't map 1:1 to SQL collations
641+
- Different databases have different collation systems
642+
- Silently adding `COLLATE` to every string comparison would break indexes
643+
644+
### SharpCoreDB Advantage: We Control Both Sides
645+
646+
Unlike generic EF Core providers, **we own both the ORM provider AND the SQL engine**.
647+
This gives us three strategies that other databases can't offer:
648+
649+
#### Strategy A: `EF.Functions.Collate()` — Explicit Query-Level Override (Recommended)
650+
651+
The standard EF Core approach. Developer explicitly requests collation in the query:
652+
653+
```csharp
654+
// ✅ EXPLICIT: Developer knows what they want
655+
var users = await context.Users
656+
.Where(u => EF.Functions.Collate(u.Name, "NOCASE") == "john")
657+
.ToListAsync();
658+
659+
// Generated SQL:
660+
// SELECT * FROM Users WHERE Name COLLATE NOCASE = 'john'
661+
```
662+
663+
**Implementation:** Add `EF.Functions.Collate()` translation to the
664+
`SharpCoreDBStringMethodCallTranslator`.
665+
666+
#### Strategy B: `string.Equals(x, StringComparison)` → COLLATE Translation
667+
668+
SharpCoreDB-specific: we can translate the `StringComparison` overload since we
669+
know our collation system:
670+
671+
```csharp
672+
// ✅ C# idiomatic — SharpCoreDB translates the StringComparison
673+
var users = db.Users
674+
.Where(u => u.Name.Equals("john", StringComparison.OrdinalIgnoreCase))
675+
.ToList();
676+
677+
// Generated SQL (SharpCoreDB-specific):
678+
// SELECT * FROM Users WHERE Name COLLATE NOCASE = 'john'
679+
```
680+
681+
Mapping table:
682+
| `StringComparison` | SharpCoreDB SQL |
683+
|---|---|
684+
| `Ordinal` | `= 'value'` (no COLLATE, uses column default) |
685+
| `OrdinalIgnoreCase` | `COLLATE NOCASE = 'value'` |
686+
| `CurrentCultureIgnoreCase` | `COLLATE UNICODE_CI = 'value'` (Phase 6) |
687+
| `InvariantCultureIgnoreCase` | `COLLATE NOCASE = 'value'` |
688+
689+
**Implementation:** Add `string.Equals(string, StringComparison)` overload to
690+
`SharpCoreDBStringMethodCallTranslator.cs`.
691+
692+
#### Strategy C: Column Collation Awareness at Translation Time
693+
694+
Since we control the provider, we can read column metadata during query translation
695+
and emit a **warning** when the C# comparison semantics conflict with the column collation:
696+
697+
```
698+
⚠️ SharpCoreDB Warning: Column 'Users.Name' has COLLATE BINARY (case-sensitive),
699+
but query uses StringComparison.OrdinalIgnoreCase. Consider using
700+
EF.Functions.Collate() or setting .UseCollation("NOCASE") on the property.
701+
```
702+
703+
### SharpCoreDB Resolution: The "No Surprise" Approach
704+
705+
For SharpCoreDB, we recommend the following behavior:
706+
707+
1. **Column defined with `COLLATE NOCASE`** → All comparisons on that column are
708+
case-insensitive by default. `WHERE Name = 'john'` matches both `'John'` and `'john'`.
709+
No mismatch possible.
710+
711+
2. **Column defined with `COLLATE BINARY` (default)** + C# `OrdinalIgnoreCase`
712+
The EF Core provider emits `COLLATE NOCASE` in the generated SQL to honor the
713+
developer's intent. This is safe because SharpCoreDB's query engine evaluates
714+
`COLLATE` per-expression (Phase 5).
715+
716+
3. **`EF.Functions.Collate()`** → Always available as the explicit escape hatch,
717+
matching EF Core conventions.
718+
719+
### Test Cases for This Scenario
720+
721+
| Test | Expected Behavior |
722+
|---|---|
723+
| `CS_Column_EqualsIgnoreCase_ShouldEmitCollateNoCase` | `Name.Equals("john", OrdinalIgnoreCase)` → SQL contains `COLLATE NOCASE` |
724+
| `NOCASE_Column_SimpleEquals_ShouldMatchBothCases` | Column is NOCASE → `WHERE Name = 'john'` returns both 'John' and 'john' |
725+
| `EFCollateFunction_ShouldEmitCollateClause` | `EF.Functions.Collate(u.Name, "NOCASE")` → SQL contains `Name COLLATE NOCASE` |
726+
| `CS_Column_OrdinalEquals_ShouldNotAddCollate` | `Name.Equals("john", Ordinal)` → no COLLATE in SQL (honor DB collation) |
727+
| `MismatchWarning_CS_Column_IgnoreCase_ShouldLogWarning` | CS column + IgnoreCase → diagnostic warning logged |
728+
729+
### Files Impacted (Additional to existing plan)
730+
731+
| File | Change | Phase |
732+
|---|---|---|
733+
| `SharpCoreDBStringMethodCallTranslator.cs` | Add `string.Equals(string, StringComparison)` overload + `EF.Functions.Collate()` | EF Core |
734+
| `SharpCoreDBQuerySqlGenerator.cs` | Emit `COLLATE <name>` expression in SQL output | EF Core |
735+
| `SharpCoreDBMethodCallTranslatorPlugin.cs` | Register collate translator | EF Core |
736+
| New: `SharpCoreDBCollateTranslator.cs` | Translate `EF.Functions.Collate()` calls | EF Core |
737+
| `SqlAst.Nodes.cs``CollateExpressionNode` | Already in Phase 5 | 5 |
501738

502739
---
503740

0 commit comments

Comments
 (0)