MPCoreDeveloper
diff --git a/‎.github/SIMD_STANDARDS.md‎
Lines changed: 161 additions & 0 deletions b/‎.github/SIMD_STANDARDS.md‎
Lines changed: 161 additions & 0 deletions
diff --git a/‎SharpCoreDB.sln‎
Lines changed: 15 additions & 0 deletions b/‎SharpCoreDB.sln‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎docs/CHANGELOG.md‎
Lines changed: 16 additions & 1 deletion b/‎docs/CHANGELOG.md‎
Lines changed: 16 additions & 1 deletion
diff --git a/‎docs/COLLATE_ISSUE_BODY.md‎
Lines changed: 83 additions & 0 deletions b/‎docs/COLLATE_ISSUE_BODY.md‎
Lines changed: 83 additions & 0 deletions
@@ -0,0 +1,161 @@
+# SIMD API Standards — SharpCoreDB
+
+> **Mandatory for all SIMD code in SharpCoreDB.** Non-compliant code will be rejected in review.
+
+## Required API: `System.Runtime.Intrinsics`
+
+All SIMD code **MUST** use the explicit intrinsics from `System.Runtime.Intrinsics`:
+
+```csharp
+// ✅ REQUIRED — explicit multi-tier intrinsics
+using System.Runtime.Intrinsics;
+using System.Runtime.Intrinsics.X86;
+
+Vector512<float> v512 = Vector512.LoadUnsafe(ref data);
+Vector256<float> v256 = Vector256.LoadUnsafe(ref data);
+Vector128<float> v128 = Vector128.LoadUnsafe(ref data);
+```
+
+## Banned API: `System.Numerics.Vector<T>`
+
+**DO NOT** use the old portable `Vector<T>` from `System.Numerics`:
+
+```csharp
+// ❌ BANNED — old portable SIMD (no explicit ISA control)
+using System.Numerics;
+
+Vector<float>.Count;              // ❌
+Vector.IsHardwareAccelerated;     // ❌
+MemoryMarshal.Cast<float, Vector<float>>(...); // ❌
+Vector.Sum(...);                  // ❌
+```
+
+### Why?
+
+As recommended by Tanner Gooding (.NET Runtime team), `System.Runtime.Intrinsics` is the
+modern, preferred API for .NET 8+ / .NET 10:
+
+| Feature | `System.Numerics.Vector<T>` (OLD) | `System.Runtime.Intrinsics` (NEW) |
+|---|---|---|
+| ISA control | JIT decides width | Explicit per-tier |
+| AVX-512 | No explicit support | Full `Avx512F` access |
+| FMA | Not accessible | `Fma.MultiplyAdd()` |
+| NativeAOT | Width may vary | Deterministic codegen |
+| Instruction selection | Opaque | You choose the instruction |
+| Multi-tier dispatch | Not possible | AVX-512 → AVX2 → SSE → Scalar |
+
+## Required Multi-Tier Dispatch Pattern
+
+Every SIMD hot path must implement a tiered fallback chain:
+
+```csharp
+if (Avx512F.IsSupported && len >= AVX512_THRESHOLD)
+{
+    // Vector512<T> path
+}
+else if (Avx2.IsSupported && len >= 8)  // or Sse.IsSupported for float
+{
+    // Vector256<T> path
+}
+else if (Sse.IsSupported && len >= 4)   // or Sse2 for int/double
+{
+    // Vector128<T> path
+}
+
+// Scalar tail — ALWAYS required
+for (; i < len; i++) { /* scalar */ }
+```
+
+### ISA Check Mapping
+
+| Data Type | 512-bit | 256-bit | 128-bit |
+|---|---|---|---|
+| `float` | `Avx512F.IsSupported` | `Avx2.IsSupported` | `Sse.IsSupported` |
+| `double` | `Avx512F.IsSupported` | `Avx2.IsSupported` | `Sse2.IsSupported` |
+| `int` | `Avx512F.IsSupported` | `Avx2.IsSupported` | `Sse2.IsSupported` |
+| `long` | `Avx512F.IsSupported` | `Avx2.IsSupported` | `Sse2.IsSupported` |
+| `byte` (XOR/popcount) | `Avx512BW.IsSupported` | `Avx2.IsSupported` | `Sse2.IsSupported` |
+
+## Required Patterns
+
+### Loading Data (use `LoadUnsafe`, not pointer-based loads)
+
+```csharp
+// ✅ DO — safe ref-based loading (no 'fixed', no unsafe)
+ref float refData = ref MemoryMarshal.GetReference(span);
+var vec = Vector256.LoadUnsafe(ref Unsafe.Add(ref refData, i));
+
+// ✅ ALSO OK — pointer-based when already in unsafe context
+var vec = Avx.LoadVector256(ptr + i);
+
+// ❌ DON'T — old MemoryMarshal.Cast to Vector<T>
+var vecs = MemoryMarshal.Cast<float, Vector<float>>(span);
+```
+
+### FMA (Fused Multiply-Add)
+
+Always use FMA when available — better throughput AND precision:
+
+```csharp
+// ✅ DO — FMA with fallback
+if (Fma.IsSupported)
+    vSum = Fma.MultiplyAdd(va, vb, vSum);   // a*b + c in one instruction
+else
+    vSum += va * vb;
+
+// ✅ DO — AVX-512 always has FMA
+vSum = Avx512F.FusedMultiplyAdd(va, vb, vSum);
+```
+
+### Horizontal Reduction
+
+```csharp
+// ✅ DO — cross-platform Vector*.Sum()
+float result = Vector256.Sum(vAccumulator);
+
+// ❌ DON'T — old System.Numerics
+float result = Vector.Sum(vAccumulator);
+```
+
+### Storing Results
+
+```csharp
+// ✅ DO
+result.StoreUnsafe(ref Unsafe.Add(ref refDst, i));
+
+// ❌ DON'T — old MemoryMarshal.Cast to write
+var spanDst = MemoryMarshal.Cast<float, Vector<float>>(result.AsSpan());
+spanDst[j] = value;
+```
+
+## AVX-512 Thresholds
+
+AVX-512 has a frequency throttling cost on some CPUs. Use minimum element thresholds:
+
+| Use Case | Minimum Elements |
+|---|---|
+| WHERE filtering (int/double) | 1024 |
+| Distance metrics (float) | 64 |
+| Batch XOR / Hamming | 32 bytes |
+
+## Reference Implementations
+
+- **WHERE filtering**: `src/SharpCoreDB/Optimizations/SimdWhereFilter.cs`
+- **Distance metrics**: `src/SharpCoreDB.VectorSearch/Distance/DistanceMetrics.cs`
+- **Hamming distance**: `src/SharpCoreDB.VectorSearch/Quantization/BinaryQuantizer.cs`
+
+## Code Review Checklist
+
+- [ ] No `System.Numerics.Vector<T>` usage
+- [ ] No `Vector.IsHardwareAccelerated` checks
+- [ ] Uses `Vector128<T>` / `Vector256<T>` / `Vector512<T>` from `System.Runtime.Intrinsics`
+- [ ] Multi-tier dispatch: AVX-512 → AVX2 → SSE → Scalar
+- [ ] Scalar tail for remainder elements
+- [ ] FMA used where available (`Fma.MultiplyAdd` / `Avx512F.FusedMultiplyAdd`)
+- [ ] `[MethodImpl(MethodImplOptions.AggressiveOptimization)]` on hot paths
+- [ ] AVX-512 guarded by minimum element threshold
+
+---
+
+**Enforcement:** All new and modified SIMD code must comply. Existing violations should be migrated on contact.  
+**Last Updated:** 2025-07-08
@@ -71,6 +71,8 @@ Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "RefFieldDemo", "tests\Manua
 EndProject
 Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SharpCoreDB.VectorSearch", "src\SharpCoreDB.VectorSearch\SharpCoreDB.VectorSearch.csproj", "{5E39577A-E286-45E9-9801-E8DC8F81ED7D}"
 EndProject
+Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SharpCoreDB.VectorSearch.Tests", "tests\SharpCoreDB.VectorSearch.Tests\SharpCoreDB.VectorSearch.Tests.csproj", "{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}"
+EndProject
 Global
 	GlobalSection(SolutionConfigurationPlatforms) = preSolution
 		Debug|Any CPU = Debug|Any CPU
@@ -285,6 +287,18 @@ Global
 		{5E39577A-E286-45E9-9801-E8DC8F81ED7D}.Release|x64.Build.0 = Release|Any CPU
 		{5E39577A-E286-45E9-9801-E8DC8F81ED7D}.Release|x86.ActiveCfg = Release|Any CPU
 		{5E39577A-E286-45E9-9801-E8DC8F81ED7D}.Release|x86.Build.0 = Release|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Debug|Any CPU.Build.0 = Debug|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Debug|x64.ActiveCfg = Debug|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Debug|x64.Build.0 = Debug|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Debug|x86.ActiveCfg = Debug|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Debug|x86.Build.0 = Debug|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Release|Any CPU.ActiveCfg = Release|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Release|Any CPU.Build.0 = Release|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Release|x64.ActiveCfg = Release|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Release|x64.Build.0 = Release|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Release|x86.ActiveCfg = Release|Any CPU
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA}.Release|x86.Build.0 = Release|Any CPU
 	EndGlobalSection
 	GlobalSection(SolutionProperties) = preSolution
 		HideSolutionNode = FALSE
@@ -313,6 +327,7 @@ Global
 		{53FBF8A2-0A92-44AD-9CD0-4FBDF7F62C9B} = {2F8A8533-DAA8-4CF9-A6C0-2F663AF7FD2E}
 		{800EBE19-FE8B-EBD4-D5F7-AFB27C1755A0} = {A1B2C3D4-E5F6-4A7B-8C9D-0E1F2A3B4C5D}
 		{5E39577A-E286-45E9-9801-E8DC8F81ED7D} = {F8B5E3A4-1C2D-4E5F-8B9A-1D2E3F4A5B6C}
+		{A55A128B-6E04-4FC5-A3FF-6F05F111FECA} = {A1B2C3D4-E5F6-4A7B-8C9D-0E1F2A3B4C5D}
 	EndGlobalSection
 	GlobalSection(ExtensibilityGlobals) = postSolution
 		SolutionGuid = {F40825F5-26A1-4E85-9D0A-B0121A7ED5F8}
 
@@ -10,7 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### ✨ Added
 - **Vector Search Extension** (`SharpCoreDB.VectorSearch` NuGet package)
   - SIMD-accelerated distance metrics: cosine, Euclidean (L2), dot product
-  - `System.Numerics.Vector<float>` auto-selects widest SIMD: AVX-512 → AVX2 → SSE2/NEON → scalar
+  - Multi-tier dispatch: AVX-512 → AVX2 → SSE → scalar with FMA when available
   - HNSW approximate nearest neighbor index with configurable M, efConstruction, efSearch
   - Flat (brute-force) exact search index for small datasets or perfect recall
   - Binary format for vector serialization with magic bytes, version header, and zero-copy spans
@@ -20,9 +20,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   - Seven SQL functions: `vec_distance_cosine`, `vec_distance_l2`, `vec_distance_dot`, `vec_from_float32`, `vec_to_json`, `vec_normalize`, `vec_dimensions`
   - DI registration: `services.AddVectorSupport()` with configuration presets (Embedded, Standard, Enterprise)
   - Zero overhead when not registered — all vector support is 100% optional
+- **Query Planner: Vector Index Acceleration** (Phase 5.4)
+  - Detects `ORDER BY vec_distance_*(col, query) LIMIT k` patterns automatically
+  - Routes to HNSW/Flat index instead of full table scan + sort
+  - `VectorIndexManager` manages live in-memory index instances per table/column
+  - `VectorQueryOptimizer` implements `IVectorQueryOptimizer` for pluggable optimization
+  - `CREATE VECTOR INDEX` now builds live in-memory index immediately
+  - `DROP VECTOR INDEX` cleans up live index from registry
+  - `EXPLAIN` shows "Vector Index Scan (HNSW)" or "Vector Index Scan (Flat/Exact)"
+  - Fallback to full scan when no index exists — zero behavioral change for existing queries
 - **Core: Extension Provider System**
   - `ICustomFunctionProvider` interface for pluggable SQL functions
   - `ICustomTypeProvider` interface for pluggable data types
+  - `IVectorQueryOptimizer` interface for vector query acceleration
   - `DataType.Vector` enum value (stored as BLOB internally)
   - `VECTOR(N)` column type parsing in CREATE TABLE
   - `ColumnDefinition.Dimensions` for VECTOR(N) metadata
@@ -31,6 +41,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   - `CREATE VECTOR INDEX idx ON table(col) USING FLAT|HNSW`
   - `DROP VECTOR INDEX idx ON table`
   - Vector column type validation at index creation time
+- **SIMD Standards** (`.github/SIMD_STANDARDS.md`)
+  - Mandatory `System.Runtime.Intrinsics` API for all SIMD code
+  - Multi-tier dispatch pattern (AVX-512 → AVX2 → SSE → scalar)
+  - FMA support for fused multiply-add
+  - Banned `System.Numerics.Vector<T>` (old portable SIMD)
 
 ### 📊 Version Info
 - **Package Version**: 1.2.0
 
@@ -0,0 +1,83 @@
+## Feature: SQL COLLATE Support for Case-Insensitive and Locale-Aware String Comparisons
+
+### Summary
+
+Add SQL-standard `COLLATE` support to SharpCoreDB, enabling case-insensitive and locale-aware string comparisons at the column level, index level, and query level.
+
+### Motivation
+
+Currently, all string comparisons in SharpCoreDB are binary (case-sensitive). Users need the ability to:
+- Define case-insensitive columns (e.g., `Name TEXT COLLATE NOCASE`)
+- Have indexes automatically respect collation (case-insensitive lookups)
+- Override collation at query time
+- Eventually support locale-aware sorting (e.g., German ß, Turkish İ)
+
+### Target SQL Syntax
+
+```sql
+-- Column-level collation in DDL
+CREATE TABLE Users (
+    Id INTEGER PRIMARY KEY AUTO,
+    Name TEXT COLLATE NOCASE,
+    Email TEXT COLLATE NOCASE
+);
+
+-- Index automatically inherits column collation
+CREATE INDEX idx_users_name ON Users(Name);  -- case-insensitive automatically
+
+-- Query-level override (future)
+SELECT * FROM Users WHERE Name COLLATE NOCASE = @var;
+SELECT * FROM Users WHERE LOWER(Name) = LOWER(@name);
+
+-- Locale-aware indexes (future)
+CREATE INDEX idx_name_ci ON users (name COLLATE "en_US" NOCASE);
+CREATE INDEX idx_name_cs ON users (name);  -- default is case-sensitive
+```
+
+### EF Core Integration (Future)
+
+```csharp
+modelBuilder.Entity<User>()
+    .Property(u => u.Name)
+    .UseCollation("NOCASE");
+```
+
+### Implementation Plan
+
+📄 **Full plan:** [`docs/COLLATE_SUPPORT_PLAN.md`](https://github.com/MPCoreDeveloper/SharpCoreDB/blob/master/docs/COLLATE_SUPPORT_PLAN.md)
+
+### Phases
+
+| Phase | Description | Priority | Impact |
+|-------|-------------|----------|--------|
+| **Phase 1** | Core types (`CollationType` enum), ITable/Table metadata, persistence | P0 | Foundation — 7 files |
+| **Phase 2** | DDL parsing (`COLLATE` in `CREATE TABLE` and `ALTER TABLE ADD COLUMN`) | P0 | `SqlParser.DDL.cs`, `EnhancedSqlParser.DDL.cs` |
+| **Phase 3** | Collation-aware WHERE filtering, JOIN conditions, ORDER BY | P0 | `SqlParser.Helpers.cs`, `CompiledQueryExecutor.cs` |
+| **Phase 4** | Index integration — HashIndex/BTree key normalization | P1 | `HashIndex.cs`, `BTree.cs`, `GenericHashIndex.cs` |
+| **Phase 5** | Query-level `COLLATE` override + `LOWER()`/`UPPER()` functions | P2 | Enhanced parser + AST nodes |
+| **Phase 6** | Locale-aware collations (ICU-based, culture-specific) | P3 | Future/research |
+| **EF Core** | `UseCollation()` fluent API + DDL emission | Separate | `SharpCoreDBMigrationsSqlGenerator.cs` |
+
+### Codebase Impact (from investigation)
+
+**20+ files** across core engine, SQL parsers, indexes, metadata, and EF Core provider.
+
+Key touchpoints identified:
+- `EvaluateOperator()` — currently uses `rowValueStr == value` (binary only)
+- `CompareKeys()` in BTree — uses `string.CompareOrdinal()` (binary only)
+- `HashIndex` — uses `SimdHashEqualityComparer` (binary only)
+- `ColumnDefinition` — missing `Collation` property
+- `ITable` / `Table` — missing `ColumnCollations` per-column list
+- `SaveMetadata()` — missing collation serialization
+- `ColumnInfo` — missing collation in metadata discovery
+
+### Backward Compatibility
+
+- ✅ Default behavior unchanged (all existing tables default to `Binary`)
+- ✅ Metadata migration: missing `ColumnCollations` → all Binary
+- ✅ All new parameters are optional with Binary defaults
+- ✅ Existing indexes continue to work
+
+### Labels
+
+`enhancement`, `sql-engine`, `roadmap`