|
| 1 | +# SIMD API Standards — SharpCoreDB |
| 2 | + |
| 3 | +> **Mandatory for all SIMD code in SharpCoreDB.** Non-compliant code will be rejected in review. |
| 4 | +
|
| 5 | +## Required API: `System.Runtime.Intrinsics` |
| 6 | + |
| 7 | +All SIMD code **MUST** use the explicit intrinsics from `System.Runtime.Intrinsics`: |
| 8 | + |
| 9 | +```csharp |
| 10 | +// ✅ REQUIRED — explicit multi-tier intrinsics |
| 11 | +using System.Runtime.Intrinsics; |
| 12 | +using System.Runtime.Intrinsics.X86; |
| 13 | + |
| 14 | +Vector512<float> v512 = Vector512.LoadUnsafe(ref data); |
| 15 | +Vector256<float> v256 = Vector256.LoadUnsafe(ref data); |
| 16 | +Vector128<float> v128 = Vector128.LoadUnsafe(ref data); |
| 17 | +``` |
| 18 | + |
| 19 | +## Banned API: `System.Numerics.Vector<T>` |
| 20 | + |
| 21 | +**DO NOT** use the old portable `Vector<T>` from `System.Numerics`: |
| 22 | + |
| 23 | +```csharp |
| 24 | +// ❌ BANNED — old portable SIMD (no explicit ISA control) |
| 25 | +using System.Numerics; |
| 26 | + |
| 27 | +Vector<float>.Count; // ❌ |
| 28 | +Vector.IsHardwareAccelerated; // ❌ |
| 29 | +MemoryMarshal.Cast<float, Vector<float>>(...); // ❌ |
| 30 | +Vector.Sum(...); // ❌ |
| 31 | +``` |
| 32 | + |
| 33 | +### Why? |
| 34 | + |
| 35 | +As recommended by Tanner Gooding (.NET Runtime team), `System.Runtime.Intrinsics` is the |
| 36 | +modern, preferred API for .NET 8+ / .NET 10: |
| 37 | + |
| 38 | +| Feature | `System.Numerics.Vector<T>` (OLD) | `System.Runtime.Intrinsics` (NEW) | |
| 39 | +|---|---|---| |
| 40 | +| ISA control | JIT decides width | Explicit per-tier | |
| 41 | +| AVX-512 | No explicit support | Full `Avx512F` access | |
| 42 | +| FMA | Not accessible | `Fma.MultiplyAdd()` | |
| 43 | +| NativeAOT | Width may vary | Deterministic codegen | |
| 44 | +| Instruction selection | Opaque | You choose the instruction | |
| 45 | +| Multi-tier dispatch | Not possible | AVX-512 → AVX2 → SSE → Scalar | |
| 46 | + |
| 47 | +## Required Multi-Tier Dispatch Pattern |
| 48 | + |
| 49 | +Every SIMD hot path must implement a tiered fallback chain: |
| 50 | + |
| 51 | +```csharp |
| 52 | +if (Avx512F.IsSupported && len >= AVX512_THRESHOLD) |
| 53 | +{ |
| 54 | + // Vector512<T> path |
| 55 | +} |
| 56 | +else if (Avx2.IsSupported && len >= 8) // or Sse.IsSupported for float |
| 57 | +{ |
| 58 | + // Vector256<T> path |
| 59 | +} |
| 60 | +else if (Sse.IsSupported && len >= 4) // or Sse2 for int/double |
| 61 | +{ |
| 62 | + // Vector128<T> path |
| 63 | +} |
| 64 | + |
| 65 | +// Scalar tail — ALWAYS required |
| 66 | +for (; i < len; i++) { /* scalar */ } |
| 67 | +``` |
| 68 | + |
| 69 | +### ISA Check Mapping |
| 70 | + |
| 71 | +| Data Type | 512-bit | 256-bit | 128-bit | |
| 72 | +|---|---|---|---| |
| 73 | +| `float` | `Avx512F.IsSupported` | `Avx2.IsSupported` | `Sse.IsSupported` | |
| 74 | +| `double` | `Avx512F.IsSupported` | `Avx2.IsSupported` | `Sse2.IsSupported` | |
| 75 | +| `int` | `Avx512F.IsSupported` | `Avx2.IsSupported` | `Sse2.IsSupported` | |
| 76 | +| `long` | `Avx512F.IsSupported` | `Avx2.IsSupported` | `Sse2.IsSupported` | |
| 77 | +| `byte` (XOR/popcount) | `Avx512BW.IsSupported` | `Avx2.IsSupported` | `Sse2.IsSupported` | |
| 78 | + |
| 79 | +## Required Patterns |
| 80 | + |
| 81 | +### Loading Data (use `LoadUnsafe`, not pointer-based loads) |
| 82 | + |
| 83 | +```csharp |
| 84 | +// ✅ DO — safe ref-based loading (no 'fixed', no unsafe) |
| 85 | +ref float refData = ref MemoryMarshal.GetReference(span); |
| 86 | +var vec = Vector256.LoadUnsafe(ref Unsafe.Add(ref refData, i)); |
| 87 | + |
| 88 | +// ✅ ALSO OK — pointer-based when already in unsafe context |
| 89 | +var vec = Avx.LoadVector256(ptr + i); |
| 90 | + |
| 91 | +// ❌ DON'T — old MemoryMarshal.Cast to Vector<T> |
| 92 | +var vecs = MemoryMarshal.Cast<float, Vector<float>>(span); |
| 93 | +``` |
| 94 | + |
| 95 | +### FMA (Fused Multiply-Add) |
| 96 | + |
| 97 | +Always use FMA when available — better throughput AND precision: |
| 98 | + |
| 99 | +```csharp |
| 100 | +// ✅ DO — FMA with fallback |
| 101 | +if (Fma.IsSupported) |
| 102 | + vSum = Fma.MultiplyAdd(va, vb, vSum); // a*b + c in one instruction |
| 103 | +else |
| 104 | + vSum += va * vb; |
| 105 | + |
| 106 | +// ✅ DO — AVX-512 always has FMA |
| 107 | +vSum = Avx512F.FusedMultiplyAdd(va, vb, vSum); |
| 108 | +``` |
| 109 | + |
| 110 | +### Horizontal Reduction |
| 111 | + |
| 112 | +```csharp |
| 113 | +// ✅ DO — cross-platform Vector*.Sum() |
| 114 | +float result = Vector256.Sum(vAccumulator); |
| 115 | + |
| 116 | +// ❌ DON'T — old System.Numerics |
| 117 | +float result = Vector.Sum(vAccumulator); |
| 118 | +``` |
| 119 | + |
| 120 | +### Storing Results |
| 121 | + |
| 122 | +```csharp |
| 123 | +// ✅ DO |
| 124 | +result.StoreUnsafe(ref Unsafe.Add(ref refDst, i)); |
| 125 | + |
| 126 | +// ❌ DON'T — old MemoryMarshal.Cast to write |
| 127 | +var spanDst = MemoryMarshal.Cast<float, Vector<float>>(result.AsSpan()); |
| 128 | +spanDst[j] = value; |
| 129 | +``` |
| 130 | + |
| 131 | +## AVX-512 Thresholds |
| 132 | + |
| 133 | +AVX-512 has a frequency throttling cost on some CPUs. Use minimum element thresholds: |
| 134 | + |
| 135 | +| Use Case | Minimum Elements | |
| 136 | +|---|---| |
| 137 | +| WHERE filtering (int/double) | 1024 | |
| 138 | +| Distance metrics (float) | 64 | |
| 139 | +| Batch XOR / Hamming | 32 bytes | |
| 140 | + |
| 141 | +## Reference Implementations |
| 142 | + |
| 143 | +- **WHERE filtering**: `src/SharpCoreDB/Optimizations/SimdWhereFilter.cs` |
| 144 | +- **Distance metrics**: `src/SharpCoreDB.VectorSearch/Distance/DistanceMetrics.cs` |
| 145 | +- **Hamming distance**: `src/SharpCoreDB.VectorSearch/Quantization/BinaryQuantizer.cs` |
| 146 | + |
| 147 | +## Code Review Checklist |
| 148 | + |
| 149 | +- [ ] No `System.Numerics.Vector<T>` usage |
| 150 | +- [ ] No `Vector.IsHardwareAccelerated` checks |
| 151 | +- [ ] Uses `Vector128<T>` / `Vector256<T>` / `Vector512<T>` from `System.Runtime.Intrinsics` |
| 152 | +- [ ] Multi-tier dispatch: AVX-512 → AVX2 → SSE → Scalar |
| 153 | +- [ ] Scalar tail for remainder elements |
| 154 | +- [ ] FMA used where available (`Fma.MultiplyAdd` / `Avx512F.FusedMultiplyAdd`) |
| 155 | +- [ ] `[MethodImpl(MethodImplOptions.AggressiveOptimization)]` on hot paths |
| 156 | +- [ ] AVX-512 guarded by minimum element threshold |
| 157 | + |
| 158 | +--- |
| 159 | + |
| 160 | +**Enforcement:** All new and modified SIMD code must comply. Existing violations should be migrated on contact. |
| 161 | +**Last Updated:** 2025-07-08 |
0 commit comments