|
| 1 | +# Complete Feature Comparison: rust-ndarray vs. AdaWorldAPI Fork |
| 2 | + |
| 3 | +> 80,131 lines of new code across 146 HPC modules, 6 SIMD files, 5 backend files, 20 burn ops, and 2 subcrates. |
| 4 | +
|
| 5 | +## At a Glance |
| 6 | + |
| 7 | +| Metric | Upstream [rust-ndarray/ndarray](https://github.com/rust-ndarray/ndarray) | **[AdaWorldAPI/ndarray](https://github.com/AdaWorldAPI/ndarray)** | |
| 8 | +|--------|-----------|------------| |
| 9 | +| Base functionality | n-dimensional arrays, slicing, views | **Same** (full upstream preserved) | |
| 10 | +| New LOC added | — | **80,131** | |
| 11 | +| New files | — | **179** (146 HPC + 6 SIMD + 5 backend + 20 burn + 2 subcrates) | |
| 12 | +| Test count | ~300 | **~1,180** (300 upstream + 880 new) | |
| 13 | +| SIMD ISAs | SSE2 via matrixmultiply (external) | **7 ISAs**: AVX-512, AVX2, SSE2, AMX, VNNI, NEON (3 tiers), WASM | |
| 14 | +| Numeric types | f32, f64 | **+f16, BF16, i8, u8, i16** (all with SIMD paths) | |
| 15 | +| BLAS coverage | dot (via matrixmultiply) | **Full L1 + L2 + L3** (pure-Rust + MKL + OpenBLAS) | |
| 16 | +| Target platforms | x86_64 (via external BLAS), scalar everywhere else | **x86_64 (tiered), aarch64 (3-tier NEON), wasm (prepared)** | |
| 17 | +| Minimum Rust | 1.64 | **1.94 stable** (no nightly) | |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## SIMD Layer (6 files, ~5,700 LOC) |
| 22 | + |
| 23 | +| Component | Upstream | **Fork** | |
| 24 | +|-----------|----------|----------| |
| 25 | +| `simd.rs` — dispatch + re-exports | Not present | **LazyLock tier detection, PREFERRED_LANES, type re-exports** | |
| 26 | +| `simd_avx512.rs` — 512-bit types | Not present | **11 types: F32x16, F64x8, U8x64, I32x16, I64x8, U32x16, U64x8, F32x8, F64x4, BF16x16, BF16x8 + F16 IEEE 754** (2,700 LOC) | |
| 27 | +| `simd_avx2.rs` — 256-bit ops | Not present | **BLAS L1, Hamming, i8 dot, popcount, F16 precision toolkit** (1,600 LOC) | |
| 28 | +| `simd_neon.rs` — ARM 128-bit | Not present | **3-tier NEON: A53 baseline, A72 dual-pipe, A76 dotprod+fp16; codebook gather, Hamming, Base17 L1** (500 LOC) | |
| 29 | +| `simd_amx.rs` — Intel tile matrix | Not present | **AMX detection (CPUID+XCR0), VNNI 512/256, MatVec dispatch, quantize/dequantize** (350 LOC) | |
| 30 | +| `simd_wasm.rs` — WebAssembly | Not present | **Scaffolding for WASM SIMD128** | |
| 31 | + |
| 32 | +## Backend Layer (5 files, ~2,000 LOC) |
| 33 | + |
| 34 | +| Component | Upstream | **Fork** | |
| 35 | +|-----------|----------|----------| |
| 36 | +| `backend/mod.rs` — BlasFloat trait | Not present | **Trait-based dispatch: Native / MKL / OpenBLAS** | |
| 37 | +| `backend/native.rs` — pure-Rust GEMM | Not present | **Goto-algorithm 6x16/6x8 microkernels, cache-blocked (L1/L2/L3), AVX-512+AVX2 dispatch** | |
| 38 | +| `backend/kernels_avx512.rs` | Not present | **AVX-512 SIMD GEMM kernels** | |
| 39 | +| `backend/mkl.rs` | Not present | **Intel MKL FFI (feature = "intel-mkl")** | |
| 40 | +| `backend/openblas.rs` | Not present | **OpenBLAS FFI (feature = "openblas")** | |
| 41 | +| GEMM throughput (1024x1024) | ~13 GFLOPS (via matrixmultiply) | **139 GFLOPS** (10.5x improvement) | |
| 42 | + |
| 43 | +## HPC Module Library (146 files, ~70,000 LOC, 880 tests) |
| 44 | + |
| 45 | +### Linear Algebra (BLAS + LAPACK) |
| 46 | + |
| 47 | +| Module | Upstream | **Fork** | Operations | |
| 48 | +|--------|----------|----------|------------| |
| 49 | +| `blas_level1.rs` | dot only (external) | **Full** | dot, axpy, scal, nrm2, asum, iamax, Givens rotation | |
| 50 | +| `blas_level2.rs` | Not present | **Full** | gemv, ger, symv, trmv, trsv | |
| 51 | +| `blas_level3.rs` | dot→gemm (external) | **Goto GEMM** | gemm, syrk, trsm, symm (cache-blocked, multithreaded) | |
| 52 | +| `quantized.rs` | Not present | **New** | BF16 GEMM, INT8 GEMM, quantize/dequantize | |
| 53 | +| `lapack.rs` | Not present | **New** | LU, Cholesky, QR factorization | |
| 54 | + |
| 55 | +### Signal Processing |
| 56 | + |
| 57 | +| Module | Upstream | **Fork** | Detail | |
| 58 | +|--------|----------|----------|--------| |
| 59 | +| `fft.rs` | Not present | **Cooley-Tukey** | Radix-2 FFT/IFFT, in-place | |
| 60 | +| `vml.rs` | Not present | **Vector Math** | exp, ln, sqrt, erf, cbrt, sin, cos (SIMD F32x16 paths) | |
| 61 | +| `statistics.rs` | Not present | **Statistics** | median, variance, std, percentile, top_k | |
| 62 | +| `activations.rs` | Not present | **Neural Net** | sigmoid, softmax, log_softmax, GELU, SiLU (fused SIMD) | |
| 63 | + |
| 64 | +### Hardware Detection + Dispatch |
| 65 | + |
| 66 | +| Module | Upstream | **Fork** | Detail | |
| 67 | +|--------|----------|----------|--------| |
| 68 | +| `simd_caps.rs` | Not present | **SimdCaps** | LazyLock detection: AVX-512/AVX2/SSE2/FMA/NEON/dotprod/fp16/aes/sha2/crc32 + **ArmProfile** (A53/A72/A76) | |
| 69 | +| `simd_dispatch.rs` | Not present | **SimdDispatch** | Frozen fn-pointer table: 0.3ns per call, no branch, no atomic | |
| 70 | +| `amx_matmul.rs` | Not present | **AMX MatMul** | Tile configuration, TDPBUSD via inline asm | |
| 71 | + |
| 72 | +### Encoding + Codec (Cognitive Computing) |
| 73 | + |
| 74 | +| Module | Upstream | **Fork** | Detail | |
| 75 | +|--------|----------|----------|--------| |
| 76 | +| `fingerprint.rs` | Not present | **Fingerprint\<256\>** | 256-bit VSA, XOR bind, Hamming distance (VPOPCNTDQ / vcntq_u8) | |
| 77 | +| `bgz17_bridge.rs` | Not present | **Base17** | 17-dim i16 vectors, L1 distance, sign agreement, xor_bind | |
| 78 | +| `cam_pq.rs` | Not present | **CAM-PQ** | Product quantization, compiled distance tables, IVF index | |
| 79 | +| `cam_index.rs` | Not present | **CAM Index** | Inverted file index for PQ search | |
| 80 | +| `palette_codec.rs` | Not present | **Palette Codec** | 4-bit palette encoding, Minecraft-style chunk compression | |
| 81 | +| `palette_distance.rs` | Not present | **Palette Distance** | 256x256 u8 distance tables, cosine emulation (611M/s) | |
| 82 | +| `zeck.rs` | Not present | **ZeckF64** | Fibonacci/Zeckendorf encoding for sparse representations | |
| 83 | +| `packed.rs` | Not present | **Packed DB** | 64-byte aligned packed storage for SIMD access | |
| 84 | +| `prefilter.rs` | Not present | **INT8 Prefilter** | Approximate statistics for cascade search pruning | |
| 85 | + |
| 86 | +### Byte-Level + Spatial Operations |
| 87 | + |
| 88 | +| Module | Upstream | **Fork** | Detail | |
| 89 | +|--------|----------|----------|--------| |
| 90 | +| `byte_scan.rs` | Not present | **Byte Scan** | AVX-512 byte_find_all/byte_count (VPCMPEQB + KMOV) | |
| 91 | +| `nibble.rs` | Not present | **Nibble Ops** | 4-bit unpack/threshold (AVX2 vpshufb) | |
| 92 | +| `distance.rs` | Not present | **3D Distance** | Squared distance (AVX2 batch) | |
| 93 | +| `spatial_hash.rs` | Not present | **Spatial Hash** | Batch radius query (AVX2 accelerated) | |
| 94 | +| `aabb.rs` | Not present | **AABB** | Axis-aligned bounding box intersection | |
| 95 | +| `bitwise.rs` | Not present | **Bitwise** | XOR, AND, OR, popcount on 8KB+ vectors | |
| 96 | + |
| 97 | +### Search + Trees |
| 98 | + |
| 99 | +| Module | Upstream | **Fork** | Detail | |
| 100 | +|--------|----------|----------|--------| |
| 101 | +| `clam.rs` | Not present | **CLAM Tree** | Build + search + rho_nn (46 tests) | |
| 102 | +| `clam_search.rs` | Not present | **CLAM Search** | k-NN and range search on CLAM index | |
| 103 | +| `clam_compress.rs` | Not present | **CLAM Compress** | Index compression for storage | |
| 104 | +| `cascade.rs` | Not present | **HDR Cascade** | Sigma-band filtering, ranked hits, drift detection | |
| 105 | +| `parallel_search.rs` | Not present | **Parallel Search** | Multi-threaded CLAM search | |
| 106 | +| `dn_tree.rs` | Not present | **DN Tree** | Hierarchical path resolution | |
| 107 | +| `merkle_tree.rs` | Not present | **Merkle Tree** | Hash-based integrity verification | |
| 108 | + |
| 109 | +### Model Inference + AI |
| 110 | + |
| 111 | +| Module | Upstream | **Fork** | Detail | |
| 112 | +|--------|----------|----------|--------| |
| 113 | +| `gguf.rs` | Not present | **GGUF Reader** | GGUF format parser (LLaMA, Qwen, Gemma) | |
| 114 | +| `gguf_indexer.rs` | Not present | **GGUF Indexer** | Build bgz7 codebook index from GGUF weights | |
| 115 | +| `safetensors.rs` | Not present | **Safetensors** | HuggingFace safetensors reader | |
| 116 | +| `gpt2/` (4 files) | Not present | **GPT-2** | Inference engine (weights, layers, API) | |
| 117 | +| `openchat/` (4 files) | Not present | **OpenChat** | Inference engine for OpenChat models | |
| 118 | +| `stable_diffusion/` (7 files) | Not present | **Stable Diffusion** | CLIP, UNet, VAE, scheduler (image generation) | |
| 119 | +| `models/` (5 files) | Not present | **Model Router** | Multi-model router, safetensors loader, layer abstractions | |
| 120 | +| `jina/` (5 files) | Not present | **Jina v5** | Embedding cache, causal attention, codec, runtime | |
| 121 | + |
| 122 | +### Cognitive Primitives |
| 123 | + |
| 124 | +| Module | Upstream | **Fork** | Detail | |
| 125 | +|--------|----------|----------|--------| |
| 126 | +| `nars.rs` | Not present | **NARS** | Non-Axiomatic Reasoning System inference | |
| 127 | +| `qualia.rs` | Not present | **Qualia** | Felt-sense quality encoding | |
| 128 | +| `qualia_gate.rs` | Not present | **Qualia Gate** | Gated operations on quality values | |
| 129 | +| `hdc.rs` | Not present | **HDC** | Hyperdimensional Computing primitives | |
| 130 | +| `vsa.rs` | Not present | **VSA** | Vector Symbolic Architecture operations | |
| 131 | +| `spo_bundle.rs` | Not present | **SPO Bundle** | Subject-Predicate-Object triple encoding | |
| 132 | +| `causality.rs` | Not present | **Causality** | Causal graph operations | |
| 133 | +| `causal_diff.rs` | Not present | **CausalEdge64** | u64-packed causal edges, quality scoring | |
| 134 | +| `bf16_truth.rs` | Not present | **BF16 Truth** | Truth values in BF16 precision | |
| 135 | +| `styles/` (34 files) | Not present | **Thinking Styles** | 34 cognitive primitives: rte, htd, smad, tcp, irs, mcp, tca, cdt, mct, lsi, pso, cdi, cws, are, tcf, ssr, etd, amp, zcf, hpm, cur, mpc, ssam, idr, spp, icr, sdd, dtmf, hkf | |
| 136 | +| `blackboard.rs` | Not present | **Blackboard** | Typed slot arena (zero-copy shared memory) | |
| 137 | +| `node.rs` | Not present | **Node** | Cognitive node representation | |
| 138 | +| `plane.rs` | Not present | **Plane** | 16Kbit representation plane | |
| 139 | +| `seal.rs` | Not present | **Seal** | Immutable snapshot encoding | |
| 140 | +| `substrate.rs` | Not present | **Substrate** | Cognitive substrate operations | |
| 141 | +| `binding_matrix.rs` | Not present | **Binding Matrix** | 3D permutation binding | |
| 142 | +| `cyclic_bundle.rs` | Not present | **Cyclic Bundle** | Cyclic vector bundling | |
| 143 | + |
| 144 | +### JIT Compilation |
| 145 | + |
| 146 | +| Module | Upstream | **Fork** | Detail | |
| 147 | +|--------|----------|----------|--------| |
| 148 | +| `jitson/` (8 files) | Not present | **JITSON** | JSON parser + validator + template + scan pipeline | |
| 149 | +| `jitson_cranelift/` (6 files) | Not present | **Cranelift JIT** | AVX-512 kernel compilation via Cranelift (feature-gated) | |
| 150 | + |
| 151 | +### Audio / OCR / Media |
| 152 | + |
| 153 | +| Module | Upstream | **Fork** | Detail | |
| 154 | +|--------|----------|----------|--------| |
| 155 | +| `holo.rs` | Not present | **Holographic** | Holographic reduced representations, cosine carriers | |
| 156 | +| `ocr_felt.rs` | Not present | **OCR** | Character recognition via felt-sense matching | |
| 157 | +| `ocr_simd.rs` | Not present | **OCR SIMD** | SIMD-accelerated binarization, Otsu threshold, density | |
| 158 | +| `surround_metadata.rs` | Not present | **Surround** | Spatial audio metadata | |
| 159 | +| `crystal_encoder.rs` | Not present | **Crystal** | Crystal symmetry encoding | |
| 160 | + |
| 161 | +### Miscellaneous |
| 162 | + |
| 163 | +| Module | Upstream | **Fork** | Detail | |
| 164 | +|--------|----------|----------|--------| |
| 165 | +| `arrow_bridge.rs` | Not present | **Arrow** | Apache Arrow zero-copy bridge | |
| 166 | +| `bnn.rs` | Not present | **BNN** | Binary Neural Network operations | |
| 167 | +| `bnn_causal_trajectory.rs` | Not present | **BNN Causal** | Causal trajectory tracking | |
| 168 | +| `bnn_cross_plane.rs` | Not present | **BNN Cross-Plane** | Cross-plane BNN operations | |
| 169 | +| `cogrecord.rs` | Not present | **CogRecord** | 4×16KB cognitive record unit | |
| 170 | +| `compression_curves.rs` | Not present | **Compression** | Rate-distortion curve analysis | |
| 171 | +| `graph.rs` | Not present | **Graph** | Basic graph operations | |
| 172 | +| `heel_f64x8.rs` | Not present | **F64x8 Kernels** | SIMD dot product, cosine similarity | |
| 173 | +| `http_reader.rs` | Not present | **HTTP Reader** | Stream weights from HTTP | |
| 174 | +| `kernels.rs` | Not present | **SIMD Kernels** | Generic SIMD apply/map/reduce | |
| 175 | +| `layered_distance.rs` | Not present | **Layered Distance** | Multi-layer distance computation | |
| 176 | +| `organic.rs` | Not present | **Organic** | Organic growth patterns | |
| 177 | +| `p64_bridge.rs` | Not present | **P64 Bridge** | Palette64 convergence point (ndarray <-> lance-graph) | |
| 178 | +| `projection.rs` | Not present | **Projection** | Dimensionality reduction | |
| 179 | +| `property_mask.rs` | Not present | **Property Mask** | Bitwise property filtering | |
| 180 | +| `tekamolo.rs` | Not present | **Tekamolo** | Syntactic position encoding | |
| 181 | +| `udf_kernels.rs` | Not present | **UDF Kernels** | User-defined function dispatch | |
| 182 | +| `deepnsm.rs` | Not present | **DeepNSM** | Distributional semantic bridge | |
| 183 | + |
| 184 | +## Subcrates (2 crates) |
| 185 | + |
| 186 | +| Crate | Upstream | **Fork** | Detail | |
| 187 | +|-------|----------|----------|--------| |
| 188 | +| `crates/p64` | Not present | **P64** | Palette64 data structure — convergence highway between ndarray and lance-graph | |
| 189 | +| `crates/phyllotactic-manifold` | Not present | **Phyllotactic Manifold** | Golden-angle spiral geometry for uniform point distribution | |
| 190 | + |
| 191 | +## Burn Backend (20 ops files) |
| 192 | + |
| 193 | +| Component | Upstream | **Fork** | Detail | |
| 194 | +|-----------|----------|----------|--------| |
| 195 | +| `crates/burn/` | Not present | **burn-ndarray** | SIMD-augmented burn backend (from tracel-ai/burn v0.21.0) | |
| 196 | +| `ops/tensor.rs` | — | **try_vml_unary** | Routes f32 unary ops through ndarray hpc::vml (F32x16 SIMD) | |
| 197 | +| `ops/activation.rs` | — | **Fused sigmoid** | SIMD-accelerated activation functions | |
| 198 | +| `ops/matmul.rs` | — | **GEMM dispatch** | Routes to our Goto-algorithm GEMM | |
| 199 | +| Remaining 17 ops files | — | **Standard burn ops** | conv, pooling, interpolate, quantization, etc. | |
| 200 | + |
| 201 | +## Summary |
| 202 | + |
| 203 | +| Category | Upstream Count | **Fork Count** | New | |
| 204 | +|----------|---------------|----------------|-----| |
| 205 | +| SIMD type files | 0 | 6 | +6 | |
| 206 | +| Backend files | 0 | 5 | +5 | |
| 207 | +| HPC modules | 0 | 146 | +146 | |
| 208 | +| Burn ops | 0 | 20 | +20 | |
| 209 | +| Subcrates | 0 | 2 | +2 | |
| 210 | +| **Total new files** | — | — | **179** | |
| 211 | +| **Total new LOC** | — | — | **80,131** | |
| 212 | +| **Total new tests** | — | — | **~880** | |
0 commit comments