Skip to content

Commit 3520b70

Browse files
authored
Merge pull request #94 from AdaWorldAPI/claude/setup-rust-smart-home-SOPAY
Claude/setup rust smart home sopay
2 parents 1c6f8ef + 5c8b3f4 commit 3520b70

3 files changed

Lines changed: 509 additions & 151 deletions

File tree

COMPARISON.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Complete Feature Comparison: rust-ndarray vs. AdaWorldAPI Fork
2+
3+
> 80,131 lines of new code across 146 HPC modules, 6 SIMD files, 5 backend files, 20 burn ops, and 2 subcrates.
4+
5+
## At a Glance
6+
7+
| Metric | Upstream [rust-ndarray/ndarray](https://github.com/rust-ndarray/ndarray) | **[AdaWorldAPI/ndarray](https://github.com/AdaWorldAPI/ndarray)** |
8+
|--------|-----------|------------|
9+
| Base functionality | n-dimensional arrays, slicing, views | **Same** (full upstream preserved) |
10+
| New LOC added || **80,131** |
11+
| New files || **179** (146 HPC + 6 SIMD + 5 backend + 20 burn + 2 subcrates) |
12+
| Test count | ~300 | **~1,180** (300 upstream + 880 new) |
13+
| SIMD ISAs | SSE2 via matrixmultiply (external) | **7 ISAs**: AVX-512, AVX2, SSE2, AMX, VNNI, NEON (3 tiers), WASM |
14+
| Numeric types | f32, f64 | **+f16, BF16, i8, u8, i16** (all with SIMD paths) |
15+
| BLAS coverage | dot (via matrixmultiply) | **Full L1 + L2 + L3** (pure-Rust + MKL + OpenBLAS) |
16+
| Target platforms | x86_64 (via external BLAS), scalar everywhere else | **x86_64 (tiered), aarch64 (3-tier NEON), wasm (prepared)** |
17+
| Minimum Rust | 1.64 | **1.94 stable** (no nightly) |
18+
19+
---
20+
21+
## SIMD Layer (6 files, ~5,700 LOC)
22+
23+
| Component | Upstream | **Fork** |
24+
|-----------|----------|----------|
25+
| `simd.rs` — dispatch + re-exports | Not present | **LazyLock tier detection, PREFERRED_LANES, type re-exports** |
26+
| `simd_avx512.rs` — 512-bit types | Not present | **11 types: F32x16, F64x8, U8x64, I32x16, I64x8, U32x16, U64x8, F32x8, F64x4, BF16x16, BF16x8 + F16 IEEE 754** (2,700 LOC) |
27+
| `simd_avx2.rs` — 256-bit ops | Not present | **BLAS L1, Hamming, i8 dot, popcount, F16 precision toolkit** (1,600 LOC) |
28+
| `simd_neon.rs` — ARM 128-bit | Not present | **3-tier NEON: A53 baseline, A72 dual-pipe, A76 dotprod+fp16; codebook gather, Hamming, Base17 L1** (500 LOC) |
29+
| `simd_amx.rs` — Intel tile matrix | Not present | **AMX detection (CPUID+XCR0), VNNI 512/256, MatVec dispatch, quantize/dequantize** (350 LOC) |
30+
| `simd_wasm.rs` — WebAssembly | Not present | **Scaffolding for WASM SIMD128** |
31+
32+
## Backend Layer (5 files, ~2,000 LOC)
33+
34+
| Component | Upstream | **Fork** |
35+
|-----------|----------|----------|
36+
| `backend/mod.rs` — BlasFloat trait | Not present | **Trait-based dispatch: Native / MKL / OpenBLAS** |
37+
| `backend/native.rs` — pure-Rust GEMM | Not present | **Goto-algorithm 6x16/6x8 microkernels, cache-blocked (L1/L2/L3), AVX-512+AVX2 dispatch** |
38+
| `backend/kernels_avx512.rs` | Not present | **AVX-512 SIMD GEMM kernels** |
39+
| `backend/mkl.rs` | Not present | **Intel MKL FFI (feature = "intel-mkl")** |
40+
| `backend/openblas.rs` | Not present | **OpenBLAS FFI (feature = "openblas")** |
41+
| GEMM throughput (1024x1024) | ~13 GFLOPS (via matrixmultiply) | **139 GFLOPS** (10.5x improvement) |
42+
43+
## HPC Module Library (146 files, ~70,000 LOC, 880 tests)
44+
45+
### Linear Algebra (BLAS + LAPACK)
46+
47+
| Module | Upstream | **Fork** | Operations |
48+
|--------|----------|----------|------------|
49+
| `blas_level1.rs` | dot only (external) | **Full** | dot, axpy, scal, nrm2, asum, iamax, Givens rotation |
50+
| `blas_level2.rs` | Not present | **Full** | gemv, ger, symv, trmv, trsv |
51+
| `blas_level3.rs` | dot→gemm (external) | **Goto GEMM** | gemm, syrk, trsm, symm (cache-blocked, multithreaded) |
52+
| `quantized.rs` | Not present | **New** | BF16 GEMM, INT8 GEMM, quantize/dequantize |
53+
| `lapack.rs` | Not present | **New** | LU, Cholesky, QR factorization |
54+
55+
### Signal Processing
56+
57+
| Module | Upstream | **Fork** | Detail |
58+
|--------|----------|----------|--------|
59+
| `fft.rs` | Not present | **Cooley-Tukey** | Radix-2 FFT/IFFT, in-place |
60+
| `vml.rs` | Not present | **Vector Math** | exp, ln, sqrt, erf, cbrt, sin, cos (SIMD F32x16 paths) |
61+
| `statistics.rs` | Not present | **Statistics** | median, variance, std, percentile, top_k |
62+
| `activations.rs` | Not present | **Neural Net** | sigmoid, softmax, log_softmax, GELU, SiLU (fused SIMD) |
63+
64+
### Hardware Detection + Dispatch
65+
66+
| Module | Upstream | **Fork** | Detail |
67+
|--------|----------|----------|--------|
68+
| `simd_caps.rs` | Not present | **SimdCaps** | LazyLock detection: AVX-512/AVX2/SSE2/FMA/NEON/dotprod/fp16/aes/sha2/crc32 + **ArmProfile** (A53/A72/A76) |
69+
| `simd_dispatch.rs` | Not present | **SimdDispatch** | Frozen fn-pointer table: 0.3ns per call, no branch, no atomic |
70+
| `amx_matmul.rs` | Not present | **AMX MatMul** | Tile configuration, TDPBUSD via inline asm |
71+
72+
### Encoding + Codec (Cognitive Computing)
73+
74+
| Module | Upstream | **Fork** | Detail |
75+
|--------|----------|----------|--------|
76+
| `fingerprint.rs` | Not present | **Fingerprint\<256\>** | 256-bit VSA, XOR bind, Hamming distance (VPOPCNTDQ / vcntq_u8) |
77+
| `bgz17_bridge.rs` | Not present | **Base17** | 17-dim i16 vectors, L1 distance, sign agreement, xor_bind |
78+
| `cam_pq.rs` | Not present | **CAM-PQ** | Product quantization, compiled distance tables, IVF index |
79+
| `cam_index.rs` | Not present | **CAM Index** | Inverted file index for PQ search |
80+
| `palette_codec.rs` | Not present | **Palette Codec** | 4-bit palette encoding, Minecraft-style chunk compression |
81+
| `palette_distance.rs` | Not present | **Palette Distance** | 256x256 u8 distance tables, cosine emulation (611M/s) |
82+
| `zeck.rs` | Not present | **ZeckF64** | Fibonacci/Zeckendorf encoding for sparse representations |
83+
| `packed.rs` | Not present | **Packed DB** | 64-byte aligned packed storage for SIMD access |
84+
| `prefilter.rs` | Not present | **INT8 Prefilter** | Approximate statistics for cascade search pruning |
85+
86+
### Byte-Level + Spatial Operations
87+
88+
| Module | Upstream | **Fork** | Detail |
89+
|--------|----------|----------|--------|
90+
| `byte_scan.rs` | Not present | **Byte Scan** | AVX-512 byte_find_all/byte_count (VPCMPEQB + KMOV) |
91+
| `nibble.rs` | Not present | **Nibble Ops** | 4-bit unpack/threshold (AVX2 vpshufb) |
92+
| `distance.rs` | Not present | **3D Distance** | Squared distance (AVX2 batch) |
93+
| `spatial_hash.rs` | Not present | **Spatial Hash** | Batch radius query (AVX2 accelerated) |
94+
| `aabb.rs` | Not present | **AABB** | Axis-aligned bounding box intersection |
95+
| `bitwise.rs` | Not present | **Bitwise** | XOR, AND, OR, popcount on 8KB+ vectors |
96+
97+
### Search + Trees
98+
99+
| Module | Upstream | **Fork** | Detail |
100+
|--------|----------|----------|--------|
101+
| `clam.rs` | Not present | **CLAM Tree** | Build + search + rho_nn (46 tests) |
102+
| `clam_search.rs` | Not present | **CLAM Search** | k-NN and range search on CLAM index |
103+
| `clam_compress.rs` | Not present | **CLAM Compress** | Index compression for storage |
104+
| `cascade.rs` | Not present | **HDR Cascade** | Sigma-band filtering, ranked hits, drift detection |
105+
| `parallel_search.rs` | Not present | **Parallel Search** | Multi-threaded CLAM search |
106+
| `dn_tree.rs` | Not present | **DN Tree** | Hierarchical path resolution |
107+
| `merkle_tree.rs` | Not present | **Merkle Tree** | Hash-based integrity verification |
108+
109+
### Model Inference + AI
110+
111+
| Module | Upstream | **Fork** | Detail |
112+
|--------|----------|----------|--------|
113+
| `gguf.rs` | Not present | **GGUF Reader** | GGUF format parser (LLaMA, Qwen, Gemma) |
114+
| `gguf_indexer.rs` | Not present | **GGUF Indexer** | Build bgz7 codebook index from GGUF weights |
115+
| `safetensors.rs` | Not present | **Safetensors** | HuggingFace safetensors reader |
116+
| `gpt2/` (4 files) | Not present | **GPT-2** | Inference engine (weights, layers, API) |
117+
| `openchat/` (4 files) | Not present | **OpenChat** | Inference engine for OpenChat models |
118+
| `stable_diffusion/` (7 files) | Not present | **Stable Diffusion** | CLIP, UNet, VAE, scheduler (image generation) |
119+
| `models/` (5 files) | Not present | **Model Router** | Multi-model router, safetensors loader, layer abstractions |
120+
| `jina/` (5 files) | Not present | **Jina v5** | Embedding cache, causal attention, codec, runtime |
121+
122+
### Cognitive Primitives
123+
124+
| Module | Upstream | **Fork** | Detail |
125+
|--------|----------|----------|--------|
126+
| `nars.rs` | Not present | **NARS** | Non-Axiomatic Reasoning System inference |
127+
| `qualia.rs` | Not present | **Qualia** | Felt-sense quality encoding |
128+
| `qualia_gate.rs` | Not present | **Qualia Gate** | Gated operations on quality values |
129+
| `hdc.rs` | Not present | **HDC** | Hyperdimensional Computing primitives |
130+
| `vsa.rs` | Not present | **VSA** | Vector Symbolic Architecture operations |
131+
| `spo_bundle.rs` | Not present | **SPO Bundle** | Subject-Predicate-Object triple encoding |
132+
| `causality.rs` | Not present | **Causality** | Causal graph operations |
133+
| `causal_diff.rs` | Not present | **CausalEdge64** | u64-packed causal edges, quality scoring |
134+
| `bf16_truth.rs` | Not present | **BF16 Truth** | Truth values in BF16 precision |
135+
| `styles/` (34 files) | Not present | **Thinking Styles** | 34 cognitive primitives: rte, htd, smad, tcp, irs, mcp, tca, cdt, mct, lsi, pso, cdi, cws, are, tcf, ssr, etd, amp, zcf, hpm, cur, mpc, ssam, idr, spp, icr, sdd, dtmf, hkf |
136+
| `blackboard.rs` | Not present | **Blackboard** | Typed slot arena (zero-copy shared memory) |
137+
| `node.rs` | Not present | **Node** | Cognitive node representation |
138+
| `plane.rs` | Not present | **Plane** | 16Kbit representation plane |
139+
| `seal.rs` | Not present | **Seal** | Immutable snapshot encoding |
140+
| `substrate.rs` | Not present | **Substrate** | Cognitive substrate operations |
141+
| `binding_matrix.rs` | Not present | **Binding Matrix** | 3D permutation binding |
142+
| `cyclic_bundle.rs` | Not present | **Cyclic Bundle** | Cyclic vector bundling |
143+
144+
### JIT Compilation
145+
146+
| Module | Upstream | **Fork** | Detail |
147+
|--------|----------|----------|--------|
148+
| `jitson/` (8 files) | Not present | **JITSON** | JSON parser + validator + template + scan pipeline |
149+
| `jitson_cranelift/` (6 files) | Not present | **Cranelift JIT** | AVX-512 kernel compilation via Cranelift (feature-gated) |
150+
151+
### Audio / OCR / Media
152+
153+
| Module | Upstream | **Fork** | Detail |
154+
|--------|----------|----------|--------|
155+
| `holo.rs` | Not present | **Holographic** | Holographic reduced representations, cosine carriers |
156+
| `ocr_felt.rs` | Not present | **OCR** | Character recognition via felt-sense matching |
157+
| `ocr_simd.rs` | Not present | **OCR SIMD** | SIMD-accelerated binarization, Otsu threshold, density |
158+
| `surround_metadata.rs` | Not present | **Surround** | Spatial audio metadata |
159+
| `crystal_encoder.rs` | Not present | **Crystal** | Crystal symmetry encoding |
160+
161+
### Miscellaneous
162+
163+
| Module | Upstream | **Fork** | Detail |
164+
|--------|----------|----------|--------|
165+
| `arrow_bridge.rs` | Not present | **Arrow** | Apache Arrow zero-copy bridge |
166+
| `bnn.rs` | Not present | **BNN** | Binary Neural Network operations |
167+
| `bnn_causal_trajectory.rs` | Not present | **BNN Causal** | Causal trajectory tracking |
168+
| `bnn_cross_plane.rs` | Not present | **BNN Cross-Plane** | Cross-plane BNN operations |
169+
| `cogrecord.rs` | Not present | **CogRecord** | 4×16KB cognitive record unit |
170+
| `compression_curves.rs` | Not present | **Compression** | Rate-distortion curve analysis |
171+
| `graph.rs` | Not present | **Graph** | Basic graph operations |
172+
| `heel_f64x8.rs` | Not present | **F64x8 Kernels** | SIMD dot product, cosine similarity |
173+
| `http_reader.rs` | Not present | **HTTP Reader** | Stream weights from HTTP |
174+
| `kernels.rs` | Not present | **SIMD Kernels** | Generic SIMD apply/map/reduce |
175+
| `layered_distance.rs` | Not present | **Layered Distance** | Multi-layer distance computation |
176+
| `organic.rs` | Not present | **Organic** | Organic growth patterns |
177+
| `p64_bridge.rs` | Not present | **P64 Bridge** | Palette64 convergence point (ndarray <-> lance-graph) |
178+
| `projection.rs` | Not present | **Projection** | Dimensionality reduction |
179+
| `property_mask.rs` | Not present | **Property Mask** | Bitwise property filtering |
180+
| `tekamolo.rs` | Not present | **Tekamolo** | Syntactic position encoding |
181+
| `udf_kernels.rs` | Not present | **UDF Kernels** | User-defined function dispatch |
182+
| `deepnsm.rs` | Not present | **DeepNSM** | Distributional semantic bridge |
183+
184+
## Subcrates (2 crates)
185+
186+
| Crate | Upstream | **Fork** | Detail |
187+
|-------|----------|----------|--------|
188+
| `crates/p64` | Not present | **P64** | Palette64 data structure — convergence highway between ndarray and lance-graph |
189+
| `crates/phyllotactic-manifold` | Not present | **Phyllotactic Manifold** | Golden-angle spiral geometry for uniform point distribution |
190+
191+
## Burn Backend (20 ops files)
192+
193+
| Component | Upstream | **Fork** | Detail |
194+
|-----------|----------|----------|--------|
195+
| `crates/burn/` | Not present | **burn-ndarray** | SIMD-augmented burn backend (from tracel-ai/burn v0.21.0) |
196+
| `ops/tensor.rs` || **try_vml_unary** | Routes f32 unary ops through ndarray hpc::vml (F32x16 SIMD) |
197+
| `ops/activation.rs` || **Fused sigmoid** | SIMD-accelerated activation functions |
198+
| `ops/matmul.rs` || **GEMM dispatch** | Routes to our Goto-algorithm GEMM |
199+
| Remaining 17 ops files || **Standard burn ops** | conv, pooling, interpolate, quantization, etc. |
200+
201+
## Summary
202+
203+
| Category | Upstream Count | **Fork Count** | New |
204+
|----------|---------------|----------------|-----|
205+
| SIMD type files | 0 | 6 | +6 |
206+
| Backend files | 0 | 5 | +5 |
207+
| HPC modules | 0 | 146 | +146 |
208+
| Burn ops | 0 | 20 | +20 |
209+
| Subcrates | 0 | 2 | +2 |
210+
| **Total new files** ||| **179** |
211+
| **Total new LOC** ||| **80,131** |
212+
| **Total new tests** ||| **~880** |

0 commit comments

Comments
 (0)