Skip to content

Commit 94660bd

Browse files
authored
Merge pull request #270 from AdaWorldAPI/claude/remove-typos-ci
ci: remove typos spell-check job (too many false positives)
2 parents 6311b52 + d419ef6 commit 94660bd

124 files changed

Lines changed: 1387 additions & 442 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/board/EPIPHANIES.md

Lines changed: 580 additions & 0 deletions
Large diffs are not rendered by default.

.claude/board/TECH_DEBT.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1124,3 +1124,65 @@ Estimated 100× speedup for encoding (O(1) table lookup vs O(256) L1 per query).
11241124
- **TD-DIST-3** (Palette distance table): `Palette::build_distance_table()`
11251125
`PaletteDistanceTable` with O(1) `distance(a, b)` and `edge_distance(a, b)`.
11261126
128 KB table, L2-resident. Status: **PAID**.
1127+
1128+
## 2026-04-26 — TD-PALETTE-SENTINEL: 257th sentinel slot in palette distance/compose tables
1129+
1130+
**Status:** Open (low priority — historical aspirational design, no current need)
1131+
1132+
The 2026-04-20 resolution-hierarchy epiphany described the bgz17 HIP layer
1133+
as `256×257` (256 archetypes + 1 sentinel). Implementation shipped `k×k`
1134+
without the sentinel. See EPIPHANIES.md 2026-04-26 CORRECTION for full
1135+
context.
1136+
1137+
**Why deferred:**
1138+
- Adding a 257th index requires widening palette indices from `u8` to `u16`
1139+
- `PaletteEdge` wire format doubles from 3 bytes to 6 bytes per edge
1140+
- `MAX_PALETTE_SIZE = 256` is a deliberate u8-ceiling design choice
1141+
- The three sentinel roles (unknown/null/identity) are already covered by
1142+
existing mechanisms: `Palette::nearest()` clamps unknowns, `identity()`
1143+
returns the closest-to-zero archetype.
1144+
1145+
**Revisit when:** a real "absent edge" code path materializes (e.g., a
1146+
sparse mxm that needs to distinguish "no relation" from "relation = 0
1147+
distance"), or when the palette grows beyond 256 entries (which would
1148+
also force u16 indices).
1149+
1150+
## 2026-04-26 — TD-AWARENESS-INLINE-1: awareness should be BF16-mantissa-inline, not driver-global
1151+
1152+
**Status:** Open (P-0 architectural, scope: substrate-wide)
1153+
1154+
Per EPIPHANIES.md 2026-04-26 "awareness should be BF16-mantissa-inline":
1155+
the current `ShaderDriver.awareness: RwLock<Vec<GrammarStyleAwareness>>`
1156+
is driver-global and separate from the stream. This wastes the CPU's
1157+
20-200 ns random-access advantage and recreates the parser/processor
1158+
split that AGI is supposed to dissolve.
1159+
1160+
**The correct shape:** every stream operation returns `(value, awareness)`,
1161+
where awareness (7-8 bits, BF16-mantissa-equivalent) is derived inline
1162+
from operation properties (bit-purity, distribution shape, residual norm,
1163+
match strength). Awareness composes through the cascade the same way
1164+
values compose.
1165+
1166+
**Wedge for the smallest viable adoption:**
1167+
1. Extend `contract::distance::Distance` with
1168+
`distance_with_awareness(&self, other) -> (u32, u8)`. 8 bits per
1169+
measurement; 11% overhead vs raw distance.
1170+
2. Add `Aware` trait and `Annotated<T>` to contract.
1171+
3. Implement awareness derivation for the four primary operations:
1172+
`vsa_bind`, `vsa_bundle`, `hamming`, `cosine`.
1173+
4. Update `ShaderDriver::dispatch` to compose inline awareness over
1174+
the cascade. The driver-global `GrammarStyleAwareness` becomes a
1175+
bootstrap seed, not the per-cycle source of truth.
1176+
1177+
**Size budget:** 11-12% overhead on stream payloads (vs 43.75% for
1178+
BF16 mantissa as a fraction of value), because the value plane is
1179+
much wider here than in floating-point.
1180+
1181+
**Why deferred:** scope is substrate-wide. Touches the contract
1182+
Distance trait (just shipped TD-DIST-1), every SIMD operation in
1183+
ndarray::hpc, the shader driver's cascade, and the BindSpace SoA.
1184+
Should be designed as one coherent commit, not piecemeal.
1185+
1186+
**Revisit when:** the next architectural sweep covers the awareness
1187+
dimension. Until then, awareness stays driver-global. The epiphany
1188+
documents the correct direction so future work doesn't re-derive it.

.github/workflows/style.yml

Lines changed: 65 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -19,29 +19,80 @@ env:
1919
RUSTFLAGS: "-C debuginfo=1 -C target-cpu=x86-64-v3"
2020

2121
jobs:
22+
# Clippy runs FIRST and is mandatory — logical soundness before syntax.
23+
# Discipline:
24+
# - NEVER use `clippy --fix` for unused-import warnings; they signal
25+
# missing wiring, not dead code. Fix the wiring or add `#[allow]`
26+
# with a comment explaining why.
27+
# - Each clippy violation is owned by the author of the code that
28+
# introduced it; resolve manually.
29+
# - Run clippy in batches (per-feature combo), not after every file edit.
30+
clippy:
31+
runs-on: ubuntu-24.04
32+
timeout-minutes: 25
33+
defaults:
34+
run:
35+
working-directory: lance-graph
36+
steps:
37+
- uses: actions/checkout@v4
38+
with:
39+
path: lance-graph
40+
- name: Checkout AdaWorldAPI/ndarray (sibling dependency)
41+
uses: actions/checkout@v4
42+
with:
43+
repository: AdaWorldAPI/ndarray
44+
path: ndarray
45+
- name: Setup rust toolchain
46+
run: |
47+
rustup toolchain install stable
48+
rustup default stable
49+
rustup component add clippy
50+
- uses: Swatinem/rust-cache@v2
51+
with:
52+
shared-key: "lance-graph-deps"
53+
workspaces: lance-graph/crates/lance-graph
54+
- name: Install dependencies
55+
run: |
56+
sudo apt update
57+
sudo apt install -y protobuf-compiler
58+
# Clippy is gated tier-by-tier as the codebase incrementally adopts it.
59+
# PRs that touch a new crate own that crate's clippy debt before merging.
60+
#
61+
# Tier A (mandatory, gating): zero-dep contract crate
62+
- name: Clippy contract (zero-dep, mandatory)
63+
run: cargo clippy --manifest-path crates/lance-graph-contract/Cargo.toml --lib --tests -- -D warnings
64+
# Tier B (advisory until incrementally cleaned, non-gating):
65+
# lance-graph core has ~91 pre-existing clippy violations to be paid down
66+
# in subsequent PRs (TD-CLIPPY-LG-1). Don't auto-fix — each violation
67+
# is a wiring/refactor decision owned by the introducing author.
68+
- name: Clippy lance-graph (advisory)
69+
continue-on-error: true
70+
run: cargo clippy --manifest-path crates/lance-graph/Cargo.toml --lib --tests -- -D warnings
71+
2272
format:
2373
runs-on: ubuntu-24.04
2474
timeout-minutes: 15
75+
needs: clippy
76+
defaults:
77+
run:
78+
working-directory: lance-graph
2579
steps:
2680
- uses: actions/checkout@v4
81+
with:
82+
path: lance-graph
83+
- name: Checkout AdaWorldAPI/ndarray (sibling dependency for cargo metadata)
84+
uses: actions/checkout@v4
85+
with:
86+
repository: AdaWorldAPI/ndarray
87+
path: ndarray
2788
- uses: actions-rust-lang/setup-rust-toolchain@v1
2889
with:
2990
components: rustfmt
3091
- name: Check formatting
3192
run: cargo fmt --manifest-path crates/lance-graph/Cargo.toml -- --check
3293

33-
# clippy: runs LOCALLY as our internal pre-check, not on GitHub CI.
34-
# GitHub CI focuses on compile + test + format + typos.
35-
# Clippy discipline documented in CODING_PRACTICES.md:
36-
#
37-
# cargo clippy --features lab -- -D warnings
38-
# cargo clippy --features serve -- -D warnings
94+
# typos / spell-check removed 2026-04-26: too many false positives on
95+
# technical jargon (NARS terms, codec acronyms, German loanwords used in
96+
# the cognitive stack). Spelling discipline is a code-review concern,
97+
# not a CI gate.
3998

40-
typos:
41-
name: Spell Check
42-
runs-on: ubuntu-24.04
43-
steps:
44-
- name: Checkout
45-
uses: actions/checkout@v4
46-
- name: Check spelling
47-
uses: crate-ci/typos@v1.26.0

crates/bgz-tensor/src/adaptive_codec.rs

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
//! After quantization, GPTQ-style Hessian compensation adjusts remaining
1010
//! weights to minimize output error (not weight error).
1111
12+
// Cluster used by future per-cluster anomaly reporting
13+
#[allow(unused_imports)]
1214
use ndarray::hpc::clam::{ClamTree, Cluster};
1315
use ndarray::hpc::fft::wht_f32;
1416
use ndarray::hpc::quantized::{
@@ -18,6 +20,8 @@ use ndarray::hpc::quantized::{
1820
QuantParams,
1921
};
2022
use ndarray::hpc::cam_pq::kmeans;
23+
// cosine_f32_to_f64_simd used by tests and future GPTQ compensation
24+
#[allow(unused_imports)]
2125
use ndarray::hpc::heel_f64x8::cosine_f32_to_f64_simd;
2226
use crate::stacked_n::{bf16_to_f32, f32_to_bf16};
2327

@@ -75,7 +79,7 @@ fn hadamard_rotate(v: &[f32], dim: usize) -> Vec<f32> {
7579
fn rows_to_fingerprint_bytes(rows: &[Vec<f32>]) -> (Vec<u8>, usize) {
7680
if rows.is_empty() { return (vec![], 0); }
7781
let dim = rows[0].len();
78-
let fp_bytes = (dim + 7) / 8;
82+
let fp_bytes = dim.div_ceil(8);
7983
let mut flat = vec![0u8; rows.len() * fp_bytes];
8084
for (ri, row) in rows.iter().enumerate() {
8185
for (i, &v) in row.iter().enumerate() {
@@ -107,8 +111,8 @@ fn classify_rows_by_lfd(tree: &ClamTree) -> Vec<RowPrecision> {
107111
// Bottom 70% → i4+i2 (regular, well-clustered)
108112
let mut sorted_lfd: Vec<f64> = row_lfd.clone();
109113
sorted_lfd.sort_by(|a, b| a.partial_cmp(b).unwrap());
110-
let p70 = sorted_lfd[n * 70 / 100.max(1)];
111-
let p90 = sorted_lfd[n * 90 / 100.max(1)];
114+
let p70 = sorted_lfd[n * 70 / 100];
115+
let p90 = sorted_lfd[n * 90 / 100];
112116

113117
row_lfd.iter().map(|&lfd| {
114118
if lfd > p90 { RowPrecision::Passthrough }
@@ -123,7 +127,7 @@ impl AdaptiveCodecTensor {
123127
rows: &[Vec<f32>],
124128
k: usize,
125129
is_kv_proj: bool,
126-
calibration_inputs: Option<&[Vec<f32>]>,
130+
_calibration_inputs: Option<&[Vec<f32>]>,
127131
) -> Self {
128132
let n = rows.len();
129133
let n_cols = if n > 0 { rows[0].len() } else { 0 };

crates/bgz-tensor/src/attention.rs

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,9 @@ impl AttentionTable {
126126
let n_k = k_indices.len();
127127
let mut scores = vec![0u16; n_q * n_k];
128128

129-
for i in 0..n_q {
130-
for j in 0..n_k {
131-
scores[i * n_k + j] = self.distance(q_indices[i], k_indices[j]);
129+
for (i, &qi) in q_indices.iter().enumerate().take(n_q) {
130+
for (j, &kj) in k_indices.iter().enumerate().take(n_k) {
131+
scores[i * n_k + j] = self.distance(qi, kj);
132132
}
133133
}
134134

@@ -152,9 +152,9 @@ impl AttentionTable {
152152
let n_k = k_indices.len();
153153
let mut sparse = Vec::new();
154154

155-
for i in 0..n_q {
156-
for j in 0..n_k {
157-
let d = self.distance(q_indices[i], k_indices[j]);
155+
for (i, &qi) in q_indices.iter().enumerate().take(n_q) {
156+
for (j, &kj) in k_indices.iter().enumerate().take(n_k) {
157+
let d = self.distance(qi, kj);
158158
if d < threshold {
159159
sparse.push((i, j, d));
160160
}

crates/bgz-tensor/src/belichtungsmesser.rs

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -59,10 +59,10 @@ impl Belichtungsmesser {
5959
// 12 quarter-sigma bands centered on mean
6060
// Band edges: μ - 3σ, μ - 2.5σ, μ - 2σ, ..., μ + 2.5σ, μ + 3σ
6161
let mut edges = [0u32; N_BANDS + 1];
62-
for i in 0..=N_BANDS {
62+
for (i, edge) in edges.iter_mut().enumerate().take(N_BANDS + 1) {
6363
let offset = -3.0 + i as f64 * 0.5; // -3σ to +3σ in 0.5σ steps
6464
let val = (mean + offset * sigma).max(0.0);
65-
edges[i] = val as u32;
65+
*edge = val as u32;
6666
}
6767
// Last edge extends to max
6868
edges[N_BANDS] = u32::MAX;
@@ -79,8 +79,8 @@ impl Belichtungsmesser {
7979
}
8080

8181
let mut bands = [Band { lo: 0, hi: 0, density: 0.0 }; N_BANDS];
82-
for b in 0..N_BANDS {
83-
bands[b] = Band {
82+
for (b, band) in bands.iter_mut().enumerate().take(N_BANDS) {
83+
*band = Band {
8484
lo: edges[b],
8585
hi: edges[b + 1],
8686
density: counts[b] as f32 / n as f32,
@@ -93,8 +93,8 @@ impl Belichtungsmesser {
9393
/// Default bands when no calibration data is available.
9494
fn default_bands() -> Self {
9595
let mut bands = [Band { lo: 0, hi: 0, density: 0.0 }; N_BANDS];
96-
for b in 0..N_BANDS {
97-
bands[b] = Band {
96+
for (b, band) in bands.iter_mut().enumerate().take(N_BANDS) {
97+
*band = Band {
9898
lo: b as u32 * 1000,
9999
hi: (b as u32 + 1) * 1000,
100100
density: 1.0 / N_BANDS as f32,

crates/bgz-tensor/src/codebook4096.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@
1414
//! ```
1515
1616
use crate::stacked::StackedBF16x4;
17+
// BASE_DIM and Base17 reserved for future PCDVQ-weighted distance
18+
#[allow(unused_imports)]
1719
use crate::projection::{BASE_DIM, Base17};
1820

1921
/// A 12-bit codebook index: cluster(6) + entry(6) = 4096 entries.

crates/bgz-tensor/src/codebook_calibrated.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,11 @@
1212
//! - Highlight compression for large-magnitude roles (Gate)
1313
//! - 28 bytes metadata per model for exact decode
1414
15+
// StackedN reserved for future stacked-resolution codebook path
16+
#[allow(unused_imports)]
1517
use crate::stacked_n::{StackedN, cosine_f32_slice};
18+
// gamma_phi_encode/decode reserved for future per-codebook calibration path
19+
#[allow(unused_imports)]
1620
use crate::gamma_phi::{GammaProfile, calibrate_gamma, gamma_phi_encode, gamma_phi_decode};
1721
use std::f64::consts::GOLDEN_RATIO;
1822

@@ -191,7 +195,7 @@ fn gamma_phi_cosine_to_u8(
191195
min_cos: f64,
192196
max_cos: f64,
193197
role_gamma: f32,
194-
phi_scale: f32,
198+
_phi_scale: f32,
195199
) -> u8 {
196200
// Normalize cosine to [0, 1]
197201
let range = (max_cos - min_cos).max(1e-10);

crates/bgz-tensor/src/euler_fold.rs

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
//! Recovery quality depends on SNR = √(d×SPD / N_members).
1313
//! At SPD=32, d=17: SNR(N=6) ≈ 9.5 → expected Pearson ~0.96
1414
15+
// cosine_f32_slice reserved for future fold quality measurement
16+
#[allow(unused_imports)]
1517
use crate::stacked_n::{StackedN, bf16_to_f32, f32_to_bf16, cosine_f32_slice};
1618

1719
/// Euler-Mascheroni constant γ ≈ 0.5772156649...
@@ -219,7 +221,7 @@ pub fn euler_gamma_fold(members: &[Vec<f32>], spd: usize) -> FoldedFamily {
219221
.map(|&v| f32_to_bf16(v as f32))
220222
.collect();
221223

222-
let mut folded = StackedN {
224+
let folded = StackedN {
223225
samples_per_dim: spd,
224226
data: folded_bf16,
225227
};
@@ -307,12 +309,12 @@ pub fn gate_test(members: &[Vec<f32>], spd: usize) -> FoldResult {
307309
let family = euler_gamma_fold(members, spd);
308310

309311
let mut pearsons = Vec::with_capacity(n);
310-
for j in 0..n {
312+
for (j, member) in members.iter().enumerate().take(n) {
311313
let recovered = euler_gamma_unfold(&family, j);
312314

313315
// Compute Pearson between original and recovered
314316
// (on the hydrated StackedN representation, not raw f32)
315-
let orig_enc = StackedN::from_f32(&members[j], spd);
317+
let orig_enc = StackedN::from_f32(member, spd);
316318
let orig_f32 = orig_enc.hydrate_f32();
317319

318320
let r = crate::quality::pearson(

crates/bgz-tensor/src/fisher_z.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
//! Storage: k×k i8 table (64 KB at k=256) + 8 bytes family gamma.
1717
1818
use crate::palette::WeightPalette;
19+
// Base17 reserved for future Base17-direct Fisher z table path
20+
#[allow(unused_imports)]
1921
use crate::projection::Base17;
2022

2123
/// Per-family gamma for Fisher z encoding.

0 commit comments

Comments
 (0)