|
| 1 | +# LANCE_GRAPH_UPSTREAM_SESSIONS.md |
| 2 | + |
| 3 | +## Fix PR #146. Split. Rebase. Contribute BlasGraph. |
| 4 | + |
| 5 | +**Repo:** AdaWorldAPI/lance-graph (fork of lance-format/lance-graph) |
| 6 | +**Upstream:** lance-format/lance-graph |
| 7 | +**PR #146:** open, CI failing, merge conflicts, maintainer asked us to fix |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## SITUATION |
| 12 | + |
| 13 | +``` |
| 14 | +PR #146 is a kitchen sink: dep bumps + error macros + graph/spo/ module + docs |
| 15 | +11 commits, 2577 additions, mergeable: false, state: dirty |
| 16 | +
|
| 17 | +Upstream merged 3 PRs since we opened: |
| 18 | + #145 Unity catalog/delta lake (Mar 3) |
| 19 | + #147 Remove Simple executor (Mar 4) ← CONFLICT |
| 20 | + #148 Move benchmarks to separate crate (Mar 5) ← CONFLICT |
| 21 | +
|
| 22 | +Our fork: 10 ahead, 3 behind upstream main. |
| 23 | +``` |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## SESSION 1: Sync Fork + Close #146 |
| 28 | + |
| 29 | +```bash |
| 30 | +cd adaworld/lance-graph |
| 31 | + |
| 32 | +# Sync fork to upstream |
| 33 | +git remote add upstream https://github.com/lance-format/lance-graph.git |
| 34 | +git fetch upstream |
| 35 | +git checkout main |
| 36 | +git merge upstream/main |
| 37 | +# Resolve conflicts (Cargo.lock, workspace layout changes) |
| 38 | +git push origin main |
| 39 | + |
| 40 | +# Close #146 with comment |
| 41 | +``` |
| 42 | + |
| 43 | +Comment on PR #146: |
| 44 | +``` |
| 45 | +Closing this PR to split into focused contributions: |
| 46 | +- PR A: dep bumps (arrow 57, datafusion 51, lance 2) |
| 47 | +- PR B: graph/spo/ module (SPO triple store, TruthGate, semiring) |
| 48 | +- PR C: BlasGraph algebra (from holograph — 7 semirings, matrix ops) |
| 49 | +
|
| 50 | +Sorry for the kitchen sink. Splitting for clean review. |
| 51 | +``` |
| 52 | + |
| 53 | +**Exit gate:** Fork synced to upstream main. #146 closed. |
| 54 | + |
| 55 | +--- |
| 56 | + |
| 57 | +## SESSION 2: PR A — Dep Bumps Only |
| 58 | + |
| 59 | +```bash |
| 60 | +git checkout -b feat/bump-arrow-57-datafusion-51-lance-2 |
| 61 | +``` |
| 62 | + |
| 63 | +**What to do:** |
| 64 | +``` |
| 65 | +1. ONLY change Cargo.toml dep versions: |
| 66 | + crates/lance-graph/Cargo.toml |
| 67 | + crates/lance-graph-catalog/Cargo.toml |
| 68 | + crates/lance-graph-python/Cargo.toml |
| 69 | + |
| 70 | + arrow: current → 57 |
| 71 | + datafusion: current → 51 |
| 72 | + lance: current → 2.0 |
| 73 | +
|
| 74 | +2. cargo update (regenerate Cargo.lock) |
| 75 | +
|
| 76 | +3. Fix any API breakages from dep bumps: |
| 77 | + - arrow 57: check RecordBatch API changes |
| 78 | + - datafusion 51: check SessionContext, LogicalPlan changes |
| 79 | + - lance 2: check table API, write params |
| 80 | +
|
| 81 | +4. cargo test --workspace (exclude python crate) |
| 82 | + Fix ALL test failures. |
| 83 | +
|
| 84 | +5. Check upstream CI requirements: |
| 85 | + - cargo fmt --check |
| 86 | + - cargo clippy -- -D warnings |
| 87 | + - cargo test |
| 88 | +
|
| 89 | +6. Push, open PR to lance-format/lance-graph |
| 90 | +``` |
| 91 | + |
| 92 | +PR title: `feat: bump arrow 57, datafusion 51, lance 2` |
| 93 | +PR body: |
| 94 | +``` |
| 95 | +Align dependency matrix: |
| 96 | + arrow → 57 |
| 97 | + datafusion → 51 |
| 98 | + lance → 2.0 |
| 99 | +
|
| 100 | +All tests pass. No API breakages. |
| 101 | +Follows up on closed #146 (split into focused PRs). |
| 102 | +``` |
| 103 | + |
| 104 | +**Exit gate:** Clean PR with ONLY dep bumps. CI green. No extra files. |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | +## SESSION 3: PR B — graph/spo/ Module |
| 109 | + |
| 110 | +```bash |
| 111 | +git checkout main |
| 112 | +git pull upstream main # get PR A merged first, or branch from main |
| 113 | +git checkout -b feat/spo-triple-store |
| 114 | +``` |
| 115 | + |
| 116 | +**What to do:** |
| 117 | +``` |
| 118 | +1. Add ONLY the graph/spo/ module: |
| 119 | + crates/lance-graph/src/graph/mod.rs |
| 120 | + crates/lance-graph/src/graph/fingerprint.rs |
| 121 | + crates/lance-graph/src/graph/sparse.rs |
| 122 | + crates/lance-graph/src/graph/spo/mod.rs |
| 123 | + crates/lance-graph/src/graph/spo/builder.rs |
| 124 | + crates/lance-graph/src/graph/spo/merkle.rs |
| 125 | + crates/lance-graph/src/graph/spo/semiring.rs |
| 126 | + crates/lance-graph/src/graph/spo/store.rs |
| 127 | + crates/lance-graph/src/graph/spo/truth.rs |
| 128 | +
|
| 129 | +2. Add test: |
| 130 | + crates/lance-graph/tests/spo_ground_truth.rs |
| 131 | +
|
| 132 | +3. Add `pub mod graph;` to lib.rs |
| 133 | +
|
| 134 | +4. REMOVE anything not relevant to upstream: |
| 135 | + - No SPARE_PARTS_SUMMARY.md |
| 136 | + - No #[track_caller] error macros (separate PR if wanted) |
| 137 | + - No ladybug-rs specific imports |
| 138 | + - No references to BindSpace, CogRedis, etc |
| 139 | + |
| 140 | +5. Make sure graph/spo/ is SELF-CONTAINED: |
| 141 | + - Uses lance-graph's own error types (not ladybug's QueryError) |
| 142 | + - Uses standard blake3 crate (add to Cargo.toml) |
| 143 | + - No dependency on rustynum or ladybug-rs |
| 144 | + |
| 145 | +6. Clean the code for upstream standards: |
| 146 | + - cargo fmt |
| 147 | + - cargo clippy -- -D warnings |
| 148 | + - Doc comments on all pub types and methods |
| 149 | + - Examples in doc comments where useful |
| 150 | +
|
| 151 | +7. cargo test (including spo_ground_truth.rs) |
| 152 | +
|
| 153 | +8. Push, open PR |
| 154 | +``` |
| 155 | + |
| 156 | +PR title: `feat(graph): add SPO triple store with Merkle integrity, TruthGate, and semiring traversal` |
| 157 | +PR body: |
| 158 | +``` |
| 159 | +Add a content-addressable SPO (Subject-Predicate-Object) triple store: |
| 160 | +
|
| 161 | +- **SpoStore**: insert, query_forward, query_reverse, query_relation |
| 162 | +- **SpoMerkle**: Blake3-based integrity with MerkleEpoch and inclusion proofs |
| 163 | +- **TruthGate**: NARS-inspired confidence gating (MinFreq/MinConf/MinBoth) |
| 164 | +- **SpoSemiring**: Algebraic traversal operations for graph algorithms |
| 165 | +- **SpoBuilder**: Builder pattern for constructing stores |
| 166 | +- **Fingerprint**: 16384-bit binary fingerprint with Hamming operations |
| 167 | +- **SparseContainer**: Memory-efficient sparse vector storage |
| 168 | +
|
| 169 | +Ground truth test included (357 lines). |
| 170 | +
|
| 171 | +This enables knowledge-graph style operations on LanceDB with |
| 172 | +content-addressed nodes and confidence-weighted edges. |
| 173 | +``` |
| 174 | + |
| 175 | +**Exit gate:** Clean PR, no ladybug-rs deps, all tests pass, upstream CI green. |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## SESSION 4: PR C — BlasGraph Semiring Algebra (FROM holograph) |
| 180 | + |
| 181 | +```bash |
| 182 | +git checkout main |
| 183 | +git pull upstream main |
| 184 | +git checkout -b feat/blasgraph-semiring-algebra |
| 185 | +``` |
| 186 | + |
| 187 | +**What to do:** |
| 188 | +``` |
| 189 | +1. Port from holograph/src/graphblas/ to lance-graph: |
| 190 | + |
| 191 | + Create: crates/lance-graph/src/graph/blasgraph/ |
| 192 | + mod.rs ← from holograph graphblas/mod.rs (94 lines) |
| 193 | + semiring.rs ← from holograph graphblas/semiring.rs (535 lines) |
| 194 | + matrix.rs ← from holograph graphblas/matrix.rs (596 lines) |
| 195 | + vector.rs ← from holograph graphblas/vector.rs (506 lines) |
| 196 | + ops.rs ← from holograph graphblas/ops.rs (717 lines) |
| 197 | + sparse.rs ← from holograph graphblas/sparse.rs (546 lines) |
| 198 | + types.rs ← from holograph graphblas/types.rs (330 lines) |
| 199 | + descriptor.rs ← from holograph graphblas/descriptor.rs (186 lines) |
| 200 | +
|
| 201 | +2. CLEAN for upstream: |
| 202 | + - Remove holograph-specific imports |
| 203 | + - Remove any reference to ladybug-rs types |
| 204 | + - Use lance-graph error types |
| 205 | + - All pub types and methods get doc comments |
| 206 | + - cargo fmt + clippy clean |
| 207 | + |
| 208 | +3. The 7 semirings to include: |
| 209 | + - XOR Bundle (bind/superpose) |
| 210 | + - Bind First (key-value association) |
| 211 | + - Hamming Min (nearest neighbor) |
| 212 | + - Similarity Max (most similar) |
| 213 | + - Resonance with threshold (sigma-gated) |
| 214 | + - Boolean (standard graph traversal) |
| 215 | + - XOR Field (algebraic field operations) |
| 216 | + |
| 217 | +4. Matrix operations: |
| 218 | + - mxm (matrix × matrix — graph composition) |
| 219 | + - mxv (matrix × vector — graph query) |
| 220 | + - vxm (vector × matrix — reverse query) |
| 221 | + - element-wise add/mult |
| 222 | +
|
| 223 | +5. Write tests: |
| 224 | + - One test per semiring showing expected behavior |
| 225 | + - Matrix multiplication with at least 2 semirings |
| 226 | + - Sparse matrix efficiency test |
| 227 | +
|
| 228 | +6. Update graph/mod.rs: pub mod blasgraph; |
| 229 | +
|
| 230 | +7. Push, open PR |
| 231 | +``` |
| 232 | + |
| 233 | +PR title: `feat(graph): add BlasGraph semiring algebra — 7 semirings, sparse matrix ops` |
| 234 | +PR body: |
| 235 | +``` |
| 236 | +Port GraphBLAS-inspired sparse matrix algebra to lance-graph. |
| 237 | +
|
| 238 | +7 semiring algebras for different graph computation modes: |
| 239 | +- XOR Bundle, Bind First, Hamming Min, Similarity Max |
| 240 | +- Resonance (threshold-gated), Boolean, XOR Field |
| 241 | +
|
| 242 | +Matrix operations: mxm, mxv, vxm, element-wise. |
| 243 | +CSR sparse format for memory-efficient large graphs. |
| 244 | +
|
| 245 | +This enables algebraic graph algorithms (PageRank, community detection, |
| 246 | +shortest path) as matrix operations on LanceDB-backed graphs, |
| 247 | +replacing Pregel-style message passing with linear algebra. |
| 248 | +
|
| 249 | +Based on the RedisGraph BlasGraph approach, adapted for LanceDB |
| 250 | +and binary Hamming distance vectors. |
| 251 | +``` |
| 252 | + |
| 253 | +**Exit gate:** Clean PR, holograph code adapted, all tests pass, no ladybug-rs deps. |
| 254 | + |
| 255 | +--- |
| 256 | + |
| 257 | +## SUMMARY |
| 258 | + |
| 259 | +``` |
| 260 | +SESSION PR WHAT LINES DEPENDS ON |
| 261 | +1 - Sync fork, close #146 0 nothing |
| 262 | +2 A Dep bumps only ~50 session 1 |
| 263 | +3 B graph/spo/ module ~1600 session 1 (or session 2 merged) |
| 264 | +4 C BlasGraph semiring algebra ~3500 session 1 (or session 2 merged) |
| 265 | +``` |
| 266 | + |
| 267 | +Sessions 3 and 4 can target main even if PR A isn't merged yet. |
| 268 | +PR B and C are independent of each other. |
| 269 | + |
| 270 | +**What beinan and the lance-format team get:** |
| 271 | +- Arrow 57 / DataFusion 51 / Lance 2 alignment (PR A) |
| 272 | +- SPO triple store with Merkle integrity and NARS confidence (PR B) |
| 273 | +- BlasGraph algebra that no other graph database has in Rust (PR C) |
| 274 | + |
| 275 | +**What we get:** |
| 276 | +- Clean upstream relationship (not a kitchen sink PR) |
| 277 | +- Our SPO and BlasGraph contributions in the official repo |
| 278 | +- Upstream CI validates our code |
| 279 | +- Community review catches bugs we missed |
0 commit comments