Skip to content

Commit da3ca40

Browse files
committed
Add upstream PR session plan — split #146 into 3 clean PRs
1 parent 91cb83a commit da3ca40

1 file changed

Lines changed: 279 additions & 0 deletions

File tree

.claude/UPSTREAM_PR_SESSIONS.md

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
# LANCE_GRAPH_UPSTREAM_SESSIONS.md
2+
3+
## Fix PR #146. Split. Rebase. Contribute BlasGraph.
4+
5+
**Repo:** AdaWorldAPI/lance-graph (fork of lance-format/lance-graph)
6+
**Upstream:** lance-format/lance-graph
7+
**PR #146:** open, CI failing, merge conflicts, maintainer asked us to fix
8+
9+
---
10+
11+
## SITUATION
12+
13+
```
14+
PR #146 is a kitchen sink: dep bumps + error macros + graph/spo/ module + docs
15+
11 commits, 2577 additions, mergeable: false, state: dirty
16+
17+
Upstream merged 3 PRs since we opened:
18+
#145 Unity catalog/delta lake (Mar 3)
19+
#147 Remove Simple executor (Mar 4) ← CONFLICT
20+
#148 Move benchmarks to separate crate (Mar 5) ← CONFLICT
21+
22+
Our fork: 10 ahead, 3 behind upstream main.
23+
```
24+
25+
---
26+
27+
## SESSION 1: Sync Fork + Close #146
28+
29+
```bash
30+
cd adaworld/lance-graph
31+
32+
# Sync fork to upstream
33+
git remote add upstream https://github.com/lance-format/lance-graph.git
34+
git fetch upstream
35+
git checkout main
36+
git merge upstream/main
37+
# Resolve conflicts (Cargo.lock, workspace layout changes)
38+
git push origin main
39+
40+
# Close #146 with comment
41+
```
42+
43+
Comment on PR #146:
44+
```
45+
Closing this PR to split into focused contributions:
46+
- PR A: dep bumps (arrow 57, datafusion 51, lance 2)
47+
- PR B: graph/spo/ module (SPO triple store, TruthGate, semiring)
48+
- PR C: BlasGraph algebra (from holograph — 7 semirings, matrix ops)
49+
50+
Sorry for the kitchen sink. Splitting for clean review.
51+
```
52+
53+
**Exit gate:** Fork synced to upstream main. #146 closed.
54+
55+
---
56+
57+
## SESSION 2: PR A — Dep Bumps Only
58+
59+
```bash
60+
git checkout -b feat/bump-arrow-57-datafusion-51-lance-2
61+
```
62+
63+
**What to do:**
64+
```
65+
1. ONLY change Cargo.toml dep versions:
66+
crates/lance-graph/Cargo.toml
67+
crates/lance-graph-catalog/Cargo.toml
68+
crates/lance-graph-python/Cargo.toml
69+
70+
arrow: current → 57
71+
datafusion: current → 51
72+
lance: current → 2.0
73+
74+
2. cargo update (regenerate Cargo.lock)
75+
76+
3. Fix any API breakages from dep bumps:
77+
- arrow 57: check RecordBatch API changes
78+
- datafusion 51: check SessionContext, LogicalPlan changes
79+
- lance 2: check table API, write params
80+
81+
4. cargo test --workspace (exclude python crate)
82+
Fix ALL test failures.
83+
84+
5. Check upstream CI requirements:
85+
- cargo fmt --check
86+
- cargo clippy -- -D warnings
87+
- cargo test
88+
89+
6. Push, open PR to lance-format/lance-graph
90+
```
91+
92+
PR title: `feat: bump arrow 57, datafusion 51, lance 2`
93+
PR body:
94+
```
95+
Align dependency matrix:
96+
arrow → 57
97+
datafusion → 51
98+
lance → 2.0
99+
100+
All tests pass. No API breakages.
101+
Follows up on closed #146 (split into focused PRs).
102+
```
103+
104+
**Exit gate:** Clean PR with ONLY dep bumps. CI green. No extra files.
105+
106+
---
107+
108+
## SESSION 3: PR B — graph/spo/ Module
109+
110+
```bash
111+
git checkout main
112+
git pull upstream main # get PR A merged first, or branch from main
113+
git checkout -b feat/spo-triple-store
114+
```
115+
116+
**What to do:**
117+
```
118+
1. Add ONLY the graph/spo/ module:
119+
crates/lance-graph/src/graph/mod.rs
120+
crates/lance-graph/src/graph/fingerprint.rs
121+
crates/lance-graph/src/graph/sparse.rs
122+
crates/lance-graph/src/graph/spo/mod.rs
123+
crates/lance-graph/src/graph/spo/builder.rs
124+
crates/lance-graph/src/graph/spo/merkle.rs
125+
crates/lance-graph/src/graph/spo/semiring.rs
126+
crates/lance-graph/src/graph/spo/store.rs
127+
crates/lance-graph/src/graph/spo/truth.rs
128+
129+
2. Add test:
130+
crates/lance-graph/tests/spo_ground_truth.rs
131+
132+
3. Add `pub mod graph;` to lib.rs
133+
134+
4. REMOVE anything not relevant to upstream:
135+
- No SPARE_PARTS_SUMMARY.md
136+
- No #[track_caller] error macros (separate PR if wanted)
137+
- No ladybug-rs specific imports
138+
- No references to BindSpace, CogRedis, etc
139+
140+
5. Make sure graph/spo/ is SELF-CONTAINED:
141+
- Uses lance-graph's own error types (not ladybug's QueryError)
142+
- Uses standard blake3 crate (add to Cargo.toml)
143+
- No dependency on rustynum or ladybug-rs
144+
145+
6. Clean the code for upstream standards:
146+
- cargo fmt
147+
- cargo clippy -- -D warnings
148+
- Doc comments on all pub types and methods
149+
- Examples in doc comments where useful
150+
151+
7. cargo test (including spo_ground_truth.rs)
152+
153+
8. Push, open PR
154+
```
155+
156+
PR title: `feat(graph): add SPO triple store with Merkle integrity, TruthGate, and semiring traversal`
157+
PR body:
158+
```
159+
Add a content-addressable SPO (Subject-Predicate-Object) triple store:
160+
161+
- **SpoStore**: insert, query_forward, query_reverse, query_relation
162+
- **SpoMerkle**: Blake3-based integrity with MerkleEpoch and inclusion proofs
163+
- **TruthGate**: NARS-inspired confidence gating (MinFreq/MinConf/MinBoth)
164+
- **SpoSemiring**: Algebraic traversal operations for graph algorithms
165+
- **SpoBuilder**: Builder pattern for constructing stores
166+
- **Fingerprint**: 16384-bit binary fingerprint with Hamming operations
167+
- **SparseContainer**: Memory-efficient sparse vector storage
168+
169+
Ground truth test included (357 lines).
170+
171+
This enables knowledge-graph style operations on LanceDB with
172+
content-addressed nodes and confidence-weighted edges.
173+
```
174+
175+
**Exit gate:** Clean PR, no ladybug-rs deps, all tests pass, upstream CI green.
176+
177+
---
178+
179+
## SESSION 4: PR C — BlasGraph Semiring Algebra (FROM holograph)
180+
181+
```bash
182+
git checkout main
183+
git pull upstream main
184+
git checkout -b feat/blasgraph-semiring-algebra
185+
```
186+
187+
**What to do:**
188+
```
189+
1. Port from holograph/src/graphblas/ to lance-graph:
190+
191+
Create: crates/lance-graph/src/graph/blasgraph/
192+
mod.rs ← from holograph graphblas/mod.rs (94 lines)
193+
semiring.rs ← from holograph graphblas/semiring.rs (535 lines)
194+
matrix.rs ← from holograph graphblas/matrix.rs (596 lines)
195+
vector.rs ← from holograph graphblas/vector.rs (506 lines)
196+
ops.rs ← from holograph graphblas/ops.rs (717 lines)
197+
sparse.rs ← from holograph graphblas/sparse.rs (546 lines)
198+
types.rs ← from holograph graphblas/types.rs (330 lines)
199+
descriptor.rs ← from holograph graphblas/descriptor.rs (186 lines)
200+
201+
2. CLEAN for upstream:
202+
- Remove holograph-specific imports
203+
- Remove any reference to ladybug-rs types
204+
- Use lance-graph error types
205+
- All pub types and methods get doc comments
206+
- cargo fmt + clippy clean
207+
208+
3. The 7 semirings to include:
209+
- XOR Bundle (bind/superpose)
210+
- Bind First (key-value association)
211+
- Hamming Min (nearest neighbor)
212+
- Similarity Max (most similar)
213+
- Resonance with threshold (sigma-gated)
214+
- Boolean (standard graph traversal)
215+
- XOR Field (algebraic field operations)
216+
217+
4. Matrix operations:
218+
- mxm (matrix × matrix — graph composition)
219+
- mxv (matrix × vector — graph query)
220+
- vxm (vector × matrix — reverse query)
221+
- element-wise add/mult
222+
223+
5. Write tests:
224+
- One test per semiring showing expected behavior
225+
- Matrix multiplication with at least 2 semirings
226+
- Sparse matrix efficiency test
227+
228+
6. Update graph/mod.rs: pub mod blasgraph;
229+
230+
7. Push, open PR
231+
```
232+
233+
PR title: `feat(graph): add BlasGraph semiring algebra — 7 semirings, sparse matrix ops`
234+
PR body:
235+
```
236+
Port GraphBLAS-inspired sparse matrix algebra to lance-graph.
237+
238+
7 semiring algebras for different graph computation modes:
239+
- XOR Bundle, Bind First, Hamming Min, Similarity Max
240+
- Resonance (threshold-gated), Boolean, XOR Field
241+
242+
Matrix operations: mxm, mxv, vxm, element-wise.
243+
CSR sparse format for memory-efficient large graphs.
244+
245+
This enables algebraic graph algorithms (PageRank, community detection,
246+
shortest path) as matrix operations on LanceDB-backed graphs,
247+
replacing Pregel-style message passing with linear algebra.
248+
249+
Based on the RedisGraph BlasGraph approach, adapted for LanceDB
250+
and binary Hamming distance vectors.
251+
```
252+
253+
**Exit gate:** Clean PR, holograph code adapted, all tests pass, no ladybug-rs deps.
254+
255+
---
256+
257+
## SUMMARY
258+
259+
```
260+
SESSION PR WHAT LINES DEPENDS ON
261+
1 - Sync fork, close #146 0 nothing
262+
2 A Dep bumps only ~50 session 1
263+
3 B graph/spo/ module ~1600 session 1 (or session 2 merged)
264+
4 C BlasGraph semiring algebra ~3500 session 1 (or session 2 merged)
265+
```
266+
267+
Sessions 3 and 4 can target main even if PR A isn't merged yet.
268+
PR B and C are independent of each other.
269+
270+
**What beinan and the lance-format team get:**
271+
- Arrow 57 / DataFusion 51 / Lance 2 alignment (PR A)
272+
- SPO triple store with Merkle integrity and NARS confidence (PR B)
273+
- BlasGraph algebra that no other graph database has in Rust (PR C)
274+
275+
**What we get:**
276+
- Clean upstream relationship (not a kitchen sink PR)
277+
- Our SPO and BlasGraph contributions in the official repo
278+
- Upstream CI validates our code
279+
- Community review catches bugs we missed

0 commit comments

Comments
 (0)