Skip to content

Commit f1682c4

Browse files
committed
Merge pull request JanKaul#288 from JanKaul/claudemd
Claudemd
2 parents 40b1f96 + 3fc1971 commit f1682c4

File tree

4 files changed

+240
-7
lines changed

4 files changed

+240
-7
lines changed

.github/workflows/rust.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,13 @@ jobs:
2020
android: true
2121
dotnet: true
2222
haskell: true
23-
large-packages: true
23+
large-packages: false
2424
docker-images: false
2525
swap-storage: false
2626
- uses: actions/checkout@v3
27+
- uses: Swatinem/rust-cache@v2
2728
- name: Build
28-
run: cargo build --verbose
29+
run: cargo build
2930
- name: Run iceberg-rust-spec tests
3031
run: make test-iceberg-rust-spec
3132
- name: Run iceberg-rust tests
@@ -43,6 +44,7 @@ jobs:
4344
runs-on: ubuntu-latest
4445
steps:
4546
- uses: actions/checkout@v3
47+
- uses: Swatinem/rust-cache@v2
4648
- name: Install Clippy
4749
run: rustup component add clippy
4850
- name: Run Clippy

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/target
22
Cargo.lock
33
Cargo.lock
4-
.aider*
4+
.mcp.json
55
.idea

CLAUDE.md

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
# iceberg-rust: Design Principles & Patterns
2+
3+
Actionable guidelines for the iceberg-rust project, optimized for AI coding assistants.
4+
5+
## Project Architecture
6+
7+
**Layered Design:**
8+
```
9+
datafusion_iceberg (query engine integration)
10+
11+
iceberg-rust (table operations, catalogs)
12+
13+
iceberg-rust-spec (pure specification types)
14+
```
15+
16+
**Core Philosophy:** Deep modules with simple interfaces (John Ousterhout's "A Philosophy of Software Design")
17+
18+
## Deep vs Shallow Modules
19+
20+
**Deep Modules** = Powerful functionality + Simple interface
21+
- **Best modules** hide significant complexity behind clean APIs
22+
- **Goal:** Minimize interface size relative to implementation size (1:10+ ratio ideal)
23+
- **Example:** Catalog trait has ~20 methods hiding 6 implementations with 5000+ lines (1:12 ratio)
24+
25+
**Shallow Modules to Avoid:**
26+
- Many small methods that just wrap other calls
27+
- Interfaces that expose internal complexity
28+
- Documentation longer than implementation
29+
30+
## LSP-Based Codebase Navigation
31+
32+
**IMPORTANT:** When an LSP (Language Server Protocol) MCP server is available (such as `rust-analyzer`), **ALWAYS prefer LSP tools over text-based search** for code navigation and analysis.
33+
34+
### When to Use LSP Tools
35+
36+
Use LSP tools for:
37+
- **Finding definitions:** `get_symbol_definitions` instead of grepping for function/type names
38+
- **Finding references:** `get_symbol_references` instead of searching for usage
39+
- **Type information:** `get_hover` for accurate type and documentation
40+
- **Code structure:** `get_symbols` for understanding module organization
41+
- **Implementations:** `get_implementations` for finding trait implementations
42+
- **Call hierarchy:** `get_call_hierarchy` for understanding call relationships
43+
- **Diagnostics:** `get_diagnostics` for compiler errors and warnings
44+
- **Completions:** `get_completions` for valid code suggestions
45+
46+
### Decision Tree
47+
48+
```
49+
Need to understand code structure? → get_symbols
50+
Need to find where something is defined? → get_symbol_definitions
51+
Need to find all usages? → get_symbol_references
52+
Need to understand types? → get_hover
53+
Need to find trait impls? → get_implementations
54+
Searching for text/patterns? → Grep/text search
55+
```
56+
57+
## Functional Programming Patterns
58+
59+
### Guidelines
60+
61+
1. **Use Iterator Methods:** `map`, `filter`, `flat_map`, `fold` over `for` loops
62+
2. **Lazy When Possible:** Return `impl Iterator` for large transformations
63+
3. **Combinators:** `ok_or`, `and_then`, `unwrap_or_default` for `Option`/`Result`
64+
4. **Strategic collect():** Only use `.collect::<Vec<_>>()` when needed
65+
5. **Chain Iterators:** Use `.chain()` instead of extending vecs
66+
67+
### When NOT to Use Iterators
68+
69+
- Complex state machines (use explicit loops)
70+
- Performance-critical hot paths needing specific optimizations
71+
- When mutation in place is clearer
72+
73+
## Trait Design Patterns
74+
75+
### When to Create Traits
76+
77+
**Decision Tree:**
78+
```
79+
Is it used by 3+ types? → YES → Consider trait
80+
↓ NO
81+
Does it hide significant complexity? → YES → Consider trait
82+
↓ NO
83+
Would From/Into/standard trait work? → YES → Use standard trait
84+
↓ NO
85+
→ Don't create trait, use generic functions or enum
86+
```
87+
88+
### Guidelines
89+
90+
1. **Prefer Standard Traits:** Use `From`, `TryFrom`, `Into`, `Display`, `Debug`, `Error` over custom traits
91+
2. **Interface/Implementation Ratio:** Aim for 1:10+ lines (interface:implementation)
92+
3. **Required Bounds:** Always include `Send + Sync + Debug` for shared types
93+
4. **Async Traits:** Use `#[async_trait]` for I/O operations
94+
5. **Arc Receivers:** Use `Arc<Self>` for async trait methods needing shared ownership
95+
96+
### Documentation Standards
97+
98+
Every public trait method must document:
99+
1. **Summary:** One-line behavior description
100+
2. **Arguments:** Each parameter explained
101+
3. **Returns:** Success case
102+
4. **Errors:** All failure modes
103+
104+
## Builder Pattern & Configuration
105+
106+
### When to Use Builders
107+
108+
**Decision Tree:**
109+
```
110+
5+ fields OR complex optional config OR needs async setup
111+
→ derive_builder
112+
Otherwise
113+
→ Regular struct with new()
114+
```
115+
116+
### Best Practices
117+
118+
1. **Use derive_builder:** Don't hand-roll builders
119+
2. **Ergonomics:** Use `setter(into)` for `String` and common types
120+
3. **Optional is Explicit:** Use `Option<T>` + `strip_option` for clarity
121+
4. **Required Fields:** No defaults - force user to provide
122+
5. **Validation Separate:** Use `TryInto` for validation logic
123+
6. **Custom build():** Add async `build()` taking external dependencies
124+
125+
## Error Handling
126+
127+
### Guidelines
128+
129+
1. **Use thiserror:** Always derive `Error` trait, never hand-roll
130+
2. **Contextual Messages:** Include what failed and why (column name + schema name)
131+
3. **Transparent Wrapping:** Use `#[from]` for automatic conversion from external errors
132+
4. **Box Large Errors:** `Box<T>` for errors >128 bytes to reduce enum size
133+
5. **Bidirectional When Needed:** Implement `From<Error>` for external types (e.g., `ArrowError`)
134+
6. **Group Related Failures:** One variant with parameter (e.g., `NotFound(String)`) vs many variants
135+
136+
## Module Organization
137+
138+
### Layering
139+
140+
**Dependency Graph:**
141+
```
142+
datafusion_iceberg → iceberg-rust → iceberg-rust-spec
143+
(DataFusion) (async, I/O) (pure data, serde)
144+
```
145+
146+
- **Spec layer:** No external dependencies except serde/uuid
147+
- **Implementation layer:** Adds object_store, async, catalogs
148+
- **Integration layer:** Adds datafusion-specific code
149+
150+
### Information Hiding Checklist
151+
152+
- Is this type part of public API? → Re-export
153+
- Is this complexity internal? → `pub(crate)` or private mod
154+
- Can spec types be separate? → Move to iceberg-rust-spec
155+
- Does this add dependencies? → Check layer appropriateness
156+
157+
## Complexity Management
158+
159+
### Before Adding Complexity
160+
161+
Ask these questions:
162+
1. Can it be hidden behind existing interface?
163+
2. Does it reduce complexity elsewhere?
164+
3. Is the interface/implementation ratio maintained?
165+
4. Can derive macros handle it? (Builder, Getters, Error)
166+
167+
## Quick Reference: Decision Trees
168+
169+
### Adding a New Trait?
170+
```
171+
Is it used by 3+ types? → YES → Consider trait
172+
↓ NO
173+
Does it hide significant complexity? → YES → Consider trait
174+
↓ NO
175+
Would From/Into/standard trait work? → YES → Use standard trait
176+
↓ NO
177+
→ Use generic functions or enum
178+
```
179+
180+
### Choosing Error Handling?
181+
```
182+
External library error? → #[error(transparent)] + #[from]
183+
Domain-specific error? → Custom variant with context
184+
Multiple failure modes? → Single variant with String parameter
185+
Size > 128 bytes? → Box<T>
186+
```
187+
188+
### Async or Sync?
189+
```
190+
I/O operation (network, disk)? → async
191+
CPU-bound computation? → sync
192+
Called from async context? → async
193+
Part of Catalog/trait? → async
194+
```
195+
196+
### Builder or Not?
197+
```
198+
5+ fields OR complex optional config OR needs async setup
199+
→ derive_builder
200+
Otherwise
201+
→ Regular struct with new()
202+
```
203+
204+
## Critical Metrics
205+
206+
These metrics indicate deep modules (desirable):
207+
208+
- **Catalog trait:** ~410 lines interface → ~5000+ lines across implementations (1:12 ratio) ✓
209+
- **Transaction:** ~100 public lines → ~500 implementation lines (1:5 ratio) ✓
210+
- **Error handling:** 107 lines covering all codebase errors (centralized) ✓
211+
212+
**Target:** Interface/Implementation ratio of 1:10+ for major abstractions
213+
214+
## Key Takeaways
215+
216+
1. **Deep Modules:** Hide complexity behind simple interfaces (Catalog trait: 20 methods, 6 implementations)
217+
2. **Standard Traits First:** Prefer `From`/`TryFrom`/`Error` over custom traits
218+
3. **Builder Pattern:** Use `derive_builder` for 5+ fields or complex config
219+
4. **Functional Style:** Iterator chains over loops, combinators over match
220+
5. **Error Context:** Use `thiserror` with contextual messages
221+
6. **Async for I/O:** All catalog/storage operations async with `Arc<Self>`
222+
7. **Layered Architecture:** Keep spec pure, implementation separate, integration isolated
223+
8. **Document Everything:** Public APIs need comprehensive docs
224+
9. **Pull Complexity Down:** Make user's life easy, hide complexity in implementation
225+
226+
## Design Philosophy Summary
227+
228+
When adding features, ask: **"Am I making the interface simpler or more complex?"**
229+
230+
The best additions hide complexity from users while maintaining clear, well-documented interfaces.
231+

Makefile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,16 @@ test-iceberg-rust:
77
cargo test -p iceberg-rust --lib
88

99
test-datafusion_iceberg:
10-
cargo test -p datafusion_iceberg --tests -j 2 && cargo clean -p datafusion_iceberg
10+
cargo test -p datafusion_iceberg --tests -j 2
1111

1212
test-rest-catalog:
13-
cargo test -p iceberg-rest-catalog --lib && cargo clean -p iceberg-rest-catalog
13+
cargo test -p iceberg-rest-catalog --lib
1414

1515
test-file-catalog:
16-
cargo test -p iceberg-file-catalog --lib && cargo clean -p iceberg-file-catalog
16+
cargo test -p iceberg-file-catalog --lib
1717

1818
test-sql-catalog:
19-
cargo test -p iceberg-sql-catalog --lib && cargo clean -p iceberg-sql-catalog
19+
cargo test -p iceberg-sql-catalog --lib
2020
clippy:
2121
cargo clippy --all-targets --all-features -- -D warnings
2222
fmt:

0 commit comments

Comments
 (0)