Skip to content

Commit 3fc1971

Browse files
committed
compress CLAUDE.md
1 parent 473361b commit 3fc1971

1 file changed

Lines changed: 19 additions & 259 deletions

File tree

CLAUDE.md

Lines changed: 19 additions & 259 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# iceberg-rust: Design Principles & Patterns
22

3-
This document provides design principles and coding patterns for the iceberg-rust project, optimized for AI coding assistants and human contributors. It documents existing patterns while suggesting improvements for future development.
3+
Actionable guidelines for the iceberg-rust project, optimized for AI coding assistants.
44

55
## Project Architecture
66

@@ -15,6 +15,18 @@ iceberg-rust-spec (pure specification types)
1515

1616
**Core Philosophy:** Deep modules with simple interfaces (John Ousterhout's "A Philosophy of Software Design")
1717

18+
## Deep vs Shallow Modules
19+
20+
**Deep Modules** = Powerful functionality + Simple interface
21+
- **Best modules** hide significant complexity behind clean APIs
22+
- **Goal:** Minimize interface size relative to implementation size (1:10+ ratio ideal)
23+
- **Example:** Catalog trait has ~20 methods hiding 6 implementations with 5000+ lines (1:12 ratio)
24+
25+
**Shallow Modules to Avoid:**
26+
- Many small methods that just wrap other calls
27+
- Interfaces that expose internal complexity
28+
- Documentation longer than implementation
29+
1830
## LSP-Based Codebase Navigation
1931

2032
**IMPORTANT:** When an LSP (Language Server Protocol) MCP server is available (such as `rust-analyzer`), **ALWAYS prefer LSP tools over text-based search** for code navigation and analysis.
@@ -31,15 +43,8 @@ Use LSP tools for:
3143
- **Diagnostics:** `get_diagnostics` for compiler errors and warnings
3244
- **Completions:** `get_completions` for valid code suggestions
3345

34-
### Integration with Development
35-
36-
Before modifying code:
37-
1. Use `get_symbol_definitions` to find the target
38-
2. Use `get_hover` to understand types and contracts
39-
3. Use `get_symbol_references` to assess impact
40-
4. Make changes with full context
46+
### Decision Tree
4147

42-
**Example Decision Tree:**
4348
```
4449
Need to understand code structure? → get_symbols
4550
Need to find where something is defined? → get_symbol_definitions
@@ -49,74 +54,8 @@ Need to find trait impls? → get_implementations
4954
Searching for text/patterns? → Grep/text search
5055
```
5156

52-
## Deep vs Shallow Modules
53-
54-
**Deep Modules** = Powerful functionality + Simple interface
55-
- **Best modules** hide significant complexity behind clean APIs
56-
- **Goal:** Minimize interface size relative to implementation size (1:10+ ratio ideal)
57-
58-
**Example - Catalog Trait** (`iceberg-rust/src/catalog/mod.rs:46-64`):
59-
```rust
60-
#[async_trait::async_trait]
61-
pub trait Catalog: Send + Sync + Debug {
62-
async fn load_tabular(self: Arc<Self>, identifier: &Identifier)
63-
-> Result<Tabular, Error>;
64-
// ~20 methods hiding 6 implementations with 5000+ lines total
65-
// Interface/Implementation ratio: 1:12+ ✓
66-
}
67-
```
68-
69-
**What this hides:** Connection pooling, metadata loading, object store interaction, format detection (table/view/materialized_view), caching, retry logic, error translation
70-
71-
**Shallow Modules to Avoid:**
72-
- Many small methods that just wrap other calls
73-
- Interfaces that expose internal complexity
74-
- Documentation longer than implementation
75-
7657
## Functional Programming Patterns
7758

78-
### Prefer Iterators Over Loops
79-
80-
**Pattern 1: Iterator Chains** (`iceberg-rust/src/table/transaction/operation.rs:188-196`):
81-
```rust
82-
let new_datafile_iter = data_files.into_iter().map(|data_file| {
83-
ManifestEntry::builder()
84-
.with_format_version(table_metadata.format_version)
85-
.with_status(Status::Added)
86-
.with_data_file(data_file)
87-
.with_sequence_number(table_metadata.last_sequence_number + dsn_offset)
88-
.build()
89-
.map_err(Error::from)
90-
});
91-
```
92-
93-
**Benefits:**
94-
- Lazy evaluation (only creates when consumed)
95-
- Clear transformation pipeline
96-
- Error handling inline with `map_err`
97-
- No intermediate allocations until `collect()`
98-
99-
**Pattern 2: flat_map for Flattening** (`iceberg-rust/src/table/transaction/operation.rs:149-153`):
100-
```rust
101-
let all_files: Vec<DataFile> = sequence_groups
102-
.iter()
103-
.flat_map(|d| d.delete_files.iter().chain(d.data_files.iter()))
104-
.cloned()
105-
.collect();
106-
```
107-
108-
**Pattern 3: Option/Result Combinators** (`iceberg-rust/src/catalog/create.rs:131-132`):
109-
```rust
110-
// Prefer this:
111-
self.location.ok_or(Error::NotFound(format!("Location for table {}", self.name)))?
112-
113-
// Over this:
114-
let location = match self.location {
115-
Some(loc) => loc,
116-
None => return Err(Error::NotFound(...)),
117-
};
118-
```
119-
12059
### Guidelines
12160

12261
1. **Use Iterator Methods:** `map`, `filter`, `flat_map`, `fold` over `for` loops
@@ -154,20 +93,9 @@ Would From/Into/standard trait work? → YES → Use standard trait
15493
4. **Async Traits:** Use `#[async_trait]` for I/O operations
15594
5. **Arc Receivers:** Use `Arc<Self>` for async trait methods needing shared ownership
15695

157-
**Example - Standard Traits in Action** (`iceberg-rust/src/catalog/create.rs:116`):
158-
```rust
159-
impl TryInto<TableMetadata> for CreateTable {
160-
type Error = Error;
161-
fn try_into(self) -> Result<TableMetadata, Self::Error> {
162-
// Validation and conversion logic (30+ lines)
163-
// Separates validation from construction
164-
}
165-
}
166-
```
167-
168-
### Documentation Standards for Traits
96+
### Documentation Standards
16997

170-
Every public trait method must document (`iceberg-rust/src/catalog/mod.rs:65-98`):
98+
Every public trait method must document:
17199
1. **Summary:** One-line behavior description
172100
2. **Arguments:** Each parameter explained
173101
3. **Returns:** Success case
@@ -185,50 +113,7 @@ Otherwise
185113
→ Regular struct with new()
186114
```
187115

188-
### Pattern: derive_builder + async build()
189-
190-
**Example** (`iceberg-rust/src/catalog/create.rs:54-83`):
191-
```rust
192-
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize, Builder)]
193-
#[builder(build_fn(name = "create", error = "Error"), setter(prefix = "with"))]
194-
pub struct CreateTable {
195-
#[builder(setter(into))]
196-
pub name: String, // Required, no default
197-
198-
#[serde(skip_serializing_if = "Option::is_none")]
199-
#[builder(setter(into, strip_option), default)]
200-
pub location: Option<String>, // Optional, explicit
201-
202-
pub schema: Schema, // Required
203-
204-
#[builder(setter(strip_option), default)]
205-
pub partition_spec: Option<PartitionSpec>, // Optional
206-
}
207-
```
208-
209-
### Builder Extension for Integration
210-
211-
**Pattern:** Custom async `build()` for catalog integration (`iceberg-rust/src/catalog/create.rs:85-114`):
212-
```rust
213-
impl CreateTableBuilder {
214-
pub async fn build(
215-
&mut self,
216-
namespace: &[String],
217-
catalog: Arc<dyn Catalog>,
218-
) -> Result<Table, Error> {
219-
let identifier = Identifier::new(namespace, &self.name);
220-
catalog.clone().create_table(identifier, self.create()?).await
221-
}
222-
}
223-
```
224-
225-
**Why this works:**
226-
- Builder handles field collection
227-
- Extension handles integration (identifier creation, catalog call)
228-
- Validation separated in `TryInto<TableMetadata>`
229-
- Users see: `CreateTable::builder().with_name("x").with_schema(s).build(&ns, cat).await?`
230-
231-
### Builder Best Practices
116+
### Best Practices
232117

233118
1. **Use derive_builder:** Don't hand-roll builders
234119
2. **Ergonomics:** Use `setter(into)` for `String` and common types
@@ -239,32 +124,6 @@ impl CreateTableBuilder {
239124

240125
## Error Handling
241126

242-
### Pattern: Centralized Error Enum with thiserror
243-
244-
**Example** (`iceberg-rust/src/error.rs:8-95`):
245-
```rust
246-
#[derive(Error, Debug)]
247-
pub enum Error {
248-
// Domain errors with context
249-
#[error("Column {0} not in schema {1}.")]
250-
Schema(String, String), // What failed + why
251-
252-
#[error("Feature {0} is not supported.")]
253-
NotSupported(String),
254-
255-
// Wrapped external errors
256-
#[error(transparent)]
257-
Arrow(#[from] arrow::error::ArrowError),
258-
259-
#[error(transparent)]
260-
ObjectStore(#[from] object_store::Error),
261-
262-
// Boxed large errors
263-
#[error(transparent)]
264-
Avro(Box<apache_avro::Error>),
265-
}
266-
```
267-
268127
### Guidelines
269128

270129
1. **Use thiserror:** Always derive `Error` trait, never hand-roll
@@ -274,123 +133,28 @@ pub enum Error {
274133
5. **Bidirectional When Needed:** Implement `From<Error>` for external types (e.g., `ArrowError`)
275134
6. **Group Related Failures:** One variant with parameter (e.g., `NotFound(String)`) vs many variants
276135

277-
### Error Flow Best Practice
278-
279-
```
280-
Operation → Result<T, Error> with context
281-
282-
Domain layer adds context
283-
284-
Infrastructure errors wrapped transparently
285-
```
286-
287-
## Async Patterns
288-
289-
### Pattern: Instrumentation
290-
291-
**All performance-critical paths** (`iceberg-rust/src/table/transaction/mod.rs`):
292-
```rust
293-
#[instrument(
294-
name = "iceberg_rust::table::transaction::commit",
295-
level = "debug",
296-
skip(self),
297-
fields(table_identifier = %self.table.identifier)
298-
)]
299-
pub async fn commit(self) -> Result<(), Error> { ... }
300-
```
301-
302-
### Guidelines
303-
304-
1. **async_trait Required:** For async methods in traits
305-
2. **Send + Sync Bounds:** All async types crossing await points
306-
3. **Arc for Shared State:** Prefer `Arc` over lifetimes in async
307-
4. **Instrument Hot Paths:** Use `#[instrument]` on catalog/I/O operations
308-
6. **Tokio Runtime:** All integration tests use `#[tokio::main]`
309-
310136
## Module Organization
311137

312-
### Principle: Minimize Dependencies Through Layering
138+
### Layering
313139

314140
**Dependency Graph:**
315141
```
316142
datafusion_iceberg → iceberg-rust → iceberg-rust-spec
317143
(DataFusion) (async, I/O) (pure data, serde)
318144
```
319145

320-
**Why This Works:**
321146
- **Spec layer:** No external dependencies except serde/uuid
322147
- **Implementation layer:** Adds object_store, async, catalogs
323148
- **Integration layer:** Adds datafusion-specific code
324149

325-
### Pattern: Re-export for Clean APIs
326-
327-
**Example** (`iceberg-rust/src/catalog/mod.rs`):
328-
```rust
329-
pub mod identifier {
330-
pub use iceberg_rust_spec::identifier::Identifier;
331-
}
332-
```
333-
334-
**Benefits:**
335-
- Users import from one crate: `use iceberg_rust::catalog::identifier::Identifier`
336-
- Spec types remain separate internally
337-
- Clear public API surface
338-
339150
### Information Hiding Checklist
340151

341152
- Is this type part of public API? → Re-export
342153
- Is this complexity internal? → `pub(crate)` or private mod
343154
- Can spec types be separate? → Move to iceberg-rust-spec
344155
- Does this add dependencies? → Check layer appropriateness
345156

346-
## Documentation Standards
347-
348-
### Pattern: Comprehensive Module Documentation
349-
350-
**Every `mod.rs` starts with** (`iceberg-rust/src/catalog/mod.rs:1-23`):
351-
```rust
352-
//! Catalog module providing interfaces for managing Iceberg tables and metadata.
353-
//!
354-
//! The catalog system manages:
355-
//! - Table metadata and schemas
356-
//! - Namespace organization
357-
//! - Storage locations
358-
//!
359-
//! # Key Components
360-
//! - [`Catalog`]: Core trait...
361-
//!
362-
//! # Common Operations
363-
//! - Creating and managing tables
364-
```
365-
366-
### Public API Documentation Structure
367-
368-
1. **One-line summary** (what it does)
369-
2. **Arguments section** (each parameter)
370-
3. **Returns section** (success case)
371-
4. **Errors section** (all failure modes)
372-
5. **Examples section** (when helpful)
373-
374-
### Guidelines
375-
376-
1. **Enforce Missing Docs:** Consider `#![deny(missing_docs)]` for public APIs
377-
2. **Document Complex Private Functions:** When logic is non-obvious
378-
3. **Module-Level Docs:** Every `mod.rs` has `//!` header
379-
4. **Examples in Docs:** For builder patterns and complex APIs
380-
5. **Errors Are Contract:** Always document failure modes
381-
382-
## Complexity Trade-offs
383-
384-
### Complexity Management
385-
386-
**Good Complexity (Hidden):**
387-
- Transaction system: 516 lines implementation, simple fluent interface
388-
- Manifest writing: Complex partitioning hidden from users
389-
390-
**Bad Complexity (Avoid):**
391-
- Exposing transaction operations publicly
392-
- Forcing users to understand manifest structure
393-
- Leaking object store details to catalog users
157+
## Complexity Management
394158

395159
### Before Adding Complexity
396160

@@ -465,7 +229,3 @@ When adding features, ask: **"Am I making the interface simpler or more complex?
465229

466230
The best additions hide complexity from users while maintaining clear, well-documented interfaces.
467231

468-
---
469-
470-
**Based on:** John Ousterhout's "A Philosophy of Software Design"
471-
**References:** [Philosophy of Software Design Review](https://blog.pragmaticengineer.com/a-philosophy-of-software-design-review/), [Summary by Carsten Behrens](https://carstenbehrens.com/a-philosophy-of-software-design-summary/)

0 commit comments

Comments
 (0)