11# iceberg-rust: Design Principles & Patterns
22
3- This document provides design principles and coding patterns for the iceberg-rust project, optimized for AI coding assistants and human contributors. It documents existing patterns while suggesting improvements for future development .
3+ Actionable guidelines for the iceberg-rust project, optimized for AI coding assistants.
44
55## Project Architecture
66
@@ -15,6 +15,18 @@ iceberg-rust-spec (pure specification types)
1515
1616** Core Philosophy:** Deep modules with simple interfaces (John Ousterhout's "A Philosophy of Software Design")
1717
18+ ## Deep vs Shallow Modules
19+
20+ ** Deep Modules** = Powerful functionality + Simple interface
21+ - ** Best modules** hide significant complexity behind clean APIs
22+ - ** Goal:** Minimize interface size relative to implementation size (1:10+ ratio ideal)
23+ - ** Example:** Catalog trait has ~ 20 methods hiding 6 implementations with 5000+ lines (1:12 ratio)
24+
25+ ** Shallow Modules to Avoid:**
26+ - Many small methods that just wrap other calls
27+ - Interfaces that expose internal complexity
28+ - Documentation longer than implementation
29+
1830## LSP-Based Codebase Navigation
1931
2032** IMPORTANT:** When an LSP (Language Server Protocol) MCP server is available (such as ` rust-analyzer ` ), ** ALWAYS prefer LSP tools over text-based search** for code navigation and analysis.
@@ -31,15 +43,8 @@ Use LSP tools for:
3143- ** Diagnostics:** ` get_diagnostics ` for compiler errors and warnings
3244- ** Completions:** ` get_completions ` for valid code suggestions
3345
34- ### Integration with Development
35-
36- Before modifying code:
37- 1 . Use ` get_symbol_definitions ` to find the target
38- 2 . Use ` get_hover ` to understand types and contracts
39- 3 . Use ` get_symbol_references ` to assess impact
40- 4 . Make changes with full context
46+ ### Decision Tree
4147
42- ** Example Decision Tree:**
4348```
4449Need to understand code structure? → get_symbols
4550Need to find where something is defined? → get_symbol_definitions
@@ -49,74 +54,8 @@ Need to find trait impls? → get_implementations
4954Searching for text/patterns? → Grep/text search
5055```
5156
52- ## Deep vs Shallow Modules
53-
54- ** Deep Modules** = Powerful functionality + Simple interface
55- - ** Best modules** hide significant complexity behind clean APIs
56- - ** Goal:** Minimize interface size relative to implementation size (1:10+ ratio ideal)
57-
58- ** Example - Catalog Trait** (` iceberg-rust/src/catalog/mod.rs:46-64 ` ):
59- ``` rust
60- #[async_trait:: async_trait]
61- pub trait Catalog : Send + Sync + Debug {
62- async fn load_tabular (self : Arc <Self >, identifier : & Identifier )
63- -> Result <Tabular , Error >;
64- // ~20 methods hiding 6 implementations with 5000+ lines total
65- // Interface/Implementation ratio: 1:12+ ✓
66- }
67- ```
68-
69- ** What this hides:** Connection pooling, metadata loading, object store interaction, format detection (table/view/materialized_view), caching, retry logic, error translation
70-
71- ** Shallow Modules to Avoid:**
72- - Many small methods that just wrap other calls
73- - Interfaces that expose internal complexity
74- - Documentation longer than implementation
75-
7657## Functional Programming Patterns
7758
78- ### Prefer Iterators Over Loops
79-
80- ** Pattern 1: Iterator Chains** (` iceberg-rust/src/table/transaction/operation.rs:188-196 ` ):
81- ``` rust
82- let new_datafile_iter = data_files . into_iter (). map (| data_file | {
83- ManifestEntry :: builder ()
84- . with_format_version (table_metadata . format_version)
85- . with_status (Status :: Added )
86- . with_data_file (data_file )
87- . with_sequence_number (table_metadata . last_sequence_number + dsn_offset )
88- . build ()
89- . map_err (Error :: from )
90- });
91- ```
92-
93- ** Benefits:**
94- - Lazy evaluation (only creates when consumed)
95- - Clear transformation pipeline
96- - Error handling inline with ` map_err `
97- - No intermediate allocations until ` collect() `
98-
99- ** Pattern 2: flat_map for Flattening** (` iceberg-rust/src/table/transaction/operation.rs:149-153 ` ):
100- ``` rust
101- let all_files : Vec <DataFile > = sequence_groups
102- . iter ()
103- . flat_map (| d | d . delete_files. iter (). chain (d . data_files. iter ()))
104- . cloned ()
105- . collect ();
106- ```
107-
108- ** Pattern 3: Option/Result Combinators** (` iceberg-rust/src/catalog/create.rs:131-132 ` ):
109- ``` rust
110- // Prefer this:
111- self . location. ok_or (Error :: NotFound (format! (" Location for table {}" , self . name)))?
112-
113- // Over this:
114- let location = match self . location {
115- Some (loc ) => loc ,
116- None => return Err (Error :: NotFound (... )),
117- };
118- ```
119-
12059### Guidelines
12160
122611 . ** Use Iterator Methods:** ` map ` , ` filter ` , ` flat_map ` , ` fold ` over ` for ` loops
@@ -154,20 +93,9 @@ Would From/Into/standard trait work? → YES → Use standard trait
154934 . ** Async Traits:** Use ` #[async_trait] ` for I/O operations
155945 . ** Arc Receivers:** Use ` Arc<Self> ` for async trait methods needing shared ownership
15695
157- ** Example - Standard Traits in Action** (` iceberg-rust/src/catalog/create.rs:116 ` ):
158- ``` rust
159- impl TryInto <TableMetadata > for CreateTable {
160- type Error = Error ;
161- fn try_into (self ) -> Result <TableMetadata , Self :: Error > {
162- // Validation and conversion logic (30+ lines)
163- // Separates validation from construction
164- }
165- }
166- ```
167-
168- ### Documentation Standards for Traits
96+ ### Documentation Standards
16997
170- Every public trait method must document ( ` iceberg-rust/src/catalog/mod.rs:65-98 ` ) :
98+ Every public trait method must document:
171991 . ** Summary:** One-line behavior description
1721002 . ** Arguments:** Each parameter explained
1731013 . ** Returns:** Success case
@@ -185,50 +113,7 @@ Otherwise
185113 → Regular struct with new()
186114```
187115
188- ### Pattern: derive_builder + async build()
189-
190- ** Example** (` iceberg-rust/src/catalog/create.rs:54-83 ` ):
191- ``` rust
192- #[derive(Clone , Debug , PartialEq , Serialize , Deserialize , Builder )]
193- #[builder(build_fn(name = " create" , error = " Error" ), setter(prefix = " with" ))]
194- pub struct CreateTable {
195- #[builder(setter(into))]
196- pub name : String , // Required, no default
197-
198- #[serde(skip_serializing_if = " Option::is_none" )]
199- #[builder(setter(into, strip_option), default)]
200- pub location : Option <String >, // Optional, explicit
201-
202- pub schema : Schema , // Required
203-
204- #[builder(setter(strip_option), default)]
205- pub partition_spec : Option <PartitionSpec >, // Optional
206- }
207- ```
208-
209- ### Builder Extension for Integration
210-
211- ** Pattern:** Custom async ` build() ` for catalog integration (` iceberg-rust/src/catalog/create.rs:85-114 ` ):
212- ``` rust
213- impl CreateTableBuilder {
214- pub async fn build (
215- & mut self ,
216- namespace : & [String ],
217- catalog : Arc <dyn Catalog >,
218- ) -> Result <Table , Error > {
219- let identifier = Identifier :: new (namespace , & self . name);
220- catalog . clone (). create_table (identifier , self . create ()? ). await
221- }
222- }
223- ```
224-
225- ** Why this works:**
226- - Builder handles field collection
227- - Extension handles integration (identifier creation, catalog call)
228- - Validation separated in ` TryInto<TableMetadata> `
229- - Users see: ` CreateTable::builder().with_name("x").with_schema(s).build(&ns, cat).await? `
230-
231- ### Builder Best Practices
116+ ### Best Practices
232117
2331181 . ** Use derive_builder:** Don't hand-roll builders
2341192 . ** Ergonomics:** Use ` setter(into) ` for ` String ` and common types
@@ -239,32 +124,6 @@ impl CreateTableBuilder {
239124
240125## Error Handling
241126
242- ### Pattern: Centralized Error Enum with thiserror
243-
244- ** Example** (` iceberg-rust/src/error.rs:8-95 ` ):
245- ``` rust
246- #[derive(Error , Debug )]
247- pub enum Error {
248- // Domain errors with context
249- #[error(" Column {0} not in schema {1}." )]
250- Schema (String , String ), // What failed + why
251-
252- #[error(" Feature {0} is not supported." )]
253- NotSupported (String ),
254-
255- // Wrapped external errors
256- #[error(transparent)]
257- Arrow (#[from] arrow :: error :: ArrowError ),
258-
259- #[error(transparent)]
260- ObjectStore (#[from] object_store :: Error ),
261-
262- // Boxed large errors
263- #[error(transparent)]
264- Avro (Box <apache_avro :: Error >),
265- }
266- ```
267-
268127### Guidelines
269128
2701291 . ** Use thiserror:** Always derive ` Error ` trait, never hand-roll
@@ -274,123 +133,28 @@ pub enum Error {
2741335 . ** Bidirectional When Needed:** Implement ` From<Error> ` for external types (e.g., ` ArrowError ` )
2751346 . ** Group Related Failures:** One variant with parameter (e.g., ` NotFound(String) ` ) vs many variants
276135
277- ### Error Flow Best Practice
278-
279- ```
280- Operation → Result<T, Error> with context
281- ↓
282- Domain layer adds context
283- ↓
284- Infrastructure errors wrapped transparently
285- ```
286-
287- ## Async Patterns
288-
289- ### Pattern: Instrumentation
290-
291- ** All performance-critical paths** (` iceberg-rust/src/table/transaction/mod.rs ` ):
292- ``` rust
293- #[instrument(
294- name = " iceberg_rust::table::transaction::commit" ,
295- level = " debug" ,
296- skip(self),
297- fields(table_identifier = % self. table. identifier)
298- )]
299- pub async fn commit (self ) -> Result <(), Error > { ... }
300- ```
301-
302- ### Guidelines
303-
304- 1 . ** async_trait Required:** For async methods in traits
305- 2 . ** Send + Sync Bounds:** All async types crossing await points
306- 3 . ** Arc for Shared State:** Prefer ` Arc ` over lifetimes in async
307- 4 . ** Instrument Hot Paths:** Use ` #[instrument] ` on catalog/I/O operations
308- 6 . ** Tokio Runtime:** All integration tests use ` #[tokio::main] `
309-
310136## Module Organization
311137
312- ### Principle: Minimize Dependencies Through Layering
138+ ### Layering
313139
314140** Dependency Graph:**
315141```
316142datafusion_iceberg → iceberg-rust → iceberg-rust-spec
317143 (DataFusion) (async, I/O) (pure data, serde)
318144```
319145
320- ** Why This Works:**
321146- ** Spec layer:** No external dependencies except serde/uuid
322147- ** Implementation layer:** Adds object_store, async, catalogs
323148- ** Integration layer:** Adds datafusion-specific code
324149
325- ### Pattern: Re-export for Clean APIs
326-
327- ** Example** (` iceberg-rust/src/catalog/mod.rs ` ):
328- ``` rust
329- pub mod identifier {
330- pub use iceberg_rust_spec :: identifier :: Identifier ;
331- }
332- ```
333-
334- ** Benefits:**
335- - Users import from one crate: ` use iceberg_rust::catalog::identifier::Identifier `
336- - Spec types remain separate internally
337- - Clear public API surface
338-
339150### Information Hiding Checklist
340151
341152- Is this type part of public API? → Re-export
342153- Is this complexity internal? → ` pub(crate) ` or private mod
343154- Can spec types be separate? → Move to iceberg-rust-spec
344155- Does this add dependencies? → Check layer appropriateness
345156
346- ## Documentation Standards
347-
348- ### Pattern: Comprehensive Module Documentation
349-
350- ** Every ` mod.rs ` starts with** (` iceberg-rust/src/catalog/mod.rs:1-23 ` ):
351- ``` rust
352- // ! Catalog module providing interfaces for managing Iceberg tables and metadata.
353- // !
354- // ! The catalog system manages:
355- // ! - Table metadata and schemas
356- // ! - Namespace organization
357- // ! - Storage locations
358- // !
359- // ! # Key Components
360- // ! - [`Catalog`]: Core trait...
361- // !
362- // ! # Common Operations
363- // ! - Creating and managing tables
364- ```
365-
366- ### Public API Documentation Structure
367-
368- 1 . ** One-line summary** (what it does)
369- 2 . ** Arguments section** (each parameter)
370- 3 . ** Returns section** (success case)
371- 4 . ** Errors section** (all failure modes)
372- 5 . ** Examples section** (when helpful)
373-
374- ### Guidelines
375-
376- 1 . ** Enforce Missing Docs:** Consider ` #![deny(missing_docs)] ` for public APIs
377- 2 . ** Document Complex Private Functions:** When logic is non-obvious
378- 3 . ** Module-Level Docs:** Every ` mod.rs ` has ` //! ` header
379- 4 . ** Examples in Docs:** For builder patterns and complex APIs
380- 5 . ** Errors Are Contract:** Always document failure modes
381-
382- ## Complexity Trade-offs
383-
384- ### Complexity Management
385-
386- ** Good Complexity (Hidden):**
387- - Transaction system: 516 lines implementation, simple fluent interface
388- - Manifest writing: Complex partitioning hidden from users
389-
390- ** Bad Complexity (Avoid):**
391- - Exposing transaction operations publicly
392- - Forcing users to understand manifest structure
393- - Leaking object store details to catalog users
157+ ## Complexity Management
394158
395159### Before Adding Complexity
396160
@@ -465,7 +229,3 @@ When adding features, ask: **"Am I making the interface simpler or more complex?
465229
466230The best additions hide complexity from users while maintaining clear, well-documented interfaces.
467231
468- ---
469-
470- ** Based on:** John Ousterhout's "A Philosophy of Software Design"
471- ** References:** [ Philosophy of Software Design Review] ( https://blog.pragmaticengineer.com/a-philosophy-of-software-design-review/ ) , [ Summary by Carsten Behrens] ( https://carstenbehrens.com/a-philosophy-of-software-design-summary/ )
0 commit comments