Skip to content

Research on using Iceberg instead of ClickHouse/SQLite  #932

@amritghimire

Description

@amritghimire

Current Implementation

SQLite (Local/CLI Usage)

The project uses SQLite for local operations through two main classes:

  1. SQLiteDatabaseEngine

https://github.com/iterative/datachain/blob/ed973c82f4ab4674a5a24816eaf689498117aef6/src/datachain/data_storage/sqlite.py#L98

  1. SQLiteWarehouse

https://github.com/iterative/datachain/blob/ed973c82f4ab4674a5a24816eaf689498117aef6/src/datachain/data_storage/sqlite.py#L406C7-L406C22

Clickhouse implementation

Currently used in the SaaS version

Why Consider Iceberg?

Current Pain Points

  1. Dual Implementation Overhead:
    • Maintaining separate SQLite and ClickHouse implementations
    • Different transaction and concurrency models
    • Separate optimization strategies
  2. Performance

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions