Skip to content

Latest commit

 

History

History
145 lines (107 loc) · 6.23 KB

File metadata and controls

145 lines (107 loc) · 6.23 KB

Architecture & Design

This document describes the internal architecture of CH.Toolkit, its package layering, data flow, and core design patterns.

Package Dependency Diagram

                    CH.Toolkit.Types (no deps)
                         |
                    CH.Toolkit.Sql (-> Types)
                         |
                    CH.Toolkit.Schema (-> Types, Sql)
                    /         |           \
    CH.Toolkit.Query   CH.Toolkit.Modeling   CH.Toolkit.Introspection
       (-> Sql, Types)     (-> Schema, Query, Types)   (-> Schema, Types, Sql + ClickHouse.Driver)
                    \         |           /
               CH.Toolkit.Migrations
               (-> Schema, Sql, Introspection, Types + Microsoft.CodeAnalysis.CSharp)
                    /                   \
    CH.Toolkit.Cli          CH.Toolkit.DependencyInjection
    (-> Introspection,           (-> Migrations + MS.Extensions.DI)
     Migrations +
     System.CommandLine)

Dependencies flow upward -- lower packages never reference higher ones.

Data Flow

Schema to SQL

The pipeline for turning a C# model into DDL follows this path:

C# class (e.g., Event)
    |  SchemaBuilder.Table<T>() + fluent config
    v
TableSchema / DatabaseSchema          (Schema package -- immutable records)
    |  DdlCompiler.CompileCreateTable()
    v
SqlNode AST (CreateTableNode, etc.)   (Sql package -- immutable records)
    |  ClickHouseSqlRenderer.Render()
    v
SQL string                            (e.g., CREATE TABLE IF NOT EXISTS ...)

DdlCompiler converts schema model objects into SqlNode AST nodes. It lives in the Schema package (not Sql) to avoid a circular dependency -- Schema depends on Sql for the AST types, and moving the compiler into Sql would require Sql to depend on Schema.

ClickHouseSqlRenderer is a SqlVisitor<string> that walks the AST and produces ClickHouse-dialect SQL. It handles identifier quoting, ON CLUSTER clauses, engine configuration, TTL, codecs, indexes, projections, and all query constructs.

Migration Flow

The migration pipeline chains schema diffing, code generation, and DDL execution:

SchemaBuilder (desired schema)     SchemaIntrospector (current DB)
         \                                /
          v                              v
      DatabaseSchema (desired)    DatabaseSchema (current)
               \                     /
                SchemaDiffer.Diff()
                       |
                       v
              SchemaOperation[]  (AddColumnOp, CreateTableOp, etc.)
                       |
                       v  MigrationCodeGenerator.GenerateMigration()
              C# migration files (MigrationBase subclasses)
                       |
                       v  MigrationRunner.MigrateAsync()
              DdlCompiler -> SqlNode AST -> ClickHouseSqlRenderer -> DDL execution

The runner also manages locking (distributed TTL-based lock table), checksum validation, history tracking, and safety policy enforcement.

Core Design Patterns

Immutable Records

All domain types are sealed records: ClickHouseType, SqlNode subtypes, ColumnSchema, TableSchema, SchemaOperation, and more. This provides value equality, immutability, and with expression support.

Records that contain IReadOnlyList fields override Equals and GetHashCode using SequenceEqual for correct value comparison. These overrides use bool Equals(T? other) (not virtual bool Equals) on sealed records to avoid the CS8851 warning.

Visitor Pattern

SqlVisitor<TResult> is the abstract base for walking the SqlNode AST:

public abstract class SqlVisitor<T>
{
    public T Visit(SqlNode node) => node switch { ... };

    protected abstract T VisitCreateTable(CreateTableNode node);
    protected abstract T VisitAlterTable(AlterTableNode node);
    protected abstract T VisitDropTable(DropTableNode node);
    protected abstract T VisitSelectQuery(SelectQueryNode node);
    protected abstract T VisitSetOperationQuery(SetOperationQueryNode node);
    protected abstract T VisitInsertSelect(InsertSelectNode node);
    // ... one method per node type
}

ClickHouseSqlRenderer is the primary visitor implementation. New SQL features are added by extending SqlNode with a new record type, adding a Visit* method to SqlVisitor<T>, and implementing rendering in ClickHouseSqlRenderer.

Fluent Builders

The modeling API uses a builder pattern with back-references for uninterrupted fluent chains:

var schema = new SchemaBuilder()
    .Table<Event>()
        .MergeTree()
        .OrderBy(e => e.Timestamp, e => e.UserId)
        .Column(e => e.EventType).LowCardinalityString().Table  // .Table returns to TableBuilder
        .Ttl("timestamp + INTERVAL 90 DAY", "DELETE")
    .Build("analytics");

TableBuilder<T> holds a reference back to SchemaBuilder, so .Build(database) delegates to the root builder. ColumnBuilder<T> holds a reference to TableBuilder<T>, so .Table chains back.

Column overrides are keyed by PascalCase property name (the raw C# member name), not the snake_case database column name.

Expression-Based Column Selection

TableBuilder<T> uses Expression<Func<T, object>> for compile-time-safe column references:

.OrderBy(e => e.Timestamp)           // extracts property name "Timestamp"
.Column(e => e.EventType)            // creates override keyed by "EventType"
.PartitionByMonth(e => e.CreatedAt)  // generates toYYYYMM(created_at)

Property names are converted to snake_case for database column names via ColumnNamingConvention.SnakeCase.

Automatic Type Mapping

ClrTypeMapper converts C# types to ClickHouse types automatically when building table schemas from generic types. The mapping is extensible via TypeMappingOptions (see Type System).

Why DdlCompiler Lives in Schema

DdlCompiler converts Schema records into Sql AST nodes. Its natural placement would be in the Sql package, but that would create a circular dependency:

  • Sql would need to reference Schema (for TableSchema, ColumnSchema, etc.)
  • Schema already references Sql (for SqlNode, ColumnDefNode, etc.)

By keeping DdlCompiler in the Schema package, the dependency direction remains clean: Schema -> Sql, never the reverse.