Skip to content

Latest commit

 

History

History
476 lines (371 loc) · 13.4 KB

File metadata and controls

476 lines (371 loc) · 13.4 KB

Query Builder Guide

CH.Toolkit provides two query APIs: a LINQ provider (IClickHouseQueryable<T>) for type-safe queries, and a fluent builder (FluentQueryBuilder) for string-based composition. Both render to the same SQL AST and support all ClickHouse-specific query features.

LINQ vs Fluent: Quick Comparison

// LINQ -- type-safe, expression-based
var sql = Query.From<Event>()
    .Where(e => e.UserId == 5)
    .OrderByDescending(e => e.CreatedAt)
    .Take(100)
    .ToSql();

// Fluent -- string-based, more flexible
var sql = FluentQueryBuilder
    .From("analytics", "event")
    .Where("user_id = 5")
    .OrderByDesc("created_at")
    .Limit(100)
    .ToSql();

// Both produce:
// SELECT * FROM event WHERE (user_id = 5) ORDER BY created_at DESC LIMIT 100

LINQ Query Builder

Entry Points

Query.From<T>()                                  // Infers table name (snake_case of type name)
Query.From<T>("custom_table")                    // Explicit table name
Query.From<T>("mydb", "custom_table")            // Database + table
Query.FromTableFunction<T>("remote", "'host:9000'", "'db'", "'table'")

Standard LINQ Operators

// Where (multiple conditions supported)
.Where(e => e.UserId == 5 && e.EventType == "click")

// Select with projection
.Select(e => new { e.UserId, e.EventName })

// Ordering
.OrderBy(e => e.UserId)
.ThenByDescending(e => e.CreatedAt)

// Paging
.Skip(50).Take(100)

// Distinct
.Distinct()

// GroupBy with aggregates
.GroupBy(e => e.UserId)
.Select(g => new { UserId = g.Key, Count = g.Count() })

// Join
.Join(Query.From<User>(), e => e.UserId, u => u.Id, (e, u) => new { e.EventId, u.Name })

String Method Translation

.Where(e => e.Name.Contains("login"))     // LIKE '%login%'
.Where(e => e.Name.StartsWith("ev"))      // LIKE 'ev%'
.Where(e => e.Name.EndsWith("ed"))        // LIKE '%ed'

DateTime Property Translation

.Where(e => e.CreatedAt.Year == 2024)     // toYear(created_at) = 2024

Collection Contains (IN)

var ids = new[] { 1, 2, 3 };
.Where(e => ids.Contains(e.UserId))       // user_id IN (1, 2, 3)

ClickHouse-Specific LINQ Extensions

PREWHERE

Pre-filters data before reading non-key columns. More efficient than WHERE for MergeTree tables when the filter is highly selective.

var sql = Query.From<Event>()
    .Prewhere(e => e.EventType == "click")
    .Where(e => e.UserId > 0)
    .ToSql();
// SELECT * FROM event PREWHERE (event_type = 'click') WHERE (user_id > 0)

FINAL

Forces deduplication for ReplacingMergeTree or collapsing for CollapsingMergeTree.

var sql = Query.From<Event>()
    .Final()
    .ToSql();
// SELECT * FROM event FINAL

SAMPLE

Random sampling from MergeTree tables (requires SAMPLE BY in table definition).

.Sample(0.1)              // 10% sample
.Sample(0.1, 0.5)         // 10% sample with offset

ARRAY JOIN

Unnests array columns into rows.

var sql = Query.From<Event>()
    .ArrayJoin(e => e.Tags)
    .Select(e => new { e.UserId, Tag = e.Tags })
    .ToSql();
// SELECT user_id, tags FROM event ARRAY JOIN tags

// LEFT ARRAY JOIN preserves rows with empty arrays
.LeftArrayJoin(e => e.Tags)

LIMIT BY

Limits rows per group (different from LIMIT).

var sql = Query.From<Event>()
    .OrderBy(e => e.CreatedAt)
    .LimitBy(5, e => e.UserId)
    .ToSql();
// SELECT * FROM event ORDER BY created_at ASC LIMIT 5 BY user_id

GROUP BY Modifiers

.WithTotals()    // GROUP BY ... WITH TOTALS
.WithRollup()    // GROUP BY ... WITH ROLLUP
.WithCube()      // GROUP BY ... WITH CUBE

DISTINCT ON

.DistinctOn(e => e.UserId)

Per-Query SETTINGS

.Setting("max_threads", "4")
.Setting("max_memory_usage", "1000000")
// SETTINGS max_threads = 4, max_memory_usage = 1000000

Ch.* Function Stubs

All Ch.* methods are compile-time only -- they exist solely for LINQ expression tree capture and throw at runtime if called directly. They map to ClickHouse SQL functions.

Aggregate Functions

Ch.Uniq(e.UserId)                    // uniq(user_id)
Ch.UniqExact(e.UserId)               // uniqExact(user_id)
Ch.ArgMin(e.Name, e.CreatedAt)       // argMin(name, created_at)
Ch.ArgMax(e.Name, e.CreatedAt)       // argMax(name, created_at)
Ch.Any(e.Name)                       // any(name)
Ch.AnyLast(e.Name)                   // anyLast(name)
Ch.GroupArray(e.UserId)              // groupArray(user_id)
Ch.GroupUniqArray(e.UserId)          // groupUniqArray(user_id)
Ch.Quantile(0.95, e.Duration)       // quantile(0.95)(duration)
Ch.CountIf(e.EventType == "click")  // countIf(event_type = 'click')
Ch.SumIf(e.Value, e.Active)         // sumIf(value, active)

Aggregate Combinators

-If variants:

Ch.AvgIf(e.Value, e.Active)         // avgIf(value, active)
Ch.MinIf(e.Value, e.Active)         // minIf(value, active)
Ch.MaxIf(e.Value, e.Active)         // maxIf(value, active)
Ch.UniqIf(e.UserId, e.Active)       // uniqIf(user_id, active)

-OrDefault variants:

Ch.SumOrDefault(e.Value)             // sumOrDefault(value)
Ch.AvgOrDefault(e.Value)             // avgOrDefault(value)

-State / -Merge (for materialized views):

Ch.UniqState(e.UserId)               // uniqState(user_id)
Ch.SumState(e.Value)                 // sumState(value)
Ch.CountState()                      // countState()
Ch.UniqMerge(e.State)                // uniqMerge(state)
Ch.SumMerge(e.State)                 // sumMerge(state)
Ch.CountMerge(e.State)               // countMerge(state)

Date/Time Functions

Ch.ToDate(e.Timestamp)               // toDate(timestamp)
Ch.ToDateTime(e.Timestamp)           // toDateTime(timestamp)
Ch.ToYYYYMM(e.Timestamp)             // toYYYYMM(timestamp)
Ch.ToStartOfDay(e.Timestamp)         // toStartOfDay(timestamp)
Ch.ToStartOfMonth(e.Timestamp)       // toStartOfMonth(timestamp)
Ch.ToStartOfWeek(e.Timestamp)        // toStartOfWeek(timestamp)
Ch.ToStartOfHour(e.Timestamp)        // toStartOfHour(timestamp)
Ch.ToStartOfMinute(e.Timestamp)      // toStartOfMinute(timestamp)
Ch.ToStartOfInterval(e.Timestamp, "1 HOUR")  // toStartOfInterval(timestamp, INTERVAL 1 HOUR)
Ch.DateAdd("day", 7, e.Timestamp)    // date_add(day, 7, timestamp)
Ch.DateSub("month", 1, e.Timestamp)  // date_sub(month, 1, timestamp)
Ch.Now()                             // now()
Ch.Today()                           // today()
Ch.Yesterday()                       // yesterday()

Type Conversion

Ch.Cast<int>(e.Value)                // CAST(value AS Int32)
Ch.ToString(e.Value)                 // toString(value)
Ch.ToInt32(e.Value)                  // toInt32(value)
Ch.ToInt64(e.Value)                  // toInt64(value)
Ch.ToFloat64(e.Value)                // toFloat64(value)

String & Array Functions

Ch.Has(e.Tags, "click")             // has(tags, 'click')
Ch.Position(e.Name, "admin")        // position(name, 'admin')
Ch.Match(e.Name, "^user_\\d+$")     // match(name, '^user_\\d+$')
Ch.Concat(e.First, " ", e.Last)     // concat(first, ' ', last)

Hash Functions

Ch.CityHash64(e.UserId, e.Name)     // cityHash64(user_id, name)
Ch.SipHash64(e.UserId)              // sipHash64(user_id)

Conditional Functions

Ch.If(e.Value > 0, "positive", "non-positive")
    // if(value > 0, 'positive', 'non-positive')
Ch.MultiIf(e.Value > 100, "high", e.Value > 50, "medium", "low")
    // multiIf(value > 100, 'high', value > 50, 'medium', 'low')

Dictionary Functions

Ch.DictGet<string>("users", "name", e.UserId)
    // dictGet('users', 'name', user_id)
Ch.DictGetOrDefault<string>("users", "name", e.UserId, "unknown")
    // dictGetOrDefault('users', 'name', user_id, 'unknown')
Ch.DictHas("users", e.UserId)
    // dictHas('users', user_id)

Parameterized Queries

.Where(e => e.UserId == Ch.Param<int>("userId"))
// WHERE (user_id = {userId:Int32})

Escape Hatches

Ch.Aggregate<long>("topK", 10, e.EventType)  // topK(10)(event_type)
Ch.Raw<int>("toHour(now())")                  // toHour(now())
Ch.RawCondition("user_id GLOBAL IN subquery") // raw SQL in WHERE

Window Functions

LINQ Window Functions

// Row number with partition
var sql = Query.From<Event>()
    .Select(e => new
    {
        e.UserId,
        RowNum = Ch.RowNumber().OverPartitionByOrderByDesc(e.UserId, e.CreatedAt)
    })
    .ToSql();
// SELECT user_id, row_number() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS row_num

Available window functions:

Ch.RowNumber()           Ch.Rank()              Ch.DenseRank()
Ch.Ntile(4)              Ch.Lag(e.Value)        Ch.Lag(e.Value, 2)
Ch.Lead(e.Value)         Ch.Lead(e.Value, 2)    Ch.FirstValue(e.Value)
Ch.LastValue(e.Value)    Ch.NthValue(e.Value, 3)

Window specification extensions:

.Over()                                          // OVER ()
.OverPartitionBy(e.UserId)                       // OVER (PARTITION BY user_id)
.OverOrderBy(e.CreatedAt)                        // OVER (ORDER BY created_at)
.OverOrderByDesc(e.CreatedAt)                    // OVER (ORDER BY created_at DESC)
.OverPartitionByOrderBy(e.UserId, e.CreatedAt)   // OVER (PARTITION BY user_id ORDER BY created_at)
.OverPartitionByOrderByDesc(e.UserId, e.CreatedAt)

Fluent Window Functions

For more control (frame specifications):

var sql = FluentQueryBuilder.From("events")
    .Select("user_id")
    .SelectWindow("sum", [new ColumnRefNode("amount")],
        w => w
            .PartitionBy("user_id")
            .OrderBy("created_at")
            .RowsBetween(FrameBounds.UnboundedPreceding(), FrameBounds.CurrentRow()),
        "running_total")
    .ToSql();
// SELECT user_id,
//   sum(amount) OVER (PARTITION BY user_id ORDER BY created_at ASC
//     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
// FROM events

Frame bound options:

FrameBounds.UnboundedPreceding()     FrameBounds.Preceding(5)
FrameBounds.CurrentRow()             FrameBounds.Following(5)
FrameBounds.UnboundedFollowing()

Frame types: RowsBetween(start, end) or RangeBetween(start, end).

Fluent Query Builder

Entry Points

FluentQueryBuilder.From("table")
FluentQueryBuilder.From("database", "table")
FluentQueryBuilder.FromSubquery(otherBuilder, "alias")
FluentQueryBuilder.FromTableFunction("remote", "'host'", "'db'", "'table'")
FluentQueryBuilder.With("cte_name", cteBuilder)

JOINs

.Join("users", "events.user_id = users.id")          // INNER JOIN
.LeftJoin("users", "events.user_id = users.id")      // LEFT JOIN
.RightJoin("users", "events.user_id = users.id")     // RIGHT JOIN
.FullJoin("users", "events.user_id = users.id")      // FULL JOIN
.CrossJoin("users")                                   // CROSS JOIN
.GlobalLeftJoin("users", "events.user_id = users.id") // GLOBAL LEFT JOIN
.AnyLeftJoin("users", "events.user_id = users.id")   // ANY LEFT JOIN
.AnyRightJoin("users", "events.user_id = users.id")  // ANY RIGHT JOIN
.AsofLeftJoin("users", "events.ts = users.ts")        // ASOF LEFT JOIN

GROUP BY Modifiers

.GroupByWithRollup("year", "month")    // GROUP BY year, month WITH ROLLUP
.GroupByWithCube("region", "product")  // GROUP BY region, product WITH CUBE
.WithTotals()                          // ... WITH TOTALS

ORDER BY WITH FILL

.OrderByWithFill("dt", "'2024-01-01'", "'2024-12-31'", "1 DAY")
// ORDER BY dt ASC WITH FILL FROM '2024-01-01' TO '2024-12-31' STEP 1 DAY

CTEs (Common Table Expressions)

var cte = FluentQueryBuilder.From("events")
    .Select("user_id", "count() as cnt")
    .GroupBy("user_id");

var sql = FluentQueryBuilder.With("user_counts", cte)
    .SetFrom("user_counts")
    .Select("*")
    .Where("cnt > 10")
    .ToSql();
// WITH user_counts AS (SELECT user_id, count() as cnt FROM events GROUP BY user_id)
// SELECT * FROM user_counts WHERE cnt > 10

Set Operations

var q1 = FluentQueryBuilder.From("table_a").Select("id", "name");
var q2 = FluentQueryBuilder.From("table_b").Select("id", "name");

q1.UnionAll(q2)              // UNION ALL
q1.UnionDistinct(q2)         // UNION DISTINCT
q1.IntersectAll(q2)          // INTERSECT ALL (since CH 21.11)
q1.IntersectDistinct(q2)     // INTERSECT DISTINCT
q1.ExceptAll(q2)             // EXCEPT ALL
q1.ExceptDistinct(q2)        // EXCEPT DISTINCT

// Chaining
q1.UnionAll(q2).UnionAll(q3)

INSERT ... SELECT

var sql = FluentQueryBuilder.From("source")
    .Select("id", "name")
    .InsertInto("target", "id", "name")
    .ToSql();
// INSERT INTO target (id, name) SELECT id, name FROM source

FluentQueryBuilder.From("source")
    .Select("id")
    .InsertIntoDb("other_db", "target", "id")
    .ToSql();
// INSERT INTO other_db.target (id) SELECT id FROM source

Per-Query SETTINGS

.Setting("max_threads", "4")
.Setting("max_memory_usage", "1000000")
// SETTINGS max_threads = 4, max_memory_usage = 1000000

Column Naming Conventions

By default, PascalCase C# property names are converted to snake_case column names:

// "UserId" -> "user_id", "EventType" -> "event_type"
ColumnNamingConvention.SnakeCase   // Default
ColumnNamingConvention.Exact       // Use property names as-is

Override with attributes:

[Table("custom_table_name")]
[Table("custom_table", Database = "mydb")]
public class MyClass
{
    [Column("custom_column_name")]
    public string MyProperty { get; set; }
}