Skip to content

Commit 34048c2

Browse files
authored
Merge pull request #6 from utopia-php/feat-clickhouse-skip-index-and-settings
feat(schema): ClickHouse index algorithms and engine SETTINGS
2 parents ec79613 + bf3a8f9 commit 34048c2

7 files changed

Lines changed: 555 additions & 11 deletions

File tree

README.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2086,6 +2086,50 @@ $schema->table('events')
20862086

20872087
TTL expressions are emitted verbatim; they must not be empty or contain semicolons. Dialects other than ClickHouse throw `UnsupportedException`.
20882088

2089+
**Skip-index algorithms** — every ClickHouse index is a data-skipping index that accelerates WHERE pruning by letting the engine skip whole granules. Pick the algorithm that matches the column shape via the `algorithm` argument on `Table::index()`:
2090+
2091+
```php
2092+
use Utopia\Query\Schema\ClickHouse\IndexAlgorithm;
2093+
2094+
$schema->table('events')
2095+
->bigInteger('id')->primary()
2096+
->string('user_id')
2097+
->string('country')
2098+
->string('text')
2099+
// BloomFilter — high-cardinality strings with `=` / `IN` predicates
2100+
->index(['user_id'], algorithm: IndexAlgorithm::BloomFilter)
2101+
// Set(N) — small fixed value sets, custom granularity
2102+
->index(['country'], algorithm: IndexAlgorithm::Set, algorithmArgs: [100], granularity: 4)
2103+
// NgramBloomFilter(n, size_bytes, hashes, seed) — text search on `LIKE` / `match`
2104+
->index(['text'], algorithm: IndexAlgorithm::NgramBloomFilter, algorithmArgs: [4, 1024, 3, 0])
2105+
// No algorithm specified → defaults to `TYPE minmax GRANULARITY 3`
2106+
->index(['id'])
2107+
->create();
2108+
2109+
// CREATE TABLE `events` (..., INDEX `idx_user_id` `user_id` TYPE bloom_filter GRANULARITY 1, ...)
2110+
```
2111+
2112+
The 6 algorithms are `MinMax`, `Set`, `BloomFilter`, `NgramBloomFilter`, `TokenBloomFilter`, `Inverted`. Algorithm-specific arguments are passed via `algorithmArgs` and rendered verbatim — supply them from trusted (developer-controlled) source. Other dialects ignore the ClickHouse-only `algorithm` / `algorithmArgs` / `granularity` arguments.
2113+
2114+
`MinMax` and `Inverted` take no parenthesised arguments in ClickHouse DDL — passing `algorithmArgs` for them throws `ValidationException`. Skip indexes can also be added via `ALTER TABLE … ADD INDEX` by calling `alter()` on the builder.
2115+
2116+
**Engine SETTINGS** — emit `SETTINGS k=v` after the TTL clause:
2117+
2118+
```php
2119+
$schema->table('events')
2120+
->bigInteger('id')->primary()
2121+
->settings([
2122+
'index_granularity' => 8192,
2123+
'allow_nullable_key' => true, // booleans become 1/0
2124+
])
2125+
->create();
2126+
2127+
// CREATE TABLE `events` (...) ENGINE = MergeTree() ORDER BY (`id`)
2128+
// SETTINGS index_granularity = 8192, allow_nullable_key = 1
2129+
```
2130+
2131+
Setting names must match `[A-Za-z_][A-Za-z0-9_]*`; string values are restricted to `[A-Za-z0-9_.\-+/]*`. Use ints / floats / booleans for everything else. Other dialects ignore the call.
2132+
20892133
### SQLite Schema
20902134

20912135
```php

src/Query/Schema/ClickHouse.php

Lines changed: 69 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,15 @@ public function compileAlter(Table $table): Statement
124124
$alterations[] = 'DROP INDEX ' . $this->quote($name);
125125
}
126126

127+
foreach ($table->indexes as $index) {
128+
if ($index->type !== IndexType::Index) {
129+
throw new UnsupportedException(
130+
'Only data-skipping indexes (index()) are supported in ClickHouse ALTER TABLE.'
131+
);
132+
}
133+
$alterations[] = 'ADD ' . $this->compileSkipIndex($index);
134+
}
135+
127136
if (! empty($table->foreignKeys)) {
128137
throw new UnsupportedException('Foreign keys are not supported in ClickHouse.');
129138
}
@@ -132,6 +141,12 @@ public function compileAlter(Table $table): Statement
132141
throw new UnsupportedException('Foreign keys are not supported in ClickHouse.');
133142
}
134143

144+
if (! empty($table->settings)) {
145+
throw new UnsupportedException(
146+
'Table SETTINGS can only be set on CREATE TABLE; emit `ALTER TABLE ... MODIFY SETTING` directly to change them.'
147+
);
148+
}
149+
135150
if (empty($alterations)) {
136151
throw new ValidationException('ALTER TABLE requires at least one alteration.');
137152
}
@@ -165,12 +180,13 @@ public function compileCreate(Table $table, bool $ifNotExists = false): Statemen
165180
$primaryKeys = \array_map(fn (string $c): string => $this->quote($c), $table->compositePrimaryKey);
166181
}
167182

168-
// Indexes (ClickHouse uses INDEX ... TYPE ... GRANULARITY ...)
169183
foreach ($table->indexes as $index) {
170-
$cols = \array_map(fn (string $c): string => $this->quote($c), $index->columns);
171-
$expr = \count($cols) === 1 ? $cols[0] : '(' . \implode(', ', $cols) . ')';
172-
$columnDefs[] = 'INDEX ' . $this->quote($index->name)
173-
. ' ' . $expr . ' TYPE minmax GRANULARITY 3';
184+
if ($index->type !== IndexType::Index) {
185+
throw new UnsupportedException(
186+
'Only data-skipping indexes (index()) are supported in ClickHouse CREATE TABLE.'
187+
);
188+
}
189+
$columnDefs[] = $this->compileSkipIndex($index);
174190
}
175191

176192
if (! empty($table->foreignKeys)) {
@@ -205,9 +221,57 @@ public function compileCreate(Table $table, bool $ifNotExists = false): Statemen
205221
$sql .= ' TTL ' . $table->ttl;
206222
}
207223

224+
if (! empty($table->settings)) {
225+
$kv = [];
226+
foreach ($table->settings as $k => $v) {
227+
$kv[] = $k . ' = ' . $v;
228+
}
229+
$sql .= ' SETTINGS ' . \implode(', ', $kv);
230+
}
231+
208232
return new Statement($sql, [], executor: $this->executor);
209233
}
210234

235+
/**
236+
* Render a full `INDEX <name> <columns> TYPE <algorithm>[(args)] GRANULARITY <n>`
237+
* fragment, used by both CREATE TABLE and ALTER TABLE ADD INDEX.
238+
*
239+
* Defaults to `TYPE minmax GRANULARITY 3` when no algorithm is set on the
240+
* index — matches the ClickHouse default behaviour for callers using the
241+
* generic `Table::index()` without picking an algorithm.
242+
*/
243+
private function compileSkipIndex(Index $index): string
244+
{
245+
$cols = \array_map(fn (string $c): string => $this->quote($c), $index->columns);
246+
$expr = \count($cols) === 1 ? $cols[0] : '(' . \implode(', ', $cols) . ')';
247+
248+
if ($index->algorithm === null) {
249+
return 'INDEX ' . $this->quote($index->name) . ' ' . $expr
250+
. ' TYPE minmax GRANULARITY ' . ($index->granularity ?? 3);
251+
}
252+
253+
$type = $index->algorithm->value;
254+
255+
if ($index->algorithmArgs !== []) {
256+
$args = \array_map(
257+
fn (string|int|float $arg): string => match (true) {
258+
\is_string($arg) => "'" . \str_replace("'", "''", $arg) . "'",
259+
// sprintf('%F', ...) avoids scientific notation (e.g. 1.0E-5)
260+
// which ClickHouse rejects in index type arguments. Trim
261+
// trailing zeros so 0.01 stays "0.010000" → "0.01".
262+
\is_float($arg) => \rtrim(\rtrim(\sprintf('%F', $arg), '0'), '.'),
263+
default => (string) $arg,
264+
},
265+
$index->algorithmArgs,
266+
);
267+
268+
$type .= '(' . \implode(', ', $args) . ')';
269+
}
270+
271+
return 'INDEX ' . $this->quote($index->name) . ' ' . $expr
272+
. ' TYPE ' . $type . ' GRANULARITY ' . ($index->granularity ?? 1);
273+
}
274+
211275
/**
212276
* Compile an engine declaration: `<Name>` or `<Name>(<args...>)`.
213277
*
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
<?php
2+
3+
namespace Utopia\Query\Schema\ClickHouse;
4+
5+
enum IndexAlgorithm: string
6+
{
7+
case MinMax = 'minmax';
8+
case Set = 'set';
9+
case BloomFilter = 'bloom_filter';
10+
case NgramBloomFilter = 'ngrambf_v1';
11+
case TokenBloomFilter = 'tokenbf_v1';
12+
case Inverted = 'inverted';
13+
}

src/Query/Schema/Column.php

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
use Utopia\Query\Builder\Statement;
66
use Utopia\Query\Exception\ValidationException;
77
use Utopia\Query\Schema\ClickHouse\Engine;
8+
use Utopia\Query\Schema\ClickHouse\IndexAlgorithm;
89

910
class Column
1011
{
@@ -392,6 +393,7 @@ public function dropColumn(string $name): Table
392393
* @param array<string, int> $lengths
393394
* @param array<string, string> $orders
394395
* @param array<string, string> $collations
396+
* @param list<string|int|float> $algorithmArgs ClickHouse skip-index algorithm args
395397
*/
396398
public function index(
397399
array $columns,
@@ -401,8 +403,22 @@ public function index(
401403
array $lengths = [],
402404
array $orders = [],
403405
array $collations = [],
406+
?IndexAlgorithm $algorithm = null,
407+
array $algorithmArgs = [],
408+
?int $granularity = null,
404409
): Table {
405-
return $this->table->index($columns, $name, $method, $operatorClass, $lengths, $orders, $collations);
410+
return $this->table->index(
411+
$columns,
412+
$name,
413+
$method,
414+
$operatorClass,
415+
$lengths,
416+
$orders,
417+
$collations,
418+
$algorithm,
419+
$algorithmArgs,
420+
$granularity,
421+
);
406422
}
407423

408424
/**
@@ -508,6 +524,14 @@ public function engine(Engine $engine, string ...$args): Table
508524
return $this->table->engine($engine, ...$args);
509525
}
510526

527+
/**
528+
* @param array<string, string|int|float|bool> $settings
529+
*/
530+
public function settings(array $settings): Table
531+
{
532+
return $this->table->settings($settings);
533+
}
534+
511535
/**
512536
* @param list<string> $columns
513537
*/

src/Query/Schema/Index.php

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
namespace Utopia\Query\Schema;
44

55
use Utopia\Query\Exception\ValidationException;
6+
use Utopia\Query\Schema\ClickHouse\IndexAlgorithm;
67

78
readonly class Index
89
{
@@ -12,6 +13,10 @@
1213
* @param array<string, string> $orders
1314
* @param array<string, string> $collations Column-specific collations (column name => collation)
1415
* @param list<string> $rawColumns Raw SQL expressions appended to the column list (bypass quoting)
16+
* @param list<string|int|float> $algorithmArgs ClickHouse skip-index algorithm args
17+
* (e.g. [3] for set(3),
18+
* [0.01] for bloom_filter(0.01),
19+
* [4, 1024, 3, 0] for ngrambf_v1(n, size_bytes, hashes, seed))
1520
*/
1621
public function __construct(
1722
public string $name,
@@ -23,7 +28,19 @@ public function __construct(
2328
public string $operatorClass = '',
2429
public array $collations = [],
2530
public array $rawColumns = [],
31+
public ?IndexAlgorithm $algorithm = null,
32+
public array $algorithmArgs = [],
33+
public ?int $granularity = null,
2634
) {
35+
// Only ClickHouse data-skipping indexes require an unquoted identifier
36+
// for the name; other dialects emit the name backtick-quoted, so
37+
// hyphens, dots, and other characters are valid there.
38+
if ($algorithm !== null && ! \preg_match('/^[A-Za-z_][A-Za-z0-9_]*$/', $name)) {
39+
throw new ValidationException('Invalid index name: ' . $name);
40+
}
41+
if ($columns === [] && $rawColumns === []) {
42+
throw new ValidationException('Index requires at least one column.');
43+
}
2744
if ($method !== '' && ! \preg_match('/^[A-Za-z0-9_]+$/', $method)) {
2845
throw new ValidationException('Invalid index method: ' . $method);
2946
}
@@ -35,5 +52,26 @@ public function __construct(
3552
throw new ValidationException('Invalid collation: ' . $collation);
3653
}
3754
}
55+
if ($granularity !== null && $granularity < 1) {
56+
throw new ValidationException('Index granularity must be >= 1.');
57+
}
58+
if ($algorithm !== null && $algorithmArgs !== [] && ! self::algorithmAcceptsArgs($algorithm)) {
59+
throw new ValidationException(
60+
$algorithm->value . ' does not accept algorithm arguments.'
61+
);
62+
}
63+
}
64+
65+
/**
66+
* MinMax and Inverted are emitted without parentheses in ClickHouse DDL;
67+
* passing args to them would produce invalid SQL.
68+
*/
69+
private static function algorithmAcceptsArgs(IndexAlgorithm $algorithm): bool
70+
{
71+
return match ($algorithm) {
72+
IndexAlgorithm::MinMax,
73+
IndexAlgorithm::Inverted => false,
74+
default => true,
75+
};
3876
}
3977
}

0 commit comments

Comments
 (0)