Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1520,6 +1520,47 @@ $result->namedBindings;

Unregistered columns fall through to value-based inference: `int → Int64`, `float → Float64`, `bool → UInt8`, `null → Nullable(String)`, `DateTimeInterface → DateTime64(3)`, everything else → `String`. Register types via `withParamType($column, $type)` or `withParamTypes($map)` whenever the inference rule doesn't match the column's ClickHouse declaration. The positional `$bindings` array is still exposed on the resulting `Statement` for callers that prefer it.

**Bulk insert** — emit the canonical `INSERT INTO <table> FORMAT <name>` envelope together with the serialized row payload in a single typed call. The returned `FormattedInsertStatement` exposes `->query` (the envelope) and `->body` (the format-specific payload) so the caller can ship both to ClickHouse's HTTP interface without hand-assembling either side:

```php
use Utopia\Query\Builder\ClickHouse as Builder;
use Utopia\Query\Builder\ClickHouse\Format;

$statement = (new Builder())
->into('events')
->bulkInsert(Format::JSONEachRow, [
['id' => 1, 'event' => 'login', 'time' => '2024-01-01 00:00:00'],
['id' => 2, 'event' => 'logout', 'time' => '2024-01-01 00:00:05'],
]);

// $statement->query
// INSERT INTO `events` (`id`, `event`, `time`) FORMAT JSONEachRow
//
// $statement->body
// {"id":1,"event":"login","time":"2024-01-01 00:00:00"}
// {"id":2,"event":"logout","time":"2024-01-01 00:00:05"}
```

Ship the result over the HTTP interface by passing `$statement->query` as the `?query=` parameter and `$statement->body` as the POST body. Columns are derived from the first row's keys; pass an explicit third argument to pin the order or fill missing keys with `null`:

```php
$statement = (new Builder())
->into('events')
->bulkInsert(Format::JSONEachRow, $rows, ['id', 'event', 'time']);
```

The `Format` enum currently supports `Format::JSONEachRow` and `Format::TabSeparated`. JSONEachRow rows are encoded with `JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE` (slashes and non-ASCII are preserved verbatim); TabSeparated escapes `\\`, `\t`, `\n`, `\r` and emits `\N` for `null`. An empty row iterable produces an empty body, which ClickHouse accepts as a zero-row ingest. The iterable is consumed eagerly — pass a generator if you want to defer row construction, but the serialized body is materialized in full before the statement is returned.

For envelopes only (no body — e.g. when streaming the payload from elsewhere), the lower-level `insertFormat()` setter remains available and pairs with `insert()` as before:

```php
$statement = (new Builder())
->into('events')
->insertFormat('JSONEachRow', ['id', 'event', 'time'])
->insert();
// $statement->body is null; assemble the payload separately.
```

**UPDATE** — compiles to `ALTER TABLE ... UPDATE` with mandatory WHERE:

```php
Expand Down
67 changes: 67 additions & 0 deletions src/Query/Builder/ClickHouse.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
namespace Utopia\Query\Builder;

use Utopia\Query\Builder as BaseBuilder;
use Utopia\Query\Builder\ClickHouse\Format;
use Utopia\Query\Builder\ClickHouse\FormattedInsertStatement;
use Utopia\Query\Builder\Feature\BitwiseAggregates;
use Utopia\Query\Builder\Feature\ClickHouse\ApproximateAggregates;
Expand Down Expand Up @@ -163,6 +164,72 @@ public function insertFormat(string $format, array $columns = []): static
return $this;
}

/**
* Build a single statement that carries both the `INSERT INTO <table>
* FORMAT <name>` envelope and the serialized row payload for a
* ClickHouse bulk ingest. Returns a `FormattedInsertStatement` whose
* `->query` is the envelope and whose `->body` is the formatted
* payload to send as the HTTP request body.
*
* The target table must be set via `into()` first. Columns are derived
* from the keys of the first row when `$columns` is omitted; pass
* `$columns` explicitly to pin the order when row shapes vary or when
* an empty iterable is passed. An empty iterable produces an empty
* body — ClickHouse accepts this as a zero-row ingest.
*
* @param iterable<array<string, mixed>> $rows
* @param list<string> $columns Optional explicit column ordering.
*/
public function bulkInsert(Format $format, iterable $rows, array $columns = []): FormattedInsertStatement
{
$this->bindings = [];
$this->validateTable();

$materialized = [];
foreach ($rows as $row) {
/** @phpstan-ignore function.alreadyNarrowedType */
if (!\is_array($row)) {
throw new ValidationException('bulkInsert() rows must be associative arrays.');
}
$materialized[] = $row;
}

if (empty($columns) && !empty($materialized)) {
$columns = \array_keys($materialized[0]);
}

foreach ($columns as $col) {
if ($col === '') {
throw new ValidationException('Column names for bulkInsert() must be non-empty strings.');
}
}

$wrappedColumns = empty($columns)
? ''
: ' (' . \implode(', ', \array_map(
fn (string $col): string => $this->resolveAndWrap($col),
$columns
)) . ')';
Comment thread
greptile-apps[bot] marked this conversation as resolved.

$sql = 'INSERT INTO ' . $this->quote($this->table)
. $wrappedColumns
. ' FORMAT ' . $format->value;

$body = $format->serialize($materialized, empty($columns) ? null : $columns);

$this->insertFormat = $format->value;
$this->insertFormatColumns = $columns;

return new FormattedInsertStatement(
$sql,
[],
$columns,
$format->value,
$body,
executor: $this->executor,
);
}

/**
* @param array<string, string> $settings
*/
Expand Down
117 changes: 117 additions & 0 deletions src/Query/Builder/ClickHouse/Format.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
<?php

namespace Utopia\Query\Builder\ClickHouse;

use Utopia\Query\Exception\ValidationException;

/**
* ClickHouse bulk-ingest format identifiers.
*
* The values map 1:1 to the names ClickHouse accepts after the `FORMAT`
* keyword in an `INSERT INTO <table> FORMAT <name>` envelope. Each case
* knows how to serialize a row iterable into the request body that
* ClickHouse expects for that format.
*/
enum Format: string
{
case JSONEachRow = 'JSONEachRow';
case TabSeparated = 'TabSeparated';

/**
* Serialize an iterable of associative rows into the body payload for
* this format. Columns are derived from the first row; subsequent rows
* use the same column ordering. An empty iterable yields an empty
* string — ClickHouse accepts an empty body as a zero-row insert.
*
* @param iterable<array<string, mixed>> $rows
* @param list<string>|null $columns Optional explicit column ordering. When null, derived from the first row.
*/
public function serialize(iterable $rows, ?array $columns = null): string
{
return match ($this) {
self::JSONEachRow => $this->serializeJsonEachRow($rows, $columns),
self::TabSeparated => $this->serializeTabSeparated($rows, $columns),
};
}

/**
* @param iterable<array<string, mixed>> $rows
* @param list<string>|null $columns
*/
private function serializeJsonEachRow(iterable $rows, ?array $columns): string
{
$lines = [];
foreach ($rows as $row) {
if ($columns !== null) {
$ordered = [];
foreach ($columns as $col) {
$ordered[$col] = $row[$col] ?? null;
}
$row = $ordered;
}

$lines[] = \json_encode(
(object) $row,
JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE,
);
}

return \implode("\n", $lines);
}

/**
* @param iterable<array<string, mixed>> $rows
* @param list<string>|null $columns
*/
private function serializeTabSeparated(iterable $rows, ?array $columns): string
{
$lines = [];
foreach ($rows as $row) {
$values = [];

if ($columns === null) {
foreach ($row as $value) {
$values[] = $this->escapeTabSeparatedValue($value);
}
} else {
foreach ($columns as $col) {
$values[] = $this->escapeTabSeparatedValue($row[$col] ?? null);
}
}

$lines[] = \implode("\t", $values);
}

return \implode("\n", $lines);
}

private function escapeTabSeparatedValue(mixed $value): string
{
if ($value === null) {
return '\\N';
}

if (\is_bool($value)) {
return $value ? '1' : '0';
}

if (\is_int($value) || \is_float($value)) {
return (string) $value;
}

if (! \is_string($value)) {
if (\is_object($value) && \method_exists($value, '__toString')) {
$value = (string) $value;
} else {
throw new ValidationException('TabSeparated values must be scalar, null, or stringable. Received: ' . \get_debug_type($value));
}
}

return \strtr($value, [
'\\' => '\\\\',
"\t" => '\\t',
"\n" => '\\n',
"\r" => '\\r',
]);
}
}
3 changes: 3 additions & 0 deletions src/Query/Builder/ClickHouse/FormattedInsertStatement.php
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
* @param list<mixed> $bindings
* @param list<string> $columns
* @param string $format
* @param ?string $body Serialized payload to ship as the HTTP request body alongside `$query`. Null when only the envelope query was produced (the caller assembles the body separately).
* @param bool $readOnly
* @param (Closure(Statement): (array<mixed>|int))|null $executor
*/
Expand All @@ -20,6 +21,7 @@ public function __construct(
array $bindings,
public array $columns,
public string $format,
public ?string $body = null,
bool $readOnly = false,
?Closure $executor = null,
) {
Expand All @@ -34,6 +36,7 @@ public function withExecutor(Closure $executor): self
$this->bindings,
$this->columns,
$this->format,
$this->body,
$this->readOnly,
$executor,
);
Expand Down
24 changes: 24 additions & 0 deletions src/Query/QuotesIdentifiers.php
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,28 @@ protected function quote(string $identifier): string

return \implode('.', $wrapped);
}

/**
* Quote a single identifier without treating dots as qualifier separators.
*
* Use when the identifier is known to be atomic — e.g. a column name in a
* CREATE TABLE definition where the dot is a literal part of the name
* rather than a `schema.table.column` separator. The canonical case is
* ClickHouse's nested-array convention (`meta.key Array(String)`) where
* `meta.key` is a single top-level column whose name contains a dot.
*/
protected function quoteLiteral(string $identifier): string
{
if ($identifier === '*') {
return '*';
}

if (\preg_match('/[\x00-\x1f\x7f]/', $identifier) === 1) {
throw new ValidationException('Identifier contains control character');
}

return $this->wrapChar
. \str_replace($this->wrapChar, $this->wrapChar . $this->wrapChar, $identifier)
. $this->wrapChar;
}
}
28 changes: 15 additions & 13 deletions src/Query/Schema.php
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ public function setExecutor(Closure $executor): static

abstract protected function quote(string $identifier): string;

abstract protected function quoteLiteral(string $identifier): string;

abstract protected function compileColumnType(Column $column): string;

abstract protected function compileAutoIncrement(): string;
Expand All @@ -51,7 +53,7 @@ public function compileCreate(Table $table, bool $ifNotExists = false): Statemen
$columnDefs[] = $def;

if ($column->isPrimary) {
$primaryKeys[] = $this->quote($column->name);
$primaryKeys[] = $this->quoteLiteral($column->name);
}
if ($column->isUnique) {
$uniqueColumns[] = $column->name;
Expand All @@ -72,13 +74,13 @@ public function compileCreate(Table $table, bool $ifNotExists = false): Statemen
$columnDefs[] = 'PRIMARY KEY (' . \implode(', ', $primaryKeys) . ')';
} elseif (! empty($table->compositePrimaryKey)) {
$columnDefs[] = 'PRIMARY KEY ('
. \implode(', ', \array_map(fn (string $c): string => $this->quote($c), $table->compositePrimaryKey))
. \implode(', ', \array_map(fn (string $c): string => $this->quoteLiteral($c), $table->compositePrimaryKey))
. ')';
}

// Inline UNIQUE constraints for columns marked unique
foreach ($uniqueColumns as $col) {
$columnDefs[] = 'UNIQUE (' . $this->quote($col) . ')';
$columnDefs[] = 'UNIQUE (' . $this->quoteLiteral($col) . ')';
}

// Table-level CHECK constraints
Expand All @@ -105,9 +107,9 @@ public function compileCreate(Table $table, bool $ifNotExists = false): Statemen

// Foreign keys
foreach ($table->foreignKeys as $fk) {
$def = 'FOREIGN KEY (' . $this->quote($fk->column) . ')'
$def = 'FOREIGN KEY (' . $this->quoteLiteral($fk->column) . ')'
. ' REFERENCES ' . $this->quote($fk->refTable)
. ' (' . $this->quote($fk->refColumn) . ')';
. ' (' . $this->quoteLiteral($fk->refColumn) . ')';
if ($fk->onDelete !== null) {
$def .= ' ON DELETE ' . $fk->onDelete->toSql();
}
Expand Down Expand Up @@ -138,18 +140,18 @@ public function compileAlter(Table $table): Statement
$keyword = $column->isModify ? 'MODIFY COLUMN' : 'ADD COLUMN';
$def = $keyword . ' ' . $this->compileColumnDefinition($column);
if ($column->after !== null) {
$def .= ' AFTER ' . $this->quote($column->after);
$def .= ' AFTER ' . $this->quoteLiteral($column->after);
}
$alterations[] = $def;
}

foreach ($table->renameColumns as $rename) {
$alterations[] = 'RENAME COLUMN ' . $this->quote($rename->from)
. ' TO ' . $this->quote($rename->to);
$alterations[] = 'RENAME COLUMN ' . $this->quoteLiteral($rename->from)
. ' TO ' . $this->quoteLiteral($rename->to);
}

foreach ($table->dropColumns as $col) {
$alterations[] = 'DROP COLUMN ' . $this->quote($col);
$alterations[] = 'DROP COLUMN ' . $this->quoteLiteral($col);
}

foreach ($table->indexes as $index) {
Expand All @@ -168,9 +170,9 @@ public function compileAlter(Table $table): Statement
}

foreach ($table->foreignKeys as $fk) {
$def = 'ADD FOREIGN KEY (' . $this->quote($fk->column) . ')'
$def = 'ADD FOREIGN KEY (' . $this->quoteLiteral($fk->column) . ')'
. ' REFERENCES ' . $this->quote($fk->refTable)
. ' (' . $this->quote($fk->refColumn) . ')';
. ' (' . $this->quoteLiteral($fk->refColumn) . ')';
if ($fk->onDelete !== null) {
$def .= ' ON DELETE ' . $fk->onDelete->toSql();
}
Expand Down Expand Up @@ -267,7 +269,7 @@ public function dropIndex(string $table, string $name): Statement
protected function compileColumnDefinition(Column $column): string
{
$parts = [
$this->quote($column->name),
$this->quoteLiteral($column->name),
$this->compileColumnType($column),
];

Expand Down Expand Up @@ -369,7 +371,7 @@ protected function compileIndexColumns(Schema\Index $index): string
$parts = [];

foreach ($index->columns as $col) {
$part = $this->quote($col);
$part = $this->quoteLiteral($col);

if (isset($index->collations[$col])) {
$collation = $index->collations[$col];
Expand Down
Loading
Loading