|
| 1 | +# KQL — Kibana Query Language support |
| 2 | + |
| 3 | +> ⚠️ **Disambiguation.** "KQL" is overloaded in the industry — it refers |
| 4 | +> to two unrelated query languages: |
| 5 | +> |
| 6 | +> - **Kibana Query Language** (this module): a single-expression |
| 7 | +> predicate grammar — `level:error and status:>=500` — used by the |
| 8 | +> Kibana UI for log search. Public grammar reference: |
| 9 | +> <https://www.elastic.co/docs/explore-analyze/query-filter/languages/kql>. |
| 10 | +> - **Kusto Query Language** (Microsoft): a pipeline language — |
| 11 | +> `Table | where x > 5 | summarize count() by foo | top 10 by ts` — |
| 12 | +> used by Azure Data Explorer, Log Analytics, Sentinel, Defender. |
| 13 | +> **Not implemented here.** If you want Kusto support, propose it |
| 14 | +> under a different name (e.g. `kusto`, `kustoql`) to avoid collision |
| 15 | +> with this module. |
| 16 | +> |
| 17 | +> Throughout this codebase, `KQL` / `kql` / `?kql=` always means the |
| 18 | +> Kibana variant. |
| 19 | +
|
| 20 | +End-user query language for Quickwit, drawn from the public Kibana KQL |
| 21 | +grammar referenced above. |
| 22 | + |
| 23 | +This module owns the parser, the AST, the lowering pass to Quickwit's |
| 24 | +internal `QueryAst`, and the Prometheus metrics emitted from the parse |
| 25 | +path. |
| 26 | + |
| 27 | +## Wire surface |
| 28 | + |
| 29 | +Two ways to send KQL to a running Quickwit cluster. |
| 30 | + |
| 31 | +### 1. Native REST parameter |
| 32 | + |
| 33 | +```bash |
| 34 | +curl 'http://<host>:7280/api/v1/<index>/search?kql=level:error+and+status:>=500' |
| 35 | +``` |
| 36 | + |
| 37 | +The `kql` query parameter is mutually exclusive with the existing `query` |
| 38 | +parameter (Tantivy/Lucene-ish grammar). Exactly one must be supplied; both |
| 39 | +or neither returns HTTP 400. |
| 40 | + |
| 41 | +POST variant: |
| 42 | + |
| 43 | +```bash |
| 44 | +curl -X POST 'http://<host>:7280/api/v1/<index>/search' \ |
| 45 | + -H 'Content-Type: application/json' \ |
| 46 | + -d '{"kql": "level:error and service:api", "max_hits": 20}' |
| 47 | +``` |
| 48 | + |
| 49 | +**Note.** KQL is intentionally **not** exposed via the |
| 50 | +`/api/v1/_elastic/<index>/_search` endpoint. That namespace mirrors the |
| 51 | +Elasticsearch query DSL, which has no `kql` variant — a real |
| 52 | +Elasticsearch cluster rejects `{"query": {"kql": ...}}` with |
| 53 | +`parsing_exception`. Keeping the `_elastic/` surface honest means KQL |
| 54 | +lives only on the two native paths above. |
| 55 | + |
| 56 | +## Supported grammar |
| 57 | + |
| 58 | +Every form documented in the Kibana KQL reference, modulo the divergences |
| 59 | +called out below. The conformance corpus |
| 60 | +[`kibana_conformance.rs`](kibana_conformance.rs) pins the expected AST for |
| 61 | +each idiom and fails CI on drift. |
| 62 | + |
| 63 | +| Form | Example | |
| 64 | +|---|---| |
| 65 | +| Field-value match | `level:error` | |
| 66 | +| Phrase match | `message:"connection refused"` | |
| 67 | +| Bare term against default fields | `refused` | |
| 68 | +| Bare phrase against default fields | `"connection refused"` | |
| 69 | +| Wildcard value | `service:work*` | |
| 70 | +| Field-exists check | `level:*` | |
| 71 | +| Match-all | `*` (lowers to `QueryAst::MatchAll`, no automaton work) | |
| 72 | +| Boolean AND (explicit) | `level:error and service:api` | |
| 73 | +| Boolean AND (juxtaposition) | `level:error service:api` | |
| 74 | +| Boolean OR | `level:error or level:warn` | |
| 75 | +| Boolean NOT | `not level:error` | |
| 76 | +| Parens | `(level:error or level:warn) and service:api` | |
| 77 | +| Value group OR | `level:(error or warn)` | |
| 78 | +| Value group AND | `tags:(prod and critical)` | |
| 79 | +| Range `>=` / `>` / `<=` / `<` | `status:>=500`, `latency_ms:<0.1` | |
| 80 | +| Compound range | `status:>=200 and status:<500` | |
| 81 | +| Quoted ISO timestamp in range | `@timestamp:<"2025-01-01T00:00:00Z"` | |
| 82 | +| Escaped colon in field name | `metric\:count:value` | |
| 83 | +| Escaped keyword as field name | `\and:value` | |
| 84 | + |
| 85 | +Precedence: `not` binds tightest, then `and`, then `or` (loosest). Parens |
| 86 | +override. |
| 87 | + |
| 88 | +## Intentional divergences from Kibana |
| 89 | + |
| 90 | +| Kibana behavior | Quickwit behavior | Reason | |
| 91 | +|---|---|---| |
| 92 | +| Unquoted ISO timestamps in range values (`@timestamp:>=2025-01-01T00:00:00Z`) | Requires quotes (`@timestamp:>="2025-01-01T00:00:00Z"`) | Our lexer tokenizes on `:`. Documented in error messages. | |
| 93 | +| Nested-field object syntax (`nested:{ name:foo }`) | Rejected with a clear error pointing to flat dotted paths | Quickwit has no nested-field type. | |
| 94 | +| `field:(other:value)` — nested field qualifier in value group | Rejected | Silent rebinding would be a wrong-data footgun. Kibana also errors. | |
| 95 | + |
| 96 | +## Safety rails |
| 97 | + |
| 98 | +All limits are hard caps — exceeding them returns HTTP 400 with a specific |
| 99 | +error message. |
| 100 | + |
| 101 | +| Limit | Value | Where | |
| 102 | +|---|---|---| |
| 103 | +| Max KQL string length (REST) | 16,384 bytes | [`rest_handler.rs:MAX_KQL_INPUT_LEN`](../../../quickwit-serve/src/search_api/rest_handler.rs) | |
| 104 | +| Max parser nesting depth | 64 | [`parser.rs:MAX_KQL_DEPTH`](parser.rs) | |
| 105 | +| Max bare-token length | 1,024 bytes | [`lexer.rs:MAX_BARE_TOKEN_LEN`](lexer.rs) | |
| 106 | +| Max quoted-phrase length | 4,096 bytes | [`lexer.rs:MAX_PHRASE_LEN`](lexer.rs) | |
| 107 | + |
| 108 | +Together these close the obvious DoS angles: oversized inputs, pathological |
| 109 | +nesting, single-token memory bombs. |
| 110 | + |
| 111 | +## Observability |
| 112 | + |
| 113 | +Prometheus metrics emitted from the parse path under the `quickwit_kql_*` |
| 114 | +namespace: |
| 115 | + |
| 116 | +| Metric | Type | Meaning | |
| 117 | +|---|---|---| |
| 118 | +| `quickwit_kql_parse_total` | counter | Every parse attempt that reaches `KqlQuery::parse_user_query` | |
| 119 | +| `quickwit_kql_parse_failures_total` | counter | Subset that returned an error | |
| 120 | +| `quickwit_kql_parse_duration_seconds` | histogram | Wall-clock from parse-start to AST-or-error | |
| 121 | + |
| 122 | +Structured tracing fields on every search log line: `kql=true/false`, |
| 123 | +`tantivy_grammar=true/false` — lets you split KQL vs. Lucene traffic in |
| 124 | +Splunk/Elastic without parsing raw query strings. |
| 125 | + |
| 126 | +## Architecture |
| 127 | + |
| 128 | +KQL is translated eagerly at the REST entry point. There is **no new |
| 129 | +variant on `QueryAst`** — the output is built from existing variants |
| 130 | +(`BoolQuery`, `FullTextQuery`, `RangeQuery`, `FieldPresenceQuery`, |
| 131 | +`WildcardQuery`, `MatchAll`, `UserInputQuery`). Bare default-field values |
| 132 | +are wrapped in `UserInputQuery` so the existing search root resolves |
| 133 | +them against each index's `default_search_fields` — same deferred- |
| 134 | +resolution mechanism the Tantivy-grammar `?query=` path already uses. |
| 135 | + |
| 136 | +``` |
| 137 | + ┌──────────────────────────────────────────────────────────┐ |
| 138 | + │ REST handler │ |
| 139 | + │ quickwit-serve/src/search_api/rest_handler.rs │ |
| 140 | + │ • SearchRequestQueryString { query, kql, ... } │ |
| 141 | + │ • build_query_ast(kql_text) → kql_to_query_ast(...) │ |
| 142 | + └─────────────────────────┬────────────────────────────────┘ |
| 143 | + │ |
| 144 | + ▼ |
| 145 | + ┌──────────────────────────────────────────────────────────┐ |
| 146 | + │ kql/ ◀──── you are here │ |
| 147 | + │ lexer.rs → Token stream, size caps │ |
| 148 | + │ parser.rs → KqlAst, depth cap │ |
| 149 | + │ lower.rs → KqlAst → existing QueryAst variants │ |
| 150 | + │ (Bool / FullText / Range / FieldPresence │ |
| 151 | + │ / Wildcard / MatchAll / UserInputQuery) │ |
| 152 | + │ metrics.rs → counters + histogram │ |
| 153 | + └─────────────────────────┬────────────────────────────────┘ |
| 154 | + │ QueryAst (no new variant) |
| 155 | + ▼ |
| 156 | + ┌──────────────────────────────────────────────────────────┐ |
| 157 | + │ Existing search pipeline — UNCHANGED │ |
| 158 | + │ quickwit-search/src/root.rs │ |
| 159 | + │ • UserInputQuery vessels resolve via the existing │ |
| 160 | + │ deferred-default-field path │ |
| 161 | + │ • All other variants flow through as-is │ |
| 162 | + └──────────────────────────────────────────────────────────┘ |
| 163 | +``` |
| 164 | + |
| 165 | +## Testing layers |
| 166 | + |
| 167 | +| Layer | Where | What it proves | |
| 168 | +|---|---|---| |
| 169 | +| Unit | this crate's `#[cfg(test)]` blocks | Per-function correctness for lexer / parser / lowering / metrics wire-up | |
| 170 | +| Conformance | [`kibana_conformance.rs`](kibana_conformance.rs) | Documented Kibana grammar idioms produce the expected `KqlAst` | |
| 171 | +| Proptest fuzz | inside `parser.rs::tests::proptest_*` | Parser never panics for arbitrary ASCII or Unicode input (≈6k cases per run) | |
| 172 | +| Integration | [`../../../../rest-api-tests/scenarii/kql_search/`](../../../../rest-api-tests/scenarii/kql_search/) | End-to-end through the HTTP stack against a real index — exact `num_hits` per query | |
| 173 | +| Load | [`../../../../rest-api-tests/scenarii/kql_search/load_test.py`](../../../../rest-api-tests/scenarii/kql_search/load_test.py) | Throughput + p50/p95/p99 + safety-rail behavior under concurrency | |
| 174 | +| Multi-node | [`../../../../rest-api-tests/scenarii/kql_search/docker-compose.cluster.yml`](../../../../rest-api-tests/scenarii/kql_search/docker-compose.cluster.yml) | Distributed root→leaf, PostgreSQL metastore, LocalStack S3 | |
| 175 | + |
| 176 | +Run the integration scenarios: |
| 177 | + |
| 178 | +```bash |
| 179 | +cd quickwit/rest-api-tests |
| 180 | +python3 run_tests.py --engine quickwit \ |
| 181 | + --binary <path>/target/debug/quickwit \ |
| 182 | + --test scenarii/kql_search |
| 183 | +``` |
| 184 | + |
| 185 | +## Isolation audit — what this feature touches in the rest of the codebase |
| 186 | + |
| 187 | +KQL is implemented as a **thin translation layer at the REST entry |
| 188 | +point**, not as a new query AST node. The `QueryAst` enum, the visitor |
| 189 | +traits, tag pruning, and root-search are all unchanged. |
| 190 | + |
| 191 | +### New files (pure isolation) |
| 192 | + |
| 193 | +| Path | Purpose | |
| 194 | +|---|---| |
| 195 | +| `quickwit-query/src/kql/` (this directory) | Lexer, parser, AST, lowering, metrics, conformance corpus | |
| 196 | +| `rest-api-tests/scenarii/kql_search/` | YAML scenarios, load test, multi-node compose | |
| 197 | + |
| 198 | +### Existing files modified |
| 199 | + |
| 200 | +| File | Change | Risk to non-KQL traffic | |
| 201 | +|---|---|---| |
| 202 | +| `quickwit-query/src/lib.rs` | `mod kql;` + `pub use kql::kql_to_query_ast` | None — adds a module and one public function | |
| 203 | +| `quickwit-query/Cargo.toml` | Added `quickwit-metrics` dep | None — already a workspace member | |
| 204 | +| `quickwit-serve/src/search_api/rest_handler.rs` | Added `kql` field to `SearchRequestQueryString`; new `build_query_ast` helper that calls `kql_to_query_ast`; structured log fields | **One wire-contract change**: `query` was required, now `#[serde(default)]`. Requests with `{}` previously failed at JSON deserialization; now fail at validation with HTTP 400 "either `query` or `kql` must be supplied". OpenAPI schema correctly reports both as optional/nullable. | |
| 205 | +| `quickwit-cli/src/tool.rs` | Added `kql: None` to one struct literal that didn't use `..Default::default()` | None | |
| 206 | + |
| 207 | +### Files I did NOT touch |
| 208 | + |
| 209 | +- `quickwit-query/src/query_ast/mod.rs` — **the core `QueryAst` enum is unchanged.** No new variant, no new match arms. |
| 210 | +- `quickwit-query/src/query_ast/visitor.rs` — **`QueryAstVisitor` and `QueryAstTransformer` traits are unchanged.** External visitors keep working without recompilation. |
| 211 | +- `quickwit-query/src/elastic_query_dsl/mod.rs` — **the ES query DSL enum is unchanged.** KQL is deliberately not exposed under `_elastic/` because real Elasticsearch has no `kql` variant. |
| 212 | +- `quickwit-doc-mapper/src/tag_pruning.rs` — unchanged. |
| 213 | +- `quickwit-search/src/root.rs` — unchanged. |
| 214 | +- All ES DSL variants (`term`, `match`, `range`, `bool`, ...) — unchanged. |
| 215 | +- Indexing pipeline, metastore, storage, control plane, cluster, actors — unchanged. |
| 216 | +- The Tantivy-grammar `UserInputQuery` lowering path — unchanged (KQL reuses it as a deferred-resolution vessel; no code change to that path). |
| 217 | +- On-disk data formats — zero impact. |
| 218 | + |
| 219 | +### What "no effect on main code" actually means |
| 220 | + |
| 221 | +- **End users of the existing `query=` parameter or other ES DSL variants**: no behavior change. |
| 222 | +- **Operators / SREs**: a handful of new metrics under `quickwit_kql_*`, no removed metrics, no changes to existing dashboards. |
| 223 | +- **Data at rest**: zero impact. KQL translates to the same `QueryAst` types Quickwit already executes. |
| 224 | +- **Rust callers of `QueryAst`, `QueryAstVisitor`, `QueryAstTransformer`**: zero source-level breakage. These types are exactly as they were before this feature. |
| 225 | +- **Rust callers of `SearchRequestQueryString`**: one new field (`kql: Option<String>`); callers using `..Default::default()` keep working; the one in-workspace site that listed every field (`quickwit-cli/src/tool.rs`) was updated. |
| 226 | + |
| 227 | +This is the wrapper architecture — KQL is added without growing the core |
| 228 | +query system's surface. The full-integration variant (a new |
| 229 | +`QueryAst::Kql` variant with deferred parsing at root) is also viable and |
| 230 | +would have been more ergonomic for cross-index queries with differing |
| 231 | +defaults, but it required ~377 lines across 9 files including visitor |
| 232 | +trait extensions. The wrapper variant trades a tiny bit of fidelity (the |
| 233 | +multi-index error message stays as the existing generic Tantivy-grammar |
| 234 | +one) for a substantially smaller, more reviewable change. |
| 235 | + |
| 236 | +## Performance reference |
| 237 | + |
| 238 | +Numbers from a 30-second load test against a debug build, single node, on |
| 239 | +a MacBook (the floor — release-mode + a real load generator will go |
| 240 | +substantially higher): |
| 241 | + |
| 242 | +- Sustained throughput: **~1,560 req/s** across 14 happy-path shapes + 5 |
| 243 | + adversarial shapes |
| 244 | +- p99 latency under load: **< 30 ms** for every happy-path shape |
| 245 | +- p99 latency for adversarial rejects: **< 18 ms** (rejection happens |
| 246 | + before any search work) |
| 247 | +- Parse-stage cost: **93% of parses < 100 µs**, 100% < 1 ms |
| 248 | +- Errors during 47k-request sweep: **0 unexpected statuses** |
| 249 | + |
| 250 | +## Known limitations (not yet implemented) |
| 251 | + |
| 252 | +- Real Kibana frontend has not been pointed at this server. Grammar |
| 253 | + matches the public Kibana docs; the conformance corpus pins each |
| 254 | + idiom. A standing Kibana → Quickwit smoke test is the next layer. |
| 255 | +- The KQL parser is hand-rolled; the Tantivy-grammar path uses |
| 256 | + `tantivy::query_grammar`. Two parsers means two maintenance surfaces. |
| 257 | + Consolidating either upstream or behind a single grammar is deferred. |
| 258 | +- Authenticated / multi-tenant exercising not covered by these tests. |
0 commit comments