Skip to content

Commit c2b6403

Browse files
committed
Merge latest main into metrics migration
2 parents cd333ab + 7597552 commit c2b6403

33 files changed

Lines changed: 71 additions & 199 deletions

.github/CODEOWNERS

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# CODEOWNERS — see https://docs.github.com/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
2+
#
3+
# Last matching rule per file wins. Approval from any listed owner satisfies
4+
# the requirement. quickwit-dev is listed on every rule so it can always
5+
# approve; an additional team is listed on the metrics Parquet pipeline paths
6+
# so PRs scoped to those paths can be approved by either team.
7+
8+
# Default: quickwit-core owns everything
9+
* @quickwit-oss/quickwit-core
10+
11+
# byoc-metrics paths — owned by byoc-metrics
12+
/quickwit/quickwit-parquet-engine/ @quickwit-oss/byoc-metrics
13+
/quickwit/quickwit-datafusion/ @quickwit-oss/byoc-metrics
14+
/quickwit/quickwit-df-core/ @quickwit-oss/byoc-metrics
15+
/quickwit/quickwit-dst/ @quickwit-oss/byoc-metrics
16+
/quickwit/quickwit-indexing/src/actors/metrics_pipeline/ @quickwit-oss/byoc-metrics

LICENSE-3rdparty.csv

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,6 @@ embedded-io,https://github.com/rust-embedded/embedded-hal,MIT OR Apache-2.0,The
284284
ena,https://github.com/rust-lang/ena,MIT OR Apache-2.0,Niko Matsakis <niko@alum.mit.edu>
285285
encode_unicode,https://github.com/tormol/encode_unicode,Apache-2.0 OR MIT,Torbjørn Birch Moltu <t.b.moltu@lyse.net>
286286
encoding_rs,https://github.com/hsivonen/encoding_rs,(Apache-2.0 OR MIT) AND BSD-3-Clause,Henri Sivonen <hsivonen@hsivonen.fi>
287-
endian-type,https://github.com/Lolirofle/endian-type,MIT,Lolirofle <lolipopple@hotmail.com>
288287
enum-iterator,https://github.com/stephaneyfx/enum-iterator,0BSD,Stephane Raux <stephaneyfx@gmail.com>
289288
enum-iterator-derive,https://github.com/stephaneyfx/enum-iterator,0BSD,Stephane Raux <stephaneyfx@gmail.com>
290289
env_filter,https://github.com/rust-cli/env_logger,MIT OR Apache-2.0,The env_filter Authors
@@ -469,7 +468,6 @@ measure_time,https://github.com/PSeitz/rust_measure_time,MIT,Pascal Seitz <pasca
469468
memchr,https://github.com/BurntSushi/memchr,Unlicense OR MIT,"Andrew Gallant <jamslam@gmail.com>, bluss"
470469
memmap2,https://github.com/RazrFalcon/memmap2-rs,MIT OR Apache-2.0,"Dan Burkert <dan@danburkert.com>, Yevhenii Reizner <razrfalcon@gmail.com>, The Contributors"
471470
metrics,https://github.com/metrics-rs/metrics,MIT,Toby Lawrence <toby@nuclearfurnace.com>
472-
metrics-exporter-dogstatsd,https://github.com/metrics-rs/metrics,MIT,Toby Lawrence <toby@nuclearfurnace.com>
473471
metrics-exporter-otel,https://github.com/palindrom615/metrics,MIT,Whoemoon Jang <palindrom615@gmail.com>
474472
metrics-exporter-prometheus,https://github.com/metrics-rs/metrics,MIT AND Apache-2.0,Toby Lawrence <toby@nuclearfurnace.com>
475473
metrics-util,https://github.com/metrics-rs/metrics,MIT,Toby Lawrence <toby@nuclearfurnace.com>
@@ -491,7 +489,6 @@ murmurhash32,https://github.com/quickwit-inc/murmurhash32,MIT,Paul Masurel <paul
491489
native-tls,https://github.com/rust-native-tls/rust-native-tls,MIT OR Apache-2.0,Steven Fackler <sfackler@gmail.com>
492490
new_debug_unreachable,https://github.com/mbrubeck/rust-debug-unreachable,MIT,"Matt Brubeck <mbrubeck@limpet.net>, Jonathan Reem <jonathan.reem@gmail.com>"
493491
new_string_template,https://github.com/hasezoey/new_string_template,MIT,hasezoey <hasezoey@gmail.com>
494-
nibble_vec,https://github.com/michaelsproul/rust_nibble_vec,MIT,Michael Sproul <micsproul@gmail.com>
495492
nix,https://github.com/nix-rust/nix,MIT,The nix-rust Project Developers
496493
no-std-net,https://github.com/dunmatt/no-std-net,MIT,M@ Dunlap <mattdunlap@gmail.com>
497494
nohash-hasher,https://github.com/paritytech/nohash-hasher,Apache-2.0 OR MIT,Parity Technologies <admin@parity.io>
@@ -653,7 +650,6 @@ quinn-udp,https://github.com/quinn-rs/quinn,MIT OR Apache-2.0,The quinn-udp Auth
653650
quote,https://github.com/dtolnay/quote,MIT OR Apache-2.0,David Tolnay <dtolnay@gmail.com>
654651
quoted_printable,https://github.com/staktrace/quoted-printable,0BSD,Kartikaya Gupta <kats@seldon.staktrace.com>
655652
r-efi,https://github.com/r-efi/r-efi,MIT OR Apache-2.0 OR LGPL-2.1-or-later,The r-efi Authors
656-
radix_trie,https://github.com/michaelsproul/rust_radix_trie,MIT,Michael Sproul <micsproul@gmail.com>
657653
rand,https://github.com/rust-random/rand,MIT OR Apache-2.0,"The Rand Project Developers, The Rust Project Developers"
658654
rand_chacha,https://github.com/rust-random/rand,MIT OR Apache-2.0,"The Rand Project Developers, The Rust Project Developers, The CryptoCorrosion Contributors"
659655
rand_core,https://github.com/rust-random/rand,MIT OR Apache-2.0,"The Rand Project Developers, The Rust Project Developers"

docs/internals/UPSTREAM-CANDIDATES.md

Lines changed: 0 additions & 30 deletions
This file was deleted.

docs/internals/adr/001-parquet-data-model.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -93,11 +93,9 @@ This is a **schema-on-read** approach: the storage layer stores data in whatever
9393

9494
**Transition.** The current OTel map-based ingestion format is the starting point. The indexing pipeline can extract attributes into columns at write time, presenting the original OTel map interface at the API boundary while storing columnar data internally. This is transparent to ingest clients — they continue sending OTel-format data. Queries can access attributes either by the original map path (for compatibility) or by direct column access (for performance). The storage representation is an internal optimization, not a change to the external data model.
9595

96-
### 6. RLE/Dictionary Encoding and the Flurry Project
96+
### 6. RLE/Dictionary Encoding Preservation
9797

98-
The point-per-row model's performance depends on columnar encodings being preserved through the query pipeline. Currently, RLE and dictionary encoding are decoded to plain arrays early in DataFusion's execution. There is significant ongoing investment in **Flurry** (the metrics equivalent of Bolt) to preserve these encodings through more operators.
99-
100-
As Flurry matures, the performance benefits of sorted point-per-row data increase: longer runs in sorted columns translate directly to better RLE compression ratios that are maintained through query execution. This makes point-per-row a bet that improves over time rather than a static trade-off.
98+
The point-per-row model's performance depends on columnar encodings being preserved through the query pipeline. Currently, RLE and dictionary encoding are decoded to plain arrays early in DataFusion's execution. As DataFusion grows operator-level support for these encodings, the performance benefits of sorted point-per-row data increase: longer runs in sorted columns translate directly to better RLE compression ratios that are maintained through query execution. This makes point-per-row a bet that improves over time rather than a static trade-off.
10199

102100
## Invariants
103101

@@ -123,14 +121,14 @@ These invariants must hold across all code paths (ingestion, compaction, query).
123121

124122
### Negative
125123

126-
- **Tag redundancy.** Every row for the same timeseries repeats all tag values. In timeseries-per-row, tags are stored once per series. With good columnar encoding on sorted data, this redundancy compresses away, but it is still present in the uncompressed representation and affects memory usage during query execution until Flurry-style encoding preservation is complete.
124+
- **Tag redundancy.** Every row for the same timeseries repeats all tag values. In timeseries-per-row, tags are stored once per series. With good columnar encoding on sorted data, this redundancy compresses away, but it is still present in the uncompressed representation and affects memory usage during query execution until DataFusion preserves dictionary/RLE encoding through more operators.
127125
- **OTel map attributes defeat columnar benefits.** The current OTel ingest schema stores attributes as key-value maps. Until schema-on-read column extraction is implemented, attributes cannot participate in sorting, page-level pruning, or efficient columnar compression. This is the most significant near-term limitation of the data model.
128126
- **No intra-series locality guarantee.** Without `timeseries_id` in the sort schema, points from the same series may be interleaved with points from other series that share the same sort-column values. This is a configuration choice, not an inherent limitation.
129127
- **Duplicate points are stored.** Without LWW or per-point dedup, retried ingestion or overlapping sources can produce duplicate points. Existing batch-level dedup (WAL checkpoints, file-level tracking) prevents most duplicates, but cross-request duplicates are possible. See [GAP-005](./gaps/005-no-per-point-deduplication.md).
130128

131129
### Risks
132130

133-
- **Flurry dependency for performance parity.** Until RLE/dictionary encoding is preserved through DataFusion, point-per-row may scan more data than timeseries-per-row for series-centric queries (e.g., "plot CPU for host X"). The magnitude depends on the encoding preservation timeline.
131+
- **Encoding-preservation dependency for performance parity.** Until RLE/dictionary encoding is preserved through DataFusion, point-per-row may scan more data than timeseries-per-row for series-centric queries (e.g., "plot CPU for host X"). The magnitude depends on the encoding-preservation timeline.
134132
- **Wide tables (future research).** Metrics from the same source share nearly identical tags. Multiple metric names could be stored as separate value columns in a single wide row (e.g., `k8s.cpu.usage`, `k8s.cpu.limit`, `k8s.mem.usage` as columns sharing one tag set). This is the approach taken by TimescaleDB's hypertables. It would amortize tag storage further but requires significant compactor changes. Worth investigating as future research; it is compatible with point-per-row as an evolution, not a replacement.
135133

136134
## Signal Generalization
@@ -147,7 +145,7 @@ The no-LWW and no-storage-interpolation decisions are universal across signals.
147145
| Date | Decision | Rationale |
148146
|------|----------|-----------|
149147
| 2026-02-19 | Initial ADR created | Establish foundational data model for Parquet metrics pipeline |
150-
| 2026-02-19 | Point-per-row chosen over timeseries-per-row | Simpler compaction, no LWW, standard DataFusion operators. Performance parity via columnar encoding + Flurry |
148+
| 2026-02-19 | Point-per-row chosen over timeseries-per-row | Simpler compaction, no LWW, standard DataFusion operators. Performance parity via columnar encoding and dictionary/RLE preservation through more operators |
151149
| 2026-02-19 | No LWW semantics | Eliminates sticky routing and series-level dedup. Simplifies ingestion and compaction |
152150
| 2026-02-19 | Dedup clarified: batch-level exists, per-point does not | WAL checkpoints provide exactly-once at the batch level. File-level dedup for queue sources. Per-point dedup not implemented; identified as GAP-005 if needed |
153151
| 2026-02-19 | timeseries_id defined as optional synthetic column | Provides intra-group locality tiebreaker without adding complexity to the core data model |

docs/internals/adr/002-sort-schema-parquet-splits.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Sorting rows within each split by a schema aligned with common query predicates
2020
1. **Compression improvement.** Columnar formats like Parquet compress data by encoding runs of similar values. When rows are sorted by metric name and tags, the columns for those fields contain long runs of identical or similar values, benefiting RLE, dictionary encoding, and general-purpose compression (ZSTD). In Husky Phase 1, this yielded ~33% size reduction for APM data and ~25% for Logs data.
2121
2. **Query efficiency.** Parquet's column index (format v2) stores min/max statistics per page within each column chunk. When data is sorted, pages within each column naturally have non-overlapping value ranges for the sort columns. DataFusion supports page index pruning, allowing it to skip pages that cannot match a query predicate.
2222

23-
Matthew Kim's implementation added a fixed sort on `(MetricName, TagService, TagEnv, TagDatacenter, TagRegion, TagHost, TimestampSecs)` in the Parquet writer (`quickwit-parquet-engine/src/storage/writer.rs`), demonstrating that sorting is feasible and inexpensive. However, this sort order is hardcoded in `ParquetField::sort_order()` and cannot be customized per index or deployment. Different workloads have different high-value columns; a metrics index tracking Kubernetes containers benefits from sorting by `pod` and `namespace`, while an infrastructure metrics index benefits from `host` and `datacenter`.
23+
An initial implementation added a fixed sort on `(MetricName, TagService, TagEnv, TagDatacenter, TagRegion, TagHost, TimestampSecs)` in the Parquet writer (`quickwit-parquet-engine/src/storage/writer.rs`), demonstrating that sorting is feasible and inexpensive. However, this sort order is hardcoded in `ParquetField::sort_order()` and cannot be customized per index or deployment. Different workloads have different high-value columns; a metrics index tracking Kubernetes containers benefits from sorting by `pod` and `namespace`, while an infrastructure metrics index benefits from `host` and `datacenter`.
2424

2525
This ADR formalizes the sort schema as a configurable, per-index property stored in the metastore.
2626

@@ -169,7 +169,7 @@ Phase 4 of the locality compaction roadmap extends sorting to the Tantivy pipeli
169169

170170
| Component | Location | Status |
171171
|-----------|----------|--------|
172-
| Fixed sort at ingestion | `quickwit-parquet-engine/src/storage/writer.rs` | Done (Matthew Kim). Replaced by configurable sort in PR #6287 |
172+
| Fixed sort at ingestion | `quickwit-parquet-engine/src/storage/writer.rs` | Done. Replaced by configurable sort in PR #6287 |
173173
| Configurable sort schema | `quickwit-parquet-engine/src/table_config.rs` | Done (PR #6287). `TableConfig` with `effective_sort_fields()` override; `ParquetWriter` resolves sort fields dynamically |
174174
| Sort schema parser | `quickwit-parquet-engine/src/sort_fields/parser.rs` | Done (PR #6290). Parses `column\|...\|&metadata\|timestamp/V2` with directions, LSM cutoff, version |
175175
| Per-column sort direction | `sort_fields/parser.rs` + `storage/writer.rs` | Done (PR #6290 + #6287). Parser extracts `+`/`-` suffix; writer respects `descending` flag |
@@ -197,5 +197,4 @@ Phase 4 of the locality compaction roadmap extends sorting to the Tantivy pipeli
197197
- [Compaction Architecture](../compaction-architecture.md) — current compaction system description
198198
- [ADR-001: Parquet Data Model](./001-parquet-data-model.md) — point-per-row data model and timeseries_id
199199
- [ADR-003: Time-Windowed Sorted Compaction](./003-time-windowed-sorted-compaction.md) — compaction that depends on sort schema
200-
- [Husky Phase 1: Locality of Reference](https://docs.google.com/document/d/1x9BO1muCTo1TmfhPYBdIxZ-59aU0ECSiEaGPUcDZkPs/edit) — prior art
201-
- [Husky Storage Compaction Blog Post](https://www.datadoghq.com/blog/engineering/husky-storage-compaction/)
200+
- [Husky Storage Compaction Blog Post](https://www.datadoghq.com/blog/engineering/husky-storage-compaction/) — prior art

docs/internals/adr/003-time-windowed-sorted-compaction.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -302,4 +302,3 @@ Phase 4 of the locality compaction roadmap extends time-windowed sorted compacti
302302
- [StableLogMergePolicy](../../quickwit/quickwit-indexing/src/merge_policy/stable_log_merge_policy.rs) — existing merge policy
303303
- [Merge Planner](../../quickwit/quickwit-indexing/src/actors/merge_planner.rs) — existing merge planner (Tantivy)
304304
- [Husky Storage Compaction Blog Post](https://www.datadoghq.com/blog/engineering/husky-storage-compaction/)
305-
- [Husky Phase 2: Locality of Reference](https://docs.google.com/document/d/1vax-vv0wbhfddo4n5obhlVJxsmUa9N_62tKs5ZmYC6k/edit)

docs/internals/adr/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ ADRs will be created here as we implement new systems. Start with the metrics pi
2424
| [001](./001-parquet-data-model.md) | Parquet Metrics Data Model | Proposed | `storage`, `metrics`, `parquet`, `data-model` | quickwit-parquet-engine |
2525
| [002](./002-sort-schema-parquet-splits.md) | Configurable Sort Schema for Parquet Splits | Proposed | `storage`, `metrics`, `compaction`, `parquet`, `sorting` | quickwit-parquet-engine, quickwit-indexing |
2626
| [003](./003-time-windowed-sorted-compaction.md) | Time-Windowed Sorted Compaction for Parquet | Proposed | `storage`, `metrics`, `compaction`, `parquet`, `time-windowing` | quickwit-parquet-engine, quickwit-indexing, quickwit-metastore |
27-
| [004](./004-cloud-native-storage-characteristics.md) | Cloud-Native Storage Characteristics | Proposed | `architecture`, `storage`, `cloud-native`, `observability` | all |
2827

2928
## Supplements & Roadmaps
3029

@@ -49,7 +48,7 @@ Quickwit tracks architectural change through three lenses. See **[EVOLUTION.md](
4948

5049
### Characteristics (What we need)
5150

52-
Product requirements and capabilities we must have. See [ADR-004](./004-cloud-native-storage-characteristics.md) for the full characteristic status matrix.
51+
Product requirements and capabilities we must have.
5352

5453
### Gaps (What we learned)
5554

docs/internals/adr/gaps/002-fixed-sort-schema.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
**Status**: Partially resolved
44
**Discovered**: 2026-02-19
5-
**Context**: Codebase analysis during Phase 1 locality compaction design. Sort implementation by Matthew Kim provides the foundation but is not configurable.
5+
**Context**: Codebase analysis during Phase 1 locality compaction design. The initial sort implementation provides the foundation but is not configurable.
66
**Resolution**: PRs #6287#6292 replaced the hardcoded sort with a configurable `TableConfig` + sort schema parser. Remaining: per-index metastore storage, pipeline propagation, null ordering fix.
77

88
## Problem

docs/internals/adr/gaps/006-no-independent-auto-scaling.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
**Status**: Open
44
**Discovered**: 2026-02-19
5-
**Context**: Cloud-native storage characteristics analysis ([ADR-004](../004-cloud-native-storage-characteristics.md), characteristics C1, C17)
5+
**Context**: Cloud-native storage characteristics analysis (independent scaling, burst handling)
66

77
## Problem
88

@@ -45,7 +45,3 @@ All signals equally affected. Independent scaling is signal-agnostic.
4545
- [ ] Evaluate separating the merge pipeline into a standalone compactor service
4646
- [ ] Design auto-scaling policies for each workload type (ingest QPS, query QPS, file backlog)
4747
- [ ] Investigate burst handling for ingest (overflow buffer, backpressure, burst lane)
48-
49-
## References
50-
51-
- [ADR-004: Cloud-Native Storage Characteristics](../004-cloud-native-storage-characteristics.md)

0 commit comments

Comments
 (0)