Commit 777a338
feat(31): shared invariants module in quickwit-dst (#6246)
* feat: replace fixed MetricDataPoint fields with dynamic tag HashMap
* feat: replace ParquetField enum with constants and dynamic validation
* feat: derive sort order and bloom filters from batch schema
* feat: union schema accumulation and schema-agnostic ingest validation
* feat: dynamic column lookup in split writer
* feat: remove ParquetSchema dependency from indexing actors
* refactor: deduplicate test batch helpers
* lint
* feat(31): sort schema foundation — proto, parser, display, validation, window, TableConfig
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: rustdoc link errors — use backticks for private items
* feat(31): compaction metadata types — extend split metadata, postgres model, field lookup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(31): wire TableConfig into sort path, add compaction KV metadata
Wire TableConfig-driven sort order into ParquetWriter and add
self-describing Parquet file metadata for compaction:
- ParquetWriter::new() takes &TableConfig, resolves sort fields at
construction via parse_sort_fields() + ParquetField::from_name()
- sort_batch() uses resolved fields with per-column direction (ASC/DESC)
- SS-1 debug_assert verification: re-sort and check identity permutation
- build_compaction_key_value_metadata(): embeds sort_fields, window_start,
window_duration, num_merge_ops, row_keys (base64) in Parquet kv_metadata
- SS-5 verify_ss5_kv_consistency(): kv_metadata matches source struct
- write_to_file_with_metadata() replaces write_to_file()
- prepare_write() shared method for bytes and file paths
- ParquetWriterConfig gains to_writer_properties_with_metadata()
- ParquetSplitWriter passes TableConfig through
- All callers in quickwit-indexing updated with TableConfig::default()
- 23 storage tests pass including META-07 self-describing roundtrip
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(31): PostgreSQL migration 27 + compaction columns in stage/list/publish
Add compaction metadata to the PostgreSQL metastore:
Migration 27:
- 6 new columns: window_start, window_duration_secs, sort_fields,
num_merge_ops, row_keys, zonemap_regexes
- Partial index idx_metrics_splits_compaction_scope on
(index_uid, sort_fields, window_start) WHERE split_state = 'Published'
stage_metrics_splits:
- INSERT extended from 15 to 21 bind parameters for compaction columns
- ON CONFLICT SET updates all compaction columns
list_metrics_splits:
- PgMetricsSplit construction includes compaction fields (defaults from JSON)
Also fixes pre-existing compilation errors on upstream-10b-parquet-actors:
- Missing StageMetricsSplitsRequestExt import
- index_id vs index_uid type mismatches in publish/mark/delete
- IndexUid binding (to_string() for sqlx)
- ListMetricsSplitsResponseExt trait disambiguation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(31): close port gaps — split_writer metadata, compaction scope, publish validation
Close critical gaps identified during port review:
split_writer.rs:
- Store table_config on ParquetSplitWriter (not just pass-through)
- Compute window_start from batch time range using table_config.window_duration_secs
- Populate sort_fields, window_duration_secs, parquet_files on metadata before write
- Call write_to_file_with_metadata(Some(&metadata)) to embed KV metadata in Parquet
- Update size_bytes after write completes
metastore/mod.rs:
- Add window_start and sort_fields fields to ListMetricsSplitsQuery
- Add with_compaction_scope() builder method
metastore/postgres/metastore.rs:
- Add compaction scope filters (AND window_start = $N, AND sort_fields = $N) to list query
- Add replaced_split_ids count verification in publish_metrics_splits
- Bind compaction scope query parameters
ingest/config.rs:
- Add table_config: TableConfig field to ParquetIngestConfig
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(31): final gap fixes — file-backed scope filter, META-07 test, dead code removal
- file_backed_index/mod.rs: Add window_start and sort_fields filtering
to metrics_split_matches_query() for compaction scope queries
- writer.rs: Add test_meta07_self_describing_parquet_roundtrip test
(writes compaction metadata to Parquet, reads back from cold file,
verifies all fields roundtrip correctly)
- fields.rs: Remove dead sort_order() method (replaced by TableConfig)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(31): correct postgres types for window_duration_secs and zonemap_regexes
Gap 1: Change window_duration_secs from i32 to Option<i32> in both
PgMetricsSplit and InsertableMetricsSplit. Pre-Phase-31 splits now
correctly map 0 → NULL in PostgreSQL, enabling Phase 32 compaction
queries to use `WHERE window_duration_secs IS NOT NULL` instead of
the fragile `WHERE window_duration_secs > 0`.
Gap 2: Change zonemap_regexes from String to serde_json::Value in
both structs. This maps directly to JSONB in sqlx, avoiding ambiguity
when PostgreSQL JSONB operators are used in Phase 34/35 zonemap pruning.
Gap 3: Add two missing tests:
- test_insertable_from_metadata_with_compaction_fields: verifies all 6
compaction fields round-trip through InsertableMetricsSplit
- test_insertable_from_metadata_pre_phase31_defaults: verifies pre-Phase-31
metadata produces window_duration_secs: None, zonemap_regexes: json!({})
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: rustfmt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(31): add metrics split test suite to shared metastore_test_suite! macro
11 tests covering the full metrics split lifecycle:
- stage (happy path + non-existent index error)
- stage upsert (ON CONFLICT update)
- list by state, time range, metric name, compaction scope
- publish (happy path + non-existent split error)
- mark for deletion
- delete (happy path + idempotent non-existent)
Tests are generic and run against both file-backed and PostgreSQL backends.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(31): read compaction columns in list_metrics_splits, fix cleanup_index FK
* fix(31): correct error types for non-existent metrics splits
- publish_metrics_splits: return NotFound (not FailedPrecondition) when
staged splits don't exist
- delete_metrics_splits: succeed silently (idempotent) for non-existent
splits instead of returning FailedPrecondition
- Tests now assert the correct error types on both backends
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: rustfmt metastore tests and postgres
* fix(31): address PR review — align metrics_splits with splits table
- Migration 27: add maturity_timestamp, delete_opstamp, node_id columns
and publish_timestamp trigger to match the splits table (Paul's review)
- ListMetricsSplitsQuery: adopt FilterRange<i64> for time_range (matching
log-side pattern), single time_range field for both read and compaction
paths, add node_id/delete_opstamp/update_timestamp/create_timestamp/
mature filters to close gaps with ListSplitsQuery
- Use SplitState enum instead of stringly-typed Vec<String> for split_states
- StoredMetricsSplit: add create_timestamp, node_id, delete_opstamp,
maturity_timestamp so file-backed metastore can filter on them locally
- File-backed filter: use FilterRange::overlaps_with() for time range and
window intersection, apply all new filters matching log-side predicate
- Postgres: intersection semantics for window queries, FilterRange-based
SQL generation for all range filters
- Fix InsertableMetricsSplit.window_duration_secs from Option<i32> to i32
- Rename two-letter variables (ws, sf, dt) throughout
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: fix rustfmt nightly formatting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(31): add shared invariants module to quickwit-dst
Extract duplicated invariant logic into a shared `invariants/` module
within `quickwit-dst`. This is the "single source of truth" layer in
the verification pyramid — used by stateright models, production
debug_assert checks, and (future) Datadog metrics emission.
Key changes:
- `invariants/registry.rs`: InvariantId enum (20 variants) with Display
- `invariants/window.rs`: shared window_start_secs(), is_valid_window_duration()
- `invariants/sort.rs`: generic compare_with_null_ordering() for SS-2
- `invariants/check.rs`: check_invariant! macro wrapping debug_assert
- stateright gated behind `model-checking` feature (optional dep)
- quickwit-parquet-engine uses shared functions and check_invariant!
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(31): check invariants in release builds, add pluggable recorder
The check_invariant! macro now always evaluates the condition — not just
in debug builds. This implements Layer 4 (Production) of the verification
stack: invariant checks run in release, with results forwarded to a
pluggable InvariantRecorder for Datadog metrics emission.
- Debug builds: panic on violation (debug_assert, Layer 3)
- All builds: evaluate condition, call recorder (Layer 4)
- set_invariant_recorder() wires up statsd at process startup
- No recorder registered = no-op (single OnceLock load)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(31): wire invariant recorder to DogStatsD metrics
Emit cloudprem.pomsky.invariant.checked and .violated counters with
invariant label via the metrics crate / DogStatsD exporter at process
startup, completing Layer 4 of the verification stack.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: license headers + cfg(not(test)) for quickwit-dst and quickwit-cli
* chore: regenerate third-party license file
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: fix rustfmt nightly formatting for quickwit-dst and quickwit-parquet-engine
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update quickwit/quickwit-parquet-engine/src/table_config.rs
Co-authored-by: Matthew Kim <matthew.kim@datadoghq.com>
* Update quickwit/quickwit-parquet-engine/src/table_config.rs
Co-authored-by: Matthew Kim <matthew.kim@datadoghq.com>
* style: rustfmt long match arm in default_sort_fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: make parquet_file field backward-compatible in MetricsSplitMetadata
Pre-existing splits were serialized before the parquet_file field was
added, so their JSON doesn't contain it. Adding #[serde(default)]
makes deserialization fall back to empty string for old splits.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: handle empty-column batches in accumulator flush
When the commit timeout fires and the accumulator contains only
zero-column batches, union_fields is empty and concat_batches fails
with "must either specify a row count or at least one column".
Now flush_internal treats empty union_fields the same as empty
pending_batches — resets state and returns None.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: rustfmt check_invariant macro argument
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Matthew Kim <matthew.kim@datadoghq.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 469c606 commit 777a338
24 files changed
Lines changed: 2772 additions & 33 deletions
File tree
- quickwit
- quickwit-cli
- src
- quickwit-dst
- src
- invariants
- models
- tests
- quickwit-parquet-engine
- src
- sort_fields
- split
- storage
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| |||
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
| 134 | + | |
133 | 135 | | |
134 | 136 | | |
| 137 | + | |
135 | 138 | | |
136 | 139 | | |
137 | 140 | | |
| |||
224 | 227 | | |
225 | 228 | | |
226 | 229 | | |
| 230 | + | |
227 | 231 | | |
228 | 232 | | |
229 | 233 | | |
| |||
327 | 331 | | |
328 | 332 | | |
329 | 333 | | |
| 334 | + | |
330 | 335 | | |
331 | 336 | | |
332 | 337 | | |
| |||
396 | 401 | | |
397 | 402 | | |
398 | 403 | | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
399 | 407 | | |
400 | 408 | | |
401 | 409 | | |
| |||
414 | 422 | | |
415 | 423 | | |
416 | 424 | | |
| 425 | + | |
417 | 426 | | |
418 | 427 | | |
| 428 | + | |
419 | 429 | | |
420 | 430 | | |
421 | 431 | | |
| |||
568 | 578 | | |
569 | 579 | | |
570 | 580 | | |
| 581 | + | |
571 | 582 | | |
572 | 583 | | |
573 | 584 | | |
574 | 585 | | |
575 | 586 | | |
576 | 587 | | |
| 588 | + | |
577 | 589 | | |
578 | 590 | | |
579 | 591 | | |
| |||
685 | 697 | | |
686 | 698 | | |
687 | 699 | | |
| 700 | + | |
688 | 701 | | |
689 | 702 | | |
690 | 703 | | |
| |||
729 | 742 | | |
730 | 743 | | |
731 | 744 | | |
| 745 | + | |
732 | 746 | | |
733 | 747 | | |
734 | 748 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
0 commit comments