diff --git a/docs/configuration/index-config.md b/docs/configuration/index-config.md index 24ce8677902..067f16c7fc8 100644 --- a/docs/configuration/index-config.md +++ b/docs/configuration/index-config.md @@ -594,7 +594,9 @@ This section describes indexing settings for a given index. | ------------- | ------------- | ------------- | | `commit_timeout_secs` | Maximum number of seconds before committing a split since its creation. | `60` | | `split_num_docs_target` | Target number of docs per split. | `10000000` | -| `merge_policy` | Describes the strategy used to trigger split merge operations (see [Merge policies](#merge-policies) section below). | +| `merge_policy` | Describes the strategy used to trigger split merge operations for logs/traces (see [Merge policies](#merge-policies) section below). | +| `parquet_merge_policy` | Describes the merge policy for Parquet (metrics/sketches) splits (see [Parquet merge policy](#parquet-merge-policy) section below). | +| `parquet_indexing` | Parquet-specific indexing settings: sort schema, window duration (see [Parquet indexing settings](#parquet-indexing-settings) section below). | | `resources.heap_size` | Indexer heap size per source per index. | `2000000000` | | `docstore_compression_level` | Level of compression used by zstd for the docstore. Lower values may increase ingest speed, at the cost of index size | `8` | | `docstore_blocksize` | Size of blocks in the docstore, in bytes. Lower values may improve doc retrieval speed, at the cost of index size | `1000000` | @@ -687,6 +689,86 @@ indexing_settings: type: "no_merge" ``` +### Parquet indexing settings + +*For indexes using the Parquet indexing pipeline (metrics, sketches).* + +These settings control how the Parquet pipeline sorts, windows, and writes incoming data. They affect both ingest-time performance and downstream query/compaction efficiency. + +```yaml +version: 0.7 +index_id: "my-metrics-index" +# ... +indexing_settings: + parquet_indexing: + sort_fields: "metric_name|service|env|host|timeseries_id|timestamp_secs/V2" + window_duration_secs: 900 +``` + +| Variable | Description | Default value | +| ------------- | ------------- | ------------- | +| `sort_fields` | Sort schema for row ordering in Parquet files (see syntax below). When omitted, the product-type default is used. | `metric_name\|service\|env\|datacenter\|region\|host\|timeseries_id\|timestamp_secs/V2` | +| `window_duration_secs` | Time window duration in seconds for split partitioning. Must evenly divide 3600. Larger values = fewer splits but coarser time pruning. | `900` (15 minutes) | + +#### Sort schema syntax + +The sort schema uses pipe-delimited column names with a `/V2` version suffix: + +```text +column1|column2|...|timestamp_secs/V2 +``` + +**Column types** are inferred from name suffixes: +- `__s` → string (e.g., `custom_tag__s`) +- `__i` → int64 (e.g., `priority__i`) +- Well-known names like `metric_name`, `service`, `env`, `host`, `timestamp_secs`, and `timeseries_id` have built-in type mappings and don't need suffixes. + +**Sort direction** defaults to ascending for most columns and descending for timestamp columns. Override with `+` (ascending) or `-` (descending) as a prefix or suffix on the column name: + +```text +# Explicit descending timestamp +metric_name|host|-timestamp_secs/V2 + +# Ascending host (default), descending timestamp (default) +metric_name|host|timestamp_secs/V2 +``` + +**How the sort schema affects behavior:** +- **Query pruning**: queries filtering on leading columns (e.g., `metric_name`) can skip entire splits whose row key ranges don't match. +- **Compression**: grouping similar values together (e.g., all rows for the same metric name) improves columnar compression ratios. +- **Compaction scope**: splits with different sort schemas are never merged together. Changing the sort schema on an existing index creates a new compaction scope — old splits are not re-sorted. + +**The `&` marker** (advanced) sets the LSM comparison cutoff: columns after `&` are used for sort order but not for compaction locality decisions. For example, `metric_name|&host|timestamp_secs/V2` sorts by metric_name then host, but only metric_name determines which splits can be merged. + +#### Parquet merge policy + +*For indexes using the Parquet indexing pipeline (metrics, sketches).* + +The Parquet merge policy controls how Parquet splits within a compaction scope (same time window, partition, and sort schema) are merged. It uses a constant write amplification strategy: splits at the same merge level are greedily accumulated until reaching `max_merge_factor` or `target_split_size_bytes`. + +```yaml +version: 0.7 +index_id: "my-metrics-index" +# ... +indexing_settings: + parquet_merge_policy: + merge_factor: 10 + max_merge_factor: 12 + max_merge_ops: 4 + target_split_size_bytes: 268435456 + maturation_period: 48h + max_finalize_merge_operations: 3 +``` + + +| Variable | Description | Default value | +| ------------- | ------------- | ------------- | +| `merge_factor` | Minimum number of splits to trigger a merge. | `10` | +| `max_merge_factor` | Maximum number of splits in a single merge operation. | `12` | +| `max_merge_ops` | Maximum number of merges a split can undergo before becoming mature. Bounds total write amplification. | `4` | +| `target_split_size_bytes` | Target size for merged output splits in bytes. Merges trigger when accumulated bytes reach this threshold, even if `merge_factor` is not reached. | `268435456` (256 MiB) | +| `maturation_period` | Duration after creation when a split becomes mature (never merged again). | `48h` | +| `max_finalize_merge_operations` | *(advanced)* Maximum number of merge operations emitted during cold-window finalization at pipeline shutdown. Set to `0` to disable. | `3` | ### Indexer memory usage diff --git a/quickwit/quickwit-config/Cargo.toml b/quickwit/quickwit-config/Cargo.toml index 93a1fdf8de1..aebea5e9c31 100644 --- a/quickwit/quickwit-config/Cargo.toml +++ b/quickwit/quickwit-config/Cargo.toml @@ -45,5 +45,6 @@ quickwit-common = { workspace = true, features = ["testsuite"] } quickwit-proto = { workspace = true, features = ["testsuite"] } [features] +metrics = [] testsuite = [] vrl = ["dep:vrl"] diff --git a/quickwit/quickwit-config/src/index_config/mod.rs b/quickwit/quickwit-config/src/index_config/mod.rs index 286f2d695be..d536e4edab5 100644 --- a/quickwit/quickwit-config/src/index_config/mod.rs +++ b/quickwit/quickwit-config/src/index_config/mod.rs @@ -37,6 +37,8 @@ use tracing::warn; use crate::index_config::serialize::VersionedIndexConfig; use crate::merge_policy_config::MergePolicyConfig; +#[cfg(feature = "metrics")] +use crate::merge_policy_config::ParquetMergePolicyConfig; #[derive(Clone, Debug, Serialize, Deserialize, utoipa::ToSchema)] #[serde(deny_unknown_fields)] @@ -118,15 +120,108 @@ pub struct IndexingSettings { pub split_num_docs_target: usize, #[serde(default)] pub merge_policy: MergePolicyConfig, + /// Merge policy for Parquet (metrics/sketches) splits. + #[cfg(feature = "metrics")] + #[serde(default, skip_serializing_if = "Option::is_none")] + pub parquet_merge_policy: Option, + /// Parquet-specific indexing settings (sort schema, window duration). + #[cfg(feature = "metrics")] + #[serde(default, skip_serializing_if = "Option::is_none")] + pub parquet_indexing: Option, #[serde(default)] pub resources: IndexingResources, } +/// Configuration for the Parquet indexing pipeline (metrics, sketches). +/// +/// Controls how incoming data is sorted, windowed, and compressed before +/// writing to Parquet split files. These settings affect both ingest-time +/// performance and downstream query/compaction efficiency. +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Hash, utoipa::ToSchema)] +#[serde(deny_unknown_fields)] +pub struct ParquetIndexingConfig { + /// Sort schema defining the physical sort order of rows in Parquet files. + /// + /// Uses Husky-style pipe-delimited syntax with a `/V2` version suffix. + /// Each column is sorted ascending by default; use `+` or `-` prefix/suffix + /// to override. Column types are inferred from well-known suffixes + /// (`__s` = string, `__i` = int64, `_secs` = uint64 timestamp). + /// + /// The sort order determines: + /// - **Query pruning**: queries that filter on leading sort columns can skip entire splits + /// whose row key ranges don't match. + /// - **Compression**: columns with good locality (e.g., metric_name first) compress better in + /// Parquet's columnar format. + /// - **Compaction scope**: splits with different sort schemas are never merged together. + /// + /// When `None`, the product-type default is used (see below). + /// + /// # Default (metrics/sketches) + /// ```text + /// metric_name|service|env|datacenter|region|host|timeseries_id|timestamp_secs/V2 + /// ``` + /// + /// # Examples + /// ```text + /// # Minimal: just metric name and timestamp + /// metric_name|timestamp_secs/V2 + /// + /// # Custom tags in sort order + /// metric_name|service|cluster|host|timestamp_secs/V2 + /// + /// # Explicit descending timestamp + /// metric_name|host|-timestamp_secs/V2 + /// ``` + #[serde(default, skip_serializing_if = "Option::is_none")] + pub sort_fields: Option, + + /// Time window duration in seconds for split partitioning. + /// + /// Incoming data is partitioned into time windows of this duration. + /// Splits within the same window may be compacted together; splits in + /// different windows are never merged. Must evenly divide 3600 (one hour). + /// + /// Larger values produce fewer, larger splits (better for bulk queries) + /// but coarser time-based pruning. Smaller values give finer pruning + /// but more splits to manage. + #[serde(default = "ParquetIndexingConfig::default_window_duration_secs")] + pub window_duration_secs: u32, +} + +impl ParquetIndexingConfig { + fn default_window_duration_secs() -> u32 { + 900 + } +} + +impl Default for ParquetIndexingConfig { + fn default() -> Self { + Self { + sort_fields: None, + window_duration_secs: Self::default_window_duration_secs(), + } + } +} + impl IndexingSettings { pub fn commit_timeout(&self) -> Duration { Duration::from_secs(self.commit_timeout_secs as u64) } + /// Returns the Parquet merge policy config, using defaults if not + /// explicitly configured. + #[cfg(feature = "metrics")] + pub fn parquet_merge_policy(&self) -> ParquetMergePolicyConfig { + self.parquet_merge_policy.clone().unwrap_or_default() + } + + /// Returns the Parquet indexing config, using defaults if not + /// explicitly configured. + #[cfg(feature = "metrics")] + pub fn parquet_indexing(&self) -> ParquetIndexingConfig { + self.parquet_indexing.clone().unwrap_or_default() + } + fn default_commit_timeout_secs() -> usize { 60 } @@ -160,6 +255,10 @@ impl Default for IndexingSettings { docstore_compression_level: Self::default_docstore_compression_level(), split_num_docs_target: Self::default_split_num_docs_target(), merge_policy: MergePolicyConfig::default(), + #[cfg(feature = "metrics")] + parquet_merge_policy: None, + #[cfg(feature = "metrics")] + parquet_indexing: None, resources: IndexingResources::default(), } } diff --git a/quickwit/quickwit-config/src/lib.rs b/quickwit/quickwit-config/src/lib.rs index 2abaaef79f3..b10afbc8b0b 100644 --- a/quickwit/quickwit-config/src/lib.rs +++ b/quickwit/quickwit-config/src/lib.rs @@ -45,9 +45,9 @@ pub use cluster_config::ClusterConfig; // See #2048 use index_config::serialize::{IndexConfigV0_8, VersionedIndexConfig}; pub use index_config::{ - IndexConfig, IndexingResources, IndexingSettings, IngestSettings, RetentionPolicy, - SearchSettings, build_doc_mapper, load_index_config_from_user_config, load_index_config_update, - prepare_doc_mapping_update, + IndexConfig, IndexingResources, IndexingSettings, IngestSettings, ParquetIndexingConfig, + RetentionPolicy, SearchSettings, build_doc_mapper, load_index_config_from_user_config, + load_index_config_update, prepare_doc_mapping_update, }; pub use quickwit_doc_mapper::DocMapping; use serde::Serialize; @@ -67,7 +67,8 @@ use tracing::warn; use crate::index_template::IndexTemplateV0_8; pub use crate::index_template::{IndexTemplate, IndexTemplateId, VersionedIndexTemplate}; use crate::merge_policy_config::{ - ConstWriteAmplificationMergePolicyConfig, MergePolicyConfig, StableLogMergePolicyConfig, + ConstWriteAmplificationMergePolicyConfig, MergePolicyConfig, ParquetMergePolicyConfig, + StableLogMergePolicyConfig, }; pub use crate::metastore_config::{ MetastoreBackend, MetastoreConfig, MetastoreConfigs, PostgresMetastoreConfig, @@ -113,6 +114,8 @@ pub fn disable_ingest_v1() -> bool { KafkaSourceParams, KinesisSourceParams, MergePolicyConfig, + ParquetIndexingConfig, + ParquetMergePolicyConfig, PubSubSourceParams, PulsarSourceAuth, PulsarSourceParams, diff --git a/quickwit/quickwit-config/src/merge_policy_config.rs b/quickwit/quickwit-config/src/merge_policy_config.rs index 3e4e5dad0ce..d3c996891cf 100644 --- a/quickwit/quickwit-config/src/merge_policy_config.rs +++ b/quickwit/quickwit-config/src/merge_policy_config.rs @@ -119,6 +119,74 @@ impl Default for StableLogMergePolicyConfig { } } +// --- Parquet merge policy config --- +// +// The types are always available (for OpenAPI schema generation in +// quickwit-serve). The IndexingSettings fields that use them are +// gated behind cfg(feature = "metrics"). + +fn default_target_split_size_bytes() -> u64 { + 256 * 1024 * 1024 // 256 MiB +} + +fn default_max_finalize_merge_operations() -> usize { + 3 +} + +/// Configuration for the Parquet (metrics/sketches) merge policy. +/// +/// Controls how Parquet splits within a compaction scope are merged. +/// Splits at the same `num_merge_ops` level are greedily accumulated +/// until reaching `max_merge_factor` or `target_split_size_bytes`. +#[derive(Debug, Clone, Serialize, Deserialize, Eq, PartialEq, Hash, utoipa::ToSchema)] +#[serde(deny_unknown_fields)] +pub struct ParquetMergePolicyConfig { + /// Minimum number of splits to trigger a merge. + #[serde(default = "default_merge_factor")] + pub merge_factor: usize, + /// Maximum number of splits in a single merge operation. + #[serde(default = "default_max_merge_factor")] + pub max_merge_factor: usize, + /// Maximum number of merges a split can undergo before becoming mature. + /// Bounds total write amplification. + #[serde(default = "default_parquet_max_merge_ops")] + pub max_merge_ops: u32, + /// Target size for merged output splits in bytes. Merges are triggered + /// when accumulated bytes reach this threshold, even if `merge_factor` + /// is not reached. + #[serde(default = "default_target_split_size_bytes")] + pub target_split_size_bytes: u64, + /// Duration after creation when a split becomes mature regardless of + /// size or merge count. Mature splits are never merged. + #[schema(value_type = String)] + #[serde(default = "default_maturation_period")] + #[serde(deserialize_with = "parse_human_duration")] + #[serde(serialize_with = "serialize_duration")] + pub maturation_period: Duration, + /// Maximum number of merge operations emitted during cold-window + /// finalization at shutdown. Set to 0 to disable. + #[serde(default = "default_max_finalize_merge_operations")] + #[serde(skip_serializing_if = "is_zero")] + pub max_finalize_merge_operations: usize, +} + +fn default_parquet_max_merge_ops() -> u32 { + 4 +} + +impl Default for ParquetMergePolicyConfig { + fn default() -> Self { + Self { + merge_factor: default_merge_factor(), + max_merge_factor: default_max_merge_factor(), + max_merge_ops: default_parquet_max_merge_ops(), + target_split_size_bytes: default_target_split_size_bytes(), + maturation_period: default_maturation_period(), + max_finalize_merge_operations: default_max_finalize_merge_operations(), + } + } +} + fn parse_human_duration<'de, D>(deserializer: D) -> Result where D: Deserializer<'de> { let value: String = Deserialize::deserialize(deserializer)?; diff --git a/quickwit/quickwit-indexing/Cargo.toml b/quickwit/quickwit-indexing/Cargo.toml index 21827a4fc02..797b59a5e97 100644 --- a/quickwit/quickwit-indexing/Cargo.toml +++ b/quickwit/quickwit-indexing/Cargo.toml @@ -105,7 +105,7 @@ testsuite = [ "quickwit-proto/testsuite", "quickwit-storage/testsuite" ] -metrics = ["dep:arrow", "dep:quickwit-parquet-engine", "quickwit-doc-mapper/metrics"] +metrics = ["dep:arrow", "dep:quickwit-parquet-engine", "quickwit-doc-mapper/metrics", "quickwit-config/metrics"] vrl = ["dep:vrl", "quickwit-config/vrl"] postgres = ["quickwit-metastore/postgres"] ci-test = [] diff --git a/quickwit/quickwit-indexing/src/actors/indexing_service.rs b/quickwit/quickwit-indexing/src/actors/indexing_service.rs index a8197d0058c..960ec903bac 100644 --- a/quickwit/quickwit-indexing/src/actors/indexing_service.rs +++ b/quickwit/quickwit-indexing/src/actors/indexing_service.rs @@ -1250,9 +1250,13 @@ mod tests { #[tokio::test] async fn test_indexing_service_apply_plan() { - const PARAMS_FINGERPRINT_INGEST_API: u64 = 1637744865450232394; - const PARAMS_FINGERPRINT_SOURCE_1: u64 = 1705211905504908791; - const PARAMS_FINGERPRINT_SOURCE_2: u64 = 8706667372658059428; + // These fingerprints are hashes of IndexConfig + SourceConfig. They + // change whenever IndexingSettings fields are added/removed. Recompute + // by temporarily adding a test that prints + // `indexing_pipeline_params_fingerprint(&index_config, &source_config)`. + const PARAMS_FINGERPRINT_INGEST_API: u64 = 7973087274884969148; + const PARAMS_FINGERPRINT_SOURCE_1: u64 = 9420938500552890840; + const PARAMS_FINGERPRINT_SOURCE_2: u64 = 16199199787360162635; quickwit_common::setup_logging_for_tests(); let transport = ChannelTransport::default(); diff --git a/quickwit/quickwit-metastore/test-data/file-backed-index/v0.7.expected.json b/quickwit/quickwit-metastore/test-data/file-backed-index/v0.7.expected.json index cb00de2fbd8..ef890a7d291 100644 --- a/quickwit/quickwit-metastore/test-data/file-backed-index/v0.7.expected.json +++ b/quickwit/quickwit-metastore/test-data/file-backed-index/v0.7.expected.json @@ -1,200 +1,200 @@ { - "version": "0.9", + "delete_tasks": [ + { + "create_timestamp": 0, + "delete_query": { + "index_uid": "my-index:00000000000000000000000000", + "query_ast": "{\"type\":\"bool\",\"must\":[{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Harry\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}}},{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Potter\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}}}]}" + }, + "opstamp": 10 + } + ], "index": { - "version": "0.9", - "index_uid": "my-index:00000000000000000000000000", + "checkpoint": { + "kafka-source": { + "00000000000000000000": "00000000000000000042" + } + }, + "create_timestamp": 1789, "index_config": { - "version": "0.9", - "index_id": "my-index", - "index_uri": "s3://quickwit-indexes/my-index", "doc_mapping": { "doc_mapping_uid": "00000000000000000000000000", - "mode": "dynamic", "dynamic_mapping": { - "indexed": true, - "tokenizer": "raw", - "record": "basic", - "stored": true, "expand_dots": true, "fast": { "normalizer": "raw" - } + }, + "indexed": true, + "record": "basic", + "stored": true, + "tokenizer": "raw" }, "field_mappings": [ { + "coerce": true, + "fast": true, + "indexed": true, "name": "tenant_id", - "type": "u64", + "output_format": "number", "stored": true, - "indexed": true, - "fast": true, - "coerce": true, - "output_format": "number" + "type": "u64" }, { - "name": "timestamp", - "type": "datetime", + "fast": true, + "fast_precision": "seconds", + "indexed": true, "input_formats": [ "rfc3339", "unix_timestamp" ], + "name": "timestamp", "output_format": "rfc3339", - "fast_precision": "seconds", - "indexed": true, "stored": true, - "fast": true + "type": "datetime" }, { - "name": "log_level", - "type": "text", + "fast": false, + "fieldnorms": false, "indexed": true, - "tokenizer": "raw", + "name": "log_level", "record": "basic", - "fieldnorms": false, "stored": true, - "fast": false + "tokenizer": "raw", + "type": "text" }, { - "name": "message", - "type": "text", + "fast": false, + "fieldnorms": false, "indexed": true, - "tokenizer": "default", + "name": "message", "record": "position", - "fieldnorms": false, "stored": true, - "fast": false + "tokenizer": "default", + "type": "text" } ], - "timestamp_field": "timestamp", + "index_field_presence": true, + "max_num_partitions": 100, + "mode": "dynamic", + "partition_key": "tenant_id", + "store_document_size": false, + "store_source": true, "tag_fields": [ "log_level", "tenant_id" ], - "partition_key": "tenant_id", - "max_num_partitions": 100, - "index_field_presence": true, - "store_document_size": false, - "store_source": true, + "timestamp_field": "timestamp", "tokenizers": [ { + "filters": [], "name": "custom_tokenizer", - "type": "regex", "pattern": "[^\\p{L}\\p{N}]+", - "filters": [] + "type": "regex" } ] }, + "index_id": "my-index", + "index_uri": "s3://quickwit-indexes/my-index", "indexing_settings": { "commit_timeout_secs": 301, - "docstore_compression_level": 8, "docstore_blocksize": 1000000, - "split_num_docs_target": 10000001, + "docstore_compression_level": 8, "merge_policy": { - "type": "stable_log", - "min_level_num_docs": 100000, - "merge_factor": 9, + "maturation_period": "2days", "max_merge_factor": 11, - "maturation_period": "2days" + "merge_factor": 9, + "min_level_num_docs": 100000, + "type": "stable_log" }, "resources": { "heap_size": 50000000 - } + }, + "split_num_docs_target": 10000001 }, "ingest_settings": { "min_shards": 1 }, + "retention": { + "period": "90 days", + "schedule": "daily" + }, "search_settings": { "default_search_fields": [ "message" ] }, - "retention": { - "period": "90 days", - "schedule": "daily" - } - }, - "checkpoint": { - "kafka-source": { - "00000000000000000000": "00000000000000000042" - } + "version": "0.9" }, - "create_timestamp": 1789, + "index_uid": "my-index:00000000000000000000000000", "sources": [ { - "version": "0.9", - "source_id": "kafka-source", - "num_pipelines": 2, "enabled": true, - "source_type": "kafka", + "input_format": "json", + "num_pipelines": 2, "params": { - "topic": "kafka-topic", - "client_params": {} + "client_params": {}, + "topic": "kafka-topic" }, + "source_id": "kafka-source", + "source_type": "kafka", "transform": { "script": ".message = downcase(string!(.message))", "timezone": "UTC" }, - "input_format": "json" + "version": "0.9" + } + ], + "version": "0.9" + }, + "shards": { + "_ingest-source": [ + { + "doc_mapping_uid": "00000000000000000000000000", + "follower_id": "follower-ingester", + "index_uid": "my-index:00000000000000000000000000", + "leader_id": "leader-ingester", + "publish_position_inclusive": "", + "shard_id": "00000000000000000001", + "shard_state": 1, + "source_id": "_ingest-source", + "update_timestamp": 1704067200 } ] }, "splits": [ { - "split_state": "Published", - "update_timestamp": 1789, - "publish_timestamp": 1789, - "version": "0.9", - "split_id": "split", - "index_uid": "my-index:00000000000000000000000000", - "partition_id": 7, - "source_id": "source", - "node_id": "node", - "num_docs": 12303, - "uncompressed_docs_size_in_bytes": 234234, - "time_range": { - "start": 121000, - "end": 130198 - }, "create_timestamp": 3, + "delete_opstamp": 10, + "doc_mapping_uid": "00000000000000000000000000", + "footer_offsets": { + "end": 2000, + "start": 1000 + }, + "index_uid": "my-index:00000000000000000000000000", "maturity": { - "type": "immature", - "maturation_period_millis": 4000 + "maturation_period_millis": 4000, + "type": "immature" }, + "node_id": "node", + "num_docs": 12303, + "num_merge_ops": 3, + "partition_id": 7, + "publish_timestamp": 1789, + "source_id": "source", + "split_id": "split", + "split_state": "Published", "tags": [ "234", "aaa" ], - "footer_offsets": { - "start": 1000, - "end": 2000 + "time_range": { + "end": 130198, + "start": 121000 }, - "delete_opstamp": 10, - "num_merge_ops": 3, - "doc_mapping_uid": "00000000000000000000000000" + "uncompressed_docs_size_in_bytes": 234234, + "update_timestamp": 1789, + "version": "0.9" } ], - "shards": { - "_ingest-source": [ - { - "index_uid": "my-index:00000000000000000000000000", - "source_id": "_ingest-source", - "shard_id": "00000000000000000001", - "leader_id": "leader-ingester", - "follower_id": "follower-ingester", - "shard_state": 1, - "publish_position_inclusive": "", - "doc_mapping_uid": "00000000000000000000000000", - "update_timestamp": 1704067200 - } - ] - }, - "delete_tasks": [ - { - "create_timestamp": 0, - "opstamp": 10, - "delete_query": { - "index_uid": "my-index:00000000000000000000000000", - "query_ast": "{\"type\":\"bool\",\"must\":[{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Harry\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}}},{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Potter\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}}}]}" - } - } - ] + "version": "0.9" } diff --git a/quickwit/quickwit-metastore/test-data/file-backed-index/v0.8.expected.json b/quickwit/quickwit-metastore/test-data/file-backed-index/v0.8.expected.json index cb00de2fbd8..ef890a7d291 100644 --- a/quickwit/quickwit-metastore/test-data/file-backed-index/v0.8.expected.json +++ b/quickwit/quickwit-metastore/test-data/file-backed-index/v0.8.expected.json @@ -1,200 +1,200 @@ { - "version": "0.9", + "delete_tasks": [ + { + "create_timestamp": 0, + "delete_query": { + "index_uid": "my-index:00000000000000000000000000", + "query_ast": "{\"type\":\"bool\",\"must\":[{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Harry\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}}},{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Potter\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}}}]}" + }, + "opstamp": 10 + } + ], "index": { - "version": "0.9", - "index_uid": "my-index:00000000000000000000000000", + "checkpoint": { + "kafka-source": { + "00000000000000000000": "00000000000000000042" + } + }, + "create_timestamp": 1789, "index_config": { - "version": "0.9", - "index_id": "my-index", - "index_uri": "s3://quickwit-indexes/my-index", "doc_mapping": { "doc_mapping_uid": "00000000000000000000000000", - "mode": "dynamic", "dynamic_mapping": { - "indexed": true, - "tokenizer": "raw", - "record": "basic", - "stored": true, "expand_dots": true, "fast": { "normalizer": "raw" - } + }, + "indexed": true, + "record": "basic", + "stored": true, + "tokenizer": "raw" }, "field_mappings": [ { + "coerce": true, + "fast": true, + "indexed": true, "name": "tenant_id", - "type": "u64", + "output_format": "number", "stored": true, - "indexed": true, - "fast": true, - "coerce": true, - "output_format": "number" + "type": "u64" }, { - "name": "timestamp", - "type": "datetime", + "fast": true, + "fast_precision": "seconds", + "indexed": true, "input_formats": [ "rfc3339", "unix_timestamp" ], + "name": "timestamp", "output_format": "rfc3339", - "fast_precision": "seconds", - "indexed": true, "stored": true, - "fast": true + "type": "datetime" }, { - "name": "log_level", - "type": "text", + "fast": false, + "fieldnorms": false, "indexed": true, - "tokenizer": "raw", + "name": "log_level", "record": "basic", - "fieldnorms": false, "stored": true, - "fast": false + "tokenizer": "raw", + "type": "text" }, { - "name": "message", - "type": "text", + "fast": false, + "fieldnorms": false, "indexed": true, - "tokenizer": "default", + "name": "message", "record": "position", - "fieldnorms": false, "stored": true, - "fast": false + "tokenizer": "default", + "type": "text" } ], - "timestamp_field": "timestamp", + "index_field_presence": true, + "max_num_partitions": 100, + "mode": "dynamic", + "partition_key": "tenant_id", + "store_document_size": false, + "store_source": true, "tag_fields": [ "log_level", "tenant_id" ], - "partition_key": "tenant_id", - "max_num_partitions": 100, - "index_field_presence": true, - "store_document_size": false, - "store_source": true, + "timestamp_field": "timestamp", "tokenizers": [ { + "filters": [], "name": "custom_tokenizer", - "type": "regex", "pattern": "[^\\p{L}\\p{N}]+", - "filters": [] + "type": "regex" } ] }, + "index_id": "my-index", + "index_uri": "s3://quickwit-indexes/my-index", "indexing_settings": { "commit_timeout_secs": 301, - "docstore_compression_level": 8, "docstore_blocksize": 1000000, - "split_num_docs_target": 10000001, + "docstore_compression_level": 8, "merge_policy": { - "type": "stable_log", - "min_level_num_docs": 100000, - "merge_factor": 9, + "maturation_period": "2days", "max_merge_factor": 11, - "maturation_period": "2days" + "merge_factor": 9, + "min_level_num_docs": 100000, + "type": "stable_log" }, "resources": { "heap_size": 50000000 - } + }, + "split_num_docs_target": 10000001 }, "ingest_settings": { "min_shards": 1 }, + "retention": { + "period": "90 days", + "schedule": "daily" + }, "search_settings": { "default_search_fields": [ "message" ] }, - "retention": { - "period": "90 days", - "schedule": "daily" - } - }, - "checkpoint": { - "kafka-source": { - "00000000000000000000": "00000000000000000042" - } + "version": "0.9" }, - "create_timestamp": 1789, + "index_uid": "my-index:00000000000000000000000000", "sources": [ { - "version": "0.9", - "source_id": "kafka-source", - "num_pipelines": 2, "enabled": true, - "source_type": "kafka", + "input_format": "json", + "num_pipelines": 2, "params": { - "topic": "kafka-topic", - "client_params": {} + "client_params": {}, + "topic": "kafka-topic" }, + "source_id": "kafka-source", + "source_type": "kafka", "transform": { "script": ".message = downcase(string!(.message))", "timezone": "UTC" }, - "input_format": "json" + "version": "0.9" + } + ], + "version": "0.9" + }, + "shards": { + "_ingest-source": [ + { + "doc_mapping_uid": "00000000000000000000000000", + "follower_id": "follower-ingester", + "index_uid": "my-index:00000000000000000000000000", + "leader_id": "leader-ingester", + "publish_position_inclusive": "", + "shard_id": "00000000000000000001", + "shard_state": 1, + "source_id": "_ingest-source", + "update_timestamp": 1704067200 } ] }, "splits": [ { - "split_state": "Published", - "update_timestamp": 1789, - "publish_timestamp": 1789, - "version": "0.9", - "split_id": "split", - "index_uid": "my-index:00000000000000000000000000", - "partition_id": 7, - "source_id": "source", - "node_id": "node", - "num_docs": 12303, - "uncompressed_docs_size_in_bytes": 234234, - "time_range": { - "start": 121000, - "end": 130198 - }, "create_timestamp": 3, + "delete_opstamp": 10, + "doc_mapping_uid": "00000000000000000000000000", + "footer_offsets": { + "end": 2000, + "start": 1000 + }, + "index_uid": "my-index:00000000000000000000000000", "maturity": { - "type": "immature", - "maturation_period_millis": 4000 + "maturation_period_millis": 4000, + "type": "immature" }, + "node_id": "node", + "num_docs": 12303, + "num_merge_ops": 3, + "partition_id": 7, + "publish_timestamp": 1789, + "source_id": "source", + "split_id": "split", + "split_state": "Published", "tags": [ "234", "aaa" ], - "footer_offsets": { - "start": 1000, - "end": 2000 + "time_range": { + "end": 130198, + "start": 121000 }, - "delete_opstamp": 10, - "num_merge_ops": 3, - "doc_mapping_uid": "00000000000000000000000000" + "uncompressed_docs_size_in_bytes": 234234, + "update_timestamp": 1789, + "version": "0.9" } ], - "shards": { - "_ingest-source": [ - { - "index_uid": "my-index:00000000000000000000000000", - "source_id": "_ingest-source", - "shard_id": "00000000000000000001", - "leader_id": "leader-ingester", - "follower_id": "follower-ingester", - "shard_state": 1, - "publish_position_inclusive": "", - "doc_mapping_uid": "00000000000000000000000000", - "update_timestamp": 1704067200 - } - ] - }, - "delete_tasks": [ - { - "create_timestamp": 0, - "opstamp": 10, - "delete_query": { - "index_uid": "my-index:00000000000000000000000000", - "query_ast": "{\"type\":\"bool\",\"must\":[{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Harry\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}}},{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Potter\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}}}]}" - } - } - ] + "version": "0.9" } diff --git a/quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.expected.json b/quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.expected.json index cf23e2349e5..0d576bbc777 100644 --- a/quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.expected.json +++ b/quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.expected.json @@ -1,200 +1,200 @@ { - "version": "0.9", + "delete_tasks": [ + { + "create_timestamp": 0, + "delete_query": { + "index_uid": "my-index:00000000000000000000000001", + "query_ast": "{\"type\":\"bool\",\"must\":[{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Harry\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}},\"lenient\":false},{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Potter\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}},\"lenient\":false}]}" + }, + "opstamp": 10 + } + ], "index": { - "version": "0.9", - "index_uid": "my-index:00000000000000000000000001", + "checkpoint": { + "kafka-source": { + "00000000000000000000": "00000000000000000042" + } + }, + "create_timestamp": 1789, "index_config": { - "version": "0.9", - "index_id": "my-index", - "index_uri": "s3://quickwit-indexes/my-index", "doc_mapping": { "doc_mapping_uid": "00000000000000000000000001", - "mode": "dynamic", "dynamic_mapping": { - "indexed": true, - "tokenizer": "raw", - "record": "basic", - "stored": true, "expand_dots": true, "fast": { "normalizer": "raw" - } + }, + "indexed": true, + "record": "basic", + "stored": true, + "tokenizer": "raw" }, "field_mappings": [ { + "coerce": true, + "fast": true, + "indexed": true, "name": "tenant_id", - "type": "u64", + "output_format": "number", "stored": true, - "indexed": true, - "fast": true, - "coerce": true, - "output_format": "number" + "type": "u64" }, { - "name": "timestamp", - "type": "datetime", + "fast": true, + "fast_precision": "seconds", + "indexed": true, "input_formats": [ "rfc3339", "unix_timestamp" ], + "name": "timestamp", "output_format": "rfc3339", - "fast_precision": "seconds", - "indexed": true, "stored": true, - "fast": true + "type": "datetime" }, { - "name": "log_level", - "type": "text", + "fast": false, + "fieldnorms": false, "indexed": true, - "tokenizer": "raw", + "name": "log_level", "record": "basic", - "fieldnorms": false, "stored": true, - "fast": false + "tokenizer": "raw", + "type": "text" }, { - "name": "message", - "type": "text", + "fast": false, + "fieldnorms": false, "indexed": true, - "tokenizer": "default", + "name": "message", "record": "position", - "fieldnorms": false, "stored": true, - "fast": false + "tokenizer": "default", + "type": "text" } ], - "timestamp_field": "timestamp", + "index_field_presence": true, + "max_num_partitions": 100, + "mode": "dynamic", + "partition_key": "tenant_id", + "store_document_size": false, + "store_source": true, "tag_fields": [ "log_level", "tenant_id" ], - "partition_key": "tenant_id", - "max_num_partitions": 100, - "index_field_presence": true, - "store_document_size": false, - "store_source": true, + "timestamp_field": "timestamp", "tokenizers": [ { + "filters": [], "name": "custom_tokenizer", - "type": "regex", "pattern": "[^\\p{L}\\p{N}]+", - "filters": [] + "type": "regex" } ] }, + "index_id": "my-index", + "index_uri": "s3://quickwit-indexes/my-index", "indexing_settings": { "commit_timeout_secs": 301, - "docstore_compression_level": 8, "docstore_blocksize": 1000000, - "split_num_docs_target": 10000001, + "docstore_compression_level": 8, "merge_policy": { - "type": "stable_log", - "min_level_num_docs": 100000, - "merge_factor": 9, + "maturation_period": "2days", "max_merge_factor": 11, - "maturation_period": "2days" + "merge_factor": 9, + "min_level_num_docs": 100000, + "type": "stable_log" }, "resources": { "heap_size": 50000000 - } + }, + "split_num_docs_target": 10000001 }, "ingest_settings": { "min_shards": 12 }, + "retention": { + "period": "90 days", + "schedule": "daily" + }, "search_settings": { "default_search_fields": [ "message" ] }, - "retention": { - "period": "90 days", - "schedule": "daily" - } - }, - "checkpoint": { - "kafka-source": { - "00000000000000000000": "00000000000000000042" - } + "version": "0.9" }, - "create_timestamp": 1789, + "index_uid": "my-index:00000000000000000000000001", "sources": [ { - "version": "0.9", - "source_id": "kafka-source", - "num_pipelines": 2, "enabled": true, - "source_type": "kafka", + "input_format": "json", + "num_pipelines": 2, "params": { - "topic": "kafka-topic", - "client_params": {} + "client_params": {}, + "topic": "kafka-topic" }, + "source_id": "kafka-source", + "source_type": "kafka", "transform": { "script": ".message = downcase(string!(.message))", "timezone": "UTC" }, - "input_format": "json" + "version": "0.9" + } + ], + "version": "0.9" + }, + "shards": { + "_ingest-source": [ + { + "doc_mapping_uid": "00000000000000000000000001", + "follower_id": "follower-ingester", + "index_uid": "my-index:00000000000000000000000001", + "leader_id": "leader-ingester", + "publish_position_inclusive": "", + "shard_id": "00000000000000000001", + "shard_state": 1, + "source_id": "_ingest-source", + "update_timestamp": 1724240908 } ] }, "splits": [ { - "split_state": "Published", - "update_timestamp": 1789, - "publish_timestamp": 1789, - "version": "0.9", - "split_id": "split", - "index_uid": "my-index:00000000000000000000000001", - "partition_id": 7, - "source_id": "source", - "node_id": "node", - "num_docs": 12303, - "uncompressed_docs_size_in_bytes": 234234, - "time_range": { - "start": 121000, - "end": 130198 - }, "create_timestamp": 3, + "delete_opstamp": 10, + "doc_mapping_uid": "00000000000000000000000000", + "footer_offsets": { + "end": 2000, + "start": 1000 + }, + "index_uid": "my-index:00000000000000000000000001", "maturity": { - "type": "immature", - "maturation_period_millis": 4000 + "maturation_period_millis": 4000, + "type": "immature" }, + "node_id": "node", + "num_docs": 12303, + "num_merge_ops": 3, + "partition_id": 7, + "publish_timestamp": 1789, + "source_id": "source", + "split_id": "split", + "split_state": "Published", "tags": [ "234", "aaa" ], - "footer_offsets": { - "start": 1000, - "end": 2000 + "time_range": { + "end": 130198, + "start": 121000 }, - "delete_opstamp": 10, - "num_merge_ops": 3, - "doc_mapping_uid": "00000000000000000000000000" + "uncompressed_docs_size_in_bytes": 234234, + "update_timestamp": 1789, + "version": "0.9" } ], - "shards": { - "_ingest-source": [ - { - "index_uid": "my-index:00000000000000000000000001", - "source_id": "_ingest-source", - "shard_id": "00000000000000000001", - "leader_id": "leader-ingester", - "follower_id": "follower-ingester", - "shard_state": 1, - "publish_position_inclusive": "", - "doc_mapping_uid": "00000000000000000000000001", - "update_timestamp": 1724240908 - } - ] - }, - "delete_tasks": [ - { - "create_timestamp": 0, - "opstamp": 10, - "delete_query": { - "index_uid": "my-index:00000000000000000000000001", - "query_ast": "{\"type\":\"bool\",\"must\":[{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Harry\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}},\"lenient\":false},{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Potter\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}},\"lenient\":false}]}" - } - } - ] + "version": "0.9" } diff --git a/quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.json b/quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.json index cf23e2349e5..0d576bbc777 100644 --- a/quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.json +++ b/quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.json @@ -1,200 +1,200 @@ { - "version": "0.9", + "delete_tasks": [ + { + "create_timestamp": 0, + "delete_query": { + "index_uid": "my-index:00000000000000000000000001", + "query_ast": "{\"type\":\"bool\",\"must\":[{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Harry\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}},\"lenient\":false},{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Potter\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}},\"lenient\":false}]}" + }, + "opstamp": 10 + } + ], "index": { - "version": "0.9", - "index_uid": "my-index:00000000000000000000000001", + "checkpoint": { + "kafka-source": { + "00000000000000000000": "00000000000000000042" + } + }, + "create_timestamp": 1789, "index_config": { - "version": "0.9", - "index_id": "my-index", - "index_uri": "s3://quickwit-indexes/my-index", "doc_mapping": { "doc_mapping_uid": "00000000000000000000000001", - "mode": "dynamic", "dynamic_mapping": { - "indexed": true, - "tokenizer": "raw", - "record": "basic", - "stored": true, "expand_dots": true, "fast": { "normalizer": "raw" - } + }, + "indexed": true, + "record": "basic", + "stored": true, + "tokenizer": "raw" }, "field_mappings": [ { + "coerce": true, + "fast": true, + "indexed": true, "name": "tenant_id", - "type": "u64", + "output_format": "number", "stored": true, - "indexed": true, - "fast": true, - "coerce": true, - "output_format": "number" + "type": "u64" }, { - "name": "timestamp", - "type": "datetime", + "fast": true, + "fast_precision": "seconds", + "indexed": true, "input_formats": [ "rfc3339", "unix_timestamp" ], + "name": "timestamp", "output_format": "rfc3339", - "fast_precision": "seconds", - "indexed": true, "stored": true, - "fast": true + "type": "datetime" }, { - "name": "log_level", - "type": "text", + "fast": false, + "fieldnorms": false, "indexed": true, - "tokenizer": "raw", + "name": "log_level", "record": "basic", - "fieldnorms": false, "stored": true, - "fast": false + "tokenizer": "raw", + "type": "text" }, { - "name": "message", - "type": "text", + "fast": false, + "fieldnorms": false, "indexed": true, - "tokenizer": "default", + "name": "message", "record": "position", - "fieldnorms": false, "stored": true, - "fast": false + "tokenizer": "default", + "type": "text" } ], - "timestamp_field": "timestamp", + "index_field_presence": true, + "max_num_partitions": 100, + "mode": "dynamic", + "partition_key": "tenant_id", + "store_document_size": false, + "store_source": true, "tag_fields": [ "log_level", "tenant_id" ], - "partition_key": "tenant_id", - "max_num_partitions": 100, - "index_field_presence": true, - "store_document_size": false, - "store_source": true, + "timestamp_field": "timestamp", "tokenizers": [ { + "filters": [], "name": "custom_tokenizer", - "type": "regex", "pattern": "[^\\p{L}\\p{N}]+", - "filters": [] + "type": "regex" } ] }, + "index_id": "my-index", + "index_uri": "s3://quickwit-indexes/my-index", "indexing_settings": { "commit_timeout_secs": 301, - "docstore_compression_level": 8, "docstore_blocksize": 1000000, - "split_num_docs_target": 10000001, + "docstore_compression_level": 8, "merge_policy": { - "type": "stable_log", - "min_level_num_docs": 100000, - "merge_factor": 9, + "maturation_period": "2days", "max_merge_factor": 11, - "maturation_period": "2days" + "merge_factor": 9, + "min_level_num_docs": 100000, + "type": "stable_log" }, "resources": { "heap_size": 50000000 - } + }, + "split_num_docs_target": 10000001 }, "ingest_settings": { "min_shards": 12 }, + "retention": { + "period": "90 days", + "schedule": "daily" + }, "search_settings": { "default_search_fields": [ "message" ] }, - "retention": { - "period": "90 days", - "schedule": "daily" - } - }, - "checkpoint": { - "kafka-source": { - "00000000000000000000": "00000000000000000042" - } + "version": "0.9" }, - "create_timestamp": 1789, + "index_uid": "my-index:00000000000000000000000001", "sources": [ { - "version": "0.9", - "source_id": "kafka-source", - "num_pipelines": 2, "enabled": true, - "source_type": "kafka", + "input_format": "json", + "num_pipelines": 2, "params": { - "topic": "kafka-topic", - "client_params": {} + "client_params": {}, + "topic": "kafka-topic" }, + "source_id": "kafka-source", + "source_type": "kafka", "transform": { "script": ".message = downcase(string!(.message))", "timezone": "UTC" }, - "input_format": "json" + "version": "0.9" + } + ], + "version": "0.9" + }, + "shards": { + "_ingest-source": [ + { + "doc_mapping_uid": "00000000000000000000000001", + "follower_id": "follower-ingester", + "index_uid": "my-index:00000000000000000000000001", + "leader_id": "leader-ingester", + "publish_position_inclusive": "", + "shard_id": "00000000000000000001", + "shard_state": 1, + "source_id": "_ingest-source", + "update_timestamp": 1724240908 } ] }, "splits": [ { - "split_state": "Published", - "update_timestamp": 1789, - "publish_timestamp": 1789, - "version": "0.9", - "split_id": "split", - "index_uid": "my-index:00000000000000000000000001", - "partition_id": 7, - "source_id": "source", - "node_id": "node", - "num_docs": 12303, - "uncompressed_docs_size_in_bytes": 234234, - "time_range": { - "start": 121000, - "end": 130198 - }, "create_timestamp": 3, + "delete_opstamp": 10, + "doc_mapping_uid": "00000000000000000000000000", + "footer_offsets": { + "end": 2000, + "start": 1000 + }, + "index_uid": "my-index:00000000000000000000000001", "maturity": { - "type": "immature", - "maturation_period_millis": 4000 + "maturation_period_millis": 4000, + "type": "immature" }, + "node_id": "node", + "num_docs": 12303, + "num_merge_ops": 3, + "partition_id": 7, + "publish_timestamp": 1789, + "source_id": "source", + "split_id": "split", + "split_state": "Published", "tags": [ "234", "aaa" ], - "footer_offsets": { - "start": 1000, - "end": 2000 + "time_range": { + "end": 130198, + "start": 121000 }, - "delete_opstamp": 10, - "num_merge_ops": 3, - "doc_mapping_uid": "00000000000000000000000000" + "uncompressed_docs_size_in_bytes": 234234, + "update_timestamp": 1789, + "version": "0.9" } ], - "shards": { - "_ingest-source": [ - { - "index_uid": "my-index:00000000000000000000000001", - "source_id": "_ingest-source", - "shard_id": "00000000000000000001", - "leader_id": "leader-ingester", - "follower_id": "follower-ingester", - "shard_state": 1, - "publish_position_inclusive": "", - "doc_mapping_uid": "00000000000000000000000001", - "update_timestamp": 1724240908 - } - ] - }, - "delete_tasks": [ - { - "create_timestamp": 0, - "opstamp": 10, - "delete_query": { - "index_uid": "my-index:00000000000000000000000001", - "query_ast": "{\"type\":\"bool\",\"must\":[{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Harry\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}},\"lenient\":false},{\"type\":\"full_text\",\"field\":\"body\",\"text\":\"Potter\",\"params\":{\"mode\":{\"type\":\"phrase_fallback_to_intersection\"}},\"lenient\":false}]}" - } - } - ] + "version": "0.9" } diff --git a/quickwit/quickwit-metastore/test-data/index-metadata/v0.9.expected.json b/quickwit/quickwit-metastore/test-data/index-metadata/v0.9.expected.json index 8fca7405352..d8b0ec2dc8b 100644 --- a/quickwit/quickwit-metastore/test-data/index-metadata/v0.9.expected.json +++ b/quickwit/quickwit-metastore/test-data/index-metadata/v0.9.expected.json @@ -100,13 +100,13 @@ }, "split_num_docs_target": 10000001 }, + "ingest_settings": { + "min_shards": 12 + }, "retention": { "period": "90 days", "schedule": "daily" }, - "ingest_settings": { - "min_shards": 12 - }, "search_settings": { "default_search_fields": [ "message" diff --git a/quickwit/quickwit-metastore/test-data/index-metadata/v0.9.json b/quickwit/quickwit-metastore/test-data/index-metadata/v0.9.json index 8fca7405352..d8b0ec2dc8b 100644 --- a/quickwit/quickwit-metastore/test-data/index-metadata/v0.9.json +++ b/quickwit/quickwit-metastore/test-data/index-metadata/v0.9.json @@ -100,13 +100,13 @@ }, "split_num_docs_target": 10000001 }, + "ingest_settings": { + "min_shards": 12 + }, "retention": { "period": "90 days", "schedule": "daily" }, - "ingest_settings": { - "min_shards": 12 - }, "search_settings": { "default_search_fields": [ "message" diff --git a/quickwit/quickwit-metastore/test-data/manifest/v0.7.expected.json b/quickwit/quickwit-metastore/test-data/manifest/v0.7.expected.json index 674e583d56f..303dcc533c6 100644 --- a/quickwit/quickwit-metastore/test-data/manifest/v0.7.expected.json +++ b/quickwit/quickwit-metastore/test-data/manifest/v0.7.expected.json @@ -74,14 +74,14 @@ }, "split_num_docs_target": 10000000 }, + "ingest_settings": { + "min_shards": 1 + }, "priority": 100, "retention": { "period": "42 days", "schedule": "daily" }, - "ingest_settings": { - "min_shards": 1 - }, "search_settings": { "default_search_fields": [] }, diff --git a/quickwit/quickwit-metastore/test-data/manifest/v0.8.expected.json b/quickwit/quickwit-metastore/test-data/manifest/v0.8.expected.json index 2b6819d6bbb..303dcc533c6 100644 --- a/quickwit/quickwit-metastore/test-data/manifest/v0.8.expected.json +++ b/quickwit/quickwit-metastore/test-data/manifest/v0.8.expected.json @@ -74,10 +74,10 @@ }, "split_num_docs_target": 10000000 }, - "priority": 100, "ingest_settings": { "min_shards": 1 }, + "priority": 100, "retention": { "period": "42 days", "schedule": "daily" diff --git a/quickwit/quickwit-metastore/test-data/manifest/v0.9.json b/quickwit/quickwit-metastore/test-data/manifest/v0.9.json index 914047f5421..11f3c3287a0 100644 --- a/quickwit/quickwit-metastore/test-data/manifest/v0.9.json +++ b/quickwit/quickwit-metastore/test-data/manifest/v0.9.json @@ -74,14 +74,14 @@ }, "split_num_docs_target": 10000000 }, + "ingest_settings": { + "min_shards": 1 + }, "priority": 100, "retention": { "period": "42 days", "schedule": "daily" }, - "ingest_settings": { - "min_shards": 1 - }, "search_settings": { "default_search_fields": [] },