feat(query): add page index#20079
Draft
zhang2014 wants to merge 8 commits into
Draft
Conversation
…uning Introduce a ClickHouse-style sparse page index that maps cluster-key granules to physical page byte ranges, enabling sub-block pruning on the read path. Write path: - Add PageIndexWriter that builds a columnar parquet sidecar file with per-granule cluster-key min values and leaf-column page offsets. - Integrate granule splitting into the stream block builder based on the new index_granularity table option. - Extend WriteSettings, BlockWriter, and location helpers to produce and persist the page index file. Read path: - Add SparsePageIndexPruner that evaluates predicates against granule cluster-key ranges and narrows column reads to relevant byte ranges. - Extend FusePart and ReadState to carry page-level byte ranges. - Wire page-index-aware merge IO into block_reader_merge_io_async. Pruning: - Remove the old page_pruner module from common/pruner (superseded). - Integrate sparse page index pruning into the fuse block pruner. Validation & tests: - Add index_granularity to table option validation. - Add sqllogictest covering end-to-end write and read with the index. - Update existing tests to accommodate new BlockMeta/parquet fields.
Add fuse compatibility test cases to ensure: - Forward compat: current writer with page-index sidecar files and new BlockMeta fields is readable by older readers that ignore the unknown sidecar and fall back to whole-block reads. - Backward compat: old writer with inline ClusterStatistics.pages (row_per_page) is still readable by the current reader after removal of the legacy page_pruner module. Both suites cover point lookups, range predicates, out-of-range predicates, and full scans over single and multi-column cluster keys.
CI lint (check_macos / check) failed on test_cases.yaml:40 with "too many blank lines (1 > 0)". Strip the extra trailing blank. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The sqllogictest column-type markers (e.g. `query IIT`) are flagged by typos as misspellings of `IT`. tests/sqllogictests is already excluded for the same reason; apply the same exemption to the compat-logictest suites. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI `cargo clippy -D warnings` failed with clippy::type_complexity on ArrowParquetWriter::finish's 3-tuple return. Introduce a named FinishedRowGroup struct for the row-group result. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
Docker Image for PR
|
# Conflicts: # src/query/service/tests/it/indexes/spatial_index/runtime_filter.rs # src/query/service/tests/it/storages/fuse/bloom_index_meta_size.rs # src/query/service/tests/it/storages/fuse/operations/prewhere.rs # src/query/storages/common/cache/src/manager.rs # src/query/storages/common/index/src/page_index.rs # src/query/storages/common/pruner/src/page_pruner.rs # src/query/storages/common/table_meta/src/meta/v2/segment.rs # src/query/storages/common/table_meta/src/meta/v3/frozen/block_meta.rs # src/query/storages/fuse/src/fuse_part.rs # src/query/storages/fuse/src/io/write/block_writer.rs # src/query/storages/fuse/src/io/write/stream/block_builder.rs # src/query/storages/fuse/src/operations/read/read_block_context.rs # src/query/storages/fuse/src/operations/read_partitions.rs # src/query/storages/fuse/src/pruning/expr_runtime_pruner.rs # src/query/storages/fuse/src/pruning_pipeline/column_oriented_block_prune.rs # src/query/storages/fuse/src/statistics/cluster_statistics.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
feat(query): add page index
Tests
Type of change
This change is