Skip to content

feat(query): add page index#20079

Draft
zhang2014 wants to merge 8 commits into
databendlabs:mainfrom
zhang2014:refactor/page_index
Draft

feat(query): add page index#20079
zhang2014 wants to merge 8 commits into
databendlabs:mainfrom
zhang2014:refactor/page_index

Conversation

@zhang2014

@zhang2014 zhang2014 commented Jun 29, 2026

Copy link
Copy Markdown
Member

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

feat(query): add page index

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

…uning

Introduce a ClickHouse-style sparse page index that maps cluster-key
granules to physical page byte ranges, enabling sub-block pruning on
the read path.

Write path:
- Add PageIndexWriter that builds a columnar parquet sidecar file with
  per-granule cluster-key min values and leaf-column page offsets.
- Integrate granule splitting into the stream block builder based on
  the new index_granularity table option.
- Extend WriteSettings, BlockWriter, and location helpers to produce
  and persist the page index file.

Read path:
- Add SparsePageIndexPruner that evaluates predicates against granule
  cluster-key ranges and narrows column reads to relevant byte ranges.
- Extend FusePart and ReadState to carry page-level byte ranges.
- Wire page-index-aware merge IO into block_reader_merge_io_async.

Pruning:
- Remove the old page_pruner module from common/pruner (superseded).
- Integrate sparse page index pruning into the fuse block pruner.

Validation & tests:
- Add index_granularity to table option validation.
- Add sqllogictest covering end-to-end write and read with the index.
- Update existing tests to accommodate new BlockMeta/parquet fields.
Add fuse compatibility test cases to ensure:
- Forward compat: current writer with page-index sidecar files and new
  BlockMeta fields is readable by older readers that ignore the unknown
  sidecar and fall back to whole-block reads.
- Backward compat: old writer with inline ClusterStatistics.pages
  (row_per_page) is still readable by the current reader after removal
  of the legacy page_pruner module.

Both suites cover point lookups, range predicates, out-of-range
predicates, and full scans over single and multi-column cluster keys.
@github-actions github-actions Bot added the pr-feature this PR introduces a new feature to the codebase label Jun 29, 2026
zhang2014 and others added 3 commits June 29, 2026 12:00
CI lint (check_macos / check) failed on test_cases.yaml:40 with
"too many blank lines (1 > 0)". Strip the extra trailing blank.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The sqllogictest column-type markers (e.g. `query IIT`) are flagged by
typos as misspellings of `IT`. tests/sqllogictests is already excluded
for the same reason; apply the same exemption to the compat-logictest
suites.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI `cargo clippy -D warnings` failed with clippy::type_complexity on
ArrowParquetWriter::finish's 3-tuple return. Introduce a named
FinishedRowGroup struct for the row-group result.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@zhang2014 zhang2014 added the ci-cloud Build docker image for cloud test label Jun 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Docker Image for PR

  • tag: pr-20079-db56cca-1782716858

note: this image tag is only available for internal use.

zhang2014 added 3 commits July 3, 2026 17:50
# Conflicts:
#	src/query/service/tests/it/indexes/spatial_index/runtime_filter.rs
#	src/query/service/tests/it/storages/fuse/bloom_index_meta_size.rs
#	src/query/service/tests/it/storages/fuse/operations/prewhere.rs
#	src/query/storages/common/cache/src/manager.rs
#	src/query/storages/common/index/src/page_index.rs
#	src/query/storages/common/pruner/src/page_pruner.rs
#	src/query/storages/common/table_meta/src/meta/v2/segment.rs
#	src/query/storages/common/table_meta/src/meta/v3/frozen/block_meta.rs
#	src/query/storages/fuse/src/fuse_part.rs
#	src/query/storages/fuse/src/io/write/block_writer.rs
#	src/query/storages/fuse/src/io/write/stream/block_builder.rs
#	src/query/storages/fuse/src/operations/read/read_block_context.rs
#	src/query/storages/fuse/src/operations/read_partitions.rs
#	src/query/storages/fuse/src/pruning/expr_runtime_pruner.rs
#	src/query/storages/fuse/src/pruning_pipeline/column_oriented_block_prune.rs
#	src/query/storages/fuse/src/statistics/cluster_statistics.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant