Skip to content

feat(parquet): initial support prune parts at runtime when do topk query#19903

Draft
ariesdevil wants to merge 1 commit into
databendlabs:mainfrom
ariesdevil:codex/zyj
Draft

feat(parquet): initial support prune parts at runtime when do topk query#19903
ariesdevil wants to merge 1 commit into
databendlabs:mainfrom
ariesdevil:codex/zyj

Conversation

@ariesdevil

@ariesdevil ariesdevil commented May 21, 2026

Copy link
Copy Markdown
Contributor

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Add a Parquet/Fuse progressive TopK scan path for ordered LIMIT queries, inspired by langsmith.

For eligible single-column ORDER BY ... LIMIT K scans, Fuse Parquet partitions are ordered by block-level sort min/max stats. During execution, the Parquet reader feeds back filtered rows into a TopK state after deserialize_and_filter, allowing the partition stream to stop scheduling older blocks once they can no longer enter the current TopK result.

Current limitations:

  • Parquet/Fuse only
  • block-level partitions only
  • supported sort keys: number, date, timestamp
  • LIMIT <= 1000
  • one block is scheduled at a time
  • unsupported complex paths, such as runtime filters, lazy/receiver pruning, virtual columns, agg indexes, samples, and secure filters, fall back to the existing scan behavior

The final Sort/Limit is still preserved for correctness. A follow-up can add bounded batch or in-flight block scheduling to improve object-storage throughput while still limiting overscan.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions Bot added the pr-feature this PR introduces a new feature to the codebase label May 21, 2026
@ariesdevil

Copy link
Copy Markdown
Contributor Author

Feel free to take over or close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant