use a shadow table for faster rowgroup pruning

See discussion in #34.

ATM we prune row groups by doing a linear scan over the row group statistics. For ~10-20K row groups, this takes ~20ms. Where possible, it'd be nice to do this pruning via a faster mechanism. This could make some parquet-backed tables almost competitive with SQLite itself for needle in a haystack queries. eg I think this is possible:

- sqlite: 0.1ms
- parquet w/fast rowgroup pruning: 1-2ms
- parquet w/linear rowgroup pruning: 20ms

### Option 1: SQLite shadow table
Perhaps on table connect we could populate a shadow table with the statistics, then use that at query time to generate an array of row group IDs to traverse.

I'm not as sure that this is a slamdunk. SQLite doesn't support using more than 1 index in a query, so it will often end up doing a table scan.

### Option 2: our own indexing

We could do the indexing ourself in C. I'm not enthused about this.

I guess we could sort the ranges by their lower bound into some array arr.

Then binary search arr for lower bound < our lowest value (order by lo desc limit 1) and lower bound > our highest value (order by low asc limit 1), which gives us the bounds to search.

Actually, I guess we could translate that into a series of SQLite calls.

### Option 3: constrain the problem

We could support it only for monotonically increasing values -- then we could do 2 binary searches to determine the lower and upper bounds of the row groups to search. We could either do the search in C, or delegate it to SQLite.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use a shadow table for faster rowgroup pruning #36

Option 1: SQLite shadow table

Option 2: our own indexing

Option 3: constrain the problem

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

use a shadow table for faster rowgroup pruning #36

Description

Option 1: SQLite shadow table

Option 2: our own indexing

Option 3: constrain the problem

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions