You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add file_row_index UDF to query file-level row indexes from Parquet files (#22604)
## Which issue does this PR close?
- Part of #20135
## Rationale for this change
This PR includes the "front end" side of @mbutrovich's #22026, bridging
the last mile to allow users to query file row indexes.
## What changes are included in this PR?
1. A new Scalar UDF `file_row_index`, following #20071's example. The
function returns 0-based row indexes for Parquet scans.
2. Expands the row-filter PushdownChecker to also check if the predicate
contains the new function, denying it from being pushed down if it does.
3. I've added a couple of utilities to find or rewrite ScalarUDF
instances in physical expressions trees, I've seen @alamb point this
mistake out in multiple PRs (including
[here](#20071 (comment))).
They can also be used in #20071. They are currently in
`schema_rewriter.rs` which was the best place I could think of, but
maybe they should be move elsewhere.
4. A dedicated rewrite function for `file_row_index`, which turns it
into a `Cast(Column(...))`, which is required to return Int64 values.
5. In `ParquetSource::try_pushdown_projection`, we look for
`FileRowIndexFunc`, and if it exists we rewrite it and the source's
table schema.
## Are these changes tested?
In addition to individual unit tests, I've added a new SLT file
(`file_row_index.slt`) that tests for the following cases:
1. Querying `file_row_index` from a table backed by multiple files
2. Filtering on `file_row_index` when its part of the projection
3. Filtering on `file_row_index` when its **not** of the projection,
when filter pushdown is either enabled or disabled (this part didn't
work in a previous iteration, but figured it out today).
## Are there any user-facing changes?
1. New scalar function type - `FileRowIndexFunc`/`file_row_index`,
5. Rewrite logic in `physical-expr-adapter` -
`rewrite_file_row_index_expr` specifically for the new UDF,
`rewrite_file_row_index_projection` to rewrite the `ProjectionExprs` and
two utility functions that should make it clearer how to manipulate and
find ScalarUDFs in physical expressions - `expr_references_scalar_udf`
and `rewrite_scalar_udf`.
---------
Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
0 commit comments