import: support computed columns with UDF references#165129
import: support computed columns with UDF references#165129trunk-io[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
😎 Merged successfully - details. |
ecb294e to
c79e1a8
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Nice! I have a few questions.
@yuzefovich reviewed 6 files and all commit messages, and made 6 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and mw5h).
pkg/sql/catalog/schemaexpr/computed_column.go line 326 at r2 (raw file):
} // Collect all column IDs that are referenced in the partial index
nit: (not your code, but) this comment seems stale / copied incorrectly.
pkg/sql/catalog/schemaexpr/computed_column.go line 383 at r2 (raw file):
ctx context.Context, expr tree.TypedExpr, semaCtx *tree.SemaContext, ) (tree.TypedExpr, error) { newExpr, err := tree.SimpleVisit(expr, func(e tree.Expr) (bool, tree.Expr, error) {
nit: it's helpful to have named return arguments
(recurse bool, newExpr tree.Expr, err error)pkg/sql/catalog/schemaexpr/computed_column.go line 433 at r2 (raw file):
// already-typed argument expressions (e.g. IndexedVar). paramMap := make(map[string]tree.Expr, len(fn.RoutineParams)) for i, p := range fn.RoutineParams {
Should we be ignoring OUT parameters here? Is it possible to have a computed expression invoke a routine with OUT / INOUT parameters?
pkg/sql/catalog/schemaexpr/computed_column.go line 439 at r2 (raw file):
} // Substitute parameter references in the body.
Should we do another walk over bodyExpr after the substitution to check that all parameter references have been replaced and returning an error if not?
pkg/sql/logictest/testdata/logic_test/udf_unsupported line 9 at r2 (raw file):
CREATE TABLE test_tbl_t (a INT PRIMARY KEY, b INT); # Insert rows to verify that backfills that use UDFs produce correct values.
Should we introduce some new test cases for the errors with inlining the UDF and then move test cases that now work to another file (since this one is udf_unsupported)?
DrewKimball
left a comment
There was a problem hiding this comment.
@DrewKimball reviewed 3 files and all commit messages, and made 3 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on mw5h).
-- commits line 51 at r2:
I don't think this is true - on a demo cluster, I was able to create a computed column expression calling a UDF with multiple statements as well as one with a PL/pgSQL routine.
pkg/sql/catalog/schemaexpr/computed_column.go line 440 at r2 (raw file):
// Substitute parameter references in the body. inlined, err := tree.SimpleVisit(
I'm not sure this pattern will actually visit everything that could be in a UDF body. We should test parameter references in:
- Subquery
- The inputs + ON condition of a Join
- CTE
- GroupBy
We should also test:
- Nested routine calls
- Ordinal parameter references like
$1
pkg/sql/importer/import_processor.go line 313 at r1 (raw file):
// Release leases on any accessed descriptors now that the // resolver is configured. We do this so that leases are not // held for the entire import process.
Does this actually do anything, since the function resolver hasn't been used yet at this point?
Also a more general question: I know we do this for backfill, so not blocking. But is it correct to release leases while we're using these descriptors for routine evaluation or type casts?
c79e1a8 to
9b5632d
Compare
f17c5aa to
0c1beaf
Compare
mw5h
left a comment
There was a problem hiding this comment.
@mw5h made 4 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and yuzefovich).
-- commits line 22 at r4:
I think DrewK meant that this is not true. UDF-based computed column expressions succeed but are not backfill/import-able.
pkg/sql/importer/read_import_base.go line 90 at r4 (raw file):
evalCtx := flowCtx.NewEvalCtx() evalCtx.Regions = makeImportRegionOperator(spec.DatabasePrimaryRegion)
Delete this blank line.
pkg/sql/importer/read_import_base.go line 92 at r4 (raw file):
semaCtx := tree.MakeSemaContext(importResolver) conv, convErr := makeInputConverter(
I'm not seeing a reason to change the name of the error here.
pkg/sql/importer/read_import_base.go line 129 at r4 (raw file):
// at the end is one row containing an encoded BulkOpSummary. var summary *kvpb.BulkOpSummary var err error
We shouldn't capture the error here, just return the error and let it propagate via the group.Wait() call.
0c1beaf to
dd6ad24
Compare
|
This should be RFAL. |
mw5h
left a comment
There was a problem hiding this comment.
@mw5h made 6 comments and resolved 2 discussions.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and yuzefovich).
Previously, DrewKimball (Drew Kimball) wrote…
I don't think this is true - on a demo cluster, I was able to create a computed column expression calling a UDF with multiple statements as well as one with a PL/pgSQL routine.
Done.
pkg/sql/catalog/schemaexpr/computed_column.go line 433 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Should we be ignoring OUT parameters here? Is it possible to have a computed expression invoke a routine with OUT / INOUT parameters?
Done.
pkg/sql/catalog/schemaexpr/computed_column.go line 439 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Should we do another walk over
bodyExprafter the substitution to check that all parameter references have been replaced and returning an error if not?
Done.
pkg/sql/catalog/schemaexpr/computed_column.go line 440 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I'm not sure this pattern will actually visit everything that could be in a UDF body. We should test parameter references in:
- Subquery
- The inputs + ON condition of a Join
- CTE
- GroupBy
We should also test:
- Nested routine calls
- Ordinal parameter references like
$1
Done.
pkg/sql/logictest/testdata/logic_test/udf_unsupported line 9 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Should we introduce some new test cases for the errors with inlining the UDF and then move test cases that now work to another file (since this one is
udf_unsupported)?
Done.
pkg/sql/importer/import_processor.go line 313 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Does this actually do anything, since the function resolver hasn't been used yet at this point?
Also a more general question: I know we do this for backfill, so not blocking. But is it correct to release leases while we're using these descriptors for routine evaluation or type casts?
Done.
yuzefovich
left a comment
There was a problem hiding this comment.
@yuzefovich reviewed 9 files and all commit messages, made 3 comments, and resolved 2 discussions.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and mw5h).
pkg/sql/catalog/schemaexpr/computed_column.go line 439 at r2 (raw file):
Previously, mw5h (Matt White) wrote…
Done.
I don't see the check that all parameters have been replaced.
pkg/sql/catalog/schemaexpr/computed_column.go line 456 at r6 (raw file):
} if i < len(funcExpr.Exprs) { paramMap[string(p.Name)] = funcExpr.Exprs[i]
It's allowed for a UDF to have unnamed parameters (i.e. p.Name == ""). Such unnamed IN parameters cannot be referenced in the body by name (only via $i notation which we currently don't support (#114701)), and there can be any number of unnamed parameters. The current logic probably doesn't handle the case of multiple unnamed IN parameters.
pkg/sql/catalog/schemaexpr/computed_column.go line 459 at r6 (raw file):
} } if len(paramMap) != len(funcExpr.Exprs) {
nit: it feels off to have the length check done after the loop given that in the loop we have bounds check on i, meaning that by construction len(paramMap) will never exceed len(funcExpr.Exprs) even if we have too many routine params.
754e131 to
a3822cf
Compare
mw5h
left a comment
There was a problem hiding this comment.
@mw5h made 3 comments and resolved 1 discussion.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and yuzefovich).
pkg/sql/catalog/schemaexpr/computed_column.go line 439 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I don't see the check that all parameters have been replaced.
Yeah, sorry, in the update I lost what this was referring to.
pkg/sql/catalog/schemaexpr/computed_column.go line 456 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
It's allowed for a UDF to have unnamed parameters (i.e.
p.Name == ""). Such unnamed IN parameters cannot be referenced in the body by name (only via $i notation which we currently don't support (#114701)), and there can be any number of unnamed parameters. The current logic probably doesn't handle the case of multiple unnamed IN parameters.
Looks like $i notation works fine for SQL UDFs, just not PL/pgSQL.
pkg/sql/catalog/schemaexpr/computed_column.go line 459 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: it feels off to have the length check done after the loop given that in the loop we have bounds check on
i, meaning that by constructionlen(paramMap)will never exceedlen(funcExpr.Exprs)even if we have too many routine params.
Agreed.
|
I believe this is RFAL. |
yuzefovich
left a comment
There was a problem hiding this comment.
Nice work! but might be worth waiting for Drew to take another look as well.
@yuzefovich reviewed 4 files and all commit messages, made 2 comments, and resolved 3 discussions.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on DrewKimball and mw5h).
pkg/sql/importer/read_import_base.go line 92 at r4 (raw file):
Previously, mw5h (Matt White) wrote…
I'm not seeing a reason to change the name of the error here.
nit: maybe revert the change here (i.e. keep everything on a single line).
IMPORT INTO with computed columns that reference user-defined functions fails during type-checking because the SemaContext has no FunctionResolver. The UDF references (stored as OIDs in the computed column expression) cannot be resolved, producing: function 100107 not found Fix this by setting up a per-instance DistSQLFunctionResolver in NewDatumRowConverter, following the same pattern already used for per-instance sequence resolution. Each converter creates its own bare-bones descs.Collection inside a short-lived txn to resolve UDF descriptors by OID. This avoids sharing a single descs.Collection across parallel import workers, which would cause data races. Since bare-bones collections lack a lease manager, we use ByIDWithoutLeased via the new NewDistSQLFunctionResolverFromGetter constructor. Note: this fix alone is necessary but not sufficient — evaluation of UDF-based computed column expressions will still fail because eval.Expr cannot execute functions with SQL bodies (issue cockroachdb#147472). The next commit addresses that. Informs: cockroachdb#157195 Informs: cockroachdb#147472 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
a3822cf to
564b2c6
Compare
DrewKimball
left a comment
There was a problem hiding this comment.
I have a few more suggestios for things to test/disallow, but beyond that LGTM.
| if fn == nil || fn.Body == "" { | ||
| return true, e, nil | ||
| } | ||
| if fn.Language != tree.RoutineLangSQL { |
There was a problem hiding this comment.
We should probably also check for generator functions if that isn't already done elsewhere.
There was a problem hiding this comment.
I'm also wondering if we're handling RECORD-returning routines correctly. We should test, or maybe just disallow for now.
There was a problem hiding this comment.
Done — generators inside UDF bodies are now caught via sanitizeExpr (which uses RejectSpecial, including RejectGenerators). Additionally, we now validate at DDL time (CREATE TABLE / ALTER TABLE) by running inlineUDFCalls during expression validation, so all non-inlineable patterns (generators, multi-statement, PL/pgSQL, CTEs, subqueries, nested UDFs) are rejected upfront rather than failing during backfill.
There was a problem hiding this comment.
RECORD-returning routines are handled — they return tuple type which never matches a valid column type, so they're rejected with a type mismatch error. Also, RECORD can't be used as a column type directly (cannot use anonymous record type as table column). Added a test for both cases.
There was a problem hiding this comment.
This is RFAL when you get back.
There was a problem hiding this comment.
Friendly ping @DrewKimball just in case you haven't seen this. I don't plan on backporting to 26.2, so no huge rush.
There was a problem hiding this comment.
Sorry about that, this one slipped by me.
31fbffc to
ed7c182
Compare
ed7c182 to
8611bc0
Compare
| # Strict UDFs (RETURNS NULL ON NULL INPUT) on a nullable column. | ||
| # Rows with NULL input should produce NULL output. | ||
| statement ok | ||
| CREATE FUNCTION cc_udf_strict(x INT) RETURNS INT IMMUTABLE RETURNS NULL ON NULL INPUT LANGUAGE SQL AS $$ SELECT x + 100 $$; |
There was a problem hiding this comment.
nit: would be nice to use an expression that doesn't already return NULL on NULL input, like CASE WHEN x IS NULL THEN -1 ELSE x + 100 END
Contexts that bypass the optimizer — IMPORT and schema change backfill — cannot evaluate user-defined functions with SQL bodies. The eval.Expr path returns an unimplemented error for functions where fn.Body != "" because it lacks the optimizer and execbuilder infrastructure needed to produce RoutineExpr plan generators. Fix this by inlining UDF calls within MakeComputedExprs, the shared entry point for both IMPORT and backfill computed column evaluation. For each FuncExpr that references a UDF (fn.Body != ""), inlineUDFCalls: 1. Parses the SQL body and extracts the result expression. 2. Substitutes parameter references (named, ColumnItem, and ordinal $1/$2 placeholders) with the actual call-site argument expressions. 3. Wraps the result in a CASE expression for strict (RETURNS NULL ON NULL INPUT) functions to preserve null-handling semantics. 4. Type-checks the inlined expression against the function's return type. This provides parity with non-UDF computed column expressions for the constructs allowed in immutable UDF bodies: arithmetic, string operations, type casts, CASE/COALESCE, immutable builtin calls, and array constructors. Immutable UDFs cannot reference relations, so their bodies are restricted to the same scalar expression language available in non-UDF computed columns. Non-inlineable UDF patterns are now rejected at DDL time (CREATE TABLE, ALTER TABLE ADD COLUMN) rather than failing later during backfill. This prevents creating tables with computed columns that cannot be backfilled. The following patterns are rejected: - Multi-statement SQL UDFs: cannot be inlined to a single expression - PL/pgSQL UDFs: cannot be inlined (different language) - UDFs with OUT/INOUT parameters: not supported by the inliner - Nested UDF calls: the inliner does not recurse into inlined bodies - UDF bodies containing subqueries, CTEs, or FROM clauses: cannot be evaluated outside the optimizer - UDF bodies containing set-returning functions (generators) Fixes: cockroachdb#157195 Informs: cockroachdb#147472 Release note (bug fix): Fixed a bug where IMPORT INTO and schema change backfills would fail when the destination table had computed columns defined using user-defined functions, reporting "function <oid> not found". Computed columns referencing immutable single-expression SQL UDFs now work correctly with IMPORT INTO and ALTER TABLE ADD COLUMN. Release note (backward-incompatible change): Computed columns using non-inlineable user-defined functions — including PL/pgSQL UDFs, multi-statement SQL UDFs, UDFs with OUT/INOUT parameters, nested UDF calls, and UDFs whose bodies contain subqueries, CTEs, FROM clauses, or set-returning functions — are now rejected at DDL time (CREATE TABLE, ALTER TABLE ADD COLUMN). Previously, these were accepted but would fail during schema change backfill, leaving the table in a state where schema changes could not be performed. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
8611bc0 to
f6e6b74
Compare
Summary
IMPORT INTO fails when the destination table has computed columns that
reference user-defined functions, with error
function <oid> not found.This PR fixes the issue in two commits:
importer: set up FunctionResolver for computed column type-checking —
Sets up a
DistSQLFunctionResolveron theSemaContextduring import sothat UDF references (stored as OIDs) in computed column expressions can be
resolved. Follows the existing pattern from
ColumnBackfiller.InitForDistributedUse.schemaexpr: inline UDF calls in computed column expressions —
Adds UDF inlining in
MakeComputedExprs(the shared entry point for bothIMPORT and backfill) so that
eval.Exprcan evaluate the resultingexpressions without requiring the full optimizer/execbuilder infrastructure.
Only single-statement immutable SQL UDFs are supported, which is the only
kind allowed in computed columns.
Fixes: #157195
Informs: #147472
Test plan
TestImportIntoComputedColumnWithUDFregression testTestImportIntoUserDefinedTypesstill passesschemaexprunit tests still pass