Skip to content

import: support computed columns with UDF references#165129

Merged
trunk-io[bot] merged 2 commits intocockroachdb:masterfrom
mw5h:import-udf-computed-columns
Apr 7, 2026
Merged

import: support computed columns with UDF references#165129
trunk-io[bot] merged 2 commits intocockroachdb:masterfrom
mw5h:import-udf-computed-columns

Conversation

@mw5h
Copy link
Copy Markdown
Contributor

@mw5h mw5h commented Mar 6, 2026

Summary

IMPORT INTO fails when the destination table has computed columns that
reference user-defined functions, with error function <oid> not found.

This PR fixes the issue in two commits:

  1. importer: set up FunctionResolver for computed column type-checking
    Sets up a DistSQLFunctionResolver on the SemaContext during import so
    that UDF references (stored as OIDs) in computed column expressions can be
    resolved. Follows the existing pattern from ColumnBackfiller.InitForDistributedUse.

  2. schemaexpr: inline UDF calls in computed column expressions
    Adds UDF inlining in MakeComputedExprs (the shared entry point for both
    IMPORT and backfill) so that eval.Expr can evaluate the resulting
    expressions without requiring the full optimizer/execbuilder infrastructure.
    Only single-statement immutable SQL UDFs are supported, which is the only
    kind allowed in computed columns.

Fixes: #157195
Informs: #147472

Test plan

  • Added TestImportIntoComputedColumnWithUDF regression test
  • Verified existing TestImportIntoUserDefinedTypes still passes
  • Verified schemaexpr unit tests still pass

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Mar 6, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@mw5h mw5h force-pushed the import-udf-computed-columns branch 6 times, most recently from ecb294e to c79e1a8 Compare March 10, 2026 14:48
@mw5h mw5h requested a review from DrewKimball March 11, 2026 20:26
@mw5h mw5h marked this pull request as ready for review March 11, 2026 20:26
@mw5h mw5h requested review from a team as code owners March 11, 2026 20:26
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I have a few questions.

@yuzefovich reviewed 6 files and all commit messages, and made 6 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and mw5h).


pkg/sql/catalog/schemaexpr/computed_column.go line 326 at r2 (raw file):

		}

		// Collect all column IDs that are referenced in the partial index

nit: (not your code, but) this comment seems stale / copied incorrectly.


pkg/sql/catalog/schemaexpr/computed_column.go line 383 at r2 (raw file):

	ctx context.Context, expr tree.TypedExpr, semaCtx *tree.SemaContext,
) (tree.TypedExpr, error) {
	newExpr, err := tree.SimpleVisit(expr, func(e tree.Expr) (bool, tree.Expr, error) {

nit: it's helpful to have named return arguments

(recurse bool, newExpr tree.Expr, err error)

pkg/sql/catalog/schemaexpr/computed_column.go line 433 at r2 (raw file):

		// already-typed argument expressions (e.g. IndexedVar).
		paramMap := make(map[string]tree.Expr, len(fn.RoutineParams))
		for i, p := range fn.RoutineParams {

Should we be ignoring OUT parameters here? Is it possible to have a computed expression invoke a routine with OUT / INOUT parameters?


pkg/sql/catalog/schemaexpr/computed_column.go line 439 at r2 (raw file):

		}

		// Substitute parameter references in the body.

Should we do another walk over bodyExpr after the substitution to check that all parameter references have been replaced and returning an error if not?


pkg/sql/logictest/testdata/logic_test/udf_unsupported line 9 at r2 (raw file):

CREATE TABLE test_tbl_t (a INT PRIMARY KEY, b INT);

# Insert rows to verify that backfills that use UDFs produce correct values.

Should we introduce some new test cases for the errors with inlining the UDF and then move test cases that now work to another file (since this one is udf_unsupported)?

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrewKimball reviewed 3 files and all commit messages, and made 3 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on mw5h).


-- commits line 51 at r2:
I don't think this is true - on a demo cluster, I was able to create a computed column expression calling a UDF with multiple statements as well as one with a PL/pgSQL routine.


pkg/sql/catalog/schemaexpr/computed_column.go line 440 at r2 (raw file):

		// Substitute parameter references in the body.
		inlined, err := tree.SimpleVisit(

I'm not sure this pattern will actually visit everything that could be in a UDF body. We should test parameter references in:

  • Subquery
  • The inputs + ON condition of a Join
  • CTE
  • GroupBy

We should also test:

  • Nested routine calls
  • Ordinal parameter references like $1

pkg/sql/importer/import_processor.go line 313 at r1 (raw file):

		// Release leases on any accessed descriptors now that the
		// resolver is configured. We do this so that leases are not
		// held for the entire import process.

Does this actually do anything, since the function resolver hasn't been used yet at this point?

Also a more general question: I know we do this for backfill, so not blocking. But is it correct to release leases while we're using these descriptors for routine evaluation or type casts?

@mw5h mw5h force-pushed the import-udf-computed-columns branch from c79e1a8 to 9b5632d Compare March 12, 2026 18:17
@mw5h mw5h marked this pull request as draft March 12, 2026 18:18
@mw5h mw5h force-pushed the import-udf-computed-columns branch 2 times, most recently from f17c5aa to 0c1beaf Compare March 13, 2026 17:39
Copy link
Copy Markdown
Contributor Author

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mw5h made 4 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and yuzefovich).


-- commits line 22 at r4:
I think DrewK meant that this is not true. UDF-based computed column expressions succeed but are not backfill/import-able.


pkg/sql/importer/read_import_base.go line 90 at r4 (raw file):

	evalCtx := flowCtx.NewEvalCtx()
	evalCtx.Regions = makeImportRegionOperator(spec.DatabasePrimaryRegion)

Delete this blank line.


pkg/sql/importer/read_import_base.go line 92 at r4 (raw file):

	semaCtx := tree.MakeSemaContext(importResolver)
	conv, convErr := makeInputConverter(

I'm not seeing a reason to change the name of the error here.


pkg/sql/importer/read_import_base.go line 129 at r4 (raw file):

	// at the end is one row containing an encoded BulkOpSummary.
	var summary *kvpb.BulkOpSummary
	var err error

We shouldn't capture the error here, just return the error and let it propagate via the group.Wait() call.

@mw5h mw5h force-pushed the import-udf-computed-columns branch from 0c1beaf to dd6ad24 Compare March 13, 2026 18:15
@mw5h mw5h marked this pull request as ready for review March 13, 2026 19:15
@mw5h
Copy link
Copy Markdown
Contributor Author

mw5h commented Mar 13, 2026

This should be RFAL.

Copy link
Copy Markdown
Contributor Author

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mw5h made 6 comments and resolved 2 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and yuzefovich).


-- commits line 51 at r2:

Previously, DrewKimball (Drew Kimball) wrote…

I don't think this is true - on a demo cluster, I was able to create a computed column expression calling a UDF with multiple statements as well as one with a PL/pgSQL routine.

Done.


pkg/sql/catalog/schemaexpr/computed_column.go line 433 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Should we be ignoring OUT parameters here? Is it possible to have a computed expression invoke a routine with OUT / INOUT parameters?

Done.


pkg/sql/catalog/schemaexpr/computed_column.go line 439 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Should we do another walk over bodyExpr after the substitution to check that all parameter references have been replaced and returning an error if not?

Done.


pkg/sql/catalog/schemaexpr/computed_column.go line 440 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I'm not sure this pattern will actually visit everything that could be in a UDF body. We should test parameter references in:

  • Subquery
  • The inputs + ON condition of a Join
  • CTE
  • GroupBy

We should also test:

  • Nested routine calls
  • Ordinal parameter references like $1

Done.


pkg/sql/logictest/testdata/logic_test/udf_unsupported line 9 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Should we introduce some new test cases for the errors with inlining the UDF and then move test cases that now work to another file (since this one is udf_unsupported)?

Done.


pkg/sql/importer/import_processor.go line 313 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Does this actually do anything, since the function resolver hasn't been used yet at this point?

Also a more general question: I know we do this for backfill, so not blocking. But is it correct to release leases while we're using these descriptors for routine evaluation or type casts?

Done.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzefovich reviewed 9 files and all commit messages, made 3 comments, and resolved 2 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and mw5h).


pkg/sql/catalog/schemaexpr/computed_column.go line 439 at r2 (raw file):

Previously, mw5h (Matt White) wrote…

Done.

I don't see the check that all parameters have been replaced.


pkg/sql/catalog/schemaexpr/computed_column.go line 456 at r6 (raw file):

			}
			if i < len(funcExpr.Exprs) {
				paramMap[string(p.Name)] = funcExpr.Exprs[i]

It's allowed for a UDF to have unnamed parameters (i.e. p.Name == ""). Such unnamed IN parameters cannot be referenced in the body by name (only via $i notation which we currently don't support (#114701)), and there can be any number of unnamed parameters. The current logic probably doesn't handle the case of multiple unnamed IN parameters.


pkg/sql/catalog/schemaexpr/computed_column.go line 459 at r6 (raw file):

			}
		}
		if len(paramMap) != len(funcExpr.Exprs) {

nit: it feels off to have the length check done after the loop given that in the loop we have bounds check on i, meaning that by construction len(paramMap) will never exceed len(funcExpr.Exprs) even if we have too many routine params.

@mw5h mw5h force-pushed the import-udf-computed-columns branch 2 times, most recently from 754e131 to a3822cf Compare March 13, 2026 23:22
Copy link
Copy Markdown
Contributor Author

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mw5h made 3 comments and resolved 1 discussion.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and yuzefovich).


pkg/sql/catalog/schemaexpr/computed_column.go line 439 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I don't see the check that all parameters have been replaced.

Yeah, sorry, in the update I lost what this was referring to.


pkg/sql/catalog/schemaexpr/computed_column.go line 456 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It's allowed for a UDF to have unnamed parameters (i.e. p.Name == ""). Such unnamed IN parameters cannot be referenced in the body by name (only via $i notation which we currently don't support (#114701)), and there can be any number of unnamed parameters. The current logic probably doesn't handle the case of multiple unnamed IN parameters.

Looks like $i notation works fine for SQL UDFs, just not PL/pgSQL.


pkg/sql/catalog/schemaexpr/computed_column.go line 459 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: it feels off to have the length check done after the loop given that in the loop we have bounds check on i, meaning that by construction len(paramMap) will never exceed len(funcExpr.Exprs) even if we have too many routine params.

Agreed.

@mw5h
Copy link
Copy Markdown
Contributor Author

mw5h commented Mar 17, 2026

I believe this is RFAL.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! :lgtm: but might be worth waiting for Drew to take another look as well.

@yuzefovich reviewed 4 files and all commit messages, made 2 comments, and resolved 3 discussions.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on DrewKimball and mw5h).


pkg/sql/importer/read_import_base.go line 92 at r4 (raw file):

Previously, mw5h (Matt White) wrote…

I'm not seeing a reason to change the name of the error here.

nit: maybe revert the change here (i.e. keep everything on a single line).

@mw5h
Copy link
Copy Markdown
Contributor Author

mw5h commented Mar 18, 2026

Nice work! :lgtm: but might be worth waiting for Drew to take another look as well.

Yeah, will do.

IMPORT INTO with computed columns that reference user-defined functions
fails during type-checking because the SemaContext has no
FunctionResolver. The UDF references (stored as OIDs in the computed
column expression) cannot be resolved, producing:

  function 100107 not found

Fix this by setting up a per-instance DistSQLFunctionResolver in
NewDatumRowConverter, following the same pattern already used for
per-instance sequence resolution. Each converter creates its own
bare-bones descs.Collection inside a short-lived txn to resolve UDF
descriptors by OID. This avoids sharing a single descs.Collection
across parallel import workers, which would cause data races.

Since bare-bones collections lack a lease manager, we use
ByIDWithoutLeased via the new NewDistSQLFunctionResolverFromGetter
constructor.

Note: this fix alone is necessary but not sufficient — evaluation of
UDF-based computed column expressions will still fail because
eval.Expr cannot execute functions with SQL bodies (issue cockroachdb#147472).
The next commit addresses that.

Informs: cockroachdb#157195
Informs: cockroachdb#147472

Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@mw5h mw5h force-pushed the import-udf-computed-columns branch from a3822cf to 564b2c6 Compare March 18, 2026 15:59
Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more suggestios for things to test/disallow, but beyond that LGTM.

if fn == nil || fn.Body == "" {
return true, e, nil
}
if fn.Language != tree.RoutineLangSQL {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also check for generator functions if that isn't already done elsewhere.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if we're handling RECORD-returning routines correctly. We should test, or maybe just disallow for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — generators inside UDF bodies are now caught via sanitizeExpr (which uses RejectSpecial, including RejectGenerators). Additionally, we now validate at DDL time (CREATE TABLE / ALTER TABLE) by running inlineUDFCalls during expression validation, so all non-inlineable patterns (generators, multi-statement, PL/pgSQL, CTEs, subqueries, nested UDFs) are rejected upfront rather than failing during backfill.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RECORD-returning routines are handled — they return tuple type which never matches a valid column type, so they're rejected with a type mismatch error. Also, RECORD can't be used as a column type directly (cannot use anonymous record type as table column). Added a test for both cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is RFAL when you get back.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly ping @DrewKimball just in case you haven't seen this. I don't plan on backporting to 26.2, so no huge rush.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, this one slipped by me.

@mw5h mw5h force-pushed the import-udf-computed-columns branch 2 times, most recently from 31fbffc to ed7c182 Compare March 26, 2026 00:20
@mw5h mw5h requested a review from a team as a code owner March 26, 2026 00:20
@mw5h mw5h requested review from msbutler and removed request for a team March 26, 2026 00:20
@mw5h mw5h force-pushed the import-udf-computed-columns branch from ed7c182 to 8611bc0 Compare March 26, 2026 15:06
Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! LGTM

# Strict UDFs (RETURNS NULL ON NULL INPUT) on a nullable column.
# Rows with NULL input should produce NULL output.
statement ok
CREATE FUNCTION cc_udf_strict(x INT) RETURNS INT IMMUTABLE RETURNS NULL ON NULL INPUT LANGUAGE SQL AS $$ SELECT x + 100 $$;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be nice to use an expression that doesn't already return NULL on NULL input, like CASE WHEN x IS NULL THEN -1 ELSE x + 100 END

Contexts that bypass the optimizer — IMPORT and schema change backfill —
cannot evaluate user-defined functions with SQL bodies. The eval.Expr
path returns an unimplemented error for functions where fn.Body != ""
because it lacks the optimizer and execbuilder infrastructure needed to
produce RoutineExpr plan generators.

Fix this by inlining UDF calls within MakeComputedExprs, the shared
entry point for both IMPORT and backfill computed column evaluation. For
each FuncExpr that references a UDF (fn.Body != ""), inlineUDFCalls:

1. Parses the SQL body and extracts the result expression.
2. Substitutes parameter references (named, ColumnItem, and ordinal
   $1/$2 placeholders) with the actual call-site argument expressions.
3. Wraps the result in a CASE expression for strict (RETURNS NULL ON
   NULL INPUT) functions to preserve null-handling semantics.
4. Type-checks the inlined expression against the function's return
   type.

This provides parity with non-UDF computed column expressions for the
constructs allowed in immutable UDF bodies: arithmetic, string
operations, type casts, CASE/COALESCE, immutable builtin calls, and
array constructors. Immutable UDFs cannot reference relations, so their
bodies are restricted to the same scalar expression language available
in non-UDF computed columns.

Non-inlineable UDF patterns are now rejected at DDL time (CREATE TABLE,
ALTER TABLE ADD COLUMN) rather than failing later during backfill. This
prevents creating tables with computed columns that cannot be
backfilled. The following patterns are rejected:

- Multi-statement SQL UDFs: cannot be inlined to a single expression
- PL/pgSQL UDFs: cannot be inlined (different language)
- UDFs with OUT/INOUT parameters: not supported by the inliner
- Nested UDF calls: the inliner does not recurse into inlined bodies
- UDF bodies containing subqueries, CTEs, or FROM clauses: cannot be
  evaluated outside the optimizer
- UDF bodies containing set-returning functions (generators)

Fixes: cockroachdb#157195
Informs: cockroachdb#147472

Release note (bug fix): Fixed a bug where IMPORT INTO and schema change
backfills would fail when the destination table had computed columns
defined using user-defined functions, reporting "function <oid> not
found". Computed columns referencing immutable single-expression SQL
UDFs now work correctly with IMPORT INTO and ALTER TABLE ADD COLUMN.

Release note (backward-incompatible change): Computed columns using
non-inlineable user-defined functions — including PL/pgSQL UDFs,
multi-statement SQL UDFs, UDFs with OUT/INOUT parameters, nested UDF
calls, and UDFs whose bodies contain subqueries, CTEs, FROM clauses, or
set-returning functions — are now rejected at DDL time (CREATE TABLE,
ALTER TABLE ADD COLUMN). Previously, these were accepted but would fail
during schema change backfill, leaving the table in a state where
schema changes could not be performed.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@mw5h mw5h force-pushed the import-udf-computed-columns branch from 8611bc0 to f6e6b74 Compare April 6, 2026 21:53
@trunk-io trunk-io Bot merged commit 7e48932 into cockroachdb:master Apr 7, 2026
34 of 36 checks passed
@mw5h mw5h deleted the import-udf-computed-columns branch April 9, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

import: computed columns defined with udf not allowed in the import destination table

4 participants