Skip to content

Commit 8a26532

Browse files
committed
[SPARK-57758][SQL] Restore O(1) built-in function resolution in the analyzer
### What changes were proposed in this pull request? SPARK-54807 added qualified function names and a configurable resolution search path (`spark.sql.functionResolution.sessionOrder`). As a side effect, every `UnresolvedFunction` now makes `FunctionResolution.resolveFunction` / `resolveTableFunction` build an ordered candidate search path, allocate `Seq`s, and iterate candidates (each doing a name-kind parse plus a registry lookup). None of this was memoized, so it was recomputed for every function node, on every analysis pass. This PR restores the previous fast path for the dominant built-in case, without changing resolution precedence: 1. **Per-analysis-pass memoization.** A lazily-filled memo on `AnalysisContext` stores the computed resolution search-path entries. The path is stable within a single analysis pass (`SET PATH` / `USE` / conf changes happen between passes, and each pass / view / SQL-function body runs under a fresh `AnalysisContext` object), and the memo shares the context's per-pass lifetime: a fresh context (`reset` / `withNewAnalysisContext` / the `copy` or construction for a view or SQL-function body) automatically starts with an empty memo, and the memo is collected with the context. It therefore needs no thread-local, identity key, or weak reference, and is stale-free because the memoized value derives only from the context's immutable fields (`resolutionPathEntries`, `catalogAndNamespace`). `sqlResolutionPathEntriesForAnalysis` (and hence `resolutionCandidates`, the `builtinFastPathSafe` gate, and the `UNRESOLVED_ROUTINE` error path) now read from this memo. 2. **Built-in-only fast-path.** In `resolveFunction` / `resolveTableFunction`, for a single-part, non-internal name, when `system.builtin` is the **first** entry of the effective path, resolve directly against the in-memory built-in registry (`resolveScalarFunctionByIdentifier` / `resolveTableFunctionByIdentifier` with `FunctionRegistry.builtinFunctionIdentifier`) and return on hit. A miss falls through to the unchanged candidate loop. Correctness: the fast-path fires only when `system.builtin` is the **first** path entry, so no earlier entry can shadow a built-in hit -- neither a `system.session` entry (a temporary/session function, as under mode `first`) nor a catalog/schema placed before `system.builtin` by a custom `SET PATH`. The default `sessionOrder` modes `second` / `last` keep `system.builtin` first (fast-path on); mode `first` puts `system.session` first (fast-path off); only a custom `SET PATH` can place another entry before `system.builtin` (fast-path off). In every case the fast-path matches the slow candidate loop, so resolution precedence is unchanged. The gate predicate is `CatalogManager.isBuiltinFirstOnPath`. The optional `FORBIDDEN_OPERATION` masking noted in the JIRA is tracked separately in [SPARK-57759](https://issues.apache.org/jira/browse/SPARK-57759) and is intentionally left unchanged here. ### Why are the changes needed? For built-in-heavy plans the per-function overhead is paid for every function node. Under Spark Connect, which re-analyzes the entire (growing) plan on every `AnalyzePlan` call, the cost scales roughly with plan size x number of analyze calls, producing a multi-fold regression in analysis time versus a pre-SPARK-54807 build. Execution time is unaffected; the regression is isolated to the analysis phase. This restores O(1) built-in resolution for the common case while preserving the qualified-name and configurable-order semantics SPARK-54807 introduced. ### Does this PR introduce _any_ user-facing change? No. This is a performance fix; resolution results and error behavior are unchanged. ### How was this patch tested? - New cases in `FunctionQualificationSuite`: - `SECTION 17a`: the built-in fast-path returns the built-in under the default order while the temp is reachable via `session.`, and switching `sessionOrder` to `first` in the same session correctly bypasses the fast-path so the temp shadows the built-in (exercises the per-pass recompute and the gate). - `SECTION 17b`: built-in and extension table-function fast-path. - `SECTION 17c`: a persistent scalar function placed before `system.builtin` via custom `SET PATH` correctly wins over the built-in (fast-path bypassed); built-in wins when `system.builtin` leads. - `SECTION 17d`: the same regression guard for the table-function fast-path. - `SECTION 17e`: the fast-path raises the built-in's argument error rather than falling through to a same-named session function (the fast-path and the slow candidate loop fail on the same candidate). - `SECTION 17f`: the per-pass memo recomputes for a SQL-function body whose pinned path differs from the caller's, so a single statement yields both resolutions (neither context reuses the other's memo). - New unit test in `CatalogManagerSuite`: `isBuiltinFirstOnPath` over representative path shapes, guarding the gate predicate directly (the fast-path has no behavioral signature). - Existing suites pass: `FunctionQualificationSuite` + `SetPathSuite` (137) and `LookupFunctionsSuite` (3), which cover the single-pass resolver, dynamic `SET PATH` ordering, and the `COUNT(*)` rewrite gate that depend on resolution order. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) Closes #56869 from MaxGekk/fix-fun-resolution. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
1 parent f520a2f commit 8a26532

5 files changed

Lines changed: 275 additions & 4 deletions

File tree

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,36 @@ case class AnalysisContext(
182182

183183
def getSinglePassResolverBridgeState: Option[AnalyzerBridgeState] =
184184
singlePassResolverBridgeState
185+
186+
/**
187+
* Per-pass memo of the SQL resolution search path (SPARK-57758). Function resolution computes
188+
* the ordered path once per analysis pass and reuses it for every [[UnresolvedFunction]],
189+
* instead of rebuilding and re-iterating it per node -- the cost that, under Spark Connect's
190+
* repeated re-analysis of the growing plan, scaled with plan size x analyze calls.
191+
*
192+
* The memo lives on the context so it shares the context's per-pass lifetime. `SET PATH` /
193+
* `USE` / conf changes all produce a fresh context ([[reset]], [[withNewAnalysisContext]], or
194+
* the `copy` / construction for a view or SQL-function body), and a body-level field is not
195+
* carried over by `copy`, so a new pass automatically starts with an empty memo and the memo is
196+
* collected with the context. It is therefore safe without an identity key or weak reference,
197+
* but only for values derived from this context's immutable fields (the path derives from
198+
* `resolutionPathEntries` / `catalogAndNamespace`); never memoize anything derived from the
199+
* mutable fields above (`relationCache`, `referredTempFunctionNames`, ...).
200+
*
201+
* INVARIANT: keep this a body `var`, never a constructor parameter. `.copy()` (used by
202+
* `withAnalysisContext(function)` and `withOuterPlan`) deliberately does not carry a body
203+
* field, which is what gives a SQL-function-body / outer-plan context a fresh memo. Promoting
204+
* it to a parameter would copy a stale path across that boundary and silently mis-resolve
205+
* (SECTION 17f of `FunctionQualificationSuite` is the regression guard).
206+
*/
207+
private var resolutionPathMemo: Seq[Seq[String]] = _
208+
209+
def memoizedResolutionPath(compute: => Seq[Seq[String]]): Seq[Seq[String]] = {
210+
if (resolutionPathMemo == null) {
211+
resolutionPathMemo = compute
212+
}
213+
resolutionPathMemo
214+
}
185215
}
186216

187217
object AnalysisContext {

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala

Lines changed: 64 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -110,10 +110,37 @@ class FunctionResolution(
110110
* directly, matching [[RelationResolution.relationResolutionEntries]] so routine order stays
111111
* aligned with relation order.
112112
*/
113-
private[analysis] def sqlResolutionPathEntriesForAnalysis: Seq[Seq[String]] =
114-
catalogManager.resolutionPathEntriesForAnalysis(
115-
AnalysisContext.get.resolutionPathEntries,
116-
AnalysisContext.get.catalogAndNamespace)
113+
private[analysis] def sqlResolutionPathEntriesForAnalysis: Seq[Seq[String]] = {
114+
// Per-analysis-pass memo (SPARK-57758): computing the path (reading the live [[CatalogManager]]
115+
// and several confs, then allocating `Seq`s) used to run once per [[UnresolvedFunction]], and
116+
// under Spark Connect once per node on every re-analysis of the growing plan. The path is
117+
// stable within a pass (`SET PATH` / `USE` / conf changes happen between passes, each under a
118+
// fresh [[AnalysisContext]]), so it is memoized on the current context, which shares the pass's
119+
// lifetime. See [[AnalysisContext.memoizedResolutionPath]] for why this needs no identity key.
120+
val context = AnalysisContext.get
121+
context.memoizedResolutionPath {
122+
catalogManager.resolutionPathEntriesForAnalysis(
123+
context.resolutionPathEntries, context.catalogAndNamespace)
124+
}
125+
}
126+
127+
/**
128+
* True when `system.builtin` is the first entry of the effective resolution path. In that case a
129+
* single-part name that resolves to a built-in cannot be shadowed by any earlier entry -- neither
130+
* a `system.session` entry (a temporary/session function) nor a catalog/schema entry placed
131+
* before `system.builtin` by a custom `SET PATH` -- so the built-in fast-path in
132+
* [[resolveFunction]] / [[resolveTableFunction]] cannot change resolution precedence. A miss
133+
* still falls through to the full candidate loop, so non-built-in names are unaffected.
134+
*
135+
* The default `spark.sql.functionResolution.sessionOrder` modes `second` and `last` put
136+
* `system.builtin` first; only `first` puts `system.session` before it, where the fast-path is
137+
* correctly disabled. Only a custom `SET PATH` can place another entry before `system.builtin`.
138+
*
139+
* Reads the per-pass memoized path ([[sqlResolutionPathEntriesForAnalysis]]), so the check is
140+
* O(1) per [[UnresolvedFunction]].
141+
*/
142+
private def builtinFastPathSafe: Boolean =
143+
CatalogManager.isBuiltinFirstOnPath(sqlResolutionPathEntriesForAnalysis)
117144

118145
private def resolutionCandidates(nameParts: Seq[String]): Seq[Seq[String]] = {
119146
if (nameParts.size == 1) {
@@ -134,6 +161,12 @@ class FunctionResolution(
134161
private def resolveFunctionCandidate(
135162
nameParts: Seq[String],
136163
unresolvedFunc: UnresolvedFunction): Option[Expression] = {
164+
// NOTE: the `system.builtin.<name>` case here is the same registry lookup the built-in
165+
// fast-path in `resolveFunction` performs directly (both go through
166+
// `identifierFromSystemNameParts` / `builtinFunctionIdentifier` ->
167+
// `resolveScalarFunctionByIdentifier`). The two must stay equivalent; a change to built-in
168+
// scalar resolution has to touch both. `resolveTableFunctionCandidate` / `resolveTableFunction`
169+
// mirror this for table functions.
137170
if (isSystemCatalogQualified(nameParts)) {
138171
v1SessionCatalog.identifierFromSystemNameParts(nameParts).flatMap { ident =>
139172
val expr = v1SessionCatalog.resolveScalarFunctionByIdentifier(
@@ -189,6 +222,23 @@ class FunctionResolution(
189222
}
190223
}
191224

225+
// Fast-path (SPARK-57758): an unqualified, non-internal name that resolves to a built-in
226+
// is by far the common case. When `system.builtin` is the first entry of the effective path,
227+
// a built-in hit cannot be shadowed by any earlier entry (a session/temporary function, or a
228+
// catalog/schema placed before `system.builtin` via `SET PATH`), so it can be resolved with a
229+
// single registry lookup instead of building and iterating the candidate search path. A miss
230+
// falls through to the full candidate resolution below. This lookup is equivalent to the
231+
// `system.builtin.<name>` branch of `resolveFunctionCandidate`; keep the two in sync.
232+
if (unresolvedFunc.nameParts.size == 1 && !unresolvedFunc.isInternal &&
233+
builtinFastPathSafe) {
234+
val builtin = v1SessionCatalog.resolveScalarFunctionByIdentifier(
235+
FunctionRegistry.builtinFunctionIdentifier(unresolvedFunc.nameParts.head),
236+
unresolvedFunc.arguments)
237+
if (builtin.isDefined) {
238+
return validateFunction(builtin.get, unresolvedFunc.arguments.length, unresolvedFunc)
239+
}
240+
}
241+
192242
val candidates = resolutionCandidates(unresolvedFunc.nameParts)
193243
for (nameParts <- candidates) {
194244
resolveFunctionCandidate(nameParts, unresolvedFunc) match {
@@ -263,6 +313,16 @@ class FunctionResolution(
263313
def resolveTableFunction(
264314
nameParts: Seq[String],
265315
arguments: Seq[Expression]): Option[LogicalPlan] = {
316+
// Fast-path (SPARK-57758): see `resolveFunction`. Short-circuit a single-part name to a
317+
// built-in table function when `system.builtin` is the first entry of the path; a miss
318+
// (including a built-in scalar of the same name) falls through to the candidate loop, which
319+
// preserves the NOT_A_TABLE_FUNCTION semantics.
320+
if (nameParts.size == 1 && builtinFastPathSafe) {
321+
val builtin = v1SessionCatalog.resolveTableFunctionByIdentifier(
322+
FunctionRegistry.builtinFunctionIdentifier(nameParts.head), arguments)
323+
if (builtin.isDefined) return builtin
324+
}
325+
266326
val candidates = resolutionCandidates(nameParts)
267327
for (nameParts <- candidates) {
268328
resolveTableFunctionCandidate(nameParts, arguments) match {

sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -588,6 +588,15 @@ private[sql] object CatalogManager extends Logging {
588588
parts.head.equalsIgnoreCase(SYSTEM_CATALOG_NAME) &&
589589
parts(1).equalsIgnoreCase(BUILTIN_NAMESPACE)
590590

591+
/**
592+
* True when `system.builtin` is the first entry of `pathEntries`. This is the path-shape
593+
* condition under which a built-in function found by an unqualified single-part name cannot be
594+
* shadowed by any earlier path entry -- the precise property the function-resolution built-in
595+
* fast-path relies on. Pure predicate over the path shape; callers decide how to use it.
596+
*/
597+
def isBuiltinFirstOnPath(pathEntries: Seq[Seq[String]]): Boolean =
598+
pathEntries.headOption.exists(isSystemBuiltinPathEntry)
599+
591600
/**
592601
* Extract `system.builtin` / `system.session` entries from a resolved PATH, mapped to
593602
* [[SessionCatalog.SessionFunctionKind]] in path order. Pure data conversion -- callers

sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogManagerSuite.scala

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,30 @@ class CatalogManagerSuite extends SparkFunSuite with SQLHelper {
188188
assert(e.getMessage.contains("default.v_broken"))
189189
}
190190

191+
test("isBuiltinFirstOnPath: SPARK-57758 gate for the built-in function fast-path") {
192+
// The function-resolution built-in fast-path is safe (cannot be shadowed) only when
193+
// `system.builtin` is the FIRST path entry. The fast-path has no behavioral signature -- the
194+
// slow candidate loop yields identical results -- so this pure-predicate test guards the gate
195+
// directly: a regression that silently stopped the fast-path from firing (re-introducing the
196+
// SPARK-57758 perf regression) or fired it when a catalog precedes `system.builtin`
197+
// (re-introducing the precedence bug) would leave the behavioral SQL tests green.
198+
val builtin = Seq("system", "builtin")
199+
val session = Seq("system", "session")
200+
val catalog = Seq("spark_catalog", "some_schema")
201+
// Default `sessionOrder` modes `second` / `last` keep `system.builtin` first -> safe.
202+
assert(CatalogManager.isBuiltinFirstOnPath(Seq(builtin, session)))
203+
assert(CatalogManager.isBuiltinFirstOnPath(Seq(builtin)))
204+
assert(CatalogManager.isBuiltinFirstOnPath(Seq(builtin, catalog, session)))
205+
// Case-insensitive match on the well-known entry.
206+
assert(CatalogManager.isBuiltinFirstOnPath(Seq(Seq("System", "Builtin"), session)))
207+
// `first` mode (session before builtin) and custom `SET PATH` shapes that place any entry
208+
// before `system.builtin` -> not safe.
209+
assert(!CatalogManager.isBuiltinFirstOnPath(Seq(session, builtin)))
210+
assert(!CatalogManager.isBuiltinFirstOnPath(Seq(catalog, builtin)))
211+
assert(!CatalogManager.isBuiltinFirstOnPath(Seq(catalog)))
212+
assert(!CatalogManager.isBuiltinFirstOnPath(Seq.empty))
213+
}
214+
191215
// ---------------------------------------------------------------------------
192216
// Direct unit tests for [[PathElement.validateNoStaticDuplicates]]. The end-to-end
193217
// `SetPathSuite` exercises this via SQL, but the duplicate-detection rules

sql/core/src/test/scala/org/apache/spark/sql/FunctionQualificationSuite.scala

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1316,6 +1316,154 @@ class FunctionQualificationSuite extends SharedSparkSession {
13161316
checkAnswer(sql("SELECT system.builtin.abs(-5)"), Row(5))
13171317
}
13181318
}
1319+
1320+
test("SECTION 17a: SPARK-57758 built-in fast-path respects dynamic session order") {
1321+
withTempView("t") {
1322+
sql("CREATE TEMPORARY VIEW t AS SELECT 1 AS id")
1323+
// Register a temp function shadowing builtin `abs` (overrideIfExists allows any order).
1324+
spark.udf.register("abs", (x: Int) => x + 100)
1325+
try {
1326+
// Default order (second): builtin precedes session, so the fast-path returns the builtin
1327+
// and the temp is reachable only via `session.` qualification.
1328+
withSQLConf("spark.sql.functionResolution.sessionOrder" -> "second") {
1329+
checkAnswer(sql("SELECT abs(-5) FROM t"), Row(5))
1330+
checkAnswer(sql("SELECT session.abs(-5) FROM t"), Row(95))
1331+
}
1332+
// first: session precedes builtin, so the fast-path is bypassed and the temp shadows the
1333+
// builtin for unqualified names. Same session, so this also exercises per-pass recompute.
1334+
withSQLConf("spark.sql.functionResolution.sessionOrder" -> "first") {
1335+
checkAnswer(sql("SELECT abs(-5) FROM t"), Row(95))
1336+
checkAnswer(sql("SELECT builtin.abs(-5) FROM t"), Row(5))
1337+
}
1338+
} finally {
1339+
spark.sessionState.catalog.dropTempFunction("abs", ignoreIfNotExists = true)
1340+
}
1341+
}
1342+
}
1343+
1344+
test("SECTION 17b: SPARK-57758 built-in table-function fast-path") {
1345+
// Smoke test that built-in / extension table functions resolve unqualified. Under the default
1346+
// path `system.builtin` leads, so the slow loop's first candidate is already the built-in --
1347+
// this yields the same rows whether or not the fast-path fires, and so does not by itself
1348+
// distinguish the gate's on/off behavior. That signal is asserted by
1349+
// CatalogManagerSuite.isBuiltinFirstOnPath and by SECTION 17c/17d/17e.
1350+
checkAnswer(sql("SELECT * FROM range(3)"), Seq(Row(0), Row(1), Row(2)))
1351+
// Extension table functions (stored as builtins) also resolve unqualified.
1352+
checkAnswer(sql("SELECT * FROM test_ext_table_func()"), Seq(Row(0L), Row(1L), Row(2L)))
1353+
}
1354+
1355+
test("SECTION 17c: SPARK-57758 fast-path is bypassed when a catalog precedes system.builtin") {
1356+
// SPARK-57758 regression guard: the built-in fast-path is safe only when `system.builtin` is
1357+
// the FIRST path entry. A custom `SET PATH` can place a catalog/schema before `system.builtin`,
1358+
// and an unqualified name found there must win over the built-in -- the fast-path must not
1359+
// short-circuit to the built-in. (The default `sessionOrder` modes always keep catalogs after
1360+
// `system.builtin`, so only a custom SET PATH exercises this.)
1361+
withSQLConf(SQLConf.PATH_ENABLED.key -> "true") {
1362+
withDatabase("path_abs_shadow") {
1363+
sql("CREATE DATABASE path_abs_shadow")
1364+
// Persistent function shadowing the built-in `abs`.
1365+
sql("CREATE FUNCTION path_abs_shadow.abs(x INT) RETURNS INT RETURN x * 10")
1366+
try {
1367+
// Catalog before system.builtin: the persistent `abs` must win (50), not the
1368+
// built-in (5). Pre-fix, the fast-path returned the built-in here.
1369+
sql("SET PATH = spark_catalog.path_abs_shadow, system.builtin")
1370+
checkAnswer(sql("SELECT abs(5)"), Row(50))
1371+
// system.builtin first: the fast-path is enabled and resolves the built-in (abs(-5) = 5),
1372+
// confirming the optimization still applies when builtin leads the path.
1373+
sql("SET PATH = system.builtin, spark_catalog.path_abs_shadow")
1374+
checkAnswer(sql("SELECT abs(-5)"), Row(5))
1375+
} finally {
1376+
sql("SET PATH = DEFAULT_PATH")
1377+
sql("DROP FUNCTION IF EXISTS path_abs_shadow.abs")
1378+
}
1379+
}
1380+
}
1381+
}
1382+
1383+
test("SECTION 17d: SPARK-57758 table-function fast-path is bypassed when a catalog precedes " +
1384+
"system.builtin") {
1385+
// Table-function counterpart of SECTION 17c: the table-function fast-path shares the same
1386+
// `builtinFastPathSafe` gate, so a persistent table function in a schema placed before
1387+
// `system.builtin` via `SET PATH` must win over the built-in TVF of the same name.
1388+
withSQLConf(SQLConf.PATH_ENABLED.key -> "true") {
1389+
withDatabase("path_range_shadow") {
1390+
sql("CREATE DATABASE path_range_shadow")
1391+
// Persistent table function shadowing the built-in `range` (ignores its argument).
1392+
sql("CREATE FUNCTION path_range_shadow.range(n INT) RETURNS TABLE(id INT) RETURN SELECT 99")
1393+
try {
1394+
// Catalog before system.builtin: the persistent `range` must win (one row [99]), not the
1395+
// built-in `range(1)` (one row [0]). Pre-fix, the fast-path returned the built-in here.
1396+
sql("SET PATH = spark_catalog.path_range_shadow, system.builtin")
1397+
checkAnswer(sql("SELECT * FROM range(1)"), Row(99))
1398+
// system.builtin first: the fast-path resolves the built-in `range(1)` (one row [0]).
1399+
sql("SET PATH = system.builtin, spark_catalog.path_range_shadow")
1400+
checkAnswer(sql("SELECT * FROM range(1)"), Row(0))
1401+
} finally {
1402+
sql("SET PATH = DEFAULT_PATH")
1403+
sql("DROP FUNCTION IF EXISTS path_range_shadow.range")
1404+
}
1405+
}
1406+
}
1407+
}
1408+
1409+
test("SECTION 17e: SPARK-57758 built-in fast-path raises the built-in's argument error rather " +
1410+
"than falling through to a same-named session function") {
1411+
// Invariant: the fast-path and the slow candidate loop must fail on the SAME candidate.
1412+
// `resolveScalarFunctionByIdentifier` returns None only when the built-in is absent; an
1413+
// existing built-in invoked with bad arity throws. So with `system.builtin` first, an
1414+
// unqualified `abs(1, 2)` must hit the built-in `abs` (1-arg) and raise its argument error --
1415+
// it must NOT silently fall through to a compatible 2-arg session `abs`. The slow loop would
1416+
// also hit `system.builtin.abs` first and throw, so the two paths stay equivalent.
1417+
withTempView("t") {
1418+
sql("CREATE TEMPORARY VIEW t AS SELECT 1 AS id")
1419+
spark.udf.register("abs", (x: Int, y: Int) => x + y)
1420+
try {
1421+
withSQLConf("spark.sql.functionResolution.sessionOrder" -> "second") {
1422+
// The 2-arg session function is reachable via explicit `session.` qualification...
1423+
checkAnswer(sql("SELECT session.abs(1, 2) FROM t"), Row(3))
1424+
// ...but the unqualified name resolves to the built-in `abs` first and raises its
1425+
// argument error instead of resolving the 2-arg session function.
1426+
intercept[AnalysisException](sql("SELECT abs(1, 2) FROM t").collect())
1427+
}
1428+
} finally {
1429+
spark.sessionState.catalog.dropTempFunction("abs", ignoreIfNotExists = true)
1430+
}
1431+
}
1432+
}
1433+
1434+
test("SECTION 17f: SPARK-57758 per-pass memo recomputes for a SQL-function body whose pinned " +
1435+
"path differs from the caller") {
1436+
// The subtlest recompute trigger: a SQL-function body runs under a FRESH AnalysisContext
1437+
// carrying the function's own stored resolution path (not the caller's). The memo lives on the
1438+
// context, so the body computes its own path within the same analysis pass as the outer query.
1439+
// Here the body's `abs` is pinned to the built-in (created under a builtin-first path) while
1440+
// the caller resolves the unqualified `abs` to a shadowing persistent function (catalog-first)
1441+
// -- so a single statement must yield both resolutions, proving neither context reuses the
1442+
// other's memo.
1443+
withSQLConf(SQLConf.PATH_ENABLED.key -> "true") {
1444+
withDatabase("path_body") {
1445+
sql("CREATE DATABASE path_body")
1446+
// Persistent `abs` shadowing the built-in for the caller's catalog-first path.
1447+
sql("CREATE FUNCTION path_body.abs(x INT) RETURNS INT RETURN x * 10")
1448+
try {
1449+
// Create the function body while `system.builtin` leads -> the body's unqualified `abs`
1450+
// is pinned to the built-in.
1451+
sql("SET PATH = system.builtin, spark_catalog.path_body")
1452+
sql("CREATE FUNCTION path_body.use_abs(x INT) RETURNS INT RETURN abs(x)")
1453+
// Caller path puts the catalog first -> a top-level unqualified `abs` resolves to the
1454+
// persistent `abs` (x * 10), while the body keeps resolving its `abs` to the built-in.
1455+
sql("SET PATH = spark_catalog.path_body, system.builtin")
1456+
checkAnswer(
1457+
sql("SELECT abs(-5) AS top, path_body.use_abs(-5) AS body"),
1458+
Row(-50, 5))
1459+
} finally {
1460+
sql("SET PATH = DEFAULT_PATH")
1461+
sql("DROP FUNCTION IF EXISTS path_body.use_abs")
1462+
sql("DROP FUNCTION IF EXISTS path_body.abs")
1463+
}
1464+
}
1465+
}
1466+
}
13191467
}
13201468

13211469
/**

0 commit comments

Comments
 (0)