[SPARK-56984][DOCS] Document the SQL PATH feature#56040
Conversation
### What changes were proposed in this pull request?
This change adds user-facing documentation for the SQL Standard PATH feature
introduced in Spark 4.2 (SPARK-56939 and related): the `SET PATH` statement,
the `current_path()` function, path-based resolution of unqualified routines,
tables, views, and session variables, and the supporting infrastructure
(`system.builtin` / `system.session` namespaces with `builtin.` / `session.`
shortcuts, `spark.sql.path.enabled`, `spark.sql.defaultPath`).
New pages:
- `docs/sql-ref-syntax-aux-conf-mgmt-set-path.md` - reference for the
`SET PATH` statement, including a dedicated subsection on how
`DEFAULT_PATH` is derived and how to change it.
- `docs/sql-ref-function-current-path.md` - reference for the
`current_path()` builtin.
Modified pages:
- `docs/sql-ref-name-resolution.md` - new "SQL Path" section that introduces
the concept, the `system.builtin` / `system.session` namespaces and their
2-part shortcuts, the path-walk for unqualified DML / queries vs the
current-schema rule for DDL, the frozen-path behavior for persistent views
and SQL UDFs, and the "Reserved names and collisions" subsection. Table /
view and function resolution sections rewritten accordingly.
- `docs/sql-ref-identifier.md` - new "Reserved system names" table linking
back to "Reserved names and collisions".
- `docs/sql-ref-syntax-aux-describe-function.md` - examples for SQL UDF
`Function / Type / Input / Returns` output, qualified builtin lookup
(`system.builtin.abs`), and the `SQL Path:` row in
`DESCRIBE FUNCTION EXTENDED`.
- `docs/sql-ref-syntax-aux-describe-table.md` - example for the `SQL Path`
row in `DESCRIBE EXTENDED` on a view.
- `docs/sql-ref-syntax-ddl-create-view.md`, `create-sql-function.md`,
`create-function.md` - allow `session` / `system.session` qualifier on
temporary objects (`INVALID_TEMP_OBJ_QUALIFIER` otherwise), and add
frozen-path examples.
- `docs/sql-ref-syntax-ddl-drop-view.md`, `drop-function.md` - clarify
`DROP TEMPORARY FUNCTION` vs `DROP VIEW` semantics and the qualifiers
accepted in each.
- `docs/sql-ref-syntax-ddl-create-database.md` - note discouraging the
schema names `session` and `builtin`, with a link to the canonical
description.
- `docs/sql-ref-syntax-aux-conf-mgmt-set.md`,
`docs/sql-ref-syntax-aux-conf-mgmt.md`, `docs/sql-ref-syntax.md` -
cross-link `SET PATH`.
- `docs/sql-migration-guide.md` - 4.1 -> 4.2 entries for the
`builtin.x` / `session.x` resolution change, the new temp-object
qualifiers, and the opt-in PATH feature.
### Why are the changes needed?
The PATH feature (SPARK-56939 and friends) shipped without external
documentation. Users have no published place to learn about `SET PATH`,
`current_path()`, the `system.builtin` / `system.session` namespaces, or the
path-walking resolution rules; the behavior change for partially qualified
`builtin.x` / `session.x` references is also a 4.1 -> 4.2 migration concern
that needs to be called out.
### Does this PR introduce _any_ user-facing change?
No. This change is documentation-only.
### How was this patch tested?
- Markdown lint clean on every touched file.
- Spot-checked for non-ASCII typographic characters; none introduced.
- Cross-checked every behavioral claim against the relevant test suites:
`sql-tests/inputs/sql-path.sql`, `SetPathSuite`, `FunctionQualificationSuite`,
`RelationQualificationSuite`, `SQLFunctionSuite`, and `DescribeTableSuite`
("DESCRIBE EXTENDED AS JSON for view shows SQL Path when PATH is enabled").
Local Jekyll build was attempted but blocked on Ruby 3 / Bundler 2.4.22
which were not installed in the local environment; relying on the GitHub
docs CI to validate the HTML build.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.7)
Consolidate the per-object resolution sections in `sql-ref-name-resolution.md` into a single `Object name resolution` section with the three subsections `Fully qualified`, `Partially qualified`, and `Unqualified`. `Table and view resolution` and `Function resolution` are now thin sections that list what each kind of object can resolve to, carry their kind-specific notes (common table expressions for relations) and errors (`TABLE_OR_VIEW_NOT_FOUND` / `UNRESOLVED_ROUTINE`), and keep their existing examples. Move the conceptual material on the SQL Path (what it is, the `system.builtin` / `system.session` namespaces, DML-vs-DDL, `spark.sql.path.enabled` gating, the initial-value-of-PATH rule, and the frozen-path semantics for persistent views and SQL UDFs) into the Description section of `SET PATH`. Make the `Reserved system names` section on the Identifiers page the canonical reference for `system` / `session` / `builtin`, with the mini-path table and the 3-part-bypass rule. Update cross-page links to point at these new homes. Tighten the prose pass-wide on the rewritten sections: drop "worked examples", "in particular", "as special cases", "small two-step", "straightforward", and similar filler; lead the 2-part case with the common rule (`current_catalog`-prepend) and treat the `session` / `builtin` mini-path as the exception; remove the bogus `current_catalog.builtin.x` "special case" bullet from the 3-part case; make `Frozen SQL Path` an inline note rather than a heading. No behavior changes; documentation only.
Iterative copy-edit pass on the documentation introduced for the SQL
Path feature, plus a handful of small accuracy fixes uncovered during
review:
- `sql-ref-name-resolution.md`: the page intro and the
per-object-kind sections are slimmed down. The DML/queries vs DDL
rule is folded into the `Unqualified (1 part)` paragraph.
Per-object error references (`TABLE_OR_VIEW_NOT_FOUND` /
`UNRESOLVED_ROUTINE`) live with the corresponding sections.
- `sql-ref-syntax-aux-conf-mgmt-set-path.md`: the Description is
reworked into prose ("the initial value of PATH is DEFAULT_PATH ...");
the Syntax block now follows the rest of the Spark SQL reference
(square brackets, alternation with braces, ellipsis for repetition,
production name on its own line); the grammar commits only to
two-level `catalog.schema` references; the
`spark.sql.functionResolution.sessionOrder` configuration is
de-emphasized; `RESET PATH` wording is dropped in favor of describing
the actual revert mechanism (`SET PATH = DEFAULT_PATH`); two applied
examples are added (appending a shared-UDF schema; dropping
`system.session` to force explicit qualification).
- `sql-ref-function-current-path.md`: the Syntax block matches the
auto-generated style (`current_path()`), with the no-parens form
noted briefly under Arguments. The "still works when disabled"
disclaimer and per-page `spark.sql.path.enabled` toggle are removed.
- `sql-ref-identifier.md`: the *Reserved system names* section is
rewritten to describe the actual tie-breaker behavior (per-object
name collisions, not whole-schema hiding) and to use real catalog
names in examples instead of the meta-syntactic `current_catalog.x`.
- `sql-ref-functions-builtin.md`: a one-paragraph intro explains that
built-in functions live in `system.builtin` and can be referenced
unambiguously via the fully qualified name.
- `sql-ref-syntax-aux-describe-table.md`: the JSON schema gains a
`sql_path` entry; the worked view example is generic (the SQL Path
row appears in the output but is no longer the headline); the
`DESC FORMATTED ... AS JSON` outputs are pretty-printed.
- `sql-ref-syntax-ddl-create-view.md` /
`sql-ref-syntax-ddl-create-sql-function.md`: the frozen-path note
moves from the object-name parameter (where it didn't belong) to the
query/expression parameter. Both pages now state that a persistent
view / SQL UDF cannot reference temporary views, temporary functions,
or session variables.
- `sql-ref-syntax-ddl-drop-function.md` /
`sql-ref-syntax-ddl-drop-view.md`: the parameter prose is shortened;
DROP VIEW gains a worked example of using `session.v` to drop a
temporary view that shadows a persistent one. The stale
AnalysisException output is replaced with `Error: TABLE_OR_VIEW_NOT_FOUND`.
- `sql-ref-syntax-ddl-create-database.md`: the discouraged-name note
shrinks to a one-line pointer to *Reserved system names*.
- `sql-ref-syntax.md`: TOC links `CREATE FUNCTION (SQL)` explicitly.
- `sql-migration-guide.md`: the 4.1 -> 4.2 entry uses
`spark_catalog.session.x` (a real catalog name) instead of the
meta-syntactic `current_catalog.session.x`.
No new behavior; documentation copy and accuracy only.
Four small accuracy nits caught in a self-review pass: 1. SET PATH grammar: removed the self-referential `schema_name` production by inlining `catalog_name . schema_name` directly into `path_element`. The previous form defined `schema_name` recursively with itself, which is hard to read literally. 2. `SYSTEM_PATH` parameter: dropped the explicit ordering "expands to `system.builtin, system.session`". The actual order depends on `spark.sql.functionResolution.sessionOrder`, which the rest of the page de-emphasizes. Now reads "Expands to the two system namespaces, `system.builtin` and `system.session`." 3. SET PATH Description: "To revert mid-session, run `SET PATH = DEFAULT_PATH`" overstated the operation. The statement stores a snapshot of `DEFAULT_PATH` into `_sessionPath` rather than restoring the "never-set" state, so a later change to `spark.sql.defaultPath` is not picked up. Reworded to "re-apply the current default path mid-session" with a brief parenthetical that names the snapshot behavior. 4. Reserved system names: "spark.sql.catalog.system = ... is unsupported" was correct but suggested a clean rejection. In fact the v2 catalog loader does not special-case `system` and registering it gives undefined results, per the CatalogManager comment. Now reads "is unsupported and may yield undefined results". Documentation only; no behavior changes.
Plain-English copy edit on the SQL Path documentation. No content changes; word choices favor common terms over technical jargon and remove minor stylistic inconsistencies. - `synthetic` -> `virtual` for the `system` catalog and namespaces (matches usage elsewhere in the doc set). - `is gated by` -> `is controlled by` for `spark.sql.path.enabled`. - `how it is gated` -> `how to enable it`. - `A SET PATH is scoped` -> `The effect of SET PATH is scoped` (avoids the awkward indefinite-article noun). - `(cycle break) rather than recursing` -> `to avoid a cycle, rather than recursing`. - `live marker` -> `re-evaluated each time` in a code comment. - `spelled out` -> `qualified explicitly` in a code comment. - `flip the preference` -> `reverse the preference`. - `may yield undefined results` -> `produces undefined results`. - `literally named X` -> `named X` (drop the redundant adverb). - `extension-injected functions` -> `functions injected through SparkSessionExtensions` in the migration guide. Documentation only; no behavior changes.
cloud-fan
left a comment
There was a problem hiding this comment.
Summary
Thanks for the comprehensive SQL Path docs — the new SET PATH and current_path() pages plus the restructured name-resolution page give the feature a coherent reference surface. A few accuracy and consistency items below; the one I'd most like to see addressed is the broken bullet list in create-sql-function.md (item 1) — the new paragraphs land in the middle of the disallowed-expressions list and orphan the Row producing functions bullet from its siblings, which will render as two unrelated lists.
Other items are accuracy gaps (the SET PATH schema_name parameter under-documents 3+ part namespaces that the grammar actually accepts; migration guide omits session variables from the PATH feature description; the abs shadowing example doesn't quite demonstrate the rule it captions; same cust_id column shows two different type spellings in describe-table) plus small consistency fixes.
Seven items from the PR review:
1. `sql-ref-syntax-ddl-create-sql-function.md`: the new frozen-path
paragraphs were inserted inside the bulleted list of disallowed
expression types, orphaning the `Row producing functions such as
explode` bullet (Kramdown rendered it as two separate lists with
body paragraphs in between). Moved the paragraphs after the list.
2. `sql-ref-syntax-aux-conf-mgmt-set-path.md`: the `schema_name`
parameter previously said "Both parts are required" (i.e. exactly
two), but the implementation accepts multi-level namespaces
(`SetPathSuite` test "multi-level namespace (3+ parts) is
accepted", and the `INVALID_SQL_PATH_SCHEMA_REFERENCE` error
message itself documents the allowance). Updated to "`catalog.schema`
or, for catalogs with multi-level namespaces, `catalog.ns1.ns2...`.
At least two parts are required." The grammar block now reads
`catalog_name . namespace [ . namespace ... ]`.
3. `sql-migration-guide.md`: the PATH-feature bullet omitted session
variables (a documented PATH consumer with a dedicated test) and
opened with "Spark 4.2 introduces..." while every other bullet in
the section opens with "Since Spark 4.2,". Both fixed.
4. `sql-ref-function-current-path.md`: a stray "persisted view" in a
code comment; the rest of the PR uses "persistent view". Fixed.
5. `sql-ref-identifier.md`: the canonical Reserved system names
section now introduces the term "mini-path" in prose so that
cross-page link text from `name-resolution.md` lands somewhere
that defines it.
6. `sql-ref-syntax-aux-describe-table.md`: the same `cust_id` column
appeared as `{"name": "int"}` in the view example and
`{"name": "integer"}` in the legacy `customer` example. The
doc's own JSON schema block specifies `int` for `IntegerType`,
so the legacy example was wrong; aligned both to `int`.
7. `sql-ref-name-resolution.md`: the `abs` shadowing example created
a 0-arg temp `abs()` and then called `abs(-5)` (one arg), which
was a signature mismatch rather than a shadow. Rewrote with a
matching `abs(x INT)` temp and an explicit `SET PATH =
system.session, system.builtin, spark_catalog.default` so the
unqualified `abs(-5)` actually resolves to the temp; the example
then demonstrates `system.builtin.abs(-5)` reaching around the
shadow.
Documentation only; no behavior changes.
cloud-fan
left a comment
There was a problem hiding this comment.
Thanks for the prompt turnaround — all 7 prior findings addressed cleanly, no remaining issues.
Two new nits below, both terminology/labeling: (1) the grammar update in set-path.md left the schema_name parameter heading mismatched with the new namespace grammar token (newly introduced by d04ce7a), and (2) identifier.md's "unqualified 2-part reference" wording conflicts with the "Partially qualified (2 parts)" taxonomy this PR sets up in name-resolution.md (late catch — I should have flagged this last round).
| `USE SCHEMA` statements are picked up without re-issuing `SET PATH`. | ||
| `CURRENT_DATABASE` is a synonym for `CURRENT_SCHEMA`. | ||
|
|
||
| * **`schema_name`** |
There was a problem hiding this comment.
The grammar update in this commit (catalog_name . namespace [ . namespace ... ] on line 74) left this parameter heading orphaned: every other heading in this block — DEFAULT_PATH, SYSTEM_PATH, PATH, CURRENT_SCHEMA, CURRENT_DATABASE — appears verbatim as a token in the grammar above, but schema_name no longer does. The pre-follow-up grammar said catalog_name . schema_name, so the alignment used to hold.
Suggest renaming this heading (and its label) to match the new grammar, e.g.:
* **`catalog_name . namespace [ . namespace ... ]`**
An explicit catalog-qualified namespace reference. At least two parts are required.
The catalog and namespace do not need to exist at the time of `SET PATH`; non-existent
entries are silently skipped during name resolution. ...
Alternatively, revert the grammar to catalog_name . schema_name and convey "multi-level namespaces allowed" only in the body — but the current grammar form is more accurate, so realigning the heading seems better.
| | `builtin` | schema | A persistent schema named `builtin` is allowed but discouraged because it collides with `system.builtin`. | | ||
| | `session` | schema | A persistent schema named `session` is allowed but discouraged because it collides with `system.session`. | | ||
|
|
||
| An unqualified 2-part reference like `builtin.x` or `session.x` walks a small **mini-path** to |
There was a problem hiding this comment.
"An unqualified 2-part reference" is in tension with the taxonomy this PR establishes in sql-ref-name-resolution.md:273, which heads exactly this case as ### Partially qualified (2 parts) — schema.object. describe-function.md:47 (also new in this PR) just says "2-part names". A 2-part reference like builtin.x is partially qualified — it carries one level of qualifier (the schema), so calling it "unqualified" reads as self-contradictory.
(Late catch — this wording was already in the prior review's snapshot and I should have flagged it then. Apologies for the second pass.)
| An unqualified 2-part reference like `builtin.x` or `session.x` walks a small **mini-path** to | |
| A partially qualified 2-part reference like `builtin.x` or `session.x` walks a small **mini-path** to |
What changes were proposed in this pull request?
Adds user-facing documentation for the SQL Standard PATH feature
introduced in Spark 4.2 (SPARK-56939 and related): the
SET PATHstatement, the
current_path()function, path-based resolution ofunqualified routines, tables, views, and session variables, and the
supporting infrastructure (
system.builtin/system.sessionnamespaceswith
builtin./session.shortcuts,spark.sql.path.enabled,spark.sql.defaultPath).New pages:
docs/sql-ref-syntax-aux-conf-mgmt-set-path.md— reference for theSET PATHstatement, including a dedicated subsection on howDEFAULT_PATHis derived and how to change it.docs/sql-ref-function-current-path.md— reference for thecurrent_path()builtin.Modified pages:
docs/sql-ref-name-resolution.md— new SQL Path section thatintroduces the concept, the
system.builtin/system.sessionnamespaces and their 2-part shortcuts, the path-walk for unqualified
DML / queries vs the current-schema rule for DDL, the frozen-path
behavior for persistent views and SQL UDFs, and the Reserved names and
collisions subsection. Table / view and function resolution sections
rewritten accordingly.
docs/sql-ref-identifier.md— Reserved system names tablelinking back to the canonical description.
docs/sql-ref-syntax-aux-describe-function.md— examples for SQLUDF
Function / Type / Input / Returnsoutput, qualified builtinlookup (
system.builtin.abs), and theSQL Path:row inDESCRIBE FUNCTION EXTENDED.docs/sql-ref-syntax-aux-describe-table.md— example for theSQL Pathrow inDESCRIBE EXTENDEDon a view.docs/sql-ref-syntax-ddl-create-view.md,create-sql-function.md,create-function.md— document thesession/system.sessionqualifier on temporary objects(
INVALID_TEMP_OBJ_QUALIFIERotherwise), and add frozen-path examples.docs/sql-ref-syntax-ddl-drop-view.md,drop-function.md—clarify
DROP TEMPORARY FUNCTIONvsDROP VIEWsemantics and thequalifiers accepted in each.
docs/sql-ref-syntax-ddl-create-database.md— note discouragingthe schema names
sessionandbuiltin.docs/sql-ref-syntax-aux-conf-mgmt-set.md,docs/sql-ref-syntax-aux-conf-mgmt.md,docs/sql-ref-syntax.md— cross-link
SET PATH.docs/sql-migration-guide.md— 4.1 → 4.2 entries for thebuiltin.x/session.xresolution change, the new temp-objectqualifiers, and the opt-in PATH feature.
Why are the changes needed?
The PATH feature (SPARK-56939 and friends) shipped without external
documentation. Users have no published place to learn about
SET PATH,current_path(), thesystem.builtin/system.sessionnamespaces, orthe path-walking resolution rules; the behavior change for partially
qualified
builtin.x/session.xreferences is also a 4.1 → 4.2migration concern that needs to be called out.
Does this PR introduce any user-facing change?
No. This change is documentation-only.
How was this patch tested?
sql-tests/inputs/sql-path.sql,SetPathSuite,FunctionQualificationSuite,RelationQualificationSuite,SQLFunctionSuite, andDescribeTableSuite(DESCRIBE EXTENDED AS JSON for view shows SQL Path when PATH is
enabled).
Local Jekyll build was attempted but blocked on Ruby 3 / Bundler 2.4.22
which were not installed in the local environment; relying on the GitHub
docs CI to validate the HTML build.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.7)