Skip to content

Spark: enforce Hive layout for LOCATION and disable iceberg table rename (1.11.0)#2

Open
anandnalya wants to merge 11 commits into
1.11.0from
feat/spark-location-allowlist-1.11.0
Open

Spark: enforce Hive layout for LOCATION and disable iceberg table rename (1.11.0)#2
anandnalya wants to merge 11 commits into
1.11.0from
feat/spark-location-allowlist-1.11.0

Conversation

@anandnalya

@anandnalya anandnalya commented Jun 9, 2026

Copy link
Copy Markdown

Summary

Location layout enforcement and rename guard

  • Validate user-supplied LOCATION on CREATE TABLE, CTAS, RTAS, stage variants, and ALTER TABLE SET LOCATION: for table <db>.<table> the path must end with <db>.db/<base> or <db>.db/<base>_new, where <base> is the identifier name with any trailing _new suffix stripped. Null/empty pass through so the catalog default applies.
  • The _new allowance supports swap-via-rename rebuild workflows in both directions: a sibling <table>_new staging directory next to <table>, or an Iceberg table named <table>_new that already points at the canonical <table> location and is later renamed in the catalog (the pattern used by Proteus's IcebergFullReplacePartitioned.handleSchemaEvolution).
  • Disable iceberg table renames in SparkSessionCatalog since renames would invalidate the layout invariant.

DROP TABLE HDFS permission pre-check (merged via #3)

  • DROP TABLE and DROP TABLE PURGE fail up front with a ValidationException when the current user lacks the HDFS permissions needed to delete the table's files, instead of removing the metastore entry and leaving the data behind as unreachable orphans. Hooked into SparkCatalog#dropTableWithoutPurging, which both dropTable and purgeTable route through, so plain drops, purges, and SparkSessionCatalog delegation are all covered.
  • The check runs a NameNode-side checkAccess RPC (group and ACL aware) requiring WRITE+EXECUTE on the table directory and its parent, and mirrors the NameNode's sticky-bit owner rules: the parent-directory rule for unlinking the table directory, and the table directory's immediate children when the table dir itself is sticky and not owned by the caller.
  • Only hdfs:// locations are enforced (other filesystems fall back to non-authoritative client-side checks); null/empty/missing locations and tables with unreadable metadata (NoSuchTableException/NotFoundException/RuntimeIOException) pass through so dangling metastore entries can still be cleaned up; service accounts with usernames starting with bot- are exempt entirely.
  • Known limitations: only the top level is inspected (not the full tree); purgeTable still drops the HMS entry before deleting files, so the pre-check shrinks but does not eliminate the orphaning window; HDFS superusers bypass sticky checks server-side but cannot be detected client-side, so a non-bot- superuser could be spuriously blocked.

Applied to spark/v3.4, v3.5, v4.0, and v4.1. Based on apache-iceberg-1.11.0. This is the 1.11.0 port of #1 (which targeted 1.10.1 and covered v3.4/v3.5/v4.0); the new spark/v4.1 module introduced in 1.11.0 is now covered as well.

Test plan

  • TestLocationLayoutValidator (17 unit tests) passes on v3.4/v3.5/v4.0/v4.1
  • TestSparkCatalogLocationLayout (6 integration tests covering CREATE/CTAS/RTAS/ALTER reject paths, the _new-identifier reject message, and null pass-through) passes on v3.4/v3.5/v4.0/v4.1
  • TestSparkSessionCatalogRename (1 integration test for rename rejection) passes on v3.4/v3.5/v4.0/v4.1
  • TestDropTablePermissionValidator (15 unit tests: allow, deny on table dir, deny on parent, non-HDFS skip, missing location, concurrent-delete race, RPC failure, null/empty location, six sticky-bit scenarios, bot- exemption via UserGroupInformation.doAs) passes on v3.4/v3.5/v4.0/v4.1
  • v3.4 integration tests run on JDK 17 (the Spark 3.4 + JDK 21 DirectByteBuffer.<init>(long, int) incompatibility affects every TestBase-derived test in v3.4 regardless of these changes; v3.5/v4.0/v4.1 run on JDK 21).
  • Drop permission pre-check manually validated on hiyu (2026-07-03) through Kyuubi: deny/purge-deny/ACL-allow/sticky-deny/dangling-allow/bot-exemption all behaved as designed — full procedure and results in hadoop-cluster#2699.
  • Smoke test the location/rename enforcement in a real Spark + HMS deployment before promoting.

For tables of the form `<db>.<table>`, validate that any user-supplied LOCATION ends with `<db>.db/<table>`. Validation runs on CREATE TABLE, CTAS, RTAS, stage variants, and ALTER TABLE SET LOCATION; null/empty pass through so the catalog default applies. Also disable iceberg table renames in SparkSessionCatalog since renames invalidate the layout invariant. Applied to spark/v3.4, v3.5, v4.0.
Accept '<db>.db/<table>_new' alongside '<db>.db/<table>' to support rebuild/swap workflows where a replacement table is staged under a sibling directory before being promoted. Applied to spark/v3.4, v3.5, v4.0.
…nLayoutValidator

Strip a trailing _new suffix from the identifier name to derive a canonical base, then accept locations ending in <db>.db/<base> or <db>.db/<base>_new. This supports swap-via-rename rebuild workflows where the staging table is named <table>_new but already points at the canonical <table> location before being renamed in the catalog. Existing Hive layout and sibling-directory rebuild allowances are unchanged.
Mirror the LOCATION layout validation and disabled-rename behavior from spark/v3.4, v3.5, and v4.0 onto the new spark/v4.1 module introduced in Iceberg 1.11.0. Adds LocationLayoutValidator plus its unit and integration tests, wraps the four named-identifier create/stage withLocation calls and the ALTER SET LOCATION path with the validator, and disables iceberg table rename in SparkSessionCatalog.
@github-actions github-actions Bot added the SPARK label Jun 9, 2026
Wrap the LocationLayoutValidator.validateAndReturn call sites in SparkCatalog and the assertThatThrownBy lambda in TestSparkSessionCatalogRename to satisfy google-java-format's 100-column limit, so spotlessJavaCheck (run as part of :check in CI) passes across spark/v3.4, v3.5, v4.0, and v4.1.
…ests

Two checkstyle rules flagged the carried files: the JavadocStyle check read the literal <table> placeholder in LocationLayoutValidator's Javadoc as an unclosed HTML element (table is a real HTML tag requiring a close, unlike the <db>/<base> placeholders), and the AssertThatThrownByWithMessageCheck rule required the four reject-path assertThatThrownBy calls in TestLocationLayoutValidator to assert on the exception message. Rename the <table> placeholder to <tbl> and add .hasMessageContaining("must end with") to the four assertions, across spark/v3.4, v3.5, v4.0, and v4.1.
… HDFS

With a Hive catalog, DROP TABLE removes the metastore entry without deleting data, and DROP TABLE PURGE removes the metastore entry before deleting files client-side, so a user without HDFS delete permission on the table location ends up with orphaned data that is no longer reachable through any catalog. DropTablePermissionValidator runs before both drop paths (hooked in dropTableWithoutPurging, which purgeTable also routes through) and calls FileSystem.access, a NameNode-side checkAccess RPC evaluated against the caller's UGI, requiring WRITE and EXECUTE on the table directory and its parent. Only hdfs locations are enforced since other filesystems fall back to non-authoritative client-side checks; null, empty, or missing locations pass through so dangling metastore entries can still be dropped. Applied to spark/v3.4, v3.5, v4.0, v4.1.
anandnalya and others added 4 commits July 3, 2026 13:37
Mirror the NameNode's FSPermissionChecker sticky-bit rule on delete: when a directory has the sticky bit set, an entry can only be removed by the entry's owner or the directory's owner. The rule is applied to the parent directory (unlinking the table directory) and, when the table directory itself is sticky and not owned by the caller, to each of its immediate children via listStatus. The existence probe switches from exists() to getFileStatus() since the check now needs owner and permission bits anyway. HDFS superusers bypass sticky checks server-side but cannot be detected client-side, so they may be blocked spuriously; this is documented on the validator.
Usernames starting with bot- are pipeline service accounts that manage table lifecycles; skip the HDFS delete-permission validation entirely for them, before any filesystem RPC is issued.
Reuse the sparkSession local in initialize() instead of re-resolving the active session, and let DROP TABLE proceed when loadTable fails with RuntimeIOException so that tables with unreadable metadata (wrapped IO failures during refresh) can still be cleaned up, matching the documented fail-open behavior for NotFoundException.
…k-1.11.0

Spark: block DROP TABLE when the user cannot delete the table data on HDFS (1.11.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant