Spark: enforce Hive layout for LOCATION and disable iceberg table rename (1.11.0)#2
Open
anandnalya wants to merge 11 commits into
Open
Spark: enforce Hive layout for LOCATION and disable iceberg table rename (1.11.0)#2anandnalya wants to merge 11 commits into
anandnalya wants to merge 11 commits into
Conversation
For tables of the form `<db>.<table>`, validate that any user-supplied LOCATION ends with `<db>.db/<table>`. Validation runs on CREATE TABLE, CTAS, RTAS, stage variants, and ALTER TABLE SET LOCATION; null/empty pass through so the catalog default applies. Also disable iceberg table renames in SparkSessionCatalog since renames invalidate the layout invariant. Applied to spark/v3.4, v3.5, v4.0.
Accept '<db>.db/<table>_new' alongside '<db>.db/<table>' to support rebuild/swap workflows where a replacement table is staged under a sibling directory before being promoted. Applied to spark/v3.4, v3.5, v4.0.
…nLayoutValidator Strip a trailing _new suffix from the identifier name to derive a canonical base, then accept locations ending in <db>.db/<base> or <db>.db/<base>_new. This supports swap-via-rename rebuild workflows where the staging table is named <table>_new but already points at the canonical <table> location before being renamed in the catalog. Existing Hive layout and sibling-directory rebuild allowances are unchanged.
Mirror the LOCATION layout validation and disabled-rename behavior from spark/v3.4, v3.5, and v4.0 onto the new spark/v4.1 module introduced in Iceberg 1.11.0. Adds LocationLayoutValidator plus its unit and integration tests, wraps the four named-identifier create/stage withLocation calls and the ALTER SET LOCATION path with the validator, and disables iceberg table rename in SparkSessionCatalog.
Wrap the LocationLayoutValidator.validateAndReturn call sites in SparkCatalog and the assertThatThrownBy lambda in TestSparkSessionCatalogRename to satisfy google-java-format's 100-column limit, so spotlessJavaCheck (run as part of :check in CI) passes across spark/v3.4, v3.5, v4.0, and v4.1.
…ests
Two checkstyle rules flagged the carried files: the JavadocStyle check read the literal <table> placeholder in LocationLayoutValidator's Javadoc as an unclosed HTML element (table is a real HTML tag requiring a close, unlike the <db>/<base> placeholders), and the AssertThatThrownByWithMessageCheck rule required the four reject-path assertThatThrownBy calls in TestLocationLayoutValidator to assert on the exception message. Rename the <table> placeholder to <tbl> and add .hasMessageContaining("must end with") to the four assertions, across spark/v3.4, v3.5, v4.0, and v4.1.
… HDFS With a Hive catalog, DROP TABLE removes the metastore entry without deleting data, and DROP TABLE PURGE removes the metastore entry before deleting files client-side, so a user without HDFS delete permission on the table location ends up with orphaned data that is no longer reachable through any catalog. DropTablePermissionValidator runs before both drop paths (hooked in dropTableWithoutPurging, which purgeTable also routes through) and calls FileSystem.access, a NameNode-side checkAccess RPC evaluated against the caller's UGI, requiring WRITE and EXECUTE on the table directory and its parent. Only hdfs locations are enforced since other filesystems fall back to non-authoritative client-side checks; null, empty, or missing locations pass through so dangling metastore entries can still be dropped. Applied to spark/v3.4, v3.5, v4.0, v4.1.
Mirror the NameNode's FSPermissionChecker sticky-bit rule on delete: when a directory has the sticky bit set, an entry can only be removed by the entry's owner or the directory's owner. The rule is applied to the parent directory (unlinking the table directory) and, when the table directory itself is sticky and not owned by the caller, to each of its immediate children via listStatus. The existence probe switches from exists() to getFileStatus() since the check now needs owner and permission bits anyway. HDFS superusers bypass sticky checks server-side but cannot be detected client-side, so they may be blocked spuriously; this is documented on the validator.
Usernames starting with bot- are pipeline service accounts that manage table lifecycles; skip the HDFS delete-permission validation entirely for them, before any filesystem RPC is issued.
Reuse the sparkSession local in initialize() instead of re-resolving the active session, and let DROP TABLE proceed when loadTable fails with RuntimeIOException so that tables with unreadable metadata (wrapped IO failures during refresh) can still be cleaned up, matching the documented fail-open behavior for NotFoundException.
…k-1.11.0 Spark: block DROP TABLE when the user cannot delete the table data on HDFS (1.11.0)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Location layout enforcement and rename guard
LOCATIONonCREATE TABLE, CTAS, RTAS, stage variants, andALTER TABLE SET LOCATION: for table<db>.<table>the path must end with<db>.db/<base>or<db>.db/<base>_new, where<base>is the identifier name with any trailing_newsuffix stripped. Null/empty pass through so the catalog default applies._newallowance supports swap-via-rename rebuild workflows in both directions: a sibling<table>_newstaging directory next to<table>, or an Iceberg table named<table>_newthat already points at the canonical<table>location and is later renamed in the catalog (the pattern used by Proteus'sIcebergFullReplacePartitioned.handleSchemaEvolution).SparkSessionCatalogsince renames would invalidate the layout invariant.DROP TABLE HDFS permission pre-check (merged via #3)
DROP TABLEandDROP TABLE PURGEfail up front with aValidationExceptionwhen the current user lacks the HDFS permissions needed to delete the table's files, instead of removing the metastore entry and leaving the data behind as unreachable orphans. Hooked intoSparkCatalog#dropTableWithoutPurging, which bothdropTableandpurgeTableroute through, so plain drops, purges, andSparkSessionCatalogdelegation are all covered.checkAccessRPC (group and ACL aware) requiring WRITE+EXECUTE on the table directory and its parent, and mirrors the NameNode's sticky-bit owner rules: the parent-directory rule for unlinking the table directory, and the table directory's immediate children when the table dir itself is sticky and not owned by the caller.hdfs://locations are enforced (other filesystems fall back to non-authoritative client-side checks); null/empty/missing locations and tables with unreadable metadata (NoSuchTableException/NotFoundException/RuntimeIOException) pass through so dangling metastore entries can still be cleaned up; service accounts with usernames starting withbot-are exempt entirely.purgeTablestill drops the HMS entry before deleting files, so the pre-check shrinks but does not eliminate the orphaning window; HDFS superusers bypass sticky checks server-side but cannot be detected client-side, so a non-bot-superuser could be spuriously blocked.Applied to
spark/v3.4,v3.5,v4.0, andv4.1. Based on apache-iceberg-1.11.0. This is the 1.11.0 port of #1 (which targeted 1.10.1 and covered v3.4/v3.5/v4.0); the newspark/v4.1module introduced in 1.11.0 is now covered as well.Test plan
TestLocationLayoutValidator(17 unit tests) passes on v3.4/v3.5/v4.0/v4.1TestSparkCatalogLocationLayout(6 integration tests covering CREATE/CTAS/RTAS/ALTER reject paths, the_new-identifier reject message, and null pass-through) passes on v3.4/v3.5/v4.0/v4.1TestSparkSessionCatalogRename(1 integration test for rename rejection) passes on v3.4/v3.5/v4.0/v4.1TestDropTablePermissionValidator(15 unit tests: allow, deny on table dir, deny on parent, non-HDFS skip, missing location, concurrent-delete race, RPC failure, null/empty location, six sticky-bit scenarios,bot-exemption viaUserGroupInformation.doAs) passes on v3.4/v3.5/v4.0/v4.1DirectByteBuffer.<init>(long, int)incompatibility affects everyTestBase-derived test in v3.4 regardless of these changes; v3.5/v4.0/v4.1 run on JDK 21).