Skip to content

Commit a08cb4e

Browse files
authored
feat: vendor-pluggable S3 credentials for native scans (#4309)
1 parent a95a2d4 commit a08cb4e

39 files changed

Lines changed: 2808 additions & 140 deletions

dev/ci/check-suites.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ def file_to_class_name(path: Path) -> str | None:
3535
"org.apache.comet.parquet.ParquetReadSuite", # abstract
3636
"org.apache.comet.parquet.ParquetReadFromS3Suite", # manual test suite
3737
"org.apache.comet.IcebergReadFromS3Suite", # manual test suite
38+
"org.apache.comet.cloud.s3.CometS3CredentialBridgeSuite", # manual test suite
3839
"org.apache.spark.sql.comet.CometPlanStabilitySuite", # abstract
3940
"org.apache.spark.sql.comet.ParquetDatetimeRebaseSuite", # abstract
4041
"org.apache.comet.exec.CometColumnarShuffleSuite" # abstract

docs/source/contributor-guide/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Arrow FFI <ffi>
4949
JVM Shuffle <jvm_shuffle>
5050
Native Shuffle <native_shuffle>
5151
ANSI Error Propagation <sql_error_propagation>
52+
S3 Credential Provider Design <s3-credential-provider-design>
5253
```
5354

5455
```{toctree}
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# S3 Credential Provider SPI: Design Notes
21+
22+
This page captures why the `org.apache.comet.cloud.s3.CometS3CredentialProvider` SPI is shaped the way it is. The user-facing contract and operator setup live in the user guide page on S3 credential providers; this page is for maintainers and reviewers who want the design rationale.
23+
24+
## The gap the SPI fills
25+
26+
Comet's native scan paths (`object_store` for raw Parquet, `opendal` via `iceberg-rust` for Iceberg) bypass Spark's Hadoop S3A code path. That means credentials cannot flow through any of the contracts that vendors typically wire into for S3A:
27+
28+
- `org.apache.spark.deploy.security.cloud.CloudCredentialsProvider` yields a single JWT per service name. No path argument, no AWS credential.
29+
- Hadoop S3A custom signers hide path-aware logic inside `Signer.sign(request, credentials)`. The credential never leaves the signing pipeline, and the underlying secret is an HMAC key that is not present in the signed output, so running the signer against a synthesized request cannot recover it.
30+
- `AWSCredentialsProvider.getCredentials()` (AWS SDK v1) and `AwsCredentialsProvider.resolveCredentials()` (v2) are parameterless. They cannot vend per-path credentials.
31+
- Reflecting into vendor singletons would encode per-vendor class names and lifecycles in Comet and would silently break on vendor upgrades.
32+
33+
A Comet-specific SPI is the narrowest fit: a single Java method that takes a `CometS3CredentialContext` (today wrapping `bucket`, `path`, and access `mode`; new fields can be added without breaking vendors compiled against earlier versions) and returns `CometS3Credentials`.
34+
35+
## Why config-driven activation, not `META-INF/services`
36+
37+
An earlier iteration used `ServiceLoader` discovery. That was rejected because:
38+
39+
- Peer SPIs in the same space (Hadoop `AWSCredentialsProvider`, AWS SDK v2 `AwsCredentialsProvider`, Iceberg `AwsClientFactory`, S3A custom signers) are all class-name-in-config. Vendors are already familiar with that model.
40+
- ServiceLoader makes activation implicit on classpath presence. A vendor JAR drifting onto a cluster could silently change S3 auth behavior. The config key makes activation explicit.
41+
- The activation key (`fs.s3a.comet.credential.provider.class`, with per-bucket override) follows the same shape as `fs.s3a.bucket.<name>.aws.credentials.provider`, so operators do not learn a new pattern.
42+
43+
Activation is modeled on `parquet.crypto.factory.class` (Parquet Modular Encryption KMS, see Comet #2447): the user names a single vendor class and the vendor dispatches across multiple credential backends inside that class if they need to. This mirrors how Iceberg's `DecryptionPropertiesFactory` already behaves for Parquet keys.
44+
45+
## Why `(FQCN, dispatchKey, catalogProperties)` keying
46+
47+
Comet caches one provider instance per `(FQCN, dispatchKey, catalogProperties)` triple. The dispatch key is the Spark V2 catalog name on the Iceberg path and the bucket on the Parquet path.
48+
49+
- Two catalogs that share one provider class (typical in multi-tenant deployments) need isolated `initialize` maps because their `catalogProperties` differ. Without `dispatchKey`, the second `initialize` would either overwrite the first or be silently skipped.
50+
- The bucket as `dispatchKey` for Parquet gives vendors per-bucket isolation when the same provider is named under several `fs.s3a.bucket.<name>.comet.credential.provider.class` keys.
51+
- `catalogProperties` enters the key to handle multi-tenant JVMs (Spark Connect, Thrift Server, `SparkSession.newSession()`) where two sessions can configure the same provider class against the same `dispatchKey` but with different REST endpoints, OAuth tokens, or vendor keys. Without it the second session would silently use the first session's credentials.
52+
- Keying solely by FQCN would force vendors to encode multi-tenant routing in static state. The triple-key keeps each call site independent.
53+
54+
`ensureInitialized` returns a `long` handle that the native bridge stashes and replays on every per-request call. Routing per-request lookups by handle avoids re-sending the property bag across JNI on the hot path and unambiguously selects the right provider when the same `(FQCN, dispatchKey)` pair maps to multiple instances.
55+
56+
## Why fresh construction in `initialize`, not probing a JVM-wide static
57+
58+
A provider implementation might be tempted to probe an existing static populated elsewhere (e.g. by a Hadoop S3A signer's `registerStore` callback) and reuse the credential cache that the Hadoop path uses. That fails on Comet-only executors:
59+
60+
- The driver JVM hits `S3AFileSystem.initialize` during analysis (raw `s3a://` paths) or during Hadoop catalog manifest reads (Iceberg with Hadoop catalog), so the static is populated there.
61+
- The driver may not hit `S3AFileSystem` at all under Iceberg with REST catalog plus `S3FileIO`, because `S3FileIO` calls AWS SDK directly without going through the Hadoop layer. The static stays null.
62+
- Executors with Comet-only reads never instantiate `S3AFileSystem`. The data path is `object_store` (raw Parquet) or `opendal` via `iceberg-rust` (Iceberg native scan). Neither touches Hadoop S3A. The static stays null on every executor.
63+
64+
Constructing a fresh provider from `catalogProperties` plus `SparkEnv` is the only strategy that works across all four cases. The trade-off is that on the driver (and any JVM where Hadoop S3A is also active), two credential caches now exist for the same identity: one inside the Hadoop signer's provider, one inside the SPI implementation's. The vendor pays for this with a small number of extra AS round-trips on cold starts and TTL boundaries. A future optional optimization could probe the static first and reuse if non-null, falling back to fresh construction otherwise.
65+
66+
## Why no Comet-side cache
67+
68+
Comet's bridge does not maintain a TTL cache, schedule refresh, or broadcast catalog state. All of that is the vendor's responsibility:
69+
70+
- Iceberg vendors get `software.amazon.awssdk.utils.cache.CachedSupplier` for free inside `org.apache.iceberg.aws.s3.VendedCredentialsProvider`.
71+
- Custom-STS vendors write whatever cache fits their refresh model.
72+
- Driver-only state is distributed via `initialize`'s `catalogProperties` (Iceberg path) or read from Hadoop conf via `SparkEnv` (Parquet path). Both are plan-time snapshots: Comet does not re-execute the catalog or push fresh values to running scans. Vendors that need a refreshing bearer compose with Spark's `HadoopDelegationTokenProvider`, which mints and renews on the driver and propagates to executors via `UserGroupInformation`. The two SPIs are orthogonal: Spark covers bearer lifecycle, this SPI covers path-aware AWS credential minting.
73+
74+
A Comet-side cache would have to either expose a tuning knob (TTL, max size, eviction policy) and grow over time, or be hardcoded and surprise vendors whose policies disagree. The bridge intentionally has neither and forwards every call.
75+
76+
## Path-specific behavior
77+
78+
`object_store::CredentialProvider` and `reqsign_core::ProvideCredential` differ in what they consume:
79+
80+
| Concern | Parquet (`object_store`) | Iceberg (`opendal` via `reqsign-core`) |
81+
| ----------------------- | -------------------------------------------------- | ------------------------------------------------------------ |
82+
| Trait method | `get_credential() -> AwsCredential` | `provide_credential(...) -> Option<IcebergAwsCredential>` |
83+
| Returns expiry? | No (only key/secret/token) | Yes (`expires_in: Option<Timestamp>`) |
84+
| Comet-side TTL wrapper? | None. Bridge passed straight to `with_credentials` | None. `opendal` schedules the next refresh from `expires_in` |
85+
| When SPI is called | Every `get_credential()` call | When `expires_in` is exceeded |
86+
| Vendor returns 0 expiry | Field has no use | Bridge substitutes 5 minutes to bound staleness |
87+
88+
The 5-minute fallback is a safety net so a vendor that omits expiry cannot leave `opendal` caching a stale token indefinitely. It is intentionally not a configuration knob.
89+
90+
## Property-bag handling on the Iceberg path
91+
92+
The full unfiltered FileIO property bag crosses JNI as `catalog_properties`. The storage-prefix filter (`s3.`/`gcs.`/`adls.`/`client.`) is applied native-side in `iceberg_scan.rs::load_file_io` immediately before `FileIOBuilder.with_prop`. This means the bridge sees `credentials.uri`, OAuth tokens, and any vendor-custom keys with no parallel field on the operator and no driver-side broadcast. Vendors set their own keys on the catalog config and read them back inside `initialize(Map)`.
93+
94+
`IcebergScanExec` derives a redacting `Debug` so plan dumps and tracing do not leak the property bag.
95+
96+
## Returns or throws, not a fall-through value
97+
98+
The SPI returns a `CometS3Credentials` or throws. There is no sentinel "I do not know" return. Vendors that are only authoritative for some paths resolve the default AWS chain themselves for the rest and return the result. This matches the contract on every other AWS credential SPI in the JVM ecosystem (AWS SDK v1/v2, Hadoop S3A, Iceberg `VendedCredentialsProvider`).
99+
100+
## Lifecycle: `AutoCloseable` plus a JVM shutdown hook
101+
102+
`CometS3CredentialProvider` extends `AutoCloseable` with a default no-op `close()`. The dispatcher installs a JVM shutdown hook that iterates every cached instance and calls `close()`, swallowing per-provider exceptions so a slow or buggy vendor cannot block other providers from cleaning up. Stateless providers ignore this entirely; vendors that hold long-lived resources (HTTP clients, scheduled-refresh executors, STS connection pools) override `close()` to release them. Shutdown hooks are best-effort, so a `SIGKILL` or abrupt JVM termination skips them; vendors must not rely on `close()` for correctness, only for resource hygiene.
103+
104+
## Iceberg path: error message fidelity caveat
105+
106+
When the bridge is wired into `iceberg-rust`, the outer `reqsign-core::ProvideCredentialChain` currently swallows thrown exceptions into "no credential" before the request reaches `opendal`. The credential is still not issued and the request still fails, but the message is degraded to an opaque anonymous-request failure. No Comet change fixes this; it is resolved upstream when `iceberg-rust` stops wrapping custom loaders in its outer chain or moves its S3 backend to `object_store`.

docs/source/user-guide/latest/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ to read more.
7979
:hidden:
8080

8181
Iceberg Guide <iceberg>
82+
S3 Credential Providers <s3-credential-providers>
8283
Kubernetes Guide <kubernetes>
8384

8485
.. toctree::

0 commit comments

Comments
 (0)