Skip to content

feat: credential vending for REST catalog#6

Open
smaheshwar-pltr wants to merge 2 commits into
mainfrom
credential-vending-tests
Open

feat: credential vending for REST catalog#6
smaheshwar-pltr wants to merge 2 commits into
mainfrom
credential-vending-tests

Conversation

@smaheshwar-pltr
Copy link
Copy Markdown
Owner

@smaheshwar-pltr smaheshwar-pltr commented Apr 17, 2026

Issue

Closes #10

iceberg-cpp's REST catalog has no support for per-table credential vending. When a REST catalog server returns vended credentials (temporary S3 access keys) in the LoadTableResponse, iceberg-cpp ignores them and uses the catalog-level FileIO for all tables.

Consumers using iceberg-cpp to read tables from a credential-vending REST catalog get authentication failures when trying to read data files:

HTTP 403 Forbidden
AccessDenied: Access Denied
Authentication Failure - this is usually caused by invalid or missing credentials.
* No credentials are provided.

The catalog discovery and table metadata loading succeed (those use the catalog-level credentials), but reading the actual data files fails because the per-table S3 credentials from the REST catalog response are never extracted or applied.

PR

Implement per-table credential vending in RestCatalog, following the pattern from iceberg-java's RESTSessionCatalog and PyIceberg's REST catalog.

Credential vending pipeline

  • RestCatalog sends X-Iceberg-Access-Delegation: vended-credentials header on all table-returning operations (LoadTable, CreateTable, RegisterTable, StageCreateTable)
  • ResolveTableFileIO merges catalog properties < table config < storage credentials (longest prefix match), then creates a per-table FileIO via FileIORegistry
  • Vended credentials are accessible to engines via FileIO::properties(), matching Python's table.io.properties pattern

API design: FileIO::properties()

We initially exposed io_properties() on the Table class, but after reviewing how engines actually consume vended credentials across the ecosystem:

How each implementation surfaces credentials to engines:

  • Java: FileIO.properties() — Trino's IcebergRestCatalogFileSystemFactory calls baseTable.io().properties() directly to extract S3/GCS/Azure credentials and inject them into its own filesystem layer. It does NOT use the SupportsStorageCredentials interface.
  • Python: table.io.properties — PyIceberg's DataFusion integration passes self.io.properties to configure its native filesystem
  • Rust: table.file_io().config().props() — credentials flow through FileIO's config

No implementation exposes credentials on the Table object. FileIO::properties() is the standard surface. So we:

  • Added properties() accessor to the FileIO base class
  • FileIORegistry::Load() automatically populates FileIO::properties_ with the merged config
  • Removed io_properties from Table/StagedTable entirely
  • Removed the ResolvedTableIO struct (no longer needed since properties live on FileIO)

Integration tests

The REST fixture is configured with CATALOG_INCLUDE__CREDENTIALS=true and dummy S3 credentials. The fixture's RESTServerCatalogAdapter injects these into LoadTableResponse.config when credential vending is enabled. Integration tests verify that vended credentials flow through ResolveTableFileIO and are accessible via table->io()->properties() on all table-returning operations.

A full end-to-end test (reading from MinIO with vended credentials via Arrow S3 FileIO) would require ICEBERG_S3=ON in CI — left as a follow-up.

Known limitations

  • No credential refresh (Java has VendedCredentialsProvider with expiry tracking)
  • No remote-signing access delegation mode (only vended-credentials)
  • Storage credential longest-prefix match uses metadata_location (like Python); Java defers matching to the FileIO per file access

@smaheshwar-pltr smaheshwar-pltr force-pushed the credential-vending-tests branch from 7ff854f to 317cb43 Compare April 17, 2026 00:11
@smaheshwar-pltr smaheshwar-pltr changed the title test: add unit tests for credential vending feat: S3 credential vending for REST catalog Apr 17, 2026
@smaheshwar-pltr smaheshwar-pltr changed the base branch from s3-credential-vending to main April 17, 2026 00:16
@smaheshwar-pltr smaheshwar-pltr changed the title feat: S3 credential vending for REST catalog feat: credential vending for REST catalog Apr 17, 2026
@smaheshwar-pltr smaheshwar-pltr force-pushed the credential-vending-tests branch 7 times, most recently from f919c85 to a218f7e Compare April 17, 2026 05:25
- CATALOG_WAREHOUSE=file:///tmp/iceberg_warehouse
- CATALOG_INCLUDE__CREDENTIALS=true
- CATALOG_S3_ACCESS__KEY__ID=dummy-access-key
- CATALOG_S3_SECRET__ACCESS__KEY=dummy-secret-key
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design note: credential vending in the test fixture

We enable include-credentials with dummy S3 keys so the fixture's RESTServerCatalogAdapter injects them into LoadTableResponse.config. This lets us verify the full vending pipeline (header → server response → ResolveTableFileIOFileIO::properties()) without needing MinIO.

Follow-up: For a true end-to-end test (reading data from MinIO using only vended credentials), we'd need to add MinIO here + build with ICEBERG_S3=ON in CI so the Arrow S3 FileIO can actually connect. Left as a follow-up since it requires CI config changes.


// Merge order: catalog props < table config < storage credentials (highest priority)
auto merged = config_.configs();
for (const auto& [k, v] : result.config) {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge ordering: catalog props < table config < storage credentials

This matches all three reference implementations:

  • Java (RESTSessionCatalog.java): merges catalog properties → table config → storage credentials
  • Python (pyiceberg/catalog/rest/__init__.py): {**table_response.metadata.properties, **table_response.config, **credential_config}
  • Rust (catalog/rest/src/catalog.rs): merges response.config into self.user_config.props (though Rust currently discards storage_credentials)

best = &cred;
}
}
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longest prefix match on metadata_location

This follows Python's approach (_resolve_storage_credentials() in pyiceberg/catalog/rest/__init__.py), which matches against metadata_location.

Java takes a different approach: it passes ALL credentials to the FileIO via SupportsStorageCredentials.setCredentials(), and the FileIO itself (e.g., S3FileIO) does per-file longest-prefix matching at read time. This is more correct for tables whose data files span multiple prefixes, but more complex.

We chose the Python pattern for simplicity. The spec's StorageCredential schema defines prefix + config but doesn't mandate the matching algorithm.

Comment thread src/iceberg/file_io.h
/// Engines that need to configure their own storage access (e.g., for credential
/// vending) can read these properties to obtain the resolved credentials.
const std::unordered_map<std::string, std::string>& properties() const {
return properties_;
Copy link
Copy Markdown
Owner Author

@smaheshwar-pltr smaheshwar-pltr Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why FileIO::properties() instead of Table::io_properties()?

We initially had io_properties() on the Table class, but after reviewing all three reference implementations:

  • Java: engines use table.io() directly; credentials live on FileIO via SupportsStorageCredentials
  • Python: engines read table.io.properties (a dict on FileIO) — e.g., the DataFusion integration explicitly passes self.io.properties
  • Rust: credentials flow through table.file_io().config().props()

No implementation exposes credentials on the Table object itself. FileIO::properties() matches the Python pattern and is the standard surface for engines that need to configure their own storage access.

}
return factory(properties);
ICEBERG_ASSIGN_OR_RAISE(auto io, factory(properties));
io->properties_ = properties;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileIORegistry::Load() automatically populates properties_ on every FileIO it creates. This means any FileIO from the registry (including per-table FileIOs from ResolveTableFileIO) will have the merged properties available via properties() without each factory needing to set them manually.

Comment thread src/iceberg/file_io.h
/// \brief Returns the configuration properties used to initialize this FileIO.
///
/// Engines that need to configure their own storage access (e.g., for credential
/// vending) can read these properties to obtain the resolved credentials.
Copy link
Copy Markdown
Owner Author

@smaheshwar-pltr smaheshwar-pltr Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real-world engine validation: Trino

Trino (a major Java query engine) validates this exact pattern. Its IcebergRestCatalogFileSystemFactory extracts vended credentials by calling baseTable.io().properties() directly — it does NOT use Java's SupportsStorageCredentials interface.

The flow: BaseTable.io().properties()IcebergTableCredentialsIcebergRestCatalogFileSystemFactory.create(identity, fileIoProperties) → extracts S3/GCS/Azure keys → injects them into Trino's ConnectorIdentity.extraCredentials → Trino's filesystem factories use the vended creds.

See: trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/IcebergRestCatalogFileSystemFactory.java

@smaheshwar-pltr smaheshwar-pltr force-pushed the credential-vending-tests branch 10 times, most recently from 048ad43 to bcbdc9c Compare April 17, 2026 07:26
auto merged =
MergeTableProperties(catalog_props, result.config, cred ? cred->config : kEmpty);

// Detect FileIO type: explicit io-impl > warehouse scheme > metadata_location scheme.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches pyiceberg?

Comment on lines +75 to +77
// No io-impl or warehouse configured. Fall back to a local FileIO as a
// default — the per-table ResolveTableFileIO will resolve the real FileIO
// when a table is loaded (using the table's location to detect S3, etc.).
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev:

Suggested change
// No io-impl or warehouse configured. Fall back to a local FileIO as a
// default — the per-table ResolveTableFileIO will resolve the real FileIO
// when a table is loaded (using the table's location to detect S3, etc.).
// No io-impl or warehouse configured. Fall back to a local FileIO as a
// default — enabling per-table ResolveTableFileIO (vending).

Comment on lines +139 to +141
// TODO: Python also merges result.metadata->properties into the chain here
// (metadata props < catalog < table config < credentials). Java does not.
// Consider adding this for parity with Python if needed.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev: remove python comment and just mention our chain
then, PR comment the python thing as a TODO to flag please

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — removed the Python comment from code.

TODO for follow-up: Python also merges result.metadata->properties into the chain (metadata props < catalog < table config < credentials). Java and Rust do not. We match Java/Rust. Worth revisiting if we find a case where table-metadata-level io-impl is needed.

#include "iceberg/result.h"

/// \file iceberg/catalog/rest/rest_file_io_internal.h
/// Internal helpers for per-table FileIO resolution in the REST catalog.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev: this file isn't written for just this it happens to be that. internal helpers for REST file IO construction or smth pls

///
/// Merges catalog properties, table config, and the best-matching storage
/// credential (longest prefix match on metadata_location), then creates a
/// per-table FileIO via FileIORegistry. Falls back to \p catalog_io when
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev: spaces between Fa and \p intentional? fine if so

ICEBERG_REST_EXPORT Result<std::shared_ptr<FileIO>> ResolveTableFileIO(
const std::shared_ptr<FileIO>& catalog_io,
const std::unordered_map<std::string, std::string>& catalog_props,
const std::string& warehouse, const LoadTableResult& result);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev: should warehouse be stringview?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine if not

Comment thread src/iceberg/file_io.h
}

private:
friend class FileIORegistry;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev: PR comment on this please. friend is controversial

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friend class FileIORegistry is intentional: properties_ should only be populated by the registry (which creates FileIO instances with merged config). Making it protected would let any subclass set arbitrary properties, breaking the invariant that properties reflect the config used to create the FileIO. The friend constrains this to the single correct code path.

If a second factory path emerges later (e.g., custom FileIO outside the registry), we can add a protected setter at that point.

namespace iceberg {

// ---------------------------------------------------------------------------
// DetectBuiltinFileIO tests
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev:

wait... is DetectBuiltinFileIO basically just a helper function too that's public I guess?

I don't see it in src/iceberg/catalog/rest/rest_file_io_internal.h so I guess it's on the main class? has it always been public?

then, let's maybe ditch the whole internal model and new file and just put resolve in that class too. I was concerned about visibility, but if there's precedence then fine. if we do that, then after comment on this PR please saying that we did it for this reason (otherwise you ask "why is this not internal"?)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ofc i'm talking about ResolveTableFileIO function

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — DetectBuiltinFileIO, BuiltinFileIOName, and MakeCatalogFileIO are all already public in rest_file_io.h. Moved ResolveTableFileIO there too for consistency. Deleted rest_file_io_internal.h.

.tv_sec = 1,
.tv_usec = 0,
struct timeval timeout {
.tv_sec = 1, .tv_usec = 0,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev: this change is unrelated to the PR, did linter really flag it? wouldn't have thought so

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if linter let's us, let's try and not change things unrelated to the PR please

Comment on lines +489 to +494
//
// The REST fixture is configured with CATALOG_INCLUDE__CREDENTIALS=true and dummy
// S3 credentials (CATALOG_S3_ACCESS__KEY__ID / CATALOG_S3_SECRET__ACCESS__KEY).
// The fixture's RESTServerCatalogAdapter injects these into LoadTableResponse.config.
// These tests verify that the vended credentials flow through ResolveTableFileIO
// and are accessible on the per-table FileIO via FileIO::properties().
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev: this comment doesn't add much, remove

Implement per-table credential vending in RestCatalog, following the pattern
from iceberg-java's RESTSessionCatalog and PyIceberg's REST catalog.

- Set X-Iceberg-Access-Delegation header at session level on HttpClient
- ResolveTableFileIO merges catalog properties with per-table config and
  storage credentials (longest-prefix match), creating per-table FileIO
  via FileIORegistry when credentials are vended
- StorageCredential struct with JSON serde and prefix-based resolution
- Applied consistently to LoadTable, CreateTable, RegisterTable, StageCreateTable
- Table exposes io_properties() for consumers that need vended configuration
- Table::Refresh() updates io_properties_ to prevent stale credentials
- Comprehensive unit tests (property merging, storage credential resolution,
  io_properties lifecycle) and REST integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@smaheshwar-pltr smaheshwar-pltr force-pushed the credential-vending-tests branch from bcbdc9c to 620041b Compare April 17, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No per-table credential vending in REST catalog

1 participant