Skip to content

Snowflake managed Open Catalog and Azure ADLS2 #1606

@martinseifertprojuventute

Description

Apache Iceberg version

0.8.1 (latest release)

Please describe the bug 🐞

I have an external volume in Snowflake pointing to an Azure ADLS2:

create or replace external volume ev_iceberg_tables
storage_locations =
    ((
        name = 'iceberg_snowflake_managed'
        storage_provider = 'AZURE'
        storage_base_url = 'azure://[storage_account].blob.core.windows.net/catalog/snowflake_managed/'
        azure_tenant_id = '[tenant]'
    ))
;

So the container is called “catalog” and the Open Catalog I want to point to is called “snowflake_managed”. Then this is my catalog integration:

create or replace catalog integration i_iceberg_catalog
catalog_source = polaris
table_format = iceberg
catalog_namespace= 'default'
rest_config = (
    catalog_uri = 'https://[locator].snowflakecomputing.com/polaris/api/catalog'
    warehouse = 'snowflake_managed'
)
rest_authentication = (
    type = oauth
    oauth_client_id = '[client_id]'
    oauth_client_secret = '[client_secret]'
    oauth_allowed_scopes = ( 'PRINCIPAL_ROLE:ALL' )
)
enabled = true
;

With this I create a table in the catalog:

create or replace iceberg table iceberg.jira.roadmap (
    id int
    , [...]
)
external_volume = 'ev_iceberg_tables'
catalog = 'SNOWFLAKE'
base_location = 'jira/roadmap/'
catalog_sync = 'i_iceberg_catalog'
;

This creates the table in Open Catalog and I can populate the table just fine. But when I try to read from the table using pyIceberg or polars, this error is returned:

ValueError: No registered filesystem for scheme: wasbs

So I checked the table's metadata:

from pyiceberg.catalog import load_catalog
from pyiceberg.io.fsspec import FsspecFileIO

catalog = load_catalog(
    **{
        "type": "rest",
        "header.X-Iceberg-Access-Delegation": "vended-credentials",
        "uri": f"https://[locator].snowflakecomputing.com/polaris/api/catalog",
        "credential": f"[open_catalog_client_id]:[open_catalog_client_secret]",
        "scope": "PRINCIPAL_ROLE:pyIceberg",
        "warehouse": "snowflake_managed",
        "token-refresh-enabled": "true",
        "py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO",
    }
)

table = catalog.load_table('ICEBERG.JIRA.ROADMAP')

table.metadata

TableMetadataV2(location=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap’, table_uuid=UUID(‘35b[…]’), last_updated_ms=1738578925967, last_column_id=19, schemas=[Schema(NestedField(field_id=1, name=‘ID’, […], schema_id=0, identifier_field_ids=)], current_schema_id=0, partition_specs=[PartitionSpec(spec_id=0)], default_spec_id=0, last_partition_id=999, properties={‘format-version’: ‘2’}, current_snapshot_id=78408874928435018, snapshots=[Snapshot(snapshot_id=3032990014606473543, parent_snapshot_id=None, sequence_number=1, timestamp_ms=1738578919582, manifest_list=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578919582000000-5714c4a4-11e8-4c0a-b89b-cab4ea909f97.avro’, summary=None, schema_id=0), Snapshot(snapshot_id=78408874928435018, parent_snapshot_id=None, sequence_number=2, timestamp_ms=1738578925967, manifest_list=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578925967000000-fbf8b14b-e0ba-4bf5-bfde-5c6cf88251ad.avro’, summary=Summary(Operation.APPEND, **{‘manifests-kept’: ‘0’, ‘added-files-size’: ‘112128’, ‘total-records’: ‘708’, ‘manifests-created’: ‘1’, ‘total-data-files’: ‘8’, ‘manifests-replaced’: ‘0’, ‘added-data-files’: ‘8’, ‘added-records’: ‘708’, ‘total-files-size’: ‘112128’}), schema_id=0)], snapshot_log=[SnapshotLogEntry(snapshot_id=3032990014606473543, timestamp_ms=1738578919582), SnapshotLogEntry(snapshot_id=78408874928435018, timestamp_ms=1738578925967)], metadata_log=, sort_orders=[SortOrder(order_id=0)], default_sort_order_id=0, refs={‘main’: SnapshotRef(snapshot_id=78408874928435018, snapshot_ref_type=SnapshotRefType.BRANCH, min_snapshots_to_keep=None, max_snapshot_age_ms=None, max_ref_age_ms=None)}, format_version=2, last_sequence_number=2)

Apparently the wasbs scheme was written into the metadata by either Open Catalog or Snowflake, even though the file is actually located in abfss:

table.metadata_location

abfss://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/[…].metadata.json

There obviously is a discrepancy between table.metadata and table.metadata_location - and I can't table.scan().to_arrow() the table as a result:

File ~\AppData\Roaming\Python\Python313\site-packages\pyiceberg\io\pyarrow.py:1354, in _fs_from_file_path(file_path, io)
...
408 if scheme not in self._scheme_to_fs:
--> 409 raise ValueError(f"No registered filesystem for scheme: {scheme}")
410 return self._scheme_to_fsscheme

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions