-
Notifications
You must be signed in to change notification settings - Fork 469
Add support for write.metadata.path
#1642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
39928c6
f4c075c
6f88749
1c9f177
76c0480
b36df41
8b309d2
577bbdd
5b0b85b
f75642b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |
| # under the License. | ||
| import importlib | ||
| import logging | ||
| import uuid | ||
| from abc import ABC, abstractmethod | ||
| from typing import Optional | ||
|
|
||
|
|
@@ -29,7 +30,7 @@ | |
|
|
||
|
|
||
| class LocationProvider(ABC): | ||
| """A base class for location providers, that provide data file locations for a table's write tasks. | ||
| """A base class for location providers, that provide file locations for a table's write tasks. | ||
|
|
||
| Args: | ||
| table_location (str): The table's base storage location. | ||
|
|
@@ -40,6 +41,7 @@ class LocationProvider(ABC): | |
| table_properties: Properties | ||
|
|
||
| data_path: str | ||
| metadata_path: str | ||
|
|
||
| def __init__(self, table_location: str, table_properties: Properties): | ||
| self.table_location = table_location | ||
|
|
@@ -52,6 +54,11 @@ def __init__(self, table_location: str, table_properties: Properties): | |
| else: | ||
| self.data_path = f"{self.table_location.rstrip('/')}/data" | ||
|
|
||
| if path := table_properties.get(TableProperties.WRITE_METADATA_PATH): | ||
| self.metadata_path = path.rstrip("/") | ||
| else: | ||
| self.metadata_path = f"{self.table_location.rstrip('/')}/metadata" | ||
|
|
||
| @abstractmethod | ||
| def new_data_location(self, data_file_name: str, partition_key: Optional[PartitionKey] = None) -> str: | ||
| """Return a fully-qualified data file location for the given filename. | ||
|
|
@@ -64,6 +71,35 @@ def new_data_location(self, data_file_name: str, partition_key: Optional[Partiti | |
| str: A fully-qualified location URI for the data file. | ||
| """ | ||
|
|
||
| def new_table_metadata_file_location(self, new_version: int = 0) -> str: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This generally looks fine to me (and configuring metadata locations on
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for 1, we haven't released a version with LocationProvider. so any references going forward should also have the new methods. We probably want to add a note in the location provider docs about the new metadata location feature
does "under a table" mean path relative to the table's base location? The SimpleLocation Provider uses absolute path. The ObjectStoreLocationProvider is special in that it injects some hash after the base location. But i think a custom metadata location provider can do the same.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah exactly, I was just curious about user demand for a custom metadata location provider. It exceeds the capability of Java custom location providers and posing the question of whether it makes sense for other implementations. To clarify, no objections from my end. Updating the docs sounds good!
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @smaheshwar-pltr Regarding point 1, I had the same question while working on this. Centralizing locations/paths would definitely make future changes easier, especially with the ongoing discussions around relative path work. I found a related issue in the main repo here: apache/iceberg#6809. Which looks like it didn't get any traction but still holds some value.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh i see the semantic difference.
we can do the same for we can then mention in the LocationProvider documentation that Does this make sense? @smaheshwar-pltr is this what you were pointing out?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is interesting because for data file path, we can use We're trying recreate the same behavior for metadata path.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, brushing up on my python...
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this - I wasn't aware of that discussion. Quoting from it, "Object storage mode to not just data files but also metadata files" is what I was getting at here. It's interesting |
||
| """Return a fully-qualified metadata file location for a new table version. | ||
|
|
||
| Args: | ||
| new_version (int): Version number of the metadata file. | ||
|
|
||
| Returns: | ||
| str: fully-qualified URI for the new table metadata file. | ||
|
|
||
| Raises: | ||
| ValueError: If the version is negative. | ||
| """ | ||
| if new_version < 0: | ||
| raise ValueError(f"Table metadata version: `{new_version}` must be a non-negative integer") | ||
|
|
||
| file_name = f"{new_version:05d}-{uuid.uuid4()}.metadata.json" | ||
| return self.new_metadata_location(file_name) | ||
|
|
||
| def new_metadata_location(self, metadata_file_name: str) -> str: | ||
| """Return a fully-qualified metadata file location for the given filename. | ||
|
|
||
| Args: | ||
| metadata_file_name (str): Name of the metadata file. | ||
|
|
||
| Returns: | ||
| str: A fully-qualified location URI for the metadata file. | ||
| """ | ||
| return f"{self.metadata_path}/{metadata_file_name}" | ||
|
|
||
|
|
||
| class SimpleLocationProvider(LocationProvider): | ||
| def __init__(self, table_location: str, table_properties: Properties): | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.