[core] Introduce file resource management#8179
Conversation
|
It's a very surprising pr. Is there any relevant pip to provide more background information? |
Thank you for your suggestion. I will submit the PIP as soon as possible. |
|
First PR, I think you can focus on Resource introducing. |
57a260d to
81f5235
Compare
Sure, let me revise it. |
81f5235 to
25c4603
Compare
|
|
||
| private final Identifier identifier; | ||
| @Nullable private final String comment; | ||
| private final String uri; |
There was a problem hiding this comment.
How to use this URI? How to get rest token for this file?
There was a problem hiding this comment.
The current design delegates URI handling to the file systems integrated with the engine. The metastore service is only responsible for permission management of the resource entity.
For example, in Daft, we assign the URI to the resources field of the PyFileResourceFunction instance.
def _get_function(self, ident: Identifier) -> Function:
...
paimon_daft_func: FunctionDefinition = self._inner.get_function(str(ident)).definitions()["daft"]
...
# file_resources may be a list attribute or a callable method
raw_resources = paimon_daft_func.file_resources
resources = raw_resources() if callable(raw_resources) else raw_resources
return PyFileResourceFunction(
identifier=ident,
module_name=paimon_daft_func.class_name,
binding_name=paimon_daft_func.function_name,
resources=[item.uri for item in resources],
)During execution, the engine resolves it by fetching the resources through the corresponding file system, and file permissions are also handled by the file system.
async def run_plan(
self,
plan: LocalPhysicalPlan,
exec_cfg: PyDaftExecutionConfig,
context: dict[str, str] | None,
added_resources: dict[str, int] | None = None,
**inputs: (
Input | list[ray.ObjectRef]
), # PyMicroPartitions are separated from Inputs because they are Ray ObjectRefs, which will be resolved by Ray.
) -> AsyncGenerator[MicroPartition | FlightPartitions | SwordfishTaskMetadata, None]:
"""Run a plan on swordfish and yield partitions."""
if added_resources:
file_resource_manager.resolve(added_resources)More straightforwardly, we could also use Paimon FileIO for handling this. In fact, the engine’s behavior is similar to this.
There was a problem hiding this comment.
For our REST Catalog, the file system should be managed by Catalog for permissions, and here Resource feels that FileIO also needs to be exposed.
There was a problem hiding this comment.
Your suggestion is absolutely right. I moved Resource to the paimon-core module and implemented the toBytes and newInputStream methods for it.
However, this introduces a small side effect: if Resource needs to be used in Function in the future, then Function would also need to be refactored into the paimon-core module.
Purpose
Introduce resource management capabilities to the REST Catalog, providing a unified way to manage file resources (FILE, JAR, PY, ARCHIVE) associated with databases. This lays the foundation for upcoming ML model and function features, where users will need to reference and manage external file resources such as model artifacts, UDF JARs, and Python scripts.
Changes
Resource Model (
paimon-api)Resourceinterface andAbstractResourcebase class — define the resource abstraction with properties like name, type, description, URI, and custom propertiesFileResource,JarResource,PyResource,ArchiveResource— concrete resource types for FILE/JAR/PY/ARCHIVEResourceTypeenum — four supported resource typesResourceChange— change operations for altering resources (setProperty, removeProperty, setDescription, setUri)ResourceDeserializer— Jackson deserializer for polymorphic resource deserializationREST API (
paimon-api)ResourcePaths— URL path builders for resource endpoints (/resources,/resource-details,/resources/{name})RESTApi— 8 new resource management API methods:listResources,listResourcesPaged,listResourceDetailsPaged,getResource,createResource,dropResource,alterResource,listResourcesPagedGloballyCreateResourceRequest,AlterResourceRequest,GetResourceResponse,ListResourcesResponse,ListResourceDetailsResponse,ListResourcesGloballyResponseCatalog Interface (
paimon-core)Catalog— 8 new interface methods for resource CRUD +ResourceAlreadyExistExceptionandResourceNotExistExceptioninner exception classesAbstractCatalog— defaultUnsupportedOperationExceptionimplementationsDelegateCatalog— delegation implementationsRESTCatalog— full REST-backed implementationsTests (
paimon-core)RESTApiJsonTest— JSON serialization/deserialization tests for resource request/response classesRESTCatalogTest— integration tests for resource CRUD operationsRESTCatalogServer— mock REST server with resource management route handlersMockRESTMessage— test helper methods for constructing resource test dataAPI Summary
/v1/{prefix}/databases/{db}/resources/v1/{prefix}/databases/{db}/resource-details/v1/{prefix}/databases/{db}/resources/{name}/v1/{prefix}/databases/{db}/resources/v1/{prefix}/databases/{db}/resources/{name}/v1/{prefix}/databases/{db}/resources/{name}/v1/{prefix}/resourcesTests