Skip to content

[#11103] feat(core): Add the hierarchy convention layer#11074

Open
roryqi wants to merge 44 commits into
apache:mainfrom
qqqttt123:split/pr-nested-ns-backend
Open

[#11103] feat(core): Add the hierarchy convention layer#11074
roryqi wants to merge 44 commits into
apache:mainfrom
qqqttt123:split/pr-nested-ns-backend

Conversation

@roryqi
Copy link
Copy Markdown
Contributor

@roryqi roryqi commented May 13, 2026

What changes were proposed in this pull request?

Add abstract class BasePOStorageOps to let all the PO logic to one class

Add SchemaPOStorageOps,TablePOStorageOps,FunctionPOStorageOps,ViewPOStorageOps.

Add Hierachical convention class

Why are the changes needed?

Fix: #11103

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added tests.

- RelationalSchemaNamingBridge and JDBCBackend entity/relation conversions
- SupportsRelationOperations.batchInsertRelations; RelationalEntityStore cache invalidation
- RelationalBackend; backend-focused tests

Co-authored-by: Cursor <cursoragent@cursor.com>
@roryqi roryqi marked this pull request as draft May 13, 2026 08:14
roryqi and others added 7 commits May 13, 2026 08:37
- Replace Preconditions.checkNotNull with Objects.requireNonNull in
  JDBCBackend.batchInsertRelations for consistency with the interface
- Add @SuppressWarnings explanatory comments in unchecked-cast methods
- Document FILESET/TOPIC/MODEL/MODEL_VERSION passthrough in default
  switch branches of nameIdentifierForStorage/nameIdentifierForApi
- Expand Javadoc on embeddedNamespaceForStorage/Api to warn about the
  index-2 layout assumption
- Guard statisticEntityForApi/Storage with Preconditions.checkArgument
  to reject non-TableStatisticEntity subtypes at the call site
- Add tests: roleEntityForStorage/Api round-trips, batchInsertRelations
  cache invalidation in RelationalEntityStore, and non-OWNER_REL
  rejection in JDBCBackend.batchInsertRelations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract mapRoleSecurableObjects helper to eliminate the duplicate
  builder block shared by roleEntityForStorage and roleEntityForApi
- Extract statisticEntityWithNamespace helper to consolidate the type
  guard and builder shared by statisticEntityForApi and
  statisticEntityForStorage
- Replace repeated Lists.newArrayList(Privileges.UseSchema.allow()) and
  SecurableObjects.ofCatalog("catalog", ...) constructions in tests with
  private static final constants USE_SCHEMA_PRIVS and CATALOG_OBJ
- Remove inline WHAT-comments that restate what the assertions already
  express

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…yStore

batchDelete and batchPut were the only write paths that bypassed the
cache — callers that relied on them would silently observe stale reads.
Apply the same pattern as their single-item counterparts:
- batchDelete: invalidate each (ident, entityType) pair after backend
- batchPut: put each entity into cache after backend

Add unit tests covering the backend-then-cache ordering for both methods.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Identifier-based approach

Use the existing nameIdentifierForStorage/Api infrastructure via a sentinel
metalake prefix instead of bespoke string-splitting logic. Removes the
package-private convertMetadataObjectDottedFullName and convertSchemaSegmentAt
helpers; updates securableObjectForStorage/Api and
genericEntityMetadataFullNameForApi accordingly. Tests are replaced with
equivalent coverage through the public securableObject API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dataObject entity types

Entity types without a MetadataObject.Type equivalent (e.g. TABLE_STATISTIC,
MODEL_VERSION) must not be schema-converted. Re-add the MetadataObject.Type.valueOf
guard that was dropped in the previous refactor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roryqi roryqi self-assigned this May 13, 2026
@roryqi roryqi requested a review from jerryshao May 13, 2026 12:27
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Code Coverage Report

Overall Project 66.17% +0.17% 🟢
Files changed 76.64% 🟢

Module Coverage
aliyun 1.72% 🔴
api 46.83% 🟢
authorization-common 85.96% 🟢
aws 1.08% 🔴
azure 2.47% 🔴
catalog-common 10.2% 🔴
catalog-fileset 80.02% 🟢
catalog-glue 83.41% 🟢
catalog-hive 81.83% 🟢
catalog-jdbc-clickhouse 80.02% 🟢
catalog-jdbc-common 44.46% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 44.89% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 85.87% 🟢
catalog-lakehouse-paimon 77.25% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.92% 🟢
common 50.0% 🟢
core 82.24% -0.32% 🟢
filesystem-hadoop3 76.97% 🟢
flink 0.0% 🔴
flink-common 43.17% 🟢
flink-runtime 0.0% 🔴
gcp 14.12% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 46.83% 🟢
iceberg-common 55.46% 🟢
iceberg-rest-server 69.88% 🟢
idp-basic 88.82% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 20.9% 🔴
lance-rest-server 62.78% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.75% 🟢
server-common 71.28% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 35.14% 🔴
Files
Module File Coverage
core FunctionMetaService.java 100.0% 🟢
POStorageReadRouting.java 100.0% 🟢
TableMetaService.java 100.0% 🟢
ViewMetaService.java 100.0% 🟢
SchemaMetaService.java 98.04% 🟢
HierarchicalConventionPOStorageOps.java 97.22% 🟢
JDBCBackend.java 78.66% 🟢
MetadataObjectService.java 72.34% 🟢
SupportsRelationOperations.java 61.54% 🟢
SchemaPOStorageOps.java 58.06% 🔴
RelationalEntityStore.java 54.88% 🔴
FunctionPOStorageOps.java 38.46% 🔴
TablePOStorageOps.java 33.33% 🔴
ViewPOStorageOps.java 26.32% 🔴
BasePOStorageOps.java 5.0% 🔴

roryqi and others added 6 commits May 14, 2026 16:52
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@roryqi roryqi removed the request for review from jerryshao May 14, 2026 10:39
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@roryqi roryqi changed the title feat(core): JDBC backend and entity store for nested namespace naming feat(core): Add the hierarchy convention layer May 14, 2026
@roryqi roryqi changed the title feat(core): Add the hierarchy convention layer [#11103] feat(core): Add the hierarchy convention layer May 14, 2026
@roryqi roryqi marked this pull request as ready for review May 14, 2026 18:08
roryqi and others added 3 commits May 15, 2026 01:48
Throw NoSuchEntityException with the correct entity type by checking
catalogId/schemaId on the joined row, so callers see "catalog not
found" or "schema not found" instead of a misleading parent error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Use the same pattern as Table: when the join returns no row or a
placeholder row, throw NoSuchEntityException with the correct entity
type by inspecting catalogId/schemaId on the result. Also switch the
function-by-full-name select from INNER JOIN to LEFT JOIN on the
version table, matching the list query and the Table/View shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Keep the INNER JOIN on the version table for the function-by-full-name
select. Revert the Java getPOByFullName checks accordingly so the
caller continues to handle the null PO case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@roryqi
Copy link
Copy Markdown
Contributor Author

roryqi commented May 15, 2026

@jerryshao @yuqi1129 Could u help me review this pull request?

@roryqi roryqi requested review from jerryshao and yuqi1129 May 15, 2026 04:51
@jerryshao
Copy link
Copy Markdown
Contributor

Is it ready to review?

@roryqi
Copy link
Copy Markdown
Contributor Author

roryqi commented May 15, 2026

Is it ready to review?

Yes, it is ready to review.

@jerryshao
Copy link
Copy Markdown
Contributor

The overall direction here is good — extracting per-entity PO logic into dedicated classes and using a decorator for hierarchical name translation is a clean separation. A few observations on the design that might be worth considering.

The core tension

BasePOStorageOps.getPO/listPOs currently combines two orthogonal concerns:

  1. Name translation (hierarchical logical ↔ physical) — handled well by HierarchicalConventionPOStorageOps
  2. Query routing (cache-enabled → ID-based; no cache → full-name JOIN) — currently embedded via GravitinoEnv.getInstance().cacheEnabled() in BasePOStorageOps

Mixing these in the base class causes the Capability enum + runtime UnsupportedOperationException pattern, the GravitinoEnv static dependency leaking into a storage layer class, and the issue in MetadataObjectService.TYPE_TO_STORAGE_OPS_MAP where raw *POStorageOps instances bypass the naming convention.

An alternative: expose two explicit SQL methods, route in the service

// BasePOStorageOps: pure SQL delegation, no routing logic
abstract class BasePOStorageOps<PO, Mapper> {
    // ID-based (cache-enabled path)
    public PO getPOByParentId(Mapper mapper, Long parentId, String name) { ... }
    public List<PO> listPOsByParentId(Mapper mapper, Long parentId) { ... }

    // Name-based (full join, no-cache path)
    public PO getPOByFullName(Mapper mapper, NameIdentifier ident) { ... }
    public List<PO> listPOsByNSFullName(Mapper mapper, Namespace ns) { ... }
}

Routing moves back to the service as a single private helper per operation — not repeated per-method:

// SchemaMetaService
private SchemaPO getSchemaPO(NameIdentifier ident) {
    return SessionUtils.getWithoutCommit(SchemaMetaMapper.class, mapper ->
        cacheEnabled()
            ? ops.getPOByParentId(mapper, getCatalogId(ident), ident.name())
            : ops.getPOByFullName(mapper, ident));
}

Since ops is still a HierarchicalConventionPOStorageOps, both paths get name translation automatically. MetadataObjectService can store wrapped instances in its map instead of raw ops, fixing the naming convention bypass.

This eliminates the Capability enum, the GravitinoEnv dependency in the base class, and the dispatcher getPO(Mapper, NameIdentifier). Each layer has one job: name translation in the decorator, SQL in the ops, routing in the service.

What stays the same

HierarchicalConventionPOStorageOps as a decorator and the *POStorageOps concrete classes are the right abstraction — they just shouldn't know about cache state.

Comment on lines +44 to +45
private final UnaryOperator<PO> readRewriter;
private final UnaryOperator<PO> writeRewriter;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two variable names are confusing. What is the meaning of writeWriter? You'd better pick a better name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will rename them to LogicalToPhyiscalRewriter and PhysicalToLogicalRewriter.

return HierarchicalSchemaUtil.logicalToPhysical(name, sep);
}

private static NameIdentifier apiIdentifierToStorage(NameIdentifier apiIdentifier) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is apiIdentifier, can you figure out a better name ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should use the keywords physical and logical, too. Thanks for input.

cache.invalidate(ident, srcType, relType);
}
cache.invalidate(dstIdentifier, dstType, relType);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will set owners for multiple metadata objects. So I need to insert multiple relations. Although we already have the update multiple relations, it can't cover the cases when we need to insert multiple records.

@roryqi
Copy link
Copy Markdown
Contributor Author

roryqi commented May 15, 2026

The overall direction here is good — extracting per-entity PO logic into dedicated classes and using a decorator for hierarchical name translation is a clean separation. A few observations on the design that might be worth considering.

The core tension

BasePOStorageOps.getPO/listPOs currently combines two orthogonal concerns:

  1. Name translation (hierarchical logical ↔ physical) — handled well by HierarchicalConventionPOStorageOps
  2. Query routing (cache-enabled → ID-based; no cache → full-name JOIN) — currently embedded via GravitinoEnv.getInstance().cacheEnabled() in BasePOStorageOps

Mixing these in the base class causes the Capability enum + runtime UnsupportedOperationException pattern, the GravitinoEnv static dependency leaking into a storage layer class, and the issue in MetadataObjectService.TYPE_TO_STORAGE_OPS_MAP where raw *POStorageOps instances bypass the naming convention.

An alternative: expose two explicit SQL methods, route in the service

// BasePOStorageOps: pure SQL delegation, no routing logic
abstract class BasePOStorageOps<PO, Mapper> {
    // ID-based (cache-enabled path)
    public PO getPOByParentId(Mapper mapper, Long parentId, String name) { ... }
    public List<PO> listPOsByParentId(Mapper mapper, Long parentId) { ... }

    // Name-based (full join, no-cache path)
    public PO getPOByFullName(Mapper mapper, NameIdentifier ident) { ... }
    public List<PO> listPOsByNSFullName(Mapper mapper, Namespace ns) { ... }
}

Routing moves back to the service as a single private helper per operation — not repeated per-method:

// SchemaMetaService
private SchemaPO getSchemaPO(NameIdentifier ident) {
    return SessionUtils.getWithoutCommit(SchemaMetaMapper.class, mapper ->
        cacheEnabled()
            ? ops.getPOByParentId(mapper, getCatalogId(ident), ident.name())
            : ops.getPOByFullName(mapper, ident));
}

Since ops is still a HierarchicalConventionPOStorageOps, both paths get name translation automatically. MetadataObjectService can store wrapped instances in its map instead of raw ops, fixing the naming convention bypass.

This eliminates the Capability enum, the GravitinoEnv dependency in the base class, and the dispatcher getPO(Mapper, NameIdentifier). Each layer has one job: name translation in the decorator, SQL in the ops, routing in the service.

What stays the same

HierarchicalConventionPOStorageOps as a decorator and the *POStorageOps concrete classes are the right abstraction — they just shouldn't know about cache state.

I got your point.
I will try solve this issue.
I want to extract a pattern that resolve the duplicated code for cached operations. Maybe I should extract a helper class for them.

@roryqi roryqi requested a review from jerryshao May 15, 2026 10:31
Comment on lines +344 to +347
for (NameIdentifier ident : srcIdentifiers) {
cache.invalidate(ident, srcType, relType);
}
cache.invalidate(dstIdentifier, dstType, relType);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more proper to invalidate the cache first and then operate the database?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take a look. Other relation ops follows this pattern, too. Do we need to modify them together?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diqiu50
Please take a look, and I recall that you have modified this point.


@Override
public SchemaPO getPOByFullName(SchemaMetaMapper mapper, NameIdentifier identifier) {
Namespace namespace = identifier.namespace();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be performance degradation if you are using the full qualified name when cache is enabled, we can utilize the entity ID to retrieve entities.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added POStorageReadRouting to handle this issue.

"listPOs by namespace and names is not supported by " + getClass().getSimpleName());
}

public List<PO> listPOs(Mapper mapper, List<Long> uuids) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uuids ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should use entityIds. It may be readable.


static final Map<MetadataObject.Type, BasePOStorageOps<?, ?>> TYPE_TO_STORAGE_OPS_MAP =
ImmutableMap.<MetadataObject.Type, BasePOStorageOps<?, ?>>builder()
.put(MetadataObject.Type.SCHEMA, new SchemaPOStorageOps())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other types?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StorageOps only cover the schema related POs.

roryqi and others added 3 commits May 15, 2026 21:17
SchemaMetaService.insertSchema previously split the entity name on the
physical separator (HierarchicalSchemaUtil.physicalSeparator()), so the
ancestor-row creation path was only reachable when the caller already
handed a storage-form name. Production callers (SchemaOperationDispatcher
.importSchema and ManagedSchemaOperations.createSchema) pass the external
API form, so the leaf was inserted with no ancestor rows.

Split on the logical separator and let HierarchicalConventionPOStorageOps
.batchInsertPOs apply its write rewriter to translate each PO to storage
form before SQL execution. For the direct batchSelectSchemaByIdentifier
lookup that bypasses the rewriter, convert logical -> physical at the
call site.

Update TestSchemaMetaService accordingly so both hierarchical tests build
SchemaEntities with logical-form names.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask] Add the hierarchy convention layer

3 participants