[#11103] feat(core): Add the hierarchy convention layer by roryqi · Pull Request #11074 · apache/gravitino

roryqi · 2026-05-13T08:09:41Z

What changes were proposed in this pull request?

Add abstract class BasePOStorageOps to let all the PO logic to one class

Add SchemaPOStorageOps,TablePOStorageOps,FunctionPOStorageOps,ViewPOStorageOps.

Add Hierachical convention class

Why are the changes needed?

Fix: #11103

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added tests.

- RelationalSchemaNamingBridge and JDBCBackend entity/relation conversions - SupportsRelationOperations.batchInsertRelations; RelationalEntityStore cache invalidation - RelationalBackend; backend-focused tests Co-authored-by: Cursor <cursoragent@cursor.com>

@SuppressWarnings

- Replace Preconditions.checkNotNull with Objects.requireNonNull in JDBCBackend.batchInsertRelations for consistency with the interface - Add @SuppressWarnings explanatory comments in unchecked-cast methods - Document FILESET/TOPIC/MODEL/MODEL_VERSION passthrough in default switch branches of nameIdentifierForStorage/nameIdentifierForApi - Expand Javadoc on embeddedNamespaceForStorage/Api to warn about the index-2 layout assumption - Guard statisticEntityForApi/Storage with Preconditions.checkArgument to reject non-TableStatisticEntity subtypes at the call site - Add tests: roleEntityForStorage/Api round-trips, batchInsertRelations cache invalidation in RelationalEntityStore, and non-OWNER_REL rejection in JDBCBackend.batchInsertRelations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Extract mapRoleSecurableObjects helper to eliminate the duplicate builder block shared by roleEntityForStorage and roleEntityForApi - Extract statisticEntityWithNamespace helper to consolidate the type guard and builder shared by statisticEntityForApi and statisticEntityForStorage - Replace repeated Lists.newArrayList(Privileges.UseSchema.allow()) and SecurableObjects.ofCatalog("catalog", ...) constructions in tests with private static final constants USE_SCHEMA_PRIVS and CATALOG_OBJ - Remove inline WHAT-comments that restate what the assertions already express Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…yStore batchDelete and batchPut were the only write paths that bypassed the cache — callers that relied on them would silently observe stale reads. Apply the same pattern as their single-item counterparts: - batchDelete: invalidate each (ident, entityType) pair after backend - batchPut: put each entity into cache after backend Add unit tests covering the backend-then-cache ordering for both methods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nalEntityStore" This reverts commit cb21732.

…Identifier-based approach Use the existing nameIdentifierForStorage/Api infrastructure via a sentinel metalake prefix instead of bespoke string-splitting logic. Removes the package-private convertMetadataObjectDottedFullName and convertSchemaSegmentAt helpers; updates securableObjectForStorage/Api and genericEntityMetadataFullNameForApi accordingly. Tests are replaced with equivalent coverage through the public securableObject API. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…dataObject entity types Entity types without a MetadataObject.Type equivalent (e.g. TABLE_STATISTIC, MODEL_VERSION) must not be schema-converted. Re-add the MetadataObject.Type.valueOf guard that was dropped in the previous refactor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-13T18:55:52Z

Code Coverage Report

Overall Project	66.17% `+0.17%`	🟢
Files changed	76.64%	🟢

Module	Coverage
aliyun	1.72%	🔴
api	46.83%	🟢
authorization-common	85.96%	🟢
aws	1.08%	🔴
azure	2.47%	🔴
catalog-common	10.2%	🔴
catalog-fileset	80.02%	🟢
catalog-glue	83.41%	🟢
catalog-hive	81.83%	🟢
catalog-jdbc-clickhouse	80.02%	🟢
catalog-jdbc-common	44.46%	🟢
catalog-jdbc-doris	80.28%	🟢
catalog-jdbc-hologres	54.03%	🟢
catalog-jdbc-mysql	79.23%	🟢
catalog-jdbc-oceanbase	78.38%	🟢
catalog-jdbc-postgresql	82.05%	🟢
catalog-jdbc-starrocks	78.27%	🟢
catalog-kafka	77.01%	🟢
catalog-lakehouse-generic	44.89%	🟢
catalog-lakehouse-hudi	79.1%	🟢
catalog-lakehouse-iceberg	85.87%	🟢
catalog-lakehouse-paimon	77.25%	🟢
catalog-model	77.72%	🟢
cli	44.51%	🟢
client-java	77.92%	🟢
common	50.0%	🟢
core	82.24% `-0.32%`	🟢
filesystem-hadoop3	76.97%	🟢
flink	0.0%	🔴
flink-common	43.17%	🟢
flink-runtime	0.0%	🔴
gcp	14.12%	🔴
hadoop-common	10.39%	🔴
hive-metastore-common	46.83%	🟢
iceberg-common	55.46%	🟢
iceberg-rest-server	69.88%	🟢
idp-basic	88.82%	🟢
integration-test-common	0.0%	🔴
jobs	66.17%	🟢
lance-common	20.9%	🔴
lance-rest-server	62.78%	🟢
lineage	53.02%	🟢
optimizer	82.95%	🟢
optimizer-api	21.95%	🔴
server	85.75%	🟢
server-common	71.28%	🟢
spark	32.79%	🔴
spark-common	39.09%	🔴
trino-connector	35.14%	🔴

Files

Module	File	Coverage
core	FunctionMetaService.java	100.0%	🟢
	POStorageReadRouting.java	100.0%	🟢
	TableMetaService.java	100.0%	🟢
	ViewMetaService.java	100.0%	🟢
	SchemaMetaService.java	98.04%	🟢
	HierarchicalConventionPOStorageOps.java	97.22%	🟢
	JDBCBackend.java	78.66%	🟢
	MetadataObjectService.java	72.34%	🟢
	SupportsRelationOperations.java	61.54%	🟢
	SchemaPOStorageOps.java	58.06%	🔴
	RelationalEntityStore.java	54.88%	🔴
	FunctionPOStorageOps.java	38.46%	🔴
	TablePOStorageOps.java	33.33%	🔴
	ViewPOStorageOps.java	26.32%	🔴
	BasePOStorageOps.java	5.0%	🔴

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Throw NoSuchEntityException with the correct entity type by checking catalogId/schemaId on the joined row, so callers see "catalog not found" or "schema not found" instead of a misleading parent error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Use the same pattern as Table: when the join returns no row or a placeholder row, throw NoSuchEntityException with the correct entity type by inspecting catalogId/schemaId on the result. Also switch the function-by-full-name select from INNER JOIN to LEFT JOIN on the version table, matching the list query and the Table/View shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Keep the INNER JOIN on the version table for the function-by-full-name select. Revert the Java getPOByFullName checks accordingly so the caller continues to handle the null PO case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

roryqi · 2026-05-15T02:07:34Z

@jerryshao @yuqi1129 Could u help me review this pull request?

jerryshao · 2026-05-15T06:33:41Z

Is it ready to review?

roryqi · 2026-05-15T06:54:27Z

Is it ready to review?

Yes, it is ready to review.

jerryshao · 2026-05-15T07:12:16Z

The overall direction here is good — extracting per-entity PO logic into dedicated classes and using a decorator for hierarchical name translation is a clean separation. A few observations on the design that might be worth considering.

The core tension

BasePOStorageOps.getPO/listPOs currently combines two orthogonal concerns:

Name translation (hierarchical logical ↔ physical) — handled well by HierarchicalConventionPOStorageOps
Query routing (cache-enabled → ID-based; no cache → full-name JOIN) — currently embedded via GravitinoEnv.getInstance().cacheEnabled() in BasePOStorageOps

Mixing these in the base class causes the Capability enum + runtime UnsupportedOperationException pattern, the GravitinoEnv static dependency leaking into a storage layer class, and the issue in MetadataObjectService.TYPE_TO_STORAGE_OPS_MAP where raw *POStorageOps instances bypass the naming convention.

An alternative: expose two explicit SQL methods, route in the service

// BasePOStorageOps: pure SQL delegation, no routing logic
abstract class BasePOStorageOps<PO, Mapper> {
    // ID-based (cache-enabled path)
    public PO getPOByParentId(Mapper mapper, Long parentId, String name) { ... }
    public List<PO> listPOsByParentId(Mapper mapper, Long parentId) { ... }

    // Name-based (full join, no-cache path)
    public PO getPOByFullName(Mapper mapper, NameIdentifier ident) { ... }
    public List<PO> listPOsByNSFullName(Mapper mapper, Namespace ns) { ... }
}

Routing moves back to the service as a single private helper per operation — not repeated per-method:

// SchemaMetaService
private SchemaPO getSchemaPO(NameIdentifier ident) {
    return SessionUtils.getWithoutCommit(SchemaMetaMapper.class, mapper ->
        cacheEnabled()
            ? ops.getPOByParentId(mapper, getCatalogId(ident), ident.name())
            : ops.getPOByFullName(mapper, ident));
}

Since ops is still a HierarchicalConventionPOStorageOps, both paths get name translation automatically. MetadataObjectService can store wrapped instances in its map instead of raw ops, fixing the naming convention bypass.

This eliminates the Capability enum, the GravitinoEnv dependency in the base class, and the dispatcher getPO(Mapper, NameIdentifier). Each layer has one job: name translation in the decorator, SQL in the ops, routing in the service.

What stays the same

HierarchicalConventionPOStorageOps as a decorator and the *POStorageOps concrete classes are the right abstraction — they just shouldn't know about cache state.

jerryshao · 2026-05-15T07:21:21Z

+  private final UnaryOperator<PO> readRewriter;
+  private final UnaryOperator<PO> writeRewriter;


These two variable names are confusing. What is the meaning of writeWriter? You'd better pick a better name.

OK, I will rename them to LogicalToPhyiscalRewriter and PhysicalToLogicalRewriter.

jerryshao · 2026-05-15T07:24:57Z

+    return HierarchicalSchemaUtil.logicalToPhysical(name, sep);
+  }
+
+  private static NameIdentifier apiIdentifierToStorage(NameIdentifier apiIdentifier) {


What is apiIdentifier, can you figure out a better name ?

Maybe I should use the keywords physical and logical, too. Thanks for input.

jerryshao · 2026-05-15T07:26:55Z

+      cache.invalidate(ident, srcType, relType);
+    }
+    cache.invalidate(dstIdentifier, dstType, relType);
+  }


Why do we need to add this?

I will set owners for multiple metadata objects. So I need to insert multiple relations. Although we already have the update multiple relations, it can't cover the cases when we need to insert multiple records.

roryqi · 2026-05-15T07:28:16Z

The overall direction here is good — extracting per-entity PO logic into dedicated classes and using a decorator for hierarchical name translation is a clean separation. A few observations on the design that might be worth considering.

The core tension

BasePOStorageOps.getPO/listPOs currently combines two orthogonal concerns:

Name translation (hierarchical logical ↔ physical) — handled well by HierarchicalConventionPOStorageOps

Query routing (cache-enabled → ID-based; no cache → full-name JOIN) — currently embedded via GravitinoEnv.getInstance().cacheEnabled() in BasePOStorageOps

Mixing these in the base class causes the Capability enum + runtime UnsupportedOperationException pattern, the GravitinoEnv static dependency leaking into a storage layer class, and the issue in MetadataObjectService.TYPE_TO_STORAGE_OPS_MAP where raw *POStorageOps instances bypass the naming convention.

An alternative: expose two explicit SQL methods, route in the service
// BasePOStorageOps: pure SQL delegation, no routing logic
abstract class BasePOStorageOps<PO, Mapper> {
    // ID-based (cache-enabled path)
    public PO getPOByParentId(Mapper mapper, Long parentId, String name) { ... }
    public List<PO> listPOsByParentId(Mapper mapper, Long parentId) { ... }

    // Name-based (full join, no-cache path)
    public PO getPOByFullName(Mapper mapper, NameIdentifier ident) { ... }
    public List<PO> listPOsByNSFullName(Mapper mapper, Namespace ns) { ... }
}
Routing moves back to the service as a single private helper per operation — not repeated per-method:
// SchemaMetaService
private SchemaPO getSchemaPO(NameIdentifier ident) {
    return SessionUtils.getWithoutCommit(SchemaMetaMapper.class, mapper ->
        cacheEnabled()
            ? ops.getPOByParentId(mapper, getCatalogId(ident), ident.name())
            : ops.getPOByFullName(mapper, ident));
}
Since ops is still a HierarchicalConventionPOStorageOps, both paths get name translation automatically. MetadataObjectService can store wrapped instances in its map instead of raw ops, fixing the naming convention bypass.

This eliminates the Capability enum, the GravitinoEnv dependency in the base class, and the dispatcher getPO(Mapper, NameIdentifier). Each layer has one job: name translation in the decorator, SQL in the ops, routing in the service.

What stays the same

HierarchicalConventionPOStorageOps as a decorator and the *POStorageOps concrete classes are the right abstraction — they just shouldn't know about cache state.

I got your point.
I will try solve this issue.
I want to extract a pattern that resolve the duplicated code for cached operations. Maybe I should extract a helper class for them.

yuqi1129 · 2026-05-15T12:44:53Z

+    for (NameIdentifier ident : srcIdentifiers) {
+      cache.invalidate(ident, srcType, relType);
+    }
+    cache.invalidate(dstIdentifier, dstType, relType);


Is it more proper to invalidate the cache first and then operate the database?

I take a look. Other relation ops follows this pattern, too. Do we need to modify them together?

@diqiu50
Please take a look, and I recall that you have modified this point.

yuqi1129 · 2026-05-15T12:51:19Z

+
+  @Override
+  public SchemaPO getPOByFullName(SchemaMetaMapper mapper, NameIdentifier identifier) {
+    Namespace namespace = identifier.namespace();


There will be performance degradation if you are using the full qualified name when cache is enabled, we can utilize the entity ID to retrieve entities.

I have added POStorageReadRouting to handle this issue.

yuqi1129 · 2026-05-15T12:53:23Z

+        "listPOs by namespace and names is not supported by " + getClass().getSimpleName());
+  }
+
+  public List<PO> listPOs(Mapper mapper, List<Long> uuids) {


Yes, I should use entityIds. It may be readable.

yuqi1129 · 2026-05-15T12:58:11Z


+  static final Map<MetadataObject.Type, BasePOStorageOps<?, ?>> TYPE_TO_STORAGE_OPS_MAP =
+      ImmutableMap.<MetadataObject.Type, BasePOStorageOps<?, ?>>builder()
+          .put(MetadataObject.Type.SCHEMA, new SchemaPOStorageOps())


What about other types?

The StorageOps only cover the schema related POs.

SchemaMetaService.insertSchema previously split the entity name on the physical separator (HierarchicalSchemaUtil.physicalSeparator()), so the ancestor-row creation path was only reachable when the caller already handed a storage-form name. Production callers (SchemaOperationDispatcher .importSchema and ManagedSchemaOperations.createSchema) pass the external API form, so the leaf was inserted with no ancestor rows. Split on the logical separator and let HierarchicalConventionPOStorageOps .batchInsertPOs apply its write rewriter to translate each PO to storage form before SQL execution. For the direct batchSelectSchemaByIdentifier lookup that bypasses the rewriter, convert logical -> physical at the call site. Update TestSchemaMetaService accordingly so both hierarchical tests build SchemaEntities with logical-form names. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

roryqi marked this pull request as draft May 13, 2026 08:14

roryqi and others added 7 commits May 13, 2026 08:37

Revert "fix(core): complete cache write-through convention in Relatio…

caf3b96

…nalEntityStore" This reverts commit cb21732.

refactor

3c4f7db

roryqi self-assigned this May 13, 2026

roryqi added 4 commits May 13, 2026 19:58

revert unused changes

6d23138

address id resolver

cbbdb8a

revert

6fa7da4

refactor

42d4809

roryqi requested a review from jerryshao May 13, 2026 12:27

roryqi added 6 commits May 13, 2026 20:49

remove convention

3d3f4dd

revert change

993e482

revert

ae98cf3

fix

7cfe982

cache

26319b7

cache

dbcf096

roryqi and others added 6 commits May 14, 2026 16:52

refactor

29cb2ac

fix

7b0c466

refactor

74a0a7d

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

refactor

e9134fd

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

refactor

8cf90f7

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

refactor

bac3a3c

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

roryqi removed the request for review from jerryshao May 14, 2026 10:39

refactor

060815a

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

roryqi changed the title ~~feat(core): JDBC backend and entity store for nested namespace naming~~ feat(core): Add the hierarchy convention layer May 14, 2026

roryqi changed the title ~~feat(core): Add the hierarchy convention layer~~ [#11103] feat(core): Add the hierarchy convention layer May 14, 2026

roryqi and others added 8 commits May 14, 2026 12:01

test: add unit tests for HierarchicalConventionPOStorageOps

a960062

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix

3d652c8

fix

e72918b

fix ut

6148465

fix ut

f31fe4a

revert

60a12ec

revert

617ba19

fix ut

525f2c6

roryqi marked this pull request as ready for review May 14, 2026 18:08

roryqi and others added 3 commits May 15, 2026 01:48

roryqi requested review from jerryshao and yuqi1129 May 15, 2026 04:51

jerryshao reviewed May 15, 2026

View reviewed changes

Address the comments

4e40b48

roryqi requested a review from jerryshao May 15, 2026 10:31

yuqi1129 reviewed May 15, 2026

View reviewed changes

roryqi and others added 3 commits May 15, 2026 21:17

rename

9be45fc

Fix schema

cb3f473

		private final UnaryOperator<PO> readRewriter;
		private final UnaryOperator<PO> writeRewriter;

Conversation

roryqi commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage Report

Uh oh!

roryqi commented May 15, 2026

Uh oh!

jerryshao commented May 15, 2026

Uh oh!

roryqi commented May 15, 2026

Uh oh!

jerryshao commented May 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roryqi commented May 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

roryqi commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading