[opt](memory) Replace TabletMeta object map with SoA CompactTabletMetaStore#62086
[opt](memory) Replace TabletMeta object map with SoA CompactTabletMetaStore#62086dataroaring wants to merge 3 commits intoapache:masterfrom
Conversation
…aStore Replace Long2ObjectOpenHashMap<TabletMeta> with CompactTabletMetaStore that stores fields in parallel primitive arrays (Structure-of-Arrays layout). This eliminates the 16-byte Java object header per tablet entry, reducing per-tablet overhead from ~96 bytes to ~69 bytes (~28% savings, ~270 MB at 10M tablets). CompactTabletMetaStore uses: - Long2IntOpenHashMap for tabletId -> slot mapping - 7 parallel arrays (dbIds, tableIds, partitionIds, indexIds, oldSchemaHashes, newSchemaHashes, storageMediumOrdinals) - Free list embedded in dbIds[] for slot reuse after deletion - On-demand TabletMeta construction for backward compatibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a more memory-efficient representation of per-tablet metadata in FE by replacing the tabletId -> TabletMeta object map with a Structure-of-Arrays (SoA) CompactTabletMetaStore, and migrates TabletInvertedIndex implementations to use it.
Changes:
- Added
CompactTabletMetaStore(SoA +Long2IntOpenHashMap+ free-list reuse) and a dedicated unit test suite. - Migrated
TabletInvertedIndexto store tablet metadata intabletMetaStoreand keep API compatibility by constructingTabletMetaon demand. - Updated
LocalTabletInvertedIndexandCloudTabletInvertedIndexto usetabletMetaStoreinstead oftabletMetaMap.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/catalog/CompactTabletMetaStore.java | New SoA-backed metadata store with slot reuse and on-demand TabletMeta construction. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/TabletInvertedIndex.java | Switches core inverted index metadata storage to CompactTabletMetaStore; UT map view via toMap(). |
| fe/fe-core/src/main/java/org/apache/doris/catalog/LocalTabletInvertedIndex.java | Replaces metadata map access/mutation with tabletMetaStore equivalents. |
| fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudTabletInvertedIndex.java | Replaces metadata existence checks and deletion with tabletMetaStore. |
| fe/fe-core/src/test/java/org/apache/doris/catalog/CompactTabletMetaStoreTest.java | Adds unit tests for add/get/remove, reuse, growth, clear, and medium updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public TabletMeta getTabletMeta(long tabletId) { | ||
| long stamp = readLock(); | ||
| try { | ||
| return tabletMetaMap.get(tabletId); | ||
| return tabletMetaStore.getTabletMeta(tabletId); | ||
| } finally { | ||
| readUnlock(stamp); | ||
| } |
There was a problem hiding this comment.
getTabletMeta() now constructs a new TabletMeta object on every call via tabletMetaStore.getTabletMeta(). TabletInvertedIndex.getTabletMeta() is used broadly across FE (rebalancers, report handler, proc nodes, etc.), so this can introduce significant allocation/GC overhead and may offset some of the memory win.
Consider adding allocation-free accessors on TabletInvertedIndex/CompactTabletMetaStore (e.g., getDbId/getTableId/getPartitionId/getIndexId/getOldSchemaHash/getStorageMedium) and migrating hot call sites to those, or providing a cached/flyweight TabletMeta for read-only access.
| if (tabletMetaStore.containsKey(tabletId)) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
addTablet() checks tabletMetaStore.containsKey(tabletId) and then calls tabletMetaStore.add(tabletId, tabletMeta), but CompactTabletMetaStore.add() also checks for duplicates. This does two hash lookups for every add.
Consider removing the outer containsKey() check and rely on tabletMetaStore.add() to no-op on duplicates, or change CompactTabletMetaStore.add() to return a boolean and avoid duplicate lookups.
| if (tabletMetaStore.containsKey(tabletId)) { | |
| return; | |
| } |
| private long[] indexIds; | ||
| private int[] oldSchemaHashes; | ||
| private int[] newSchemaHashes; | ||
| private byte[] storageMediumOrdinals; | ||
|
|
There was a problem hiding this comment.
newSchemaHashes is allocated and grown alongside the other arrays, but it is never read or exposed, and TabletMeta currently has no getter for newSchemaHash. This adds per-tablet memory overhead and undermines the compaction goal.
Either remove newSchemaHashes entirely for now, or fully wire it through (capture the value on add and expose a getter / include it when constructing TabletMeta).
| if (replicaMetaWithBackend != null) { | ||
| for (long tabletId : replicaMetaWithBackend.keySet()) { | ||
| if (tabletMetaMap.get(tabletId).getStorageMedium() == TStorageMedium.HDD) { | ||
| if (tabletMetaStore.getTabletMeta(tabletId).getStorageMedium() == TStorageMedium.HDD) { |
There was a problem hiding this comment.
In getReplicaNumByBeIdAndStorageMedium(), tabletMetaStore.getTabletMeta(tabletId) is called inside the loop solely to read storageMedium, which allocates a new TabletMeta per tablet. This is a hot path (iterating all tablets on a backend) and can add significant GC pressure.
Use tabletMetaStore.getStorageMedium(tabletId) instead (as is already done in getTabletSizeByBackendIdAndStorageMedium) to avoid per-iteration object allocation.
| if (tabletMetaStore.getTabletMeta(tabletId).getStorageMedium() == TStorageMedium.HDD) { | |
| if (tabletMetaStore.getStorageMedium(tabletId) == TStorageMedium.HDD) { |
…locations - Remove unused newSchemaHashes array (-4 bytes/tablet) - Change add() to return boolean, eliminating double hash lookup in addTablet() - Use getStorageMedium() directly in getReplicaNumByBeIdAndStorageMedium() to avoid unnecessary TabletMeta object allocation in hot loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
… hot loop Replace tabletMetaStore.getTabletMeta(tabletId) with individual field accessors (getDbId, getTableId, etc.) in buildPartitionInfoBySkew() to eliminate per-tablet object allocation when iterating over millions of tablets. This reduces GC pressure during partition balance computation. Also fixes a latent bug where Preconditions.checkNotNull(tabletMeta) ran after tabletMeta was already dereferenced, making it unreachable. Replaced with a NOT_EXIST_VALUE check on the first accessor call. Generated by ThinkOps
|
run buildall |
|
Addressed the review feedback in commit b50ad34: Comment #1 (getTabletMeta() GC concern): Fixed the main hot-path usage — Comment #2 (redundant containsKey in addTablet): Already addressed in a prior revision — Comment #3 (unused newSchemaHashes): Already addressed in a prior revision — the Comment #4 (getReplicaNumByBeIdAndStorageMedium): Already addressed in a prior revision — the method already uses Latent bug fix: Also fixed a latent bug where — ThinkOps 🤖 |
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
4 similar comments
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
2 similar comments
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Memory optimization for FE metadata storage at scale.
In Doris FE,
TabletInvertedIndexstores oneTabletMetaJava object per tablet in aLong2ObjectOpenHashMap<TabletMeta>. At 10M+ tablets, each object's 16-byte Java header adds ~160 MB of pure overhead. Combined with hash map reference overhead, this wastes ~270 MB per 10M tablets.How is the problem solved?
Replace per-object
TabletMetastorage with a Structure-of-Arrays (SoA) layout via a newCompactTabletMetaStoreclass:Long2ObjectOpenHashMap<TabletMeta>— oneTabletMetaobject per tablet (~96 bytes/tablet including object header, field padding, and map entry overhead)CompactTabletMetaStore— 7 parallel primitive arrays indexed by slot (long[] dbIds/tableIds/partitionIds/indexIds,int[] oldSchemaHashes/newSchemaHashes,byte[] storageMediumOrdinals) with aLong2IntOpenHashMapfor tabletId→slot mapping (~69 bytes/tablet, ~28% reduction)Key design choices:
dbIds[]for O(1) slot reuse after deletionTabletMetaconstruction — external callers are unchanged;TabletMetaobjects are only created when needed viagetTabletMeta()getStorageMedium(),setStorageMedium()) avoid object allocation for hot paths like storage medium checks and mutationsStampedLockinTabletInvertedIndexWhat are the changes?
CompactTabletMetaStore.javaTabletMetaconstructionTabletInvertedIndex.javaLong2ObjectOpenHashMap<TabletMeta> tabletMetaMapfield withCompactTabletMetaStore tabletMetaStore; update 5 methodsLocalTabletInvertedIndex.javatabletMetaMapreferences withtabletMetaStorecallsCloudTabletInvertedIndex.javatabletMetaMapreferences withtabletMetaStorecallsCompactTabletMetaStoreTest.javaAll 25+ external callers of
TabletInvertedIndexare unchanged — full backward compatibility.Test plan
CompactTabletMetaStoreTest— covers add/get/remove, duplicate adds, storageMedium mutation, free list reuse, array growth,toMap(),clear(), all storage medium valuesTabletInvertedIndexTestandClusterLoadStatisticsTestpass🤖 Generated with Claude Code