Skip to content

Commit 17c3b8b

Browse files
sonika-shahclaude
andauthored
fix(profiler): N+1 / missing-index regression on /tables/.../columns?fields=profile (#3488) (#27746)
* fix(profiler): N+1 / missing-index regression on /tables/.../columns?fields=profile (#3488) Root cause ---------- The 1.9.9 migration introduced two separate index regressions on `profiler_data_time_series`: 1. **PostgreSQL**: `schemaChanges.sql` explicitly dropped the unique constraint `profiler_data_time_series_unique_hash_extension_ts` (entityFQNHash, extension, operation, timestamp) to allow altering the generated `operation` column expression, but never recreated it. After the migration the table kept only the `(extension, timestamp)` index, which is useless for queries filtering by `entityFQNHash`. 2. **MySQL/both**: `postDataMigrationSQLScript.sql` created temporary indexes (idx_pdts_entityFQNHash, idx_pdts_composite, etc.) for its bulk UPDATE pass and then dropped **all** of them, including the only index covering `entityFQNHash`. The batch query issued by `getLatestExtensionsBatch()` when `fields=profile` is requested: SELECT entityFQNHash, MAX(timestamp) FROM profiler_data_time_series WHERE entityFQNHash IN (...N hashes...) AND extension = 'table.columnProfile' GROUP BY entityFQNHash required an `(entityFQNHash, extension, timestamp)` index. Without it the database performs a full table scan. On production deployments with millions of profiler rows this caused 100+ second response times (Grafana: 106 770 ms; 99 % in DB; 93 dbOps). Without `profile` in the fields param the same endpoint returned in ~150-220 ms. A secondary N+1 bug existed independently of the index: `customMetrics` in fields called `getCustomMetrics(table, column)` once per paginated column, issuing up to N identical queries against `entity_extension` and then filtering in Java. Fix --- * **migration 2.0.2** (MySQL + PostgreSQL): `CREATE INDEX IF NOT EXISTS idx_pdts_fqnhash_ext_ts ON profiler_data_time_series(entityFQNHash, extension, timestamp)`. The `IF NOT EXISTS` guard makes the migration safe to re-run and handles both upgrade and fresh-install paths. * **`getTableColumnsInternal`** — `customMetrics` block: fetch all column custom metrics for the table in one query, group by column name in Java, then distribute. Reduces N queries to 1. * **`getTableColumnsInternal`** — `profile` block: skip the duplicate `populateEntityFieldTags` call when `tags` was already fetched earlier in the same request, saving one prefix-scan on `tag_usage` per request. Related: PR #26855 (fixed N+1 tag queries on the list-tables path but left the profiler-index and customMetrics N+1 untouched on the columns sub-path). * fix(profiler): restore unique constraint on profiler_data_time_series + batch column extension/customMetrics fetch Move the migration from 2.0.2/ to 1.12.8/ and switch from a non-unique covering index to restoring the original unique constraint dropped in 1.9.9. The two-phase CREATE UNIQUE INDEX CONCURRENTLY + ADD CONSTRAINT USING INDEX pattern avoids the ACCESS EXCLUSIVE lock on the hot profiler_data_time_series table during the upgrade. Closes the 1.9.9 regression and brings Postgres back in line with MySQL (which never lost the constraint). The leading (entityFQNHash, extension) prefix serves the column-profile batch query — same shape MySQL has been running without 504s. MySQL needs no migration. Java side, eliminates two more N+1 patterns that compound the latency at customer scale: * getTableColumnsInternal extension block: replaced per-column getColumnExtension() loop with a single getExtensionsByJsonSchema() call, grouped by column FQN-hash in Java. * searchTableColumnsInternal customMetrics block: applied the same batch-fetch pattern already used in getTableColumnsInternal, replacing per-column getCustomMetrics() with one getExtensions() call. New DAO method on EntityExtensionDAO: getExtensionsByJsonSchema(id, jsonSchema) — selects extensions for a table id filtered by the jsonschema discriminator. Required because column extensions are stored with MD5-hashed extension keys and have no shared prefix the existing getExtensions(id, prefix) could use. * chore(profiler): address review feedback — empty-list literal + accurate test comments * Replace `new ArrayList<>()` default in `metricsByColumn.getOrDefault(...)` with `List.of()` at both call sites in `TableRepository` (getTableColumnsInternal and searchTableColumnsInternal). `getOrDefault` evaluates its default eagerly, so the new ArrayList allocates per-column even when the key is found — unnecessary work on a hot path. * Reword two stale test comments in `test_getColumnsWithProfileField_correctnessAndNoBatchRegression`: - "all four field combinations" → "the three field combinations exercised below" - "(c) duplicate populateEntityFieldTags must not run twice" → describe the observable contract the assertions actually verify (tags + profile both present), not the internal call count. * fix(profiler): force outer index scan in getLatestExtensionsBatch by pushing IN list to the join The getLatestExtensionsBatch query was the right shape for correctness but the planner — on Postgres at customer scale, with the new unique constraint in place — was still choosing a parallel sequential scan over the full profiler_data_time_series table for the outer side of the JOIN, rather than a merge join with index scan on both sides. Inner subquery: filtered by `entityFQNHash IN (...)`, used the index. Outer: only filtered by `p.extension = :extension`, no IN list, planner couldn't infer the transitive constraint that p.entityFQNHash must equal one of the inner hashes (because it's enforced through the JOIN ON clause, not a WHERE predicate). Result: full table scan reading 6.7M+ rows even when the actual answer is 23 rows. Adding the redundant `AND p.entityFQNHash IN (<entityFQNHashes>)` to the outer WHERE makes the constraint explicit. The result set is unchanged (implied by the join condition), but the planner can now use the unique index for the outer access too. Verified on the AUT dump (6.94M-row pdts): EXPLAIN of the batch query: 7,234ms → 79ms (Hash Join + Parallel Seq Scan → Merge Join + Index Only Scan). Live API /columns?fields=profile&include=all: 6-36 seconds → 22-28ms (warm) / 1.9s (very first call). 250-1000x improvement, depending on cache state. Same SQL works on both engines; no @ConnectionAwareSqlQuery split needed. * test(profiler): shorten classification/tag fixture names in IT to fit varchar(256) The IT fixture for test_getColumnsWithProfileField_correctnessAndNoBatchRegression was building a tagFQN of `<classification>.<tag>` where each part went through TestNamespace.prefix(). With the descriptive method name (62 chars) + class name (15 chars) + namespace UUID (32 chars) plus the `profile_test_cls` / `profile_test_tag` base names (16 chars each), the resulting tagFQN was 263 characters — over the tag_usage.tagFQN VARCHAR(256) limit: ERROR: value too long for type character varying(256) Shorten the fixture base names from `profile_test_cls`/`profile_test_tag` to `cls`/`tag`. The namespace prefix already encodes test isolation (class + method + UUID), so the base name doesn't need to repeat that context. New tagFQN length: 237 chars (cls__<32>__TableResourceIT__<62>.tag__<32>__TableResourceIT__<62>), comfortably under 256. * fix(table): include extensionKey in column-extension deserialize warn log Addresses gitar-bot review on PR #27746: the warning log on failed column- extension deserialization only had table.getId(), so operators could not pinpoint which row was bad. Add record.extensionName() (the entity_extension row key) to the log. No extra iteration - record is already in scope inside the catch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(migration): move profiler unique-constraint migration to 1.12.9 1.12.8 was already published with the PII classification fix from #27910. Move the profiler_data_time_series unique-constraint restore (this PR's postgres migration) to 1.12.9 so customers upgrading past the published 1.12.8 still pick it up. Add a MySQL placeholder schemaChanges.sql for 1.12.9 consistent with the 1.12.7 convention — MySQL was unaffected by the 1.9.9 regression (MODIFY COLUMN re-evaluates generated expressions in place without touching the constraint, so MySQL still has the constraint from 1.1.5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(table): extract batchFetchCustomMetricsByColumn helper Addresses PR #27746 Copilot review: - Dedupe custom-metric batch logic between getTableColumnsInternal and searchTableColumnsInternal. - Reword IT inline comment to reflect what the test actually validates (completes within timeout + correct profiles) instead of claiming it inspects query plans. --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 0fa86e1 commit 17c3b8b

5 files changed

Lines changed: 210 additions & 5 deletions

File tree

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
-- Placeholder for 1.12.9 MySQL schema changes
2+
-- The Postgres-side fix for collate#3488 has no MySQL counterpart: MySQL's
3+
-- 1.1.5 unique constraint on profiler_data_time_series was never dropped
4+
-- (MODIFY COLUMN re-evaluates generated expressions in place), so the
5+
-- regression that hit Postgres did not affect MySQL.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
-- Restore the unique constraint dropped in 1.9.9. Closes the 1.9.9 regression that caused
2+
-- /columns?fields=profile 504s, and brings Postgres back in line with MySQL (which never
3+
-- lost it). The leading (entityFQNHash, extension) prefix serves the column-profile batch query.
4+
-- Two-phase: CONCURRENTLY build avoids ACCESS EXCLUSIVE lock; ADD CONSTRAINT USING INDEX
5+
-- promotes the built index without re-scanning.
6+
CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS
7+
profiler_data_time_series_unique_hash_extension_ts
8+
ON profiler_data_time_series (entityFQNHash, extension, operation, timestamp);
9+
10+
ALTER TABLE profiler_data_time_series
11+
ADD CONSTRAINT profiler_data_time_series_unique_hash_extension_ts
12+
UNIQUE USING INDEX profiler_data_time_series_unique_hash_extension_ts;
13+
14+
ANALYZE profiler_data_time_series;

openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TableResourceIT.java

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5861,4 +5861,141 @@ void test_listTablesWithColumnTags_performance(TestNamespace ns) {
58615861
assertFalse(table.getTags().isEmpty(), "Table tags should not be empty");
58625862
}
58635863
}
5864+
5865+
// ===================================================================
5866+
// REGRESSION TEST - columns API with fields=profile (collate#3488)
5867+
// ===================================================================
5868+
5869+
@Test
5870+
@Execution(ExecutionMode.SAME_THREAD)
5871+
void test_getColumnsWithProfileField_correctnessAndNoBatchRegression(TestNamespace ns) {
5872+
OpenMetadataClient client = SdkClients.adminClient();
5873+
5874+
DatabaseService service = DatabaseServiceTestFactory.createPostgres(ns);
5875+
DatabaseSchema schema = DatabaseSchemaTestFactory.createSimple(ns, service);
5876+
5877+
CreateClassification createClassification =
5878+
new CreateClassification()
5879+
.withName(ns.prefix("cls"))
5880+
.withDescription("Classification for profile regression test");
5881+
Classification cls = client.classifications().create(createClassification);
5882+
5883+
CreateTag createTag =
5884+
new CreateTag()
5885+
.withName(ns.prefix("tag"))
5886+
.withDescription("Tag for profile regression test")
5887+
.withClassification(cls.getName());
5888+
Tag tag = client.tags().create(createTag);
5889+
5890+
TagLabel tagLabel =
5891+
new TagLabel()
5892+
.withTagFQN(tag.getFullyQualifiedName())
5893+
.withSource(TagLabel.TagSource.CLASSIFICATION);
5894+
5895+
Column idCol = ColumnBuilder.of("id", "BIGINT").primaryKey().notNull().build();
5896+
idCol.setTags(List.of(tagLabel));
5897+
Column emailCol = ColumnBuilder.of("email", "VARCHAR").dataLength(255).build();
5898+
emailCol.setTags(List.of(tagLabel));
5899+
Column nameCol = ColumnBuilder.of("name", "VARCHAR").dataLength(255).build();
5900+
5901+
CreateTable createRequest = createRequest(ns.prefix("profile_regression_table"), ns);
5902+
createRequest.setDatabaseSchema(schema.getFullyQualifiedName());
5903+
createRequest.setColumns(List.of(idCol, emailCol, nameCol));
5904+
Table table = client.tables().create(createRequest);
5905+
5906+
Long timestamp = System.currentTimeMillis();
5907+
ColumnProfile idProfile =
5908+
new ColumnProfile()
5909+
.withName("id")
5910+
.withMin(1.0)
5911+
.withMax(999.0)
5912+
.withUniqueCount(100.0)
5913+
.withTimestamp(timestamp);
5914+
ColumnProfile emailProfile =
5915+
new ColumnProfile()
5916+
.withName("email")
5917+
.withNullCount(5.0)
5918+
.withNullProportion(0.05)
5919+
.withTimestamp(timestamp);
5920+
5921+
TableProfile tableProfile =
5922+
new TableProfile().withRowCount(100.0).withColumnCount(3.0).withTimestamp(timestamp);
5923+
5924+
CreateTableProfile createProfile =
5925+
new CreateTableProfile()
5926+
.withTableProfile(tableProfile)
5927+
.withColumnProfile(List.of(idProfile, emailProfile));
5928+
client.tables().updateTableProfile(table.getId(), createProfile);
5929+
5930+
// Verify the three field combinations exercised below don't regress:
5931+
// (a) fields=profile — completes within 30s and returns the expected column profiles
5932+
TableColumnList withProfile =
5933+
assertTimeout(
5934+
Duration.ofSeconds(30),
5935+
() -> client.tables().getColumns(table.getId(), "profile"),
5936+
"columns?fields=profile should complete within 30s");
5937+
5938+
assertEquals(3, withProfile.getData().size());
5939+
Column returnedId =
5940+
withProfile.getData().stream()
5941+
.filter(c -> "id".equals(c.getName()))
5942+
.findFirst()
5943+
.orElse(null);
5944+
Column returnedName =
5945+
withProfile.getData().stream()
5946+
.filter(c -> "name".equals(c.getName()))
5947+
.findFirst()
5948+
.orElse(null);
5949+
assertNotNull(returnedId, "id column should be present");
5950+
assertNotNull(returnedId.getProfile(), "id column should have profile data");
5951+
assertEquals(1.0, returnedId.getProfile().getMin(), "id column min should match");
5952+
assertEquals(999.0, returnedId.getProfile().getMax(), "id column max should match");
5953+
assertNotNull(returnedName, "name column should be present");
5954+
assertNull(returnedName.getProfile(), "name column has no profile, should be null");
5955+
5956+
// (b) fields=tags,customMetrics,extension,profile — the exact production query
5957+
TableColumnList withAllFields =
5958+
assertTimeout(
5959+
Duration.ofSeconds(30),
5960+
() -> client.tables().getColumns(table.getId(), "tags,customMetrics,extension,profile"),
5961+
"columns?fields=tags,customMetrics,extension,profile should complete within 30s");
5962+
5963+
assertEquals(3, withAllFields.getData().size());
5964+
5965+
Column idResult =
5966+
withAllFields.getData().stream()
5967+
.filter(c -> "id".equals(c.getName()))
5968+
.findFirst()
5969+
.orElse(null);
5970+
assertNotNull(idResult, "id column must be present");
5971+
assertNotNull(idResult.getProfile(), "id column must have profile");
5972+
assertNotNull(idResult.getTags(), "id column must have tags");
5973+
assertFalse(idResult.getTags().isEmpty(), "id column tags must not be empty");
5974+
assertTrue(
5975+
idResult.getTags().stream()
5976+
.anyMatch(t -> tag.getFullyQualifiedName().equals(t.getTagFQN())),
5977+
"id column should carry the test tag");
5978+
5979+
// (c) fields=tags,profile — both tags and profile are populated correctly when requested
5980+
// together (the dedup of populateEntityFieldTags is exercised here, but this test
5981+
// verifies the observable contract — tags + profile both present on the result —
5982+
// not the internal call count)
5983+
TableColumnList withTagsAndProfile =
5984+
assertTimeout(
5985+
Duration.ofSeconds(30),
5986+
() -> client.tables().getColumns(table.getId(), "tags,profile"),
5987+
"columns?fields=tags,profile should complete within 30s");
5988+
5989+
assertEquals(3, withTagsAndProfile.getData().size());
5990+
Column idTagsProfile =
5991+
withTagsAndProfile.getData().stream()
5992+
.filter(c -> "id".equals(c.getName()))
5993+
.findFirst()
5994+
.orElse(null);
5995+
assertNotNull(idTagsProfile);
5996+
assertNotNull(idTagsProfile.getTags());
5997+
assertFalse(
5998+
idTagsProfile.getTags().isEmpty(), "Tags must be present even when profile requested");
5999+
assertNotNull(idTagsProfile.getProfile(), "Profile must be present when profile requested");
6000+
}
58646001
}

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1545,6 +1545,13 @@ void bulkUpsertExtensions(
15451545
List<ExtensionRecord> getExtensions(
15461546
@BindUUID("id") UUID id, @Bind("extensionPrefix") String extensionPrefix);
15471547

1548+
@RegisterRowMapper(ExtensionMapper.class)
1549+
@SqlQuery(
1550+
"SELECT extension, json FROM entity_extension WHERE id = :id AND jsonschema = :jsonSchema "
1551+
+ "ORDER BY extension")
1552+
List<ExtensionRecord> getExtensionsByJsonSchema(
1553+
@BindUUID("id") UUID id, @Bind("jsonSchema") String jsonSchema);
1554+
15481555
@ConnectionAwareSqlQuery(
15491556
value =
15501557
"SELECT json FROM ("
@@ -9550,7 +9557,8 @@ default String getTimeSeriesTableName() {
95509557
+ " GROUP BY entityFQNHash"
95519558
+ ") latest "
95529559
+ "ON p.entityFQNHash = latest.entityFQNHash AND p.timestamp = latest.latestTs "
9553-
+ "WHERE p.extension = :extension")
9560+
+ "WHERE p.extension = :extension "
9561+
+ "AND p.entityFQNHash IN (<entityFQNHashes>)")
95549562
@RegisterRowMapper(LatestExtensionRecordMapper.class)
95559563
List<LatestExtensionRecord> getLatestExtensionsBatch(
95569564
@Define("table") String table,

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,7 @@ public class TableRepository extends EntityRepository<Table> {
154154
public static final String TABLE_COLUMN_EXTENSION = "table.column";
155155
public static final String TABLE_EXTENSION = "table.table";
156156
public static final String CUSTOM_METRICS_EXTENSION = "customMetrics.";
157+
public static final String COLUMN_EXTENSION_JSON_SCHEMA = "columnExtension";
157158
public static final String TABLE_PROFILER_CONFIG = "tableProfilerConfig";
158159
private static final ReadPrefetchKey PREFETCH_DEFAULT_FIELDS =
159160
ReadPrefetchKey.TABLE_DEFAULT_FIELDS;
@@ -2194,6 +2195,21 @@ private Object getColumnExtension(UUID tableId, String columnFQN) {
21942195
return null;
21952196
}
21962197

2198+
private Map<String, List<CustomMetric>> batchFetchCustomMetricsByColumn(UUID tableId) {
2199+
List<ExtensionRecord> records =
2200+
daoCollection
2201+
.entityExtensionDAO()
2202+
.getExtensions(tableId, CUSTOM_METRICS_EXTENSION + TABLE_COLUMN_EXTENSION);
2203+
Map<String, List<CustomMetric>> metricsByColumn = new HashMap<>();
2204+
for (ExtensionRecord record : records) {
2205+
CustomMetric metric = JsonUtils.readValue(record.extensionJson(), CustomMetric.class);
2206+
if (metric != null && metric.getColumnName() != null) {
2207+
metricsByColumn.computeIfAbsent(metric.getColumnName(), k -> new ArrayList<>()).add(metric);
2208+
}
2209+
}
2210+
return metricsByColumn;
2211+
}
2212+
21972213
private List<CustomMetric> getCustomMetrics(Table table, String columnName) {
21982214
String extension = columnName != null ? TABLE_COLUMN_EXTENSION : TABLE_EXTENSION;
21992215
extension = CUSTOM_METRICS_EXTENSION + extension;
@@ -2891,20 +2907,43 @@ private ResultList<Column> getTableColumnsInternal(
28912907
}
28922908

28932909
if (fieldsParam != null && fieldsParam.contains("customMetrics")) {
2910+
Map<String, List<CustomMetric>> metricsByColumn =
2911+
batchFetchCustomMetricsByColumn(table.getId());
28942912
for (Column column : paginatedColumns) {
2895-
column.setCustomMetrics(getCustomMetrics(table, column.getName()));
2913+
column.setCustomMetrics(metricsByColumn.getOrDefault(column.getName(), List.of()));
28962914
}
28972915
}
28982916

28992917
if (fieldsParam != null && fieldsParam.contains("extension")) {
2918+
List<ExtensionRecord> allColumnExtensions =
2919+
daoCollection
2920+
.entityExtensionDAO()
2921+
.getExtensionsByJsonSchema(table.getId(), COLUMN_EXTENSION_JSON_SCHEMA);
2922+
Map<String, Object> extensionByColumnHash = new HashMap<>();
2923+
for (ExtensionRecord record : allColumnExtensions) {
2924+
try {
2925+
extensionByColumnHash.put(
2926+
record.extensionName(), JsonUtils.readValue(record.extensionJson(), Object.class));
2927+
} catch (Exception e) {
2928+
LOG.warn(
2929+
"Failed to deserialize column extension for table {} extensionKey {}: {}",
2930+
table.getId(),
2931+
record.extensionName(),
2932+
e.getMessage());
2933+
}
2934+
}
29002935
for (Column column : paginatedColumns) {
2901-
column.setExtension(getColumnExtension(table.getId(), column.getFullyQualifiedName()));
2936+
column.setExtension(
2937+
extensionByColumnHash.get(
2938+
FullyQualifiedName.buildHash(column.getFullyQualifiedName())));
29022939
}
29032940
}
29042941

29052942
if (fieldsParam != null && fieldsParam.contains("profile")) {
29062943
setColumnProfile(paginatedColumns);
2907-
populateEntityFieldTags(entityType, paginatedColumns, table.getFullyQualifiedName(), true);
2944+
if (!fieldsParam.contains("tags")) {
2945+
populateEntityFieldTags(entityType, paginatedColumns, table.getFullyQualifiedName(), true);
2946+
}
29082947
paginatedColumns =
29092948
piiOwners != null
29102949
? PIIMasker.getTableProfile(piiOwners, paginatedColumns, authorizer, securityContext)
@@ -3227,8 +3266,10 @@ private ResultList<Column> searchTableColumnsInternal(
32273266

32283267
Fields fields = getFields(fieldsParam);
32293268
if (fields.contains("customMetrics") || fields.contains("*")) {
3269+
Map<String, List<CustomMetric>> metricsByColumn =
3270+
batchFetchCustomMetricsByColumn(table.getId());
32303271
for (Column column : paginatedResults) {
3231-
column.setCustomMetrics(getCustomMetrics(table, column.getName()));
3272+
column.setCustomMetrics(metricsByColumn.getOrDefault(column.getName(), List.of()));
32323273
}
32333274
}
32343275

0 commit comments

Comments
 (0)