Make Selective Field Query during Reindexing#27723
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a “selective fields” strategy for search reindex reads to avoid fetching large fan-out relationships (OOM risk), by letting each SearchIndex declare the minimal EntityRepository.setFields set required to build a correct ES document.
Changes:
- Added
SearchIndex#getRequiredReindexFields()and a sharedCOMMON_REINDEX_FIELDSbaseline. - Implemented per-index required reindex field overrides and added factory probing (
SearchIndexFactory#getReindexFieldsFor). - Updated the search-index reindex reader to request selective fields and added regression/contract tests around excluded fan-out behavior.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/src/test/java/org/openmetadata/service/search/SearchIndexReindexFieldsParityTest.java | Adds contract tests ensuring excluded fan-out fields remain stripped and docs match expectations. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/SearchIndexFactoryTest.java | Adds tests for required reindex fields coverage/overrides and fan-out omissions. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/UserIndex.java | Declares user-specific required reindex fields (teams/roles/inheritedRoles). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TestCaseIndex.java | Declares test-case required reindex fields (suite/definition references). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TeamIndex.java | Declares team required reindex fields (parents). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TableIndex.java | Adds required reindex fields for table columns (but currently drops serviceType indexing). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SpreadsheetIndex.java | Declares spreadsheet required reindex fields (worksheets). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SearchIndex.java | Adds COMMON_REINDEX_FIELDS and default getRequiredReindexFields() contract. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PipelineIndex.java | Declares pipeline required reindex fields (tasks). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/IngestionPipelineIndex.java | Declares ingestion pipeline required reindex fields (pipelineStatuses). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/GlossaryTermIndex.java | Declares glossary term required reindex fields (relatedTerms). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DatabaseIndex.java | Declares database required reindex fields (usageSummary). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DashboardIndex.java | Declares dashboard required reindex fields (charts). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ContainerIndex.java | Declares container required reindex fields (dataModel). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexFactory.java | Adds probing method to retrieve per-entity required reindex fields; makes pipeline execution index null-probe safe. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/EntityReader.java | Switches reindex reads from * to selective required fields returned by the factory. |
🟡 Playwright Results — all passed (13 flaky)✅ 3962 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 86 skipped
🟡 13 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
| } catch (Exception e) { | ||
| LOG.warn( | ||
| "Failed to probe reindex fields for entity type {}; falling back to common set: {}", | ||
| entityType, | ||
| e.getMessage()); | ||
| } | ||
| return SearchIndex.COMMON_REINDEX_FIELDS; |
There was a problem hiding this comment.
getReindexFieldsFor uses LOG.warn(...), but this class is annotated with Lombok @Slf4j (which generates a log field) and there is no LOG logger defined here. This will not compile; switch to log.warn(...) (or add an explicit private static final Logger LOG = ...).
| "Failed to probe reindex fields for entity type {}; falling back to common set: {}", | ||
| entityType, | ||
| e.getMessage()); |
There was a problem hiding this comment.
The catch block logs only e.getMessage() and drops the exception/stack trace, which can make it hard to diagnose why probing failed (especially if getMessage() is null). Consider logging the exception itself (e.g., as the last argument) so failures in field probing are actionable.
| "Failed to probe reindex fields for entity type {}; falling back to common set: {}", | |
| entityType, | |
| e.getMessage()); | |
| "Failed to probe reindex fields for entity type {}; falling back to common set", | |
| entityType, | |
| e); |
Code Review ✅ ApprovedImplements selective field querying during the reindexing process to improve efficiency. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
| /** | ||
| * Returns the minimal set of fields the reindex path must request from | ||
| * {@code EntityRepository.setFields} for the given entity type. Probes the corresponding | ||
| * index class via {@link #buildIndex(String, Object)} with a {@code null} entity and calls | ||
| * {@link SearchIndex#getRequiredReindexFields()}. Index constructors must be safe with a null | ||
| * entity for this probe to work — they are today because field declarations are static. | ||
| */ | ||
| public java.util.Set<String> getReindexFieldsFor(String entityType) { | ||
| try { | ||
| SearchIndex probe = buildIndex(entityType, null); | ||
| if (probe != null) { | ||
| return probe.getRequiredReindexFields(); | ||
| } | ||
| } catch (Exception e) { | ||
| LOG.warn( | ||
| "Failed to probe reindex fields for entity type {}; falling back to common set: {}", | ||
| entityType, | ||
| e.getMessage()); | ||
| } | ||
| return SearchIndex.COMMON_REINDEX_FIELDS; | ||
| } |
There was a problem hiding this comment.
Selective reindex fields are introduced via getReindexFieldsFor(...), but the primary single-server reindex path (SearchIndexExecutor) still hardcodes List.of("*") for entity reads. This means the performance/OOM mitigation won’t apply in the default reindex strategy unless that path is updated to use the factory-derived field set (similar to EntityReader).
|
* Make Selective Field Query * Minor nit * Fix Failing Tests (cherry picked from commit 385d589)
* Make Selective Field Query * Minor nit * Fix Failing Tests (cherry picked from commit 385d589) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>



Fixes https://github.com/open-metadata/openmetadata-collate/issues/3848
Summary by Gitar
getRequiredReindexFields()to theSearchIndexinterface to define per-entity field requirements.EntityReaderto prune database queries based on the minimal fields required for specific indexes.SearchIndexReindexFieldsParityTestto enforce contracts for field exclusions and prevent OOM risks.SearchIndexFactoryTestto validate parity between index requirements and factory probes.This will update automatically on new commits.