Skip to content

Make Selective Field Query during Reindexing#27723

Merged
pmbrull merged 8 commits intomainfrom
reduce-index
Apr 28, 2026
Merged

Make Selective Field Query during Reindexing#27723
pmbrull merged 8 commits intomainfrom
reduce-index

Conversation

@mohityadav766
Copy link
Copy Markdown
Member

@mohityadav766 mohityadav766 commented Apr 24, 2026

Fixes https://github.com/open-metadata/openmetadata-collate/issues/3848


Summary by Gitar

  • Performance Optimization:
    • Implemented selective field fetching during reindexing to prevent loading heavy, unneeded entity relationships.
    • Added getRequiredReindexFields() to the SearchIndex interface to define per-entity field requirements.
    • Updated EntityReader to prune database queries based on the minimal fields required for specific indexes.
  • Testing:
    • Added SearchIndexReindexFieldsParityTest to enforce contracts for field exclusions and prevent OOM risks.
    • Expanded SearchIndexFactoryTest to validate parity between index requirements and factory probes.

This will update automatically on new commits.

@mohityadav766 mohityadav766 self-assigned this Apr 24, 2026
Copilot AI review requested due to automatic review settings April 24, 2026 18:05
@mohityadav766 mohityadav766 changed the title Make Selective Field Query Make Selective Field Query during Reindexing Apr 24, 2026
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “selective fields” strategy for search reindex reads to avoid fetching large fan-out relationships (OOM risk), by letting each SearchIndex declare the minimal EntityRepository.setFields set required to build a correct ES document.

Changes:

  • Added SearchIndex#getRequiredReindexFields() and a shared COMMON_REINDEX_FIELDS baseline.
  • Implemented per-index required reindex field overrides and added factory probing (SearchIndexFactory#getReindexFieldsFor).
  • Updated the search-index reindex reader to request selective fields and added regression/contract tests around excluded fan-out behavior.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
openmetadata-service/src/test/java/org/openmetadata/service/search/SearchIndexReindexFieldsParityTest.java Adds contract tests ensuring excluded fan-out fields remain stripped and docs match expectations.
openmetadata-service/src/test/java/org/openmetadata/service/search/SearchIndexFactoryTest.java Adds tests for required reindex fields coverage/overrides and fan-out omissions.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/UserIndex.java Declares user-specific required reindex fields (teams/roles/inheritedRoles).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TestCaseIndex.java Declares test-case required reindex fields (suite/definition references).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TeamIndex.java Declares team required reindex fields (parents).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TableIndex.java Adds required reindex fields for table columns (but currently drops serviceType indexing).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SpreadsheetIndex.java Declares spreadsheet required reindex fields (worksheets).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SearchIndex.java Adds COMMON_REINDEX_FIELDS and default getRequiredReindexFields() contract.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PipelineIndex.java Declares pipeline required reindex fields (tasks).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/IngestionPipelineIndex.java Declares ingestion pipeline required reindex fields (pipelineStatuses).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/GlossaryTermIndex.java Declares glossary term required reindex fields (relatedTerms).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DatabaseIndex.java Declares database required reindex fields (usageSummary).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DashboardIndex.java Declares dashboard required reindex fields (charts).
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ContainerIndex.java Declares container required reindex fields (dataModel).
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexFactory.java Adds probing method to retrieve per-entity required reindex fields; makes pipeline execution index null-probe safe.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/EntityReader.java Switches reindex reads from * to selective required fields returned by the factory.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

🟡 Playwright Results — all passed (13 flaky)

✅ 3962 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 299 0 0 4
🟡 Shard 2 741 0 4 8
🟡 Shard 3 745 0 3 7
🟡 Shard 4 757 0 2 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 733 0 4 8
🟡 13 flaky test(s) (passed on retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 1 retry)
  • Features/LandingPageWidgets/DomainDataProductsWidgets.spec.ts › Domain asset count should update when assets are added (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/Table.spec.ts › Tags term should be consistent for search (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Remove users in persona should work properly (shard 3, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for SearchIndex (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Not_Set (shard 4, 1 retry)
  • Pages/Glossary.spec.ts › Glossary Term Update in Glossary Page should persist tree (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Admin user can get all the roles hierarchy and edit roles (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot AI review requested due to automatic review settings April 27, 2026 08:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Comment on lines +123 to +129
} catch (Exception e) {
LOG.warn(
"Failed to probe reindex fields for entity type {}; falling back to common set: {}",
entityType,
e.getMessage());
}
return SearchIndex.COMMON_REINDEX_FIELDS;
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getReindexFieldsFor uses LOG.warn(...), but this class is annotated with Lombok @Slf4j (which generates a log field) and there is no LOG logger defined here. This will not compile; switch to log.warn(...) (or add an explicit private static final Logger LOG = ...).

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +127
"Failed to probe reindex fields for entity type {}; falling back to common set: {}",
entityType,
e.getMessage());
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch block logs only e.getMessage() and drops the exception/stack trace, which can make it hard to diagnose why probing failed (especially if getMessage() is null). Consider logging the exception itself (e.g., as the last argument) so failures in field probing are actionable.

Suggested change
"Failed to probe reindex fields for entity type {}; falling back to common set: {}",
entityType,
e.getMessage());
"Failed to probe reindex fields for entity type {}; falling back to common set",
entityType,
e);

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 28, 2026 06:25
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 28, 2026

Code Review ✅ Approved

Implements selective field querying during the reindexing process to improve efficiency. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Comment on lines +110 to +130
/**
* Returns the minimal set of fields the reindex path must request from
* {@code EntityRepository.setFields} for the given entity type. Probes the corresponding
* index class via {@link #buildIndex(String, Object)} with a {@code null} entity and calls
* {@link SearchIndex#getRequiredReindexFields()}. Index constructors must be safe with a null
* entity for this probe to work — they are today because field declarations are static.
*/
public java.util.Set<String> getReindexFieldsFor(String entityType) {
try {
SearchIndex probe = buildIndex(entityType, null);
if (probe != null) {
return probe.getRequiredReindexFields();
}
} catch (Exception e) {
LOG.warn(
"Failed to probe reindex fields for entity type {}; falling back to common set: {}",
entityType,
e.getMessage());
}
return SearchIndex.COMMON_REINDEX_FIELDS;
}
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selective reindex fields are introduced via getReindexFieldsFor(...), but the primary single-server reindex path (SearchIndexExecutor) still hardcodes List.of("*") for entity reads. This means the performance/OOM mitigation won’t apply in the default reindex strategy unless that path is updated to use the factory-derived field set (similar to EntityReader).

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

@pmbrull pmbrull merged commit 385d589 into main Apr 28, 2026
59 of 61 checks passed
@pmbrull pmbrull deleted the reduce-index branch April 28, 2026 09:07
mohityadav766 added a commit that referenced this pull request Apr 29, 2026
* Make Selective Field Query

* Minor nit

* Fix Failing Tests

(cherry picked from commit 385d589)
mohityadav766 added a commit that referenced this pull request Apr 29, 2026
* Make Selective Field Query
* Minor nit
* Fix Failing Tests

(cherry picked from commit 385d589)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants