feat(search): scope index resolution with fetchParentsAliases / fetchChildAliases flags#27762
feat(search): scope index resolution with fetchParentsAliases / fetchChildAliases flags#27762mohityadav766 wants to merge 26 commits intomainfrom
Conversation
…ope index resolution Stop relying on Elasticsearch alias expansion to resolve UI-supplied indexes. The UI now sends the alias `table` (instead of `table_search_index`), and ES matches every index whose alias list contains `table` -- including `column_search_index`, because tableColumn declares `table` as a parent. That bleeds column hits into table searches. The server now resolves the requested index against the indexMapping graph into an explicit list of actual `*_search_index` names, gated by two new flags: - `fetchParentsAliases` (default false): include indexes of entities listed as parents in this entity's parentAliases. - `fetchChildAliases` (default true): include indexes of entities that list this alias in their parentAliases. Default `true` preserves the legacy ES expansion behavior so existing callers don't change; the UI fixes the bug by passing `fetchChildAliases=false`. Wired through `/search/query`, `/search/export`, `/search/preview`, `/search/nlq/query`, `/search/fieldQuery`, `/search/aggregate` (GET + POST), and `/search/entityTypeCounts`. The resolver also handles compound aliases (`all`, `dataAsset`, `dataAssetEmbeddings`) by expanding them through the reverse parent/child map. Cluster-alias prefixing is preserved. Closes #27761
✅ TypeScript Types Auto-UpdatedThe generated TypeScript types have been automatically updated based on JSON schema changes in this PR. |
There was a problem hiding this comment.
Pull request overview
This PR fixes unintended cross-entity search results caused by Elasticsearch alias expansion by having the server resolve requested index aliases into an explicit list of concrete *_search_index names, controlled via two new flags: fetchParentsAliases and fetchChildAliases.
Changes:
- Adds
fetchParentsAliases/fetchChildAliasesrequest parameters (defaulting tofalse/true) to control parent/child alias expansion. - Implements server-side index resolution based on the
indexMapping.jsonparent/child alias graph, including reverse-map construction for O(1) child lookup. - Adds unit + integration tests covering alias resolution behavior and the table-vs-column regression.
Reviewed changes
Copilot reviewed 7 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-spec/src/main/resources/json/schema/search/searchRequest.json | Adds fetchParentsAliases / fetchChildAliases to the search request schema. |
| openmetadata-spec/src/main/resources/json/schema/search/aggregationRequest.json | Adds fetchParentsAliases / fetchChildAliases to the aggregation request schema. |
| openmetadata-spec/src/main/resources/json/schema/api/search/previewSearchRequest.json | Adds fetchParentsAliases / fetchChildAliases to the preview search request schema. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java | Builds reverse alias→children map and adds stateless resolver to expand aliases into explicit concrete indexes. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java | Wires new flags through search endpoints and uses repository resolver to compute the index list. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/SearchRepositoryAliasResolutionTest.java | New unit tests validating resolver behavior, compound aliases, prefixing, and reverse-map consistency. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchResourceIT.java | New integration tests validating the regression and flag propagation across endpoints. |
| String resolvedIndex = | ||
| searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases); | ||
| return searchRepository.searchByField(fieldName, fieldValue, resolvedIndex, deleted); |
There was a problem hiding this comment.
getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases) returns cluster-prefixed concrete index names. searchByField(...) is later executed by the search manager, which prefixes again via Entity.getSearchRepository().getIndexOrAliasName(index) (e.g., ElasticSearchSearchManager.searchByField / OpenSearchSearchManager.searchByField). With a non-empty clusterAlias, this will double-prefix (e.g., tenant_tenant_table_search_index) and break queries. Consider making SearchRepository.getIndexOrAliasName(String) idempotent (skip prefixing when already prefixed) or stop re-prefixing in the search managers for this call path, so the index is only prefixed once end-to-end.
| String resolvedIndex = | |
| searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases); | |
| return searchRepository.searchByField(fieldName, fieldValue, resolvedIndex, deleted); | |
| return searchRepository.searchByField(fieldName, fieldValue, index, deleted); |
| .withIndex( | ||
| searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases)) | ||
| .withFieldName(fieldName) | ||
| .withFieldValue(value) | ||
| .withSourceFields(SearchUtils.sourceFields(sourceFieldsParam)) | ||
| .withDeleted(deleted) | ||
| .withQueryText(queryText); | ||
| .withQueryText(queryText) | ||
| .withFetchParentsAliases(fetchParentsAliases) | ||
| .withFetchChildAliases(fetchChildAliases); |
There was a problem hiding this comment.
AggregationRequest.withIndex(searchRepository.getIndexOrAliasName(...)) sets a cluster-prefixed, resolved index list. The aggregation managers then call Entity.getSearchRepository().getIndexOrAliasName(request.getIndex()) again (see ElasticSearchAggregationManager.aggregate / OpenSearchAggregationManager.aggregate), which will double-prefix when clusterAlias is set and cause 404/missing-index failures. Either avoid prefixing here (let managers prefix once), or make SearchRepository.getIndexOrAliasName(String) idempotent / update managers to treat the request index as already resolved+prefixed.
| String resolvedIndex = | ||
| searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases); | ||
| return searchRepository.getEntityTypeCounts(request, resolvedIndex); |
There was a problem hiding this comment.
resolvedIndex is computed using searchRepository.getIndexOrAliasName(index, ...), which returns cluster-prefixed, resolved concrete indexes. getEntityTypeCounts then passes this to the aggregation manager, which again prefixes via Entity.getSearchRepository().getIndexOrAliasName(index) (see ElasticSearchAggregationManager.getEntityTypeCounts / OpenSearchAggregationManager.getEntityTypeCounts). With a non-empty clusterAlias this will double-prefix and break the request. Recommend ensuring the cluster alias is applied exactly once (idempotent prefixing or remove the second prefixing in managers).
| String resolvedIndex = | |
| searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases); | |
| return searchRepository.getEntityTypeCounts(request, resolvedIndex); | |
| // Do not pre-resolve the index here. The downstream entity type count path resolves | |
| // the index again, and passing an already cluster-prefixed concrete index can lead to | |
| // double-prefixing when a cluster alias is configured. | |
| return searchRepository.getEntityTypeCounts(request, index); |
| if (mapping != null) { | ||
| resolved.add(mapping.getIndexName(clusterAlias)); | ||
| if (fetchParents) { | ||
| addParentIndexes(mapping, entityIndexMap, clusterAlias, resolved); | ||
| } | ||
| if (fetchChildren) { | ||
| addChildIndexes(token, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved); | ||
| } | ||
| return; | ||
| } | ||
| if (fetchChildren) { | ||
| addChildIndexes(token, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved); | ||
| } | ||
| boolean alreadyExpanded = | ||
| fetchChildren | ||
| && aliasToChildEntityTypes != null | ||
| && aliasToChildEntityTypes.containsKey(token); | ||
| if (!alreadyExpanded) { | ||
| resolved.add(prefixWithClusterAlias(token, clusterAlias)); | ||
| } |
There was a problem hiding this comment.
resolveIndexes(...) applies the cluster prefix via mapping.getIndexName(clusterAlias) / prefixWithClusterAlias(...). Several downstream code paths (notably aggregation and fieldQuery) already call SearchRepository.getIndexOrAliasName(...) to apply the cluster alias, which will lead to double-prefixing unless prefixing is made idempotent. Consider either (1) returning unprefixed resolved names from resolveIndexes and applying the cluster alias in one consistent layer, or (2) updating getIndexOrAliasName(String) to detect and skip already-prefixed tokens.
| @Test | ||
| void defaultFlagsLeaveTokenUnchangedWhenAliasHasNoExpansion() { | ||
| String resolved = | ||
| SearchRepository.resolveIndexes( | ||
| "table", false, false, entityIndexMap, aliasToChildEntityTypes, ""); | ||
| assertEquals("table_search_index", resolved); | ||
| } |
There was a problem hiding this comment.
Test name defaultFlagsLeaveTokenUnchangedWhenAliasHasNoExpansion is misleading: it calls resolveIndexes(..., fetchParents=false, fetchChildren=false, ...), but the API default for fetchChildAliases is true. Consider renaming the test to reflect the actual flag values under test (e.g., explicitly “fetchChildrenFalse…”), or update it to exercise the real default behavior.
Flip the default so the requested index is honored as-is; callers must opt in to child-alias expansion explicitly. This makes the bug-fix path the default behavior instead of opt-in, which matches what the UI now relies on for index=table not to bleed in column hits. Also flips the integration test that asserted the old default to exercise the explicit fetchChildAliases=true opt-in path.
✅ TypeScript Types Auto-UpdatedThe generated TypeScript types have been automatically updated based on JSON schema changes in this PR. |
| @@ -512,7 +561,9 @@ public Response searchWithNLQ( | |||
| new SearchRequest() | |||
| .withQuery(nlqQuery) | |||
There was a problem hiding this comment.
@mohityadav766 not sure about this NLQ endpoint, but search bar + askCollate are currently using the one defined on collate at HybridSearchResource.java - Will need to update that one too
# Conflicts: # openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java
…etadata into fix-cluster-aliasing
🟡 Playwright Results — all passed (15 flaky)✅ 3967 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 86 skipped
🟡 15 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
…ctive string filters
Replace the boolean flags with comma-separated string filters so callers
can pin expansion to specific entity types instead of "all or nothing":
* / all / true — expand every parent or child in the alias graph
none / "" / false — no expansion (just the requested index)
table,topic — expand only the listed entity types
Defaults at the OSS API surface flip to ("none", "none") so a query for
{index=table} is scoped to table_search_index and doesn't bleed in
column docs. The Collate-side hybrid endpoints keep ("none", "*") as
their default since the chat / askCollate path scores across the whole
alias graph and that's what callers expect there.
Plumbed end-to-end:
- searchRequest / aggregationRequest / previewSearchRequest schemas
now type the fields as String with the new defaults.
- SearchRepository.getIndexOrAliasName(name, parents, children) and
SearchRepository.resolveIndexes take String filters; AliasFilter is
a small parsed-filter helper with NONE / ALL sentinels and a
Set-backed accepts() check, so the comma-list path stays O(1).
- SearchManagementClient default-method overload, ES/OS Search and
Aggregation managers, and ES/OS clients all carry the strings.
- SearchResource @QueryParam types switch to String with @DefaultValue
annotations matching the schema defaults.
- HybridSearchResource / HybridSearchService / NLQHybridSearchService
on the Collate side use Strings; their internal default delegations
pass ("none", "*") to honor the hybrid contract.
Legacy "true" / "false" tokens are still accepted (aliased to "*" / "none")
so any client still on the old boolean syntax doesn't break.
Tests:
- Unit: parameterised tests for none, *, named filter, mixed list,
legacy-boolean alias, comma-separated input.
- Integration: tests rewritten as comparison tests anchored on
explicit fetchChildAliases=* (must surface tableColumn) and the
new default (must drop tableColumn). Adds a named-filter IT that
pins the comma-list semantics: =tableColumn includes columns,
=topic does not.
…etadata into fix-cluster-aliasing
The 'true' / 'false' tokens were originally added thinking older clients might still send the boolean form, but those clients never existed — boolean fetchParentsAliases / fetchChildAliases only lived in earlier commits of this PR, never in a release. Strip the dead branches in AliasFilter.parse, the matching unit test, and the "Legacy true / false are aliased to */none" sentence from the schema and @parameter / Javadoc descriptions so readers don't infer there's external compatibility surface to preserve.
…etadata into fix-cluster-aliasing
| * reproduces under the legacy default (children expanded) and disappears under the explicit | ||
| * opt-out — a comparison test, not a vacuous "no columns" check that would pass on an empty | ||
| * response. |
There was a problem hiding this comment.
This test’s docstring says the bug “reproduces under the legacy default (children expanded)”, but the test actually opts into that behavior via fetchChildAliases=*. Consider updating the wording to avoid implying that the current default is “children expanded” (especially since the assertion below expects the default request to exclude tableColumn hits).
| * reproduces under the legacy default (children expanded) and disappears under the explicit | |
| * opt-out — a comparison test, not a vacuous "no columns" check that would pass on an empty | |
| * response. | |
| * reproduces when child alias expansion is explicitly enabled via | |
| * {@code fetchChildAliases=*}, and that the default request excludes those child hits — a | |
| * comparison test, not a vacuous "no columns" check that would pass on an empty response. |
| /** | ||
| * Reverse map: alias name -> entity types whose IndexMapping declares this alias as a parent. | ||
| * Lets us answer "what are the children of alias X?" in O(1). Built from indexMapping.json on | ||
| * load and used by {@link #getIndexOrAliasName(String, boolean, boolean)}. |
There was a problem hiding this comment.
The Javadoc references getIndexOrAliasName(String, boolean, boolean), but the added overload takes (String, String, String) (string filter tokens). This is a broken Javadoc link and misleading documentation. Update the reference to the correct overload/signature.
| * load and used by {@link #getIndexOrAliasName(String, boolean, boolean)}. | |
| * load and used by {@link #getIndexOrAliasName(String, String, String)}. |
| * parent / child entity indexes are also included — see {@link #aliasFilter(String)} for | ||
| * the accepted syntax ({@code *}, {@code none}, comma-separated entity types). |
There was a problem hiding this comment.
This Javadoc points to aliasFilter(String), but no such method exists (the parsing logic lives in AliasFilter.parse). Please fix the Javadoc reference so it resolves correctly and matches the implementation.
| * parent / child entity indexes are also included — see {@link #aliasFilter(String)} for | |
| * the accepted syntax ({@code *}, {@code none}, comma-separated entity types). | |
| * parent / child entity indexes are also included — see {@link AliasFilter#parse(String)} | |
| * for the accepted syntax ({@code *}, {@code none}, comma-separated entity types). |
| /** | ||
| * Variant that honors {@code fetchParentsAliases} / {@code fetchChildAliases} when resolving | ||
| * the supplied {@code index} alias. Lets the alias-graph traversal happen exactly once, at | ||
| * the manager boundary, instead of the resource pre-resolving and the manager re-prefixing. | ||
| * Filter syntax: {@code "*"} (or {@code "all"}) for every alias, {@code "none"} (or empty) for | ||
| * none, or a comma-separated list of specific entity types. Default implementation delegates | ||
| * to the legacy 4-arg signature so existing implementations pick up the conservative behavior | ||
| * (no expansion) until they implement the filter-aware path. | ||
| */ | ||
| default Response searchByField( | ||
| String fieldName, | ||
| String fieldValue, | ||
| String index, | ||
| Boolean deleted, | ||
| String fetchParentsAliases, | ||
| String fetchChildAliases) | ||
| throws IOException { | ||
| return searchByField(fieldName, fieldValue, index, deleted); | ||
| } | ||
|
|
There was a problem hiding this comment.
The Javadoc says the default implementation delegates to the legacy 4-arg searchByField so existing implementations keep the “children expanded, parents not” behavior. However, the updated managers route the legacy 4-arg signature to the new overload with "none"/"none", which disables expansion. Either update the Javadoc to reflect the actual default behavior or change the delegation/defaults if preserving legacy expansion is required.
✅ TypeScript Types Auto-UpdatedThe generated TypeScript types have been automatically updated based on JSON schema changes in this PR. |
# Conflicts: # openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java # openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java # openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java
…etadata into fix-cluster-aliasing
Code Review ✅ Approved 5 resolved / 5 findingsScoped index resolution via fetchParentsAliases and fetchChildAliases flags is now implemented. Addressed findings related to cluster-alias prefixing, documentation accuracy, idempotency, and null-coalescing logic. ✅ 5 resolved✅ Bug: Double cluster-alias prefixing in aggregate endpoints
✅ Quality: Javadoc claims method returns parsed JSON but signature is void
✅ Bug: Removing cluster-alias idempotency guard makes prefixing fragile
✅ Edge Case: getIndexOrAliasName can return empty string for all-empty input
✅ Quality: Asymmetric null-coalescing for fetchParentsAliases vs fetchChildAliases
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
| if (expandChildren) { | ||
| addChildIndexes( | ||
| token, childFilter, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved); | ||
| } | ||
| boolean alreadyExpanded = | ||
| expandChildren | ||
| && aliasToChildEntityTypes != null | ||
| && aliasToChildEntityTypes.containsKey(token); | ||
| if (!alreadyExpanded) { |
There was a problem hiding this comment.
In expandSingleAlias, a compound alias token (mapping == null) is treated as “alreadyExpanded” solely based on aliasToChildEntityTypes.containsKey(token). If fetchChildAliases is a named filter that matches none of that alias’s children, addChildIndexes(...) adds nothing, alreadyExpanded is still true, and the original token is dropped from the final index list whenever some other token resolves successfully. This can silently remove an explicitly requested token (e.g. index=dataAsset,table&fetchChildAliases=tableColumn would omit dataAsset). Consider tracking whether any child index was actually added for this token (e.g. compare resolved.size() before/after) and only suppress the fallback token when at least one child index was emitted.
| if (expandChildren) { | |
| addChildIndexes( | |
| token, childFilter, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved); | |
| } | |
| boolean alreadyExpanded = | |
| expandChildren | |
| && aliasToChildEntityTypes != null | |
| && aliasToChildEntityTypes.containsKey(token); | |
| if (!alreadyExpanded) { | |
| boolean childIndexesAdded = false; | |
| if (expandChildren) { | |
| int resolvedSizeBeforeChildren = resolved.size(); | |
| addChildIndexes( | |
| token, childFilter, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved); | |
| childIndexesAdded = resolved.size() > resolvedSizeBeforeChildren; | |
| } | |
| if (!childIndexesAdded) { |
| * The flag must also be honored on the streaming export endpoint. Comparison test: default | ||
| * export carries column rows; explicit fetchChildAliases=false drops them. |
There was a problem hiding this comment.
This Javadoc still refers to fetchChildAliases=false, but the implementation and tests use the new string-based syntax (none / * / comma-separated entity types). Update the comment to match the actual supported values to avoid confusing future maintainers and reviewers.
| * The flag must also be honored on the streaming export endpoint. Comparison test: default | |
| * export carries column rows; explicit fetchChildAliases=false drops them. | |
| * The child-alias filter must also be honored on the streaming export endpoint. Comparison | |
| * test: default export drops child rows, while explicit string-based filters such as | |
| * fetchChildAliases=* include them. Supported values use the current syntax: none, *, or a | |
| * comma-separated list of entity types. |
|
|
||
| /** | ||
| * The flag must propagate to /aggregate. Comparison test: default aggregation has a column | ||
| * bucket; explicit fetchChildAliases=false drops it. |
There was a problem hiding this comment.
This Javadoc mentions fetchChildAliases=false, but the endpoint uses the new string-based filter syntax (e.g. fetchChildAliases=none). Please update the comment so the documented behavior matches the actual API contract.
| * bucket; explicit fetchChildAliases=false drops it. | |
| * bucket; explicit fetchChildAliases=none drops it. |
|
|
|
did a small change via #27813 , this can be closed |



Summary
Closes #27761.
The UI now sends the alias
table(instead oftable_search_index), and Elasticsearch's alias expansion matches every index whose alias list containstable— includingcolumn_search_index, becausetableColumn.parentAliasescontainstableinindexMapping.json. Result: a UI search for "table" bleeds column hits back into the response. The same shape exists for every alias whose entities also act as a parent for some other entity (testCase,testSuite,database, …).This change stops relying on ES alias expansion. The server resolves the requested index against the
indexMapping.jsongraph into an explicit list of actual*_search_indexnames, gated by two new flags:fetchParentsAliasesfalsetrue, also include the indexes of entities listed in this entity'sparentAliases.fetchChildAliasestruetrue, also include the indexes of entities that list this alias in theirparentAliases. Defaulttruepreserves the legacy ES expansion behavior so existing callers don't break; the UI fixes the bug by passingfetchChildAliases=false.Wired through every search endpoint that takes an index/alias:
GET /v1/search/queryGET /v1/search/exportPOST /v1/search/previewGET /v1/search/nlq/queryGET /v1/search/fieldQueryGET /v1/search/aggregateandPOST /v1/search/aggregateGET /v1/search/entityTypeCountsThe resolver also handles compound aliases (
all,dataAsset,dataAssetEmbeddings) by expanding them through the reverse parent/child map (built once atloadIndexMappings()). Cluster-alias prefixing is preserved.Implementation notes
IndexMapping.jsonitself is unchanged — the existingparentAliases/childAliasesdeclarations drive the resolver.SearchRepository.loadIndexMappings()now also builds analiasToChildEntityTypesreverse map (alias→[entityType, …]for every entity that listsaliasas a parent), so child lookup is O(1).SearchRepository.resolveIndexes(...)so it's exercised directly in unit tests without bootstrapping a full repository.Test plan
mvn test -pl openmetadata-service -Dtest=SearchRepositoryAliasResolutionTest— new unit tests cover the default flags, child expansion, parent expansion, compound aliases (dataAsset), unknown tokens, cluster-alias prefixing, comma-separated tokens, and reverse-map consistency.mvn test -pl openmetadata-integration-tests -Dtest='SearchResourceIT#testFetchChildAliasesFalseExcludesColumns+testFetchChildAliasesDefaultIncludesColumns+testFetchParentsAliasesTrueIncludesParents+testFetchChildAliasesFalseOnExportEndpoint+testFetchChildAliasesFalseOnAggregate'— five new integration tests, including the regression for the table-vs-column bleed.🤖 Generated with Claude Code
Summary by Gitar
true/false) forfetchParentsAliasesandfetchChildAliasesacross documentation, API specs, and implementation code.SearchRepository.parseto strictly enforcenone/*/allsyntax for alias resolution by removing legacy boolean aliases.SearchManagementClientinterface documentation to reflect the removal of legacy boolean-token aliasing inAliasFilter.This will update automatically on new commits.