Skip to content

feat(search): scope index resolution with fetchParentsAliases / fetchChildAliases flags#27762

Closed
mohityadav766 wants to merge 26 commits intomainfrom
fix-cluster-aliasing
Closed

feat(search): scope index resolution with fetchParentsAliases / fetchChildAliases flags#27762
mohityadav766 wants to merge 26 commits intomainfrom
fix-cluster-aliasing

Conversation

@mohityadav766
Copy link
Copy Markdown
Member

@mohityadav766 mohityadav766 commented Apr 27, 2026

Summary

Closes #27761.

The UI now sends the alias table (instead of table_search_index), and Elasticsearch's alias expansion matches every index whose alias list contains table — including column_search_index, because tableColumn.parentAliases contains table in indexMapping.json. Result: a UI search for "table" bleeds column hits back into the response. The same shape exists for every alias whose entities also act as a parent for some other entity (testCase, testSuite, database, …).

This change stops relying on ES alias expansion. The server resolves the requested index against the indexMapping.json graph into an explicit list of actual *_search_index names, gated by two new flags:

Param Default Meaning
fetchParentsAliases false If true, also include the indexes of entities listed in this entity's parentAliases.
fetchChildAliases true If true, also include the indexes of entities that list this alias in their parentAliases. Default true preserves the legacy ES expansion behavior so existing callers don't break; the UI fixes the bug by passing fetchChildAliases=false.

Wired through every search endpoint that takes an index/alias:

  • GET /v1/search/query
  • GET /v1/search/export
  • POST /v1/search/preview
  • GET /v1/search/nlq/query
  • GET /v1/search/fieldQuery
  • GET /v1/search/aggregate and POST /v1/search/aggregate
  • GET /v1/search/entityTypeCounts

The resolver also handles compound aliases (all, dataAsset, dataAssetEmbeddings) by expanding them through the reverse parent/child map (built once at loadIndexMappings()). Cluster-alias prefixing is preserved.

Implementation notes

  • IndexMapping.json itself is unchanged — the existing parentAliases / childAliases declarations drive the resolver.
  • SearchRepository.loadIndexMappings() now also builds an aliasToChildEntityTypes reverse map (alias[entityType, …] for every entity that lists alias as a parent), so child lookup is O(1).
  • The actual resolution lives in a static helper SearchRepository.resolveIndexes(...) so it's exercised directly in unit tests without bootstrapping a full repository.

Test plan

  • mvn test -pl openmetadata-service -Dtest=SearchRepositoryAliasResolutionTest — new unit tests cover the default flags, child expansion, parent expansion, compound aliases (dataAsset), unknown tokens, cluster-alias prefixing, comma-separated tokens, and reverse-map consistency.
  • mvn test -pl openmetadata-integration-tests -Dtest='SearchResourceIT#testFetchChildAliasesFalseExcludesColumns+testFetchChildAliasesDefaultIncludesColumns+testFetchParentsAliasesTrueIncludesParents+testFetchChildAliasesFalseOnExportEndpoint+testFetchChildAliasesFalseOnAggregate' — five new integration tests, including the regression for the table-vs-column bleed.
  • Smoke against a running server:
    curl '…/v1/search/query?q=*&index=table&fetchChildAliases=false&size=20' \
      | jq '[.hits.hits[]._source.entityType] | unique'
    # Expect: ["table"] — no "column" entries

🤖 Generated with Claude Code


Summary by Gitar

  • Cleanup:
    • Removed legacy boolean support (true/false) for fetchParentsAliases and fetchChildAliases across documentation, API specs, and implementation code.
  • Refactoring:
    • Updated SearchRepository.parse to strictly enforce none/*/all syntax for alias resolution by removing legacy boolean aliases.
    • Simplified SearchManagementClient interface documentation to reflect the removal of legacy boolean-token aliasing in AliasFilter.

This will update automatically on new commits.

…ope index resolution

Stop relying on Elasticsearch alias expansion to resolve UI-supplied indexes.
The UI now sends the alias `table` (instead of `table_search_index`), and ES
matches every index whose alias list contains `table` -- including
`column_search_index`, because tableColumn declares `table` as a parent. That
bleeds column hits into table searches.

The server now resolves the requested index against the indexMapping graph
into an explicit list of actual `*_search_index` names, gated by two new
flags:

- `fetchParentsAliases` (default false): include indexes of entities listed
  as parents in this entity's parentAliases.
- `fetchChildAliases` (default true): include indexes of entities that list
  this alias in their parentAliases. Default `true` preserves the legacy ES
  expansion behavior so existing callers don't change; the UI fixes the bug
  by passing `fetchChildAliases=false`.

Wired through `/search/query`, `/search/export`, `/search/preview`,
`/search/nlq/query`, `/search/fieldQuery`, `/search/aggregate` (GET + POST),
and `/search/entityTypeCounts`. The resolver also handles compound aliases
(`all`, `dataAsset`, `dataAssetEmbeddings`) by expanding them through the
reverse parent/child map. Cluster-alias prefixing is preserved.

Closes #27761
Copilot AI review requested due to automatic review settings April 27, 2026 11:35
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

@github-actions github-actions Bot requested a review from a team as a code owner April 27, 2026 11:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes unintended cross-entity search results caused by Elasticsearch alias expansion by having the server resolve requested index aliases into an explicit list of concrete *_search_index names, controlled via two new flags: fetchParentsAliases and fetchChildAliases.

Changes:

  • Adds fetchParentsAliases / fetchChildAliases request parameters (defaulting to false / true) to control parent/child alias expansion.
  • Implements server-side index resolution based on the indexMapping.json parent/child alias graph, including reverse-map construction for O(1) child lookup.
  • Adds unit + integration tests covering alias resolution behavior and the table-vs-column regression.

Reviewed changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
openmetadata-spec/src/main/resources/json/schema/search/searchRequest.json Adds fetchParentsAliases / fetchChildAliases to the search request schema.
openmetadata-spec/src/main/resources/json/schema/search/aggregationRequest.json Adds fetchParentsAliases / fetchChildAliases to the aggregation request schema.
openmetadata-spec/src/main/resources/json/schema/api/search/previewSearchRequest.json Adds fetchParentsAliases / fetchChildAliases to the preview search request schema.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java Builds reverse alias→children map and adds stateless resolver to expand aliases into explicit concrete indexes.
openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java Wires new flags through search endpoints and uses repository resolver to compute the index list.
openmetadata-service/src/test/java/org/openmetadata/service/search/SearchRepositoryAliasResolutionTest.java New unit tests validating resolver behavior, compound aliases, prefixing, and reverse-map consistency.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchResourceIT.java New integration tests validating the regression and flag propagation across endpoints.

Comment on lines +654 to +656
String resolvedIndex =
searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases);
return searchRepository.searchByField(fieldName, fieldValue, resolvedIndex, deleted);
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases) returns cluster-prefixed concrete index names. searchByField(...) is later executed by the search manager, which prefixes again via Entity.getSearchRepository().getIndexOrAliasName(index) (e.g., ElasticSearchSearchManager.searchByField / OpenSearchSearchManager.searchByField). With a non-empty clusterAlias, this will double-prefix (e.g., tenant_tenant_table_search_index) and break queries. Consider making SearchRepository.getIndexOrAliasName(String) idempotent (skip prefixing when already prefixed) or stop re-prefixing in the search managers for this call path, so the index is only prefixed once end-to-end.

Suggested change
String resolvedIndex =
searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases);
return searchRepository.searchByField(fieldName, fieldValue, resolvedIndex, deleted);
return searchRepository.searchByField(fieldName, fieldValue, index, deleted);

Copilot uses AI. Check for mistakes.
Comment on lines +752 to +760
.withIndex(
searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases))
.withFieldName(fieldName)
.withFieldValue(value)
.withSourceFields(SearchUtils.sourceFields(sourceFieldsParam))
.withDeleted(deleted)
.withQueryText(queryText);
.withQueryText(queryText)
.withFetchParentsAliases(fetchParentsAliases)
.withFetchChildAliases(fetchChildAliases);
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AggregationRequest.withIndex(searchRepository.getIndexOrAliasName(...)) sets a cluster-prefixed, resolved index list. The aggregation managers then call Entity.getSearchRepository().getIndexOrAliasName(request.getIndex()) again (see ElasticSearchAggregationManager.aggregate / OpenSearchAggregationManager.aggregate), which will double-prefix when clusterAlias is set and cause 404/missing-index failures. Either avoid prefixing here (let managers prefix once), or make SearchRepository.getIndexOrAliasName(String) idempotent / update managers to treat the request index as already resolved+prefixed.

Copilot uses AI. Check for mistakes.
Comment on lines +866 to +868
String resolvedIndex =
searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases);
return searchRepository.getEntityTypeCounts(request, resolvedIndex);
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolvedIndex is computed using searchRepository.getIndexOrAliasName(index, ...), which returns cluster-prefixed, resolved concrete indexes. getEntityTypeCounts then passes this to the aggregation manager, which again prefixes via Entity.getSearchRepository().getIndexOrAliasName(index) (see ElasticSearchAggregationManager.getEntityTypeCounts / OpenSearchAggregationManager.getEntityTypeCounts). With a non-empty clusterAlias this will double-prefix and break the request. Recommend ensuring the cluster alias is applied exactly once (idempotent prefixing or remove the second prefixing in managers).

Suggested change
String resolvedIndex =
searchRepository.getIndexOrAliasName(index, fetchParentsAliases, fetchChildAliases);
return searchRepository.getEntityTypeCounts(request, resolvedIndex);
// Do not pre-resolve the index here. The downstream entity type count path resolves
// the index again, and passing an already cluster-prefixed concrete index can lead to
// double-prefixing when a cluster alias is configured.
return searchRepository.getEntityTypeCounts(request, index);

Copilot uses AI. Check for mistakes.
Comment on lines +591 to +610
if (mapping != null) {
resolved.add(mapping.getIndexName(clusterAlias));
if (fetchParents) {
addParentIndexes(mapping, entityIndexMap, clusterAlias, resolved);
}
if (fetchChildren) {
addChildIndexes(token, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved);
}
return;
}
if (fetchChildren) {
addChildIndexes(token, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved);
}
boolean alreadyExpanded =
fetchChildren
&& aliasToChildEntityTypes != null
&& aliasToChildEntityTypes.containsKey(token);
if (!alreadyExpanded) {
resolved.add(prefixWithClusterAlias(token, clusterAlias));
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolveIndexes(...) applies the cluster prefix via mapping.getIndexName(clusterAlias) / prefixWithClusterAlias(...). Several downstream code paths (notably aggregation and fieldQuery) already call SearchRepository.getIndexOrAliasName(...) to apply the cluster alias, which will lead to double-prefixing unless prefixing is made idempotent. Consider either (1) returning unprefixed resolved names from resolveIndexes and applying the cluster alias in one consistent layer, or (2) updating getIndexOrAliasName(String) to detect and skip already-prefixed tokens.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +46
@Test
void defaultFlagsLeaveTokenUnchangedWhenAliasHasNoExpansion() {
String resolved =
SearchRepository.resolveIndexes(
"table", false, false, entityIndexMap, aliasToChildEntityTypes, "");
assertEquals("table_search_index", resolved);
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test name defaultFlagsLeaveTokenUnchangedWhenAliasHasNoExpansion is misleading: it calls resolveIndexes(..., fetchParents=false, fetchChildren=false, ...), but the API default for fetchChildAliases is true. Consider renaming the test to reflect the actual flag values under test (e.g., explicitly “fetchChildrenFalse…”), or update it to exercise the real default behavior.

Copilot uses AI. Check for mistakes.
mohityadav766 and others added 2 commits April 27, 2026 17:20
Flip the default so the requested index is honored as-is; callers must
opt in to child-alias expansion explicitly. This makes the bug-fix path
the default behavior instead of opt-in, which matches what the UI now
relies on for index=table not to bleed in column hits.

Also flips the integration test that asserted the old default to
exercise the explicit fetchChildAliases=true opt-in path.
@github-actions
Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

Copilot AI review requested due to automatic review settings April 27, 2026 11:56
@mohityadav766 mohityadav766 review requested due to automatic review settings April 27, 2026 11:56
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 61%
62.06% (62281/100356) 42.24% (33375/79007) 45.29% (9896/21850)

@@ -512,7 +561,9 @@ public Response searchWithNLQ(
new SearchRequest()
.withQuery(nlqQuery)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohityadav766 not sure about this NLQ endpoint, but search bar + askCollate are currently using the one defined on collate at HybridSearchResource.java - Will need to update that one too

# Conflicts:
#	openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java
Copilot AI review requested due to automatic review settings April 27, 2026 16:30
@mohityadav766 mohityadav766 self-assigned this Apr 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 10 changed files in this pull request and generated 6 comments.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

🟡 Playwright Results — all passed (15 flaky)

✅ 3967 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 740 0 5 8
🟡 Shard 3 752 0 3 7
✅ Shard 4 759 0 0 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 731 0 6 8
🟡 15 flaky test(s) (passed on retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/BulkEditEntity.spec.ts › Table (shard 2, 1 retry)
  • Features/BulkImport.spec.ts › Keyboard Delete selection (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/UserProfileOnlineStatus.spec.ts › Should not show online status for inactive users (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/Glossary.spec.ts › Create glossary with all optional fields (tags, owners, reviewers, domain) (shard 6, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for dashboard -> searchIndex (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

mohityadav766 and others added 3 commits April 29, 2026 11:29
…ctive string filters

Replace the boolean flags with comma-separated string filters so callers
can pin expansion to specific entity types instead of "all or nothing":

  *  / all  / true   — expand every parent or child in the alias graph
  none / "" / false  — no expansion (just the requested index)
  table,topic        — expand only the listed entity types

Defaults at the OSS API surface flip to ("none", "none") so a query for
{index=table} is scoped to table_search_index and doesn't bleed in
column docs. The Collate-side hybrid endpoints keep ("none", "*") as
their default since the chat / askCollate path scores across the whole
alias graph and that's what callers expect there.

Plumbed end-to-end:
  - searchRequest / aggregationRequest / previewSearchRequest schemas
    now type the fields as String with the new defaults.
  - SearchRepository.getIndexOrAliasName(name, parents, children) and
    SearchRepository.resolveIndexes take String filters; AliasFilter is
    a small parsed-filter helper with NONE / ALL sentinels and a
    Set-backed accepts() check, so the comma-list path stays O(1).
  - SearchManagementClient default-method overload, ES/OS Search and
    Aggregation managers, and ES/OS clients all carry the strings.
  - SearchResource @QueryParam types switch to String with @DefaultValue
    annotations matching the schema defaults.
  - HybridSearchResource / HybridSearchService / NLQHybridSearchService
    on the Collate side use Strings; their internal default delegations
    pass ("none", "*") to honor the hybrid contract.

Legacy "true" / "false" tokens are still accepted (aliased to "*" / "none")
so any client still on the old boolean syntax doesn't break.

Tests:
  - Unit: parameterised tests for none, *, named filter, mixed list,
    legacy-boolean alias, comma-separated input.
  - Integration: tests rewritten as comparison tests anchored on
    explicit fetchChildAliases=* (must surface tableColumn) and the
    new default (must drop tableColumn). Adds a named-filter IT that
    pins the comma-list semantics: =tableColumn includes columns,
    =topic does not.
Copilot AI review requested due to automatic review settings April 29, 2026 06:03
The 'true' / 'false' tokens were originally added thinking older clients
might still send the boolean form, but those clients never existed —
boolean fetchParentsAliases / fetchChildAliases only lived in earlier
commits of this PR, never in a release. Strip the dead branches in
AliasFilter.parse, the matching unit test, and the "Legacy true / false
are aliased to */none" sentence from the schema and @parameter / Javadoc
descriptions so readers don't infer there's external compatibility
surface to preserve.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 17 changed files in this pull request and generated 4 comments.

Comment on lines +1983 to +1985
* reproduces under the legacy default (children expanded) and disappears under the explicit
* opt-out — a comparison test, not a vacuous "no columns" check that would pass on an empty
* response.
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s docstring says the bug “reproduces under the legacy default (children expanded)”, but the test actually opts into that behavior via fetchChildAliases=*. Consider updating the wording to avoid implying that the current default is “children expanded” (especially since the assertion below expects the default request to exclude tableColumn hits).

Suggested change
* reproduces under the legacy default (children expanded) and disappears under the explicit
* opt-outa comparison test, not a vacuous "no columns" check that would pass on an empty
* response.
* reproduces when child alias expansion is explicitly enabled via
* {@code fetchChildAliases=*}, and that the default request excludes those child hitsa
* comparison test, not a vacuous "no columns" check that would pass on an empty response.

Copilot uses AI. Check for mistakes.
/**
* Reverse map: alias name -> entity types whose IndexMapping declares this alias as a parent.
* Lets us answer "what are the children of alias X?" in O(1). Built from indexMapping.json on
* load and used by {@link #getIndexOrAliasName(String, boolean, boolean)}.
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc references getIndexOrAliasName(String, boolean, boolean), but the added overload takes (String, String, String) (string filter tokens). This is a broken Javadoc link and misleading documentation. Update the reference to the correct overload/signature.

Suggested change
* load and used by {@link #getIndexOrAliasName(String, boolean, boolean)}.
* load and used by {@link #getIndexOrAliasName(String, String, String)}.

Copilot uses AI. Check for mistakes.
Comment on lines +702 to +703
* parent / child entity indexes are also included — see {@link #aliasFilter(String)} for
* the accepted syntax ({@code *}, {@code none}, comma-separated entity types).
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Javadoc points to aliasFilter(String), but no such method exists (the parsing logic lives in AliasFilter.parse). Please fix the Javadoc reference so it resolves correctly and matches the implementation.

Suggested change
* parent / child entity indexes are also includedsee {@link #aliasFilter(String)} for
* the accepted syntax ({@code *}, {@code none}, comma-separated entity types).
* parent / child entity indexes are also includedsee {@link AliasFilter#parse(String)}
* for the accepted syntax ({@code *}, {@code none}, comma-separated entity types).

Copilot uses AI. Check for mistakes.
Comment on lines +76 to +95
/**
* Variant that honors {@code fetchParentsAliases} / {@code fetchChildAliases} when resolving
* the supplied {@code index} alias. Lets the alias-graph traversal happen exactly once, at
* the manager boundary, instead of the resource pre-resolving and the manager re-prefixing.
* Filter syntax: {@code "*"} (or {@code "all"}) for every alias, {@code "none"} (or empty) for
* none, or a comma-separated list of specific entity types. Default implementation delegates
* to the legacy 4-arg signature so existing implementations pick up the conservative behavior
* (no expansion) until they implement the filter-aware path.
*/
default Response searchByField(
String fieldName,
String fieldValue,
String index,
Boolean deleted,
String fetchParentsAliases,
String fetchChildAliases)
throws IOException {
return searchByField(fieldName, fieldValue, index, deleted);
}

Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc says the default implementation delegates to the legacy 4-arg searchByField so existing implementations keep the “children expanded, parents not” behavior. However, the updated managers route the legacy 4-arg signature to the new overload with "none"/"none", which disables expansion. Either update the Javadoc to reflect the actual default behavior or change the delegation/defaults if preserving legacy expansion is required.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

# Conflicts:
#	openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java
#	openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java
#	openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java
Copilot AI review requested due to automatic review settings April 29, 2026 07:02
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 29, 2026

Code Review ✅ Approved 5 resolved / 5 findings

Scoped index resolution via fetchParentsAliases and fetchChildAliases flags is now implemented. Addressed findings related to cluster-alias prefixing, documentation accuracy, idempotency, and null-coalescing logic.

✅ 5 resolved
Bug: Double cluster-alias prefixing in aggregate endpoints

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java:752-753 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java:789-791
Both ElasticSearchAggregationManager.aggregate() and OpenSearchAggregationManager.aggregate() internally call the old single-argument getIndexOrAliasName(request.getIndex()), which prepends clusterAlias_ to every token. However, SearchResource already resolves the index via the new 3-arg getIndexOrAliasName(index, fetchParents, fetchChildren), which also applies the cluster prefix (via IndexMapping.getIndexName(clusterAlias)). When clusterAlias is non-empty (multi-tenant deployments), the final index name becomes tenant_tenant_table_search_index — a non-existent index, causing every aggregate query to fail.

The same pattern applies to both the GET /v1/search/aggregate (line 753) and the POST /v1/search/aggregate (line 789-791) code paths in SearchResource.

Quality: Javadoc claims method returns parsed JSON but signature is void

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchResourceIT.java:1951-1955
The Javadoc for awaitTableAndColumnIndexed states "Returns the parsed JSON of the satisfying response so callers can reuse it", but the method signature is private void. Either update the doc to match the void return type, or change the method to return the JsonNode if callers would benefit from reuse.

Bug: Removing cluster-alias idempotency guard makes prefixing fragile

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:672-674 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:806-809 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:816-818
The previous code in getIndexOrAliasName(String), prefixWithClusterAlias, and prefixCommaList guarded against double-prefixing with token.startsWith(prefix) ? token : prefix + token. This commit removes that guard, unconditionally prepending clusterAlias + ":". While current call sites appear safe (the 1-arg method is only called on raw constants like GLOBAL_SEARCH_ALIAS, and the 3-arg version is called exactly once at the manager boundary), this makes the API fragile: any future caller that accidentally passes an already-resolved value will silently produce tenant:tenant:index. The guard was cheap (one startsWith check) and the commit message itself mentions preventing double-prefixing as a goal.

Edge Case: getIndexOrAliasName can return empty string for all-empty input

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:679-683 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:836
When clusterAlias is set and the input is nothing but commas/whitespace (e.g. "," or ", ,"), getIndexOrAliasName (line 679-683) will return an empty string after filtering. In contrast, prefixCommaList (line 836) explicitly handles this case by returning the original input unchanged. While this is an unlikely edge case, the inconsistency could surface a confusing empty-index ES error if the degenerate input ever reaches this code path.

Quality: Asymmetric null-coalescing for fetchParentsAliases vs fetchChildAliases

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java:448-451 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchAggregationManager.java:95-96 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchAggregationManager.java:630-631 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchAggregationManager.java:94-95 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchAggregationManager.java:519-520
Throughout the aggregation managers and previewSearch, fetchChildAliases is defensively null-coalesced (== null ? "none" : ...) but fetchParentsAliases is passed directly to getIndexOrAliasName. While AliasFilter.parse(null) does return NONE (so there's no functional bug), the inconsistency makes the intent unclear and is a maintenance hazard — a future refactor of parse() could break the null-is-safe assumption.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 17 changed files in this pull request and generated 3 comments.

Comment on lines +826 to +834
if (expandChildren) {
addChildIndexes(
token, childFilter, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved);
}
boolean alreadyExpanded =
expandChildren
&& aliasToChildEntityTypes != null
&& aliasToChildEntityTypes.containsKey(token);
if (!alreadyExpanded) {
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In expandSingleAlias, a compound alias token (mapping == null) is treated as “alreadyExpanded” solely based on aliasToChildEntityTypes.containsKey(token). If fetchChildAliases is a named filter that matches none of that alias’s children, addChildIndexes(...) adds nothing, alreadyExpanded is still true, and the original token is dropped from the final index list whenever some other token resolves successfully. This can silently remove an explicitly requested token (e.g. index=dataAsset,table&fetchChildAliases=tableColumn would omit dataAsset). Consider tracking whether any child index was actually added for this token (e.g. compare resolved.size() before/after) and only suppress the fallback token when at least one child index was emitted.

Suggested change
if (expandChildren) {
addChildIndexes(
token, childFilter, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved);
}
boolean alreadyExpanded =
expandChildren
&& aliasToChildEntityTypes != null
&& aliasToChildEntityTypes.containsKey(token);
if (!alreadyExpanded) {
boolean childIndexesAdded = false;
if (expandChildren) {
int resolvedSizeBeforeChildren = resolved.size();
addChildIndexes(
token, childFilter, entityIndexMap, aliasToChildEntityTypes, clusterAlias, resolved);
childIndexesAdded = resolved.size() > resolvedSizeBeforeChildren;
}
if (!childIndexesAdded) {

Copilot uses AI. Check for mistakes.
Comment on lines +2094 to +2095
* The flag must also be honored on the streaming export endpoint. Comparison test: default
* export carries column rows; explicit fetchChildAliases=false drops them.
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Javadoc still refers to fetchChildAliases=false, but the implementation and tests use the new string-based syntax (none / * / comma-separated entity types). Update the comment to match the actual supported values to avoid confusing future maintainers and reviewers.

Suggested change
* The flag must also be honored on the streaming export endpoint. Comparison test: default
* export carries column rows; explicit fetchChildAliases=false drops them.
* The child-alias filter must also be honored on the streaming export endpoint. Comparison
* test: default export drops child rows, while explicit string-based filters such as
* fetchChildAliases=* include them. Supported values use the current syntax: none, *, or a
* comma-separated list of entity types.

Copilot uses AI. Check for mistakes.

/**
* The flag must propagate to /aggregate. Comparison test: default aggregation has a column
* bucket; explicit fetchChildAliases=false drops it.
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Javadoc mentions fetchChildAliases=false, but the endpoint uses the new string-based filter syntax (e.g. fetchChildAliases=none). Please update the comment so the documented behavior matches the actual API contract.

Suggested change
* bucket; explicit fetchChildAliases=false drops it.
* bucket; explicit fetchChildAliases=none drops it.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

@mohityadav766
Copy link
Copy Markdown
Member Author

did a small change via #27813 , this can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with Aliases in Search

3 participants