fix(search): enforce ContextMemory visibility at index time, not per-query by harshach · Pull Request #29541 · open-metadata/OpenMetadata

harshach · 2026-06-26T19:04:06Z

Describe your changes:

This addresses a security finding from the review of #29384: Private/Shared ContextMemory could still leak through search paths that the per-query applyContextMemoryVisibility filter didn't cover. Instead of filtering on every search path (fragile — the review found three uncovered ones), the rule is enforced once at index time: only org-wide (ENTITY) memories are written to the search index. ContextMemoryRepository.isSearchIndexable gates the live create/update/embedding paths and ContextMemoryRepository.getReindexFilter() applies the same visibility == ENTITY rule (a cross-DB JSON ListFilter condition) to the bulk-reindex reader and its entity counts, so partition totals stay balanced and the staged index promotes. The applyContextMemoryVisibility apparatus is removed from both the Elasticsearch and OpenSearch managers; the REST /contextCenter/memories visibility checks are unchanged.

Type of change:

Bug fix

High-level design:

Two indexing pipelines need the rule, so it lives entirely in ContextMemoryRepository (overriding trivial EntityRepository base hooks): isSearchIndexable(entity) for the live single-entity path (SearchRepository.createEntityIndex/updateEntityIndex, VectorEmbeddingHandler), and getReindexFilter() (a shareConfig.visibility = 'Entity' JSON condition on ListFilter) for the bulk reindex — applied at the reader (PaginatedEntitiesSource) and every entity-count site (PartitionCalculator / ReindexingOrchestrator / DistributedIndexingStrategy). Count and read must share the same filter because reindex promotion compares reader-total vs sink-success (StatsReconciler / EntityPromotionContext); excluding from the read but not the count would collapse the success ratio and block index promotion. No data migration is needed — ContextMemory search indexing is unreleased (added post-1.13, only on 2.0.0-SNAPSHOT), so the feature ships with this fix.

Tests:

Use cases covered

Creating a PRIVATE or SHARED memory: never written to the search index (live path).
An ENTITY (org-wide) memory: searchable.
Flipping a memory ENTITY→PRIVATE removes it from the index live; PRIVATE→ENTITY re-indexes it.
A full reindex rebuilds only ENTITY memories, with balanced counts so the staged index promotes.

Unit tests

I added unit tests for the new/changed logic.
ContextMemoryRepositoryTest — isSearchIndexable across ENTITY/PRIVATE/SHARED/missing-shareConfig + getReindexFilter.
ListFilterTest — the generated memorySearchVisibility SQL condition (MySQL + Postgres forms).

Backend integration tests

I added integration tests in openmetadata-integration-tests/.
ContextMemoryIT.privateAndSharedMemoriesAreExcludedFromSearchIndex and ContextMemoryIT.memoryVisibilityFlipAddsAndRemovesFromSearchIndex.

Ingestion integration tests

Not applicable (no ingestion changes).

Playwright (UI) tests

Not applicable (no UI changes).

Manual testing performed

mvn compile + spotless:check on openmetadata-service and openmetadata-integration-tests: pass.
Unit tests green: ContextMemoryRepositoryTest, ListFilterTest, ReindexingUtilTest. The two ContextMemory ITs require the Docker stack and were not run locally.

UI screen recording / screenshots:

Not applicable.

Checklist:

I have read the CONTRIBUTING document.
I have commented on my code, particularly in hard-to-understand areas.
I have added tests (unit / integration as applicable) and listed them above.

🤖 Generated with Claude Code

Greptile Summary

This PR moves ContextMemory visibility enforcement from per-query search filters to index time: only org-wide (ENTITY) memories are written to the search index, while PRIVATE and SHARED memories are excluded at create/update and during bulk reindex. The fragile per-query applyContextMemoryVisibility apparatus in both ES and OS managers is removed, replaced by two repository overrides (isSearchIndexable and getReindexFilter) and a shared ListFilter condition that keeps reader counts and partition totals aligned so staged-index promotion succeeds.

ContextMemoryRepository overrides isSearchIndexable (live single-entity gate) and getReindexFilter (DB-level filter for bulk reindex), with SearchRepository.updateEntityIndex calling deleteEntityIndex on a visibility flip to PRIVATE/SHARED.
All entity-count sites (PartitionCalculator, ReindexingOrchestrator, DistributedIndexingStrategy) and the bulk reader (PaginatedEntitiesSource) now use repository.getReindexFilter() so counts match what is actually indexed and partition promotion ratios stay correct.
ContextMemorySearchVisibility and its ES/OS wiring are deleted; the REST /contextCenter/memories visibility checks are unchanged.

Confidence Score: 5/5

Safe to merge; the privacy boundary is correctly enforced at index time across all live and bulk paths, and the old per-query filter is cleanly removed.

All four live indexing entry points (createEntityIndex, updateEntityIndex, the streaming-buffer batch path, and VectorEmbeddingHandler) now gate on isSearchIndexable, and every entity-count site shares the same getReindexFilter() so promotion ratios stay correct. The one gap found is PartitionWorker.initializeKeysetCursor fallback using new ListFilter(Include.ALL) rather than getReindexFilter(); this can cause ENTITY memories near a failure boundary to be skipped in that specific recovery scenario, but since entity reads in that same path do use getReindexFilter(), no PRIVATE/SHARED memory will ever reach the index. ContextMemory search indexing is also unreleased at the time of this change, limiting blast radius.

PartitionWorker.java line 655 — the fallback cursor in initializeKeysetCursor should use getReindexFilter() to stay consistent with the primary path.

Important Files Changed

Filename	Overview
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContextMemoryRepository.java	Adds isSearchIndexable (ENTITY-only) and getReindexFilter (visibility=Entity SQL param) as the single source of truth for memory search-visibility. Logic is correct; null shareConfig and null visibility both default to non-indexable.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java	Adds default isSearchIndexable (true) and getReindexFilter (Include.ALL) hooks for the base class; override points are clean and well-documented.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ListFilter.java	Adds MEMORY_SEARCH_VISIBILITY_PARAM constant and getMemorySearchVisibilityCondition(); both MySQL and PostgreSQL JSON paths look correct. Returns empty string when param is absent, so no impact on other entity types.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java	isSearchIndexable checks added to createEntityIndex, updateEntityIndex, bulkUpdateEntities loop, and the streaming-buffer bulk path. The ENTITY→PRIVATE flip correctly calls deleteEntityIndex on all four paths.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java	Removes applyContextMemoryVisibility calls and the ContextMemorySearchVisibility field from all three search entry points; field and factory import also cleaned up. Removal is complete and consistent.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java	Same as ElasticSearchSearchManager: applyContextMemoryVisibility removed from all four call sites, field and factory import cleaned up. Mirrored correctly.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionWorker.java	readEntitiesKeyset now uses PaginatedEntitiesSource which gets getReindexFilter() correctly. However, the initializeKeysetCursor fallback path (line 655) still uses new ListFilter(Include.ALL); misaligned cursors after batch failures could skip ENTITY memories.
openmetadata-service/src/main/java/org/openmetadata/service/workflows/searchIndex/PaginatedEntitiesSource.java	Both no-total and knownTotal constructors now call getReindexFilter(); the ListFilter and knownTotal constructors that accept an external filter are unchanged (correct, used by non-search-index callers).
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionCalculator.java	getRegularEntityCount now uses repository.getReindexFilter() instead of new ListFilter(Include.ALL); partition totals now match the filtered reader.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingOrchestrator.java	Entity total computation for index-promotion check now uses getReindexFilter(); count is consistent with what the reader indexes.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedIndexingStrategy.java	getEntityTypeCount now uses getReindexFilter() for the non-time-series branch; consistent with PartitionCalculator and ReindexingOrchestrator.
openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorEmbeddingHandler.java	Early-return guard added for non-indexable entities (PRIVATE/SHARED memories). Safe because entity-doc deletion in updateEntityIndex implicitly removes embeddings stored as document fields.
openmetadata-service/src/main/java/org/openmetadata/service/Entity.java	New static isSearchIndexable dispatches to the repository, defaulting to true for repository-less types. Logic and javadoc are correct.
openmetadata-service/src/test/java/org/openmetadata/service/jdbi3/ContextMemoryRepositoryTest.java	New unit tests cover ENTITY/PRIVATE/SHARED/null shareConfig for isSearchIndexable, the reindex filter param value, and the Entity facade dispatch including the repository-less default. Coverage is comprehensive.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ContextMemoryIT.java	Added privateAndSharedMemoriesAreExcludedFromSearchIndex and memoryVisibilityFlipAddsAndRemovesFromSearchIndex tests. The ENTITY-memory awaitility fence is sound; bare assertFalse for PRIVATE/SHARED absence is correct given isSearchIndexable blocks them synchronously.
openmetadata-service/src/main/java/org/openmetadata/service/search/security/ContextMemorySearchVisibility.java	Deleted — the per-query visibility filter is superseded by index-time enforcement. Removal is complete with no remaining references in the codebase.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ContextMemory create/update] --> B{isSearchIndexable?}
    B -- ENTITY visibility --> C[SearchRepository.createEntityIndex / updateEntityIndex]
    B -- PRIVATE or SHARED --> D[SearchRepository.deleteEntityIndex\n remove stale doc]
    B -- PRIVATE or SHARED --> E[VectorEmbeddingHandler\n early return]

    F[Bulk Reindex trigger] --> G[PaginatedEntitiesSource\n filter = getReindexFilter\n visibility = ENTITY only]
    G --> H[PartitionCalculator / ReindexingOrchestrator\n DistributedIndexingStrategy counts\n same getReindexFilter]
    H --> I{Count matches reader}
    I -- yes --> J[StatsReconciler ratio OK\n staged index promotes]
    G --> L[BulkSink writes ENTITY memories]
    L --> M[Search Index\n only ENTITY memories]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[ContextMemory create/update] --> B{isSearchIndexable?}
    B -- ENTITY visibility --> C[SearchRepository.createEntityIndex / updateEntityIndex]
    B -- PRIVATE or SHARED --> D[SearchRepository.deleteEntityIndex\n remove stale doc]
    B -- PRIVATE or SHARED --> E[VectorEmbeddingHandler\n early return]

    F[Bulk Reindex trigger] --> G[PaginatedEntitiesSource\n filter = getReindexFilter\n visibility = ENTITY only]
    G --> H[PartitionCalculator / ReindexingOrchestrator\n DistributedIndexingStrategy counts\n same getReindexFilter]
    H --> I{Count matches reader}
    I -- yes --> J[StatsReconciler ratio OK\n staged index promotes]
    G --> L[BulkSink writes ENTITY memories]
    L --> M[Search Index\n only ENTITY memories]

Comments Outside Diff (2)

openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ContextMemoryIT.java, line 77-84 (link)

Bare assertions for PRIVATE/SHARED absence may yield false positives under parallel consumers — The test fences on the ENTITY memory being indexed first (via Awaitility) and then asserts PRIVATE/SHARED absence without retrying. This is sound when the async indexer processes events strictly in creation order, but if the indexer is partitioned or uses a thread pool, the ENTITY memory could reach the index through a fast lane while PRIVATE/SHARED events are still queued. In that window the bare assertFalse would pass even if PRIVATE/SHARED are later indexed (and the test would not re-run). Since isSearchIndexable blocks them synchronously at createEntityIndex, they can never actually land in the index, so the test will never catch a real regression on that code path either way. Consider wrapping the absence checks in awaitSearchPresence(id, false, …) to make the test symmetric and resilient to any future changes in event ordering.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ListFilter.java, line 303-313 (link)

Condition is evaluated for every entity type on every getCondition() call — getMemorySearchVisibilityCondition() is wired into the global condition chain and the queryParams.get(...) lookup plus nullOrEmpty check runs on every ListFilter instance, not just those created for ContextMemory. The method returns "" for all other entities (safe, consistent with the existing pattern), but a typo that accidentally sets MEMORY_SEARCH_VISIBILITY_PARAM on a non-ContextMemory filter would silently filter out all rows for that entity type because most entity tables don't have a JSON shareConfig.visibility field. The risk is low given only ContextMemoryRepository.getReindexFilter() sets this param today, but a short guard comment on the constant noting it is exclusively for that repository would help prevent misuse.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (5): Last reviewed commit: "docs(search): document reindex-only scop..." | Re-trigger Greptile}

…query Private/Shared ContextMemory could leak through search paths the per-query applyContextMemoryVisibility filter (#29384) did not cover. Instead of filtering on every search path, only org-wide (ENTITY) memories are written to the index: ContextMemoryRepository.isSearchIndexable gates the live create/update/embedding paths and getReindexFilter applies the same rule to the bulk reindex reader and counts (so partition totals stay balanced and the staged index promotes). Removes the applyContextMemoryVisibility apparatus from both search managers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-26T19:04:15Z

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

No GitHub issue is linked. Link an issue in the Development section of the PR (or add Fixes #12345 to the description). For a same-org cross-repo issue, add Fixes open-metadata/<repo>#123 to the description.

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

greptile-apps · 2026-06-26T19:13:33Z

+    if (!Entity.getEntityRepository(entityType).isSearchIndexable(entity)) {
+      return;
+    }


Silent early-return on update gives no vector-side cleanup — When a memory is flipped ENTITY→PRIVATE, SearchRepository.updateEntityIndex calls deleteEntityIndex to remove the entity document. Because vector embeddings are written via partialUpdateEntity as fields on that same document, the entity-doc deletion implicitly cleans up the embedding too. The VectorEmbeddingHandler returning early here is therefore safe today. However, the asymmetry vs. SearchRepository (which explicitly deletes) is easy to miss: if the embedding is ever moved to its own child documents (the chunkIndex/chunkCount/parentId fields already hint at chunking), this silent return would become a real leak with no compiler or test warning. A brief inline comment explaining the implicit-cleanup assumption would protect future readers.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

createEntitiesIndex and updateEntitiesIndex (the bulk live paths reached via SearchIndexHandler.onEntitiesCreated/onEntitiesUpdated) bypassed the isSearchIndexable gate, so a PRIVATE/SHARED ContextMemory created or updated through a bulk lifecycle dispatch (import/batch extraction) could still be indexed. Apply the same gate: skip non-indexable entities on bulk create; on bulk update drop them and delete their stale docs, mirroring updateEntityIndex's ENTITY->PRIVATE flip handling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-26T22:30:33Z

🔴 Playwright Results — 2 failure(s), 54 flaky

✅ 4420 passed · ❌ 2 failed · 🟡 54 flaky · ⏭️ 38 skipped

Shard	Passed	Failed	Flaky	Skipped
🟡 Shard 1	321	0	4	4
🔴 Shard 2	822	1	6	9
🟡 Shard 3	823	0	6	7
🟡 Shard 4	827	0	4	10
🔴 Shard 5	854	1	19	0
🟡 Shard 6	773	0	15	8

Genuine Failures (failed on all attempts)

❌ Features/GlobalPageSize.spec.ts › Page size should persist across different pages (shard 2)

�[31mTest timeout of 60000ms exceeded.�[39m

❌ Pages/ExploreBrowse.spec.ts › service type drill-down disables unrelated roots and query-panel Clear resets it (shard 5)

�[31mTest timeout of 180000ms exceeded.�[39m

🟡 54 flaky test(s) (passed on retry)

Features/DataAssetRulesEnabled.spec.ts › Verify the Dashboard Entity Action items after rules is Enabled (shard 1, 1 retry)
Features/DataAssetRulesDisabled.spec.ts › Verify the ApiEndpoint entity item action after rules disabled (shard 1, 1 retry)
Flow/Metric.spec.ts › Verify Granularity Update (shard 1, 1 retry)
Flow/SearchRBAC.spec.ts › the browse tree only shows the asset-type categories a user can access (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 2 retries)
Features/BulkEditOperationBadges.spec.ts › Glossary bulk edit search filters rows and clear restores them (shard 2, 1 retry)
Features/BulkImport.spec.ts › Database service (shard 2, 2 retries)
Features/Container.spec.ts › expand / collapse should not appear after updating nested fields for container (shard 2, 1 retry)
Features/Container.spec.ts › Copy column link button should copy the column URL to clipboard (shard 2, 1 retry)
Features/DataQuality/CertificationFilter.spec.ts › Certification filter narrows both table- and testCase-index queries via the flat field path (shard 2, 1 retry)
Features/LandingPageWidgets/DomainDataProductsWidgets.spec.ts › Domain asset count should update when assets are added (shard 3, 1 retry)
Features/NestedColumnsExpandCollapse.spec.ts › should not duplicate rows when expanding and collapsing nested columns with same names in Version History (shard 3, 1 retry)
Features/Permissions/EntityPermissions.spec.ts › Dashboard deny common operations permissions (shard 3, 1 retry)
Features/SearchExport.spec.ts › Export queues a background job and downloads from the jobs tray (shard 3, 1 retry)
Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
Features/TestSuiteMultiPipeline.spec.ts › TestSuite multi pipeline support (shard 3, 1 retry)
Pages/CustomProperties.spec.ts › Number (shard 4, 1 retry)
Pages/CustomProperties.spec.ts › Set & Update all CP types on apiCollection (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for ApiEndpoint (shard 4, 1 retry)
Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
Pages/Entity.spec.ts › User as Owner Add, Update and Remove (shard 5, 1 retry)
Pages/Entity.spec.ts › User should be denied access to edit description when deny policy rule is applied on an entity (shard 5, 1 retry)
Pages/Entity.spec.ts › Announcement create, edit & delete (shard 5, 1 retry)
Pages/Entity.spec.ts › Tag and Glossary Selector should close vice versa (shard 5, 1 retry)
Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
Pages/Entity.spec.ts › User as Owner with unsorted list (shard 5, 1 retry)
Pages/Entity.spec.ts › Team as Owner Add, Update and Remove (shard 5, 1 retry)
Pages/Entity.spec.ts › User as Owner with unsorted list (shard 5, 1 retry)
Pages/Entity.spec.ts › User should be denied access to edit description when deny policy rule is applied on an entity (shard 5, 1 retry)
Pages/Entity.spec.ts › User should be denied access to edit description when deny policy rule is applied on an entity (shard 5, 1 retry)
... and 24 more

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

checkIfIndexingIsSupported gates on the index mapping, not repository existence, so an indexable but repository-less type (e.g. an index-only/time-series sub-entity) reaching createEntityIndex/updateEntityIndex would throw EntityNotFoundException at the getEntityRepository().isSearchIndexable call added in the prior commits. Route all live index-path gates through a new Entity.isSearchIndexable facade that defaults to indexable when no repository is registered, so the rule still lives in ContextMemoryRepository but repository-less types index exactly as before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…_PARAM Warn that the filter param is set only by ContextMemoryRepository.getReindexFilter and must not be applied to the REST /contextCenter/memories listing (owners must still read their own PRIVATE/SHARED memories there). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-06-28T15:50:16Z

Code Review 👍 Approved with suggestions 1 resolved / 2 findings

Enforces ContextMemory visibility at index time by gating search indexing to ENTITY-only records, resolving the risk of data leakage via uncovered search paths. Consider refining integration test assertions and adding documentation to the visibility parameter to prevent future misuse.

💡 Edge Case: Deterministic tiebreak now sorts on name.keyword/id.keyword for all queries

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java:1114-1122 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java:1254-1268 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java:1073-1081 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java:1211-1225

In both OpenSearchSearchManager.buildSearchRequestBuilder (~lines 1254-1278) and ElasticSearchSearchManager.buildSearchRequestBuilder (~lines 1211-1235), the deterministic tiebreaker (name.keyword + id.keyword) is now appended on EVERY search path. Previously it was only appended when the sort field was _score or when isExport=true. For explicit non-_score sort fields on non-export requests, no id.keyword/name.keyword sort was added before; it now always is.

This is a deliberate determinism improvement, but it broadens the set of queries that sort on name.keyword/id.keyword. If any searchable index lacks those keyword sub-fields in its mapping, the sort clause can fail with "No mapping found for [name.keyword]". Risk is low because OpenMetadata entity indexes universally map name.keyword and id.keyword, and the _score/export paths already relied on this. Worth a quick confirmation that no specialized index (e.g. data-quality/relationship-only indexes) is queried through this builder without those fields. If such an index exists, set unmapped_type on the tiebreaker sorts to keep them harmless.

✅ 1 resolved

✅ Security: Bulk live-index paths bypass isSearchIndexable visibility gate

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:1193-1207 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:1630-1644
The new visibility-at-index-time enforcement only gates the single-entity live paths: SearchRepository.createEntityIndex (SearchRepository.java:961) and updateEntityIndex (SearchRepository.java:1399), plus VectorEmbeddingHandler. However the bulk live-index paths do NOT call isSearchIndexable:

createEntitiesIndex(List) builds and writes docs directly (SearchRepository.java:1209-1227) with no indexability check. It is reached on the live (non-reindex) path via SearchIndexHandler.onEntitiesCreated -> createEntitiesIndex (SearchIndexHandler.java:170).

updateEntitiesIndex(List) writes through bulkSink.write(typeEntities, ...) (SearchRepository.java:1692) without an indexability filter; it only falls back to the gated updateEntityIndex on bulk failure (line 1698). It is reached via SearchIndexHandler.onEntitiesUpdated (SearchIndexHandler.java:197).

The EntityLifecycleEventDispatcher invokes these bulk handler methods for bulk create/update operations. If any ContextMemory create/update ever flows through a bulk lifecycle dispatch (e.g. a batch extraction/import), a PRIVATE or SHARED memory would be written to the search index, re-introducing exactly the leak this PR set out to eliminate at index time. This is the same fragility (uncovered paths) the PR cites for moving the rule to index time, so the bulk live paths should be covered too.

Suggested fix: apply isSearchIndexable filtering inside createEntitiesIndex and updateEntitiesIndex (drop non-indexable entities before building/writing docs), so all live write paths share the gate. Note the asymmetry vs. updateEntityIndex, which deletes the doc when not indexable — the bulk update path should similarly remove now-non-indexable docs to handle a bulk ENTITY->PRIVATE flip.

🤖 Prompt for agents

Code Review: Enforces ContextMemory visibility at index time by gating search indexing to ENTITY-only records, resolving the risk of data leakage via uncovered search paths. Consider refining integration test assertions and adding documentation to the visibility parameter to prevent future misuse.

1. 💡 Edge Case: Deterministic tiebreak now sorts on name.keyword/id.keyword for all queries
   Files: openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java:1114-1122, openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java:1254-1268, openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java:1073-1081, openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java:1211-1225

   In both OpenSearchSearchManager.buildSearchRequestBuilder (~lines 1254-1278) and ElasticSearchSearchManager.buildSearchRequestBuilder (~lines 1211-1235), the deterministic tiebreaker (`name.keyword` + `id.keyword`) is now appended on EVERY search path. Previously it was only appended when the sort field was `_score` or when `isExport=true`. For explicit non-`_score` sort fields on non-export requests, no `id.keyword`/`name.keyword` sort was added before; it now always is.
   
   This is a deliberate determinism improvement, but it broadens the set of queries that sort on `name.keyword`/`id.keyword`. If any searchable index lacks those keyword sub-fields in its mapping, the sort clause can fail with "No mapping found for [name.keyword]". Risk is low because OpenMetadata entity indexes universally map `name.keyword` and `id.keyword`, and the `_score`/export paths already relied on this. Worth a quick confirmation that no specialized index (e.g. data-quality/relationship-only indexes) is queried through this builder without those fields. If such an index exists, set `unmapped_type` on the tiebreaker sorts to keep them harmless.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-06-28T17:36:07Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jun 26, 2026

greptile-apps Bot reviewed Jun 26, 2026

View reviewed changes

harshach had a problem deploying to test June 26, 2026 19:18 — with GitHub Actions Error

harshach and others added 2 commits June 26, 2026 12:37

Merge branch 'main' into harshach/contextmemory-search-visibility-gap

f37202e

harshach had a problem deploying to test June 26, 2026 20:09 — with GitHub Actions Failure

harshach temporarily deployed to test June 26, 2026 20:09 — with GitHub Actions Inactive

harshach had a problem deploying to test June 28, 2026 14:26 — with GitHub Actions Error

harshach had a problem deploying to test June 28, 2026 14:26 — with GitHub Actions Failure

harshach had a problem deploying to test June 28, 2026 14:26 — with GitHub Actions Error

harshach temporarily deployed to test June 28, 2026 16:03 — with GitHub Actions Inactive

harshach had a problem deploying to test June 28, 2026 16:03 — with GitHub Actions Failure

harshach temporarily deployed to test June 28, 2026 16:03 — with GitHub Actions Inactive

harshach had a problem deploying to test June 28, 2026 16:03 — with GitHub Actions Failure

harshach temporarily deployed to test June 28, 2026 16:03 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(search): enforce ContextMemory visibility at index time, not per-query#29541

fix(search): enforce ContextMemory visibility at index time, not per-query#29541
harshach wants to merge 5 commits into
mainfrom
harshach/contextmemory-search-visibility-gap

harshach commented Jun 26, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

greptile-apps Bot Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

gitar-bot Bot commented Jun 28, 2026

Uh oh!

sonarqubecloud Bot commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

harshach commented Jun 26, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

High-level design:

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Checklist:

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

github-actions Bot commented Jun 26, 2026

❌ PR checklist incomplete

Uh oh!

greptile-apps Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Playwright Results — 2 failure(s), 54 flaky

Genuine Failures (failed on all attempts)

Uh oh!

gitar-bot Bot commented Jun 28, 2026

Uh oh!

sonarqubecloud Bot commented Jun 28, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

harshach commented Jun 26, 2026 •

edited by greptile-apps Bot

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading