Skip to content

fix(search): enforce ContextMemory visibility at index time, not per-query#29541

Open
harshach wants to merge 5 commits into
mainfrom
harshach/contextmemory-search-visibility-gap
Open

fix(search): enforce ContextMemory visibility at index time, not per-query#29541
harshach wants to merge 5 commits into
mainfrom
harshach/contextmemory-search-visibility-gap

Conversation

@harshach

@harshach harshach commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Describe your changes:

This addresses a security finding from the review of #29384: Private/Shared ContextMemory could still leak through search paths that the per-query applyContextMemoryVisibility filter didn't cover. Instead of filtering on every search path (fragile — the review found three uncovered ones), the rule is enforced once at index time: only org-wide (ENTITY) memories are written to the search index. ContextMemoryRepository.isSearchIndexable gates the live create/update/embedding paths and ContextMemoryRepository.getReindexFilter() applies the same visibility == ENTITY rule (a cross-DB JSON ListFilter condition) to the bulk-reindex reader and its entity counts, so partition totals stay balanced and the staged index promotes. The applyContextMemoryVisibility apparatus is removed from both the Elasticsearch and OpenSearch managers; the REST /contextCenter/memories visibility checks are unchanged.

Type of change:

  • Bug fix

High-level design:

Two indexing pipelines need the rule, so it lives entirely in ContextMemoryRepository (overriding trivial EntityRepository base hooks): isSearchIndexable(entity) for the live single-entity path (SearchRepository.createEntityIndex/updateEntityIndex, VectorEmbeddingHandler), and getReindexFilter() (a shareConfig.visibility = 'Entity' JSON condition on ListFilter) for the bulk reindex — applied at the reader (PaginatedEntitiesSource) and every entity-count site (PartitionCalculator / ReindexingOrchestrator / DistributedIndexingStrategy). Count and read must share the same filter because reindex promotion compares reader-total vs sink-success (StatsReconciler / EntityPromotionContext); excluding from the read but not the count would collapse the success ratio and block index promotion. No data migration is needed — ContextMemory search indexing is unreleased (added post-1.13, only on 2.0.0-SNAPSHOT), so the feature ships with this fix.

Tests:

Use cases covered

  • Creating a PRIVATE or SHARED memory: never written to the search index (live path).
  • An ENTITY (org-wide) memory: searchable.
  • Flipping a memory ENTITY→PRIVATE removes it from the index live; PRIVATE→ENTITY re-indexes it.
  • A full reindex rebuilds only ENTITY memories, with balanced counts so the staged index promotes.

Unit tests

  • I added unit tests for the new/changed logic.
  • ContextMemoryRepositoryTestisSearchIndexable across ENTITY/PRIVATE/SHARED/missing-shareConfig + getReindexFilter.
  • ListFilterTest — the generated memorySearchVisibility SQL condition (MySQL + Postgres forms).

Backend integration tests

  • I added integration tests in openmetadata-integration-tests/.
  • ContextMemoryIT.privateAndSharedMemoriesAreExcludedFromSearchIndex and ContextMemoryIT.memoryVisibilityFlipAddsAndRemovesFromSearchIndex.

Ingestion integration tests

  • Not applicable (no ingestion changes).

Playwright (UI) tests

  • Not applicable (no UI changes).

Manual testing performed

  1. mvn compile + spotless:check on openmetadata-service and openmetadata-integration-tests: pass.
  2. Unit tests green: ContextMemoryRepositoryTest, ListFilterTest, ReindexingUtilTest. The two ContextMemory ITs require the Docker stack and were not run locally.

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have added tests (unit / integration as applicable) and listed them above.

🤖 Generated with Claude Code

Greptile Summary

This PR moves ContextMemory visibility enforcement from per-query search filters to index time: only org-wide (ENTITY) memories are written to the search index, while PRIVATE and SHARED memories are excluded at create/update and during bulk reindex. The fragile per-query applyContextMemoryVisibility apparatus in both ES and OS managers is removed, replaced by two repository overrides (isSearchIndexable and getReindexFilter) and a shared ListFilter condition that keeps reader counts and partition totals aligned so staged-index promotion succeeds.

  • ContextMemoryRepository overrides isSearchIndexable (live single-entity gate) and getReindexFilter (DB-level filter for bulk reindex), with SearchRepository.updateEntityIndex calling deleteEntityIndex on a visibility flip to PRIVATE/SHARED.
  • All entity-count sites (PartitionCalculator, ReindexingOrchestrator, DistributedIndexingStrategy) and the bulk reader (PaginatedEntitiesSource) now use repository.getReindexFilter() so counts match what is actually indexed and partition promotion ratios stay correct.
  • ContextMemorySearchVisibility and its ES/OS wiring are deleted; the REST /contextCenter/memories visibility checks are unchanged.

Confidence Score: 5/5

Safe to merge; the privacy boundary is correctly enforced at index time across all live and bulk paths, and the old per-query filter is cleanly removed.

All four live indexing entry points (createEntityIndex, updateEntityIndex, the streaming-buffer batch path, and VectorEmbeddingHandler) now gate on isSearchIndexable, and every entity-count site shares the same getReindexFilter() so promotion ratios stay correct. The one gap found is PartitionWorker.initializeKeysetCursor fallback using new ListFilter(Include.ALL) rather than getReindexFilter(); this can cause ENTITY memories near a failure boundary to be skipped in that specific recovery scenario, but since entity reads in that same path do use getReindexFilter(), no PRIVATE/SHARED memory will ever reach the index. ContextMemory search indexing is also unreleased at the time of this change, limiting blast radius.

PartitionWorker.java line 655 — the fallback cursor in initializeKeysetCursor should use getReindexFilter() to stay consistent with the primary path.

Important Files Changed

Filename Overview
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContextMemoryRepository.java Adds isSearchIndexable (ENTITY-only) and getReindexFilter (visibility=Entity SQL param) as the single source of truth for memory search-visibility. Logic is correct; null shareConfig and null visibility both default to non-indexable.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java Adds default isSearchIndexable (true) and getReindexFilter (Include.ALL) hooks for the base class; override points are clean and well-documented.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ListFilter.java Adds MEMORY_SEARCH_VISIBILITY_PARAM constant and getMemorySearchVisibilityCondition(); both MySQL and PostgreSQL JSON paths look correct. Returns empty string when param is absent, so no impact on other entity types.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java isSearchIndexable checks added to createEntityIndex, updateEntityIndex, bulkUpdateEntities loop, and the streaming-buffer bulk path. The ENTITY→PRIVATE flip correctly calls deleteEntityIndex on all four paths.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java Removes applyContextMemoryVisibility calls and the ContextMemorySearchVisibility field from all three search entry points; field and factory import also cleaned up. Removal is complete and consistent.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java Same as ElasticSearchSearchManager: applyContextMemoryVisibility removed from all four call sites, field and factory import cleaned up. Mirrored correctly.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionWorker.java readEntitiesKeyset now uses PaginatedEntitiesSource which gets getReindexFilter() correctly. However, the initializeKeysetCursor fallback path (line 655) still uses new ListFilter(Include.ALL); misaligned cursors after batch failures could skip ENTITY memories.
openmetadata-service/src/main/java/org/openmetadata/service/workflows/searchIndex/PaginatedEntitiesSource.java Both no-total and knownTotal constructors now call getReindexFilter(); the ListFilter and knownTotal constructors that accept an external filter are unchanged (correct, used by non-search-index callers).
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/PartitionCalculator.java getRegularEntityCount now uses repository.getReindexFilter() instead of new ListFilter(Include.ALL); partition totals now match the filtered reader.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingOrchestrator.java Entity total computation for index-promotion check now uses getReindexFilter(); count is consistent with what the reader indexes.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/DistributedIndexingStrategy.java getEntityTypeCount now uses getReindexFilter() for the non-time-series branch; consistent with PartitionCalculator and ReindexingOrchestrator.
openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorEmbeddingHandler.java Early-return guard added for non-indexable entities (PRIVATE/SHARED memories). Safe because entity-doc deletion in updateEntityIndex implicitly removes embeddings stored as document fields.
openmetadata-service/src/main/java/org/openmetadata/service/Entity.java New static isSearchIndexable dispatches to the repository, defaulting to true for repository-less types. Logic and javadoc are correct.
openmetadata-service/src/test/java/org/openmetadata/service/jdbi3/ContextMemoryRepositoryTest.java New unit tests cover ENTITY/PRIVATE/SHARED/null shareConfig for isSearchIndexable, the reindex filter param value, and the Entity facade dispatch including the repository-less default. Coverage is comprehensive.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ContextMemoryIT.java Added privateAndSharedMemoriesAreExcludedFromSearchIndex and memoryVisibilityFlipAddsAndRemovesFromSearchIndex tests. The ENTITY-memory awaitility fence is sound; bare assertFalse for PRIVATE/SHARED absence is correct given isSearchIndexable blocks them synchronously.
openmetadata-service/src/main/java/org/openmetadata/service/search/security/ContextMemorySearchVisibility.java Deleted — the per-query visibility filter is superseded by index-time enforcement. Removal is complete with no remaining references in the codebase.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ContextMemory create/update] --> B{isSearchIndexable?}
    B -- ENTITY visibility --> C[SearchRepository.createEntityIndex / updateEntityIndex]
    B -- PRIVATE or SHARED --> D[SearchRepository.deleteEntityIndex\n remove stale doc]
    B -- PRIVATE or SHARED --> E[VectorEmbeddingHandler\n early return]

    F[Bulk Reindex trigger] --> G[PaginatedEntitiesSource\n filter = getReindexFilter\n visibility = ENTITY only]
    G --> H[PartitionCalculator / ReindexingOrchestrator\n DistributedIndexingStrategy counts\n same getReindexFilter]
    H --> I{Count matches reader}
    I -- yes --> J[StatsReconciler ratio OK\n staged index promotes]
    G --> L[BulkSink writes ENTITY memories]
    L --> M[Search Index\n only ENTITY memories]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[ContextMemory create/update] --> B{isSearchIndexable?}
    B -- ENTITY visibility --> C[SearchRepository.createEntityIndex / updateEntityIndex]
    B -- PRIVATE or SHARED --> D[SearchRepository.deleteEntityIndex\n remove stale doc]
    B -- PRIVATE or SHARED --> E[VectorEmbeddingHandler\n early return]

    F[Bulk Reindex trigger] --> G[PaginatedEntitiesSource\n filter = getReindexFilter\n visibility = ENTITY only]
    G --> H[PartitionCalculator / ReindexingOrchestrator\n DistributedIndexingStrategy counts\n same getReindexFilter]
    H --> I{Count matches reader}
    I -- yes --> J[StatsReconciler ratio OK\n staged index promotes]
    G --> L[BulkSink writes ENTITY memories]
    L --> M[Search Index\n only ENTITY memories]
Loading

Comments Outside Diff (2)

  1. openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ContextMemoryIT.java, line 77-84 (link)

    P2 Bare assertions for PRIVATE/SHARED absence may yield false positives under parallel consumers — The test fences on the ENTITY memory being indexed first (via Awaitility) and then asserts PRIVATE/SHARED absence without retrying. This is sound when the async indexer processes events strictly in creation order, but if the indexer is partitioned or uses a thread pool, the ENTITY memory could reach the index through a fast lane while PRIVATE/SHARED events are still queued. In that window the bare assertFalse would pass even if PRIVATE/SHARED are later indexed (and the test would not re-run). Since isSearchIndexable blocks them synchronously at createEntityIndex, they can never actually land in the index, so the test will never catch a real regression on that code path either way. Consider wrapping the absence checks in awaitSearchPresence(id, false, …) to make the test symmetric and resilient to any future changes in event ordering.

  2. openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ListFilter.java, line 303-313 (link)

    P2 Condition is evaluated for every entity type on every getCondition() callgetMemorySearchVisibilityCondition() is wired into the global condition chain and the queryParams.get(...) lookup plus nullOrEmpty check runs on every ListFilter instance, not just those created for ContextMemory. The method returns "" for all other entities (safe, consistent with the existing pattern), but a typo that accidentally sets MEMORY_SEARCH_VISIBILITY_PARAM on a non-ContextMemory filter would silently filter out all rows for that entity type because most entity tables don't have a JSON shareConfig.visibility field. The risk is low given only ContextMemoryRepository.getReindexFilter() sets this param today, but a short guard comment on the constant noting it is exclusively for that repository would help prevent misuse.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (5): Last reviewed commit: "docs(search): document reindex-only scop..." | Re-trigger Greptile

…query

Private/Shared ContextMemory could leak through search paths the per-query applyContextMemoryVisibility filter (#29384) did not cover. Instead of filtering on every search path, only org-wide (ENTITY) memories are written to the index: ContextMemoryRepository.isSearchIndexable gates the live create/update/embedding paths and getReindexFilter applies the same rule to the bulk reindex reader and counts (so partition totals stay balanced and the staged index promotes). Removes the applyContextMemoryVisibility apparatus from both search managers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

  • No GitHub issue is linked. Link an issue in the Development section of the PR (or add Fixes #12345 to the description). For a same-org cross-repo issue, add Fixes open-metadata/<repo>#123 to the description.

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jun 26, 2026
Comment on lines +69 to +71
if (!Entity.getEntityRepository(entityType).isSearchIndexable(entity)) {
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent early-return on update gives no vector-side cleanup — When a memory is flipped ENTITY→PRIVATE, SearchRepository.updateEntityIndex calls deleteEntityIndex to remove the entity document. Because vector embeddings are written via partialUpdateEntity as fields on that same document, the entity-doc deletion implicitly cleans up the embedding too. The VectorEmbeddingHandler returning early here is therefore safe today. However, the asymmetry vs. SearchRepository (which explicitly deletes) is easy to miss: if the embedding is ever moved to its own child documents (the chunkIndex/chunkCount/parentId fields already hint at chunking), this silent return would become a real leak with no compiler or test warning. A brief inline comment explaining the implicit-cleanup assumption would protect future readers.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

harshach and others added 2 commits June 26, 2026 12:37
createEntitiesIndex and updateEntitiesIndex (the bulk live paths reached via SearchIndexHandler.onEntitiesCreated/onEntitiesUpdated) bypassed the isSearchIndexable gate, so a PRIVATE/SHARED ContextMemory created or updated through a bulk lifecycle dispatch (import/batch extraction) could still be indexed. Apply the same gate: skip non-indexable entities on bulk create; on bulk update drop them and delete their stale docs, mirroring updateEntityIndex's ENTITY->PRIVATE flip handling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

🔴 Playwright Results — 2 failure(s), 54 flaky

✅ 4420 passed · ❌ 2 failed · 🟡 54 flaky · ⏭️ 38 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 321 0 4 4
🔴 Shard 2 822 1 6 9
🟡 Shard 3 823 0 6 7
🟡 Shard 4 827 0 4 10
🔴 Shard 5 854 1 19 0
🟡 Shard 6 773 0 15 8

Genuine Failures (failed on all attempts)

Features/GlobalPageSize.spec.ts › Page size should persist across different pages (shard 2)
�[31mTest timeout of 60000ms exceeded.�[39m
Pages/ExploreBrowse.spec.ts › service type drill-down disables unrelated roots and query-panel Clear resets it (shard 5)
�[31mTest timeout of 180000ms exceeded.�[39m
🟡 54 flaky test(s) (passed on retry)
  • Features/DataAssetRulesEnabled.spec.ts › Verify the Dashboard Entity Action items after rules is Enabled (shard 1, 1 retry)
  • Features/DataAssetRulesDisabled.spec.ts › Verify the ApiEndpoint entity item action after rules disabled (shard 1, 1 retry)
  • Flow/Metric.spec.ts › Verify Granularity Update (shard 1, 1 retry)
  • Flow/SearchRBAC.spec.ts › the browse tree only shows the asset-type categories a user can access (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 2 retries)
  • Features/BulkEditOperationBadges.spec.ts › Glossary bulk edit search filters rows and clear restores them (shard 2, 1 retry)
  • Features/BulkImport.spec.ts › Database service (shard 2, 2 retries)
  • Features/Container.spec.ts › expand / collapse should not appear after updating nested fields for container (shard 2, 1 retry)
  • Features/Container.spec.ts › Copy column link button should copy the column URL to clipboard (shard 2, 1 retry)
  • Features/DataQuality/CertificationFilter.spec.ts › Certification filter narrows both table- and testCase-index queries via the flat field path (shard 2, 1 retry)
  • Features/LandingPageWidgets/DomainDataProductsWidgets.spec.ts › Domain asset count should update when assets are added (shard 3, 1 retry)
  • Features/NestedColumnsExpandCollapse.spec.ts › should not duplicate rows when expanding and collapsing nested columns with same names in Version History (shard 3, 1 retry)
  • Features/Permissions/EntityPermissions.spec.ts › Dashboard deny common operations permissions (shard 3, 1 retry)
  • Features/SearchExport.spec.ts › Export queues a background job and downloads from the jobs tray (shard 3, 1 retry)
  • Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
  • Features/TestSuiteMultiPipeline.spec.ts › TestSuite multi pipeline support (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Number (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Set & Update all CP types on apiCollection (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for ApiEndpoint (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
  • Pages/Entity.spec.ts › User as Owner Add, Update and Remove (shard 5, 1 retry)
  • Pages/Entity.spec.ts › User should be denied access to edit description when deny policy rule is applied on an entity (shard 5, 1 retry)
  • Pages/Entity.spec.ts › Announcement create, edit & delete (shard 5, 1 retry)
  • Pages/Entity.spec.ts › Tag and Glossary Selector should close vice versa (shard 5, 1 retry)
  • Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/Entity.spec.ts › User as Owner with unsorted list (shard 5, 1 retry)
  • Pages/Entity.spec.ts › Team as Owner Add, Update and Remove (shard 5, 1 retry)
  • Pages/Entity.spec.ts › User as Owner with unsorted list (shard 5, 1 retry)
  • Pages/Entity.spec.ts › User should be denied access to edit description when deny policy rule is applied on an entity (shard 5, 1 retry)
  • Pages/Entity.spec.ts › User should be denied access to edit description when deny policy rule is applied on an entity (shard 5, 1 retry)
  • ... and 24 more

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

checkIfIndexingIsSupported gates on the index mapping, not repository existence, so an indexable but repository-less type (e.g. an index-only/time-series sub-entity) reaching createEntityIndex/updateEntityIndex would throw EntityNotFoundException at the getEntityRepository().isSearchIndexable call added in the prior commits. Route all live index-path gates through a new Entity.isSearchIndexable facade that defaults to indexable when no repository is registered, so the rule still lives in ContextMemoryRepository but repository-less types index exactly as before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_PARAM

Warn that the filter param is set only by ContextMemoryRepository.getReindexFilter and must not be applied to the REST /contextCenter/memories listing (owners must still read their own PRIVATE/SHARED memories there).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gitar-bot

gitar-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown
Code Review 👍 Approved with suggestions 1 resolved / 2 findings

Enforces ContextMemory visibility at index time by gating search indexing to ENTITY-only records, resolving the risk of data leakage via uncovered search paths. Consider refining integration test assertions and adding documentation to the visibility parameter to prevent future misuse.

💡 Edge Case: Deterministic tiebreak now sorts on name.keyword/id.keyword for all queries

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java:1114-1122 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java:1254-1268 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java:1073-1081 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java:1211-1225

In both OpenSearchSearchManager.buildSearchRequestBuilder (~lines 1254-1278) and ElasticSearchSearchManager.buildSearchRequestBuilder (~lines 1211-1235), the deterministic tiebreaker (name.keyword + id.keyword) is now appended on EVERY search path. Previously it was only appended when the sort field was _score or when isExport=true. For explicit non-_score sort fields on non-export requests, no id.keyword/name.keyword sort was added before; it now always is.

This is a deliberate determinism improvement, but it broadens the set of queries that sort on name.keyword/id.keyword. If any searchable index lacks those keyword sub-fields in its mapping, the sort clause can fail with "No mapping found for [name.keyword]". Risk is low because OpenMetadata entity indexes universally map name.keyword and id.keyword, and the _score/export paths already relied on this. Worth a quick confirmation that no specialized index (e.g. data-quality/relationship-only indexes) is queried through this builder without those fields. If such an index exists, set unmapped_type on the tiebreaker sorts to keep them harmless.

✅ 1 resolved
Security: Bulk live-index paths bypass isSearchIndexable visibility gate

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:1193-1207 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:1630-1644
The new visibility-at-index-time enforcement only gates the single-entity live paths: SearchRepository.createEntityIndex (SearchRepository.java:961) and updateEntityIndex (SearchRepository.java:1399), plus VectorEmbeddingHandler. However the bulk live-index paths do NOT call isSearchIndexable:

  • createEntitiesIndex(List) builds and writes docs directly (SearchRepository.java:1209-1227) with no indexability check. It is reached on the live (non-reindex) path via SearchIndexHandler.onEntitiesCreated -> createEntitiesIndex (SearchIndexHandler.java:170).
  • updateEntitiesIndex(List) writes through bulkSink.write(typeEntities, ...) (SearchRepository.java:1692) without an indexability filter; it only falls back to the gated updateEntityIndex on bulk failure (line 1698). It is reached via SearchIndexHandler.onEntitiesUpdated (SearchIndexHandler.java:197).

The EntityLifecycleEventDispatcher invokes these bulk handler methods for bulk create/update operations. If any ContextMemory create/update ever flows through a bulk lifecycle dispatch (e.g. a batch extraction/import), a PRIVATE or SHARED memory would be written to the search index, re-introducing exactly the leak this PR set out to eliminate at index time. This is the same fragility (uncovered paths) the PR cites for moving the rule to index time, so the bulk live paths should be covered too.

Suggested fix: apply isSearchIndexable filtering inside createEntitiesIndex and updateEntitiesIndex (drop non-indexable entities before building/writing docs), so all live write paths share the gate. Note the asymmetry vs. updateEntityIndex, which deletes the doc when not indexable — the bulk update path should similarly remove now-non-indexable docs to handle a bulk ENTITY->PRIVATE flip.

🤖 Prompt for agents
Code Review: Enforces ContextMemory visibility at index time by gating search indexing to ENTITY-only records, resolving the risk of data leakage via uncovered search paths. Consider refining integration test assertions and adding documentation to the visibility parameter to prevent future misuse.

1. 💡 Edge Case: Deterministic tiebreak now sorts on name.keyword/id.keyword for all queries
   Files: openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java:1114-1122, openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java:1254-1268, openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java:1073-1081, openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java:1211-1225

   In both OpenSearchSearchManager.buildSearchRequestBuilder (~lines 1254-1278) and ElasticSearchSearchManager.buildSearchRequestBuilder (~lines 1211-1235), the deterministic tiebreaker (`name.keyword` + `id.keyword`) is now appended on EVERY search path. Previously it was only appended when the sort field was `_score` or when `isExport=true`. For explicit non-`_score` sort fields on non-export requests, no `id.keyword`/`name.keyword` sort was added before; it now always is.
   
   This is a deliberate determinism improvement, but it broadens the set of queries that sort on `name.keyword`/`id.keyword`. If any searchable index lacks those keyword sub-fields in its mapping, the sort clause can fail with "No mapping found for [name.keyword]". Risk is low because OpenMetadata entity indexes universally map `name.keyword` and `id.keyword`, and the `_score`/export paths already relied on this. Worth a quick confirmation that no specialized index (e.g. data-quality/relationship-only indexes) is queried through this builder without those fields. If such an index exists, set `unmapped_type` on the tiebreaker sorts to keep them harmless.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant