fix(search): prevent ES/OS document rejections via engine-native mapping hardening by mohityadav766 · Pull Request #28671 · open-metadata/OpenMetadata

mohityadav766 · 2026-06-03T15:25:40Z

Problem

Search documents were being silently rejected by Elasticsearch/OpenSearch and dead-lettered by SearchIndexRetryWorker (4xx marked non-retryable). Root causes found in the mappings:

66% (2,728 / 4,079) of keyword field defs have no ignore_above → a value > 32,766 bytes throws an immense-term IllegalArgumentException and the whole document is rejected.
No numeric/date guard → out-of-range integer, NaN/Infinity float, or malformed date rejects the document.
Unbounded recursive flattening of columns and schemaFields → deep/wide structures explode the document.
Free-form per-run blobs on pipelineStatuses (config, metadata) → arbitrary keys dynamically mapped, causing string-then-object type conflicts and field-count blowups at reindex.

Approach — engine-native hardening (zero per-document cost)

Harden the mapping once at index creation and let the engine enforce the bounds, instead of walking every document at index time (doesn't scale to large docs).

SearchIndexSettings.harden(content, limits):

ignore_above (= byte-safe 8,191 = keywordMaxBytes/4) on keyword fields + keyword multi-fields + flattened
ignore_malformed on numeric/date/boolean
depth_limit on flattened
tunable index.mapping.{depth,nested_objects,total_fields}.limit
never overwrites existing values

Applied on both the direct createIndex and the index-template paths, for ES and OS. OsUtils strips the ES-only ignore_above/depth_limit (and ignore_malformed on boolean) when converting flattened → flat_object for OpenSearch.

Structural explosion (the one thing engines can't truncate gracefully) is capped at the source:

ColumnIndex / ColumnSearchIndex — depth + column-count caps (global, not per-level)
new SchemaFieldFlattener — shared depth + field-count cap for Topic & APIEndpoint schemaFields (also dedupes two identical copies)
IngestionPipelineIndex — drops the free-form config and metadata maps from each run status before indexing (searchable fields — state, runId, timestamps — are preserved)

Limits are config-tunable via ElasticSearchConfiguration.searchIndexingLimits (enableMappingHardening, keywordMaxBytes, mappingDepthLimit, nestedObjectsLimit, totalFieldsLimit, maxColumns).

Admin-editable, persisted mappings (DB-backed + UI)

The hardened mapping is no longer a read-only classpath resource — it is seeded into settings and is the effective mapping used at index creation:

New searchIndexMappings setting (SEARCH_INDEX_MAPPINGS) stores mappings keyed by language → entity type, hardened once at seed time (SearchIndexMappingsSeeder, insert-if-absent so admin edits are never clobbered). Seeded on fresh install and via the v2.0.0 upgrade migration.
SearchRepository.readIndexMapping() prefers the stored slice and falls back to the hardened resource (fresh-install first boot / newly added entity types).
IndexMappingVersionTracker computes drift against the effective mapping, so an admin edit (or a code-default change) surfaces as a reindex-required drift.
Admin REST endpoints on SystemResource: list, get (?fallback), update (PUT, re-hardened before storage), reset-to-default (PUT).
New admin Search Index Mappings settings page (Preferences) — pick language + entity type, edit the JSON mapping, save or reset to default. Saving persists only; a banner tells the admin the change applies on the next reindex.

Coverage (exhaustive)

Recursive flatten-into-one-doc exists in exactly two places (columns, schemaFields) — both capped.
All flattened fields hardened; extension (and columns.extension, dataModel.columns.extension) is switched to a non-indexed object (enabled:false) across all four languages — arbitrary custom-property values can no longer reject a document. Behaviour change: extension.* contents are no longer searchable; see Edge cases.
Non-recursive nested arrays (mcp_*, testSuites, …) bounded by entity size + nested_objects.limit + leaf ignore_above.

Edge cases analysed

A focused audit of the hardening, persistence, and OS-transform paths confirmed the design holds; verdicts:

harden() is idempotent / non-destructive — never overwrites an existing ignore_above/ignore_malformed/limit; recurses into nested properties and keyword multi-fields; no-ops safely when a mapping has no top-level mappings.
OpenSearch transform runs on the stored mapping — the OS flattened → flat_object + boolean-ignore_malformed strip is applied to the effective (stored) mapping at both create and update, so ES-only params never reach OS.
Column / schema-field caps are global (shared accumulator across recursion), depth and count guards can't loop or off-by-one, and at most maxColumns items are kept.
extension → enabled:false is consistent across en/jp/ru/zh. The tradeoff: extension.<prop> fields configured as searchable in Search Settings will return no matches after reindex. This is intentional (arbitrary custom-property values were the prime immense-term/explosion source) — called out here for release notes.
pipelineStatuses was only half-guarded — the original change stripped config but left metadata, an identically free-form per-run map. It carried the same dynamic type-conflict exposure and could still push the index past total_fields.limit (a hard rejection on ES7/OS). Now both are stripped.
searchIndexMappings limits are boot-time config (not a runtime-editable setting), so the cached SearchFieldLimits.active() cannot go stale without a restart.

Tests

SearchIndexSettingsTest (25) — per-type hardening across all 16 field types, multi-fields, flattened + column-level extension, no-override, settings-limit injection
OsUtilsTest (+1) — flat_object transform strips ES-only params
ColumnIndexLimitTest (4), SchemaFieldFlattenerTest (2) — depth + count caps
IngestionPipelineIndexTest — asserts both config and metadata are stripped while searchable status fields are kept
SearchIndexMappingsSeederTest, IndexMappingVersionTrackerTest — seeding + effective-mapping drift
IndexingLimitsIT — against the real engine (both ES and OS via the two CI profiles): over-limit keyword and malformed numbers are rejected on a raw mapping and accepted on a hardened one

Out of scope (follow-ups)

UI-editable searchIndexingLimits (they are config-only today)
One-click reindex trigger from the mappings page (today the banner instructs the admin to reindex)
dynamic: strict (one dynamic:true hole: test_case_resolution_status:testCaseResolutionStatusDetails)
completion suggester reserved-char stripping (9 suggest fields)

🤖 Generated with Claude Code

…ing hardening Documents were being silently rejected by Elasticsearch/OpenSearch (immense-term on keyword > 32766 bytes, malformed numbers/dates, nested/depth explosion) and dead-lettered by the retry worker. Root cause: 66% of keyword field defs had no ignore_above, no numeric/date guards, and unbounded recursive flattening. Harden mappings once at index creation (zero per-document cost; the engine enforces the bounds): - SearchIndexSettings.harden injects ignore_above (keyword + multi-fields + flattened), ignore_malformed (numeric/date/boolean), depth_limit (flattened), and tunable index.mapping.*.limit guardrails; never overwrites existing values. - Applied on both the direct createIndex and the index-template paths (ES + OS). - OsUtils strips ES-only ignore_above/depth_limit when converting flattened to flat_object for OpenSearch. Cap structural explosion at the source (the one thing engines cannot truncate): - ColumnIndex/ColumnSearchIndex: depth + column-count caps. - New SchemaFieldFlattener: shared depth + field-count cap for Topic and APIEndpoint schemaFields (dedupes two identical copies). Limits are config-tunable via ElasticSearchConfiguration.searchIndexingLimits (enableMappingHardening, keywordMaxBytes, mappingDepthLimit, nestedObjectsLimit, totalFieldsLimit, maxColumns). Tests: SearchIndexSettingsTest (per-type hardening incl. all 16 field types + flattened/extension), OsUtilsTest (flat_object strip), ColumnIndexLimitTest, SchemaFieldFlattenerTest; IndexingLimitsIT proves raw mappings reject and hardened mappings accept against the real engine (per ES/OS profile). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR hardens Elasticsearch/OpenSearch index mappings at creation time (to prevent document rejections) and adds configurable caps to recursive search-document flattening (columns and schemaFields) to avoid structural explosions during indexing.

Changes:

Introduces searchIndexingLimits configuration (mapping hardening + engine limit settings + max columns cap).
Adds SearchIndexSettings + SearchFieldLimits to inject ignore_above, ignore_malformed, depth_limit, and index.mapping.*.limit settings into mappings on index/template creation (ES + OS).
Caps recursive flattening for columns and schemaFields (Topic/APIEndpoint) and adds unit + integration coverage.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
openmetadata-spec/src/main/resources/json/schema/configuration/elasticSearchConfiguration.json	Adds `searchIndexingLimits` config block and defaults for mapping/indexing limits.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexSettings.java	Implements create-time mapping hardening (ignore_above / ignore_malformed / mapping limits).
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchFieldLimits.java	Resolves limits from config (with defaults) and exposes derived thresholds/caps.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OsUtils.java	Ensures ES-only flattened params are removed when converting to OpenSearch `flat_object`.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchIndexManager.java	Applies mapping hardening prior to OpenSearch mapping transformation on index creation.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchGenericManager.java	Applies mapping hardening prior to OpenSearch mapping transformation on template creation.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java	Applies mapping hardening on Elasticsearch index creation.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchGenericManager.java	Applies mapping hardening on Elasticsearch index-template creation.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SchemaFieldFlattener.java	New shared, bounded flattener for schemaFields used by Topic/APIEndpoint indexing.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TopicIndex.java	Switches schemaFields flattening to the shared bounded flattener.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/APIEndpointIndex.java	Switches schemaFields flattening to the shared bounded flattener.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ColumnSearchIndex.java	Adds depth + max-columns caps to static column flattening.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ColumnIndex.java	Adds depth + max-columns caps to column flattening during index-doc building.
openmetadata-service/src/test/java/org/openmetadata/service/search/SearchIndexSettingsTest.java	Unit tests for mapping hardening behavior across field types/settings.
openmetadata-service/src/test/java/org/openmetadata/service/search/opensearch/OsUtilsTest.java	Verifies ES-only flattened params are stripped for `flat_object`.
openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/SchemaFieldFlattenerTest.java	Verifies schemaFields flattening stops at depth + count caps.
openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/ColumnIndexLimitTest.java	Verifies column flattening stops at depth + count caps (interface + static).
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/IndexingLimitsIT.java	Integration test proving hardened mappings accept docs that raw mappings reject.

+  private static int clampKeywordBytes(Integer value) {
+    int resolved = orDefault(value, LUCENE_KEYWORD_MAX_BYTES);
+    return Math.min(resolved, LUCENE_KEYWORD_MAX_BYTES);
+  }


+@Execution(ExecutionMode.CONCURRENT)
+public class IndexingLimitsIT {
+
+  private static final List<String> CREATED_INDICES = new ArrayList<>();
+


github-actions · 2026-06-03T15:31:34Z

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

@execution

- SearchFieldLimits.loadActive: don't permanently cache defaults when IndexMappingLoader isn't initialized yet (would silently ignore configured limits for the JVM lifetime); only cache once the config is resolvable. - SearchFieldLimits.clampKeywordBytes: floor keywordMaxBytes at 4 so ignore_above (= value/4) can never be 0 (which would disable keyword indexing); schema minimum bumped to 4. - ColumnIndex / SchemaFieldFlattener: pass the fully-qualified name (not the local name) into the recursion so deeply nested (>2 levels) columns/fields get correct dotted paths (a.b.c, not b.c). - IndexingLimitsIT: CopyOnWriteArrayList for CREATED_INDICES (was a plain ArrayList mutated under @execution(CONCURRENT)). - Schema docs: clarify nested_objects.limit rejects (not truncates) and that maxColumns also caps schema-field flattening. - Tests: FQN-path assertions for columns and schema fields; tiny keywordMaxBytes ignore_above >= 1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 21 changed files in this pull request and generated 2 comments.

+        Column col = columns.get(index);
+        if (col.getTags() != null) {
+          tags = col.getTags();
+        }
+        String columnName = addFlattenColumn(col, optParentColumn, tags, flattenColumns);


+        Field field = fields.get(index);
+        if (field.getTags() != null) {
+          tags = field.getTags();
+        }
+        String fieldName = addFlattenField(field, optParentField, tags, flattenSchemaFields);


github-actions · 2026-06-03T16:17:19Z

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

github-actions · 2026-06-03T16:34:07Z

Jest test Coverage

UI tests summary

Lines	Statements	Branches	Functions
	63% (70457/111823)	45.87% (40630/88560)	47.27% (12572/26595)

github-actions · 2026-06-03T18:46:57Z

🔴 Playwright Results — 12 failure(s), 48 flaky

✅ 3597 passed · ❌ 12 failed · 🟡 48 flaky · ⏭️ 39 skipped

Shard	Passed	Failed	Flaky	Skipped
🔴 Shard 1	371	9	6	11
🟡 Shard 3	817	0	3	7
🔴 Shard 4	796	2	5	20
🔴 Shard 5	844	1	19	0
🟡 Shard 6	769	0	15	1

Genuine Failures (failed on all attempts)

❌ Pages/Lineage/LineageInteraction.spec.ts › Verify edge click opens edge drawer (shard 1)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoBeVisible�[2m(�[22m�[2m)�[22m failed

Locator: getByTestId('edge-pw-database-service-de51c224.pw-database-c0450725.pw-database-schema-315c7a9d.pw-table-de3fa3ca-f3a2-43a4-94b5-a9e279dceda4-undefined')
Expected: visible
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toBeVisible" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('edge-pw-database-service-de51c224.pw-database-c0450725.pw-database-schema-315c7a9d.pw-table-de3fa3ca-f3a2-43a4-94b5-a9e279dceda4-undefined')�[22m

❌ Pages/Lineage/LineageInteraction.spec.ts › Verify edge delete button in drawer (shard 1)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoBeVisible�[2m(�[22m�[2m)�[22m failed

Locator: getByTestId('edge-pw-database-service-bc9a7590.pw-database-c1555126.pw-database-schema-febcd97b.pw-table-55a5a92d-adce-4bf9-989b-a146e07bfbcd-undefined')
Expected: visible
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toBeVisible" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('edge-pw-database-service-bc9a7590.pw-database-c1555126.pw-database-schema-febcd97b.pw-table-55a5a92d-adce-4bf9-989b-a146e07bfbcd-undefined')�[22m

❌ Pages/Lineage/LineageInteraction.spec.ts › Verify node panel opens on click (shard 1)

�[31mTest timeout of 60000ms exceeded.�[39m

❌ Pages/Lineage/LineageInteraction.spec.ts › Verify edit mode with edge operations (shard 1)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoBeVisible�[2m(�[22m�[2m)�[22m failed

Locator: getByTestId('edge-pw-database-service-3327e29c.pw-database-f18e8d5d.pw-database-schema-20425e00.pw-table-7ae6c7db-c5f9-4107-890b-8f1aa4f525c2-undefined')
Expected: visible
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toBeVisible" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('edge-pw-database-service-3327e29c.pw-database-f18e8d5d.pw-database-schema-20425e00.pw-table-7ae6c7db-c5f9-4107-890b-8f1aa4f525c2-undefined')�[22m

❌ Pages/Lineage/PlatformLineage.spec.ts › Verify table search with special characters as handled (shard 1)

TimeoutError: page.waitForResponse: Timeout 30000ms exceeded while waiting for event "response"

❌ Pages/Lineage/PlatformLineage.spec.ts › Verify service platform view (shard 1)

TimeoutError: page.waitForResponse: Timeout 30000ms exceeded while waiting for event "response"

❌ Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 1)

TimeoutError: page.waitForResponse: Timeout 30000ms exceeded while waiting for event "response"

❌ Pages/Lineage/PlatformLineage.spec.ts › Verify platform view switching (shard 1)

TimeoutError: page.waitForResponse: Timeout 30000ms exceeded while waiting for event "response"

❌ Flow/SearchRBAC.spec.ts › a dashboard-scoped user sees dashboards but never tables (shard 1)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveCount�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  getByTestId('search-container').locator('[data-testid="entity-header-display-name"]').filter({ hasText: 'pw table 97432872-a432-46e2-8802-e6af0905696f' })
Expected: �[32m0�[39m
Received: �[31m1�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveCount" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('search-container').locator('[data-testid="entity-header-display-name"]').filter({ hasText: 'pw table 97432872-a432-46e2-8802-e6af0905696f' })�[22m
�[2m    19 × locator resolved to 1 element�[22m
�[2m       - unexpected value "1"�[22m

❌ Pages/CustomProperties.spec.ts › Create custom property and configure search for Dashboard (shard 4)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Pages/CustomProperties.spec.ts › Create custom property and configure search for Pipeline (shard 4)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Pages/ExploreBrowse.spec.ts › service type drill-down disables unrelated roots and query-panel Clear resets it (shard 5)

�[31mTest timeout of 180000ms exceeded.�[39m

🟡 48 flaky test(s) (passed on retry)

Features/DataAssetRulesDisabled.spec.ts › Verify the Spreadsheet entity item action after rules disabled (shard 1, 1 retry)
Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Topic (shard 1, 1 retry)
Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Dashboard (shard 1, 1 retry)
Pages/Login.spec.ts › Refresh should work (shard 1, 1 retry)
Flow/SearchRBAC.spec.ts › User without permission (shard 1, 1 retry)
Flow/SearchRBAC.spec.ts › User without permission (shard 1, 1 retry)
Features/SearchExport.spec.ts › Export queues a background job and downloads from the jobs tray (shard 3, 1 retry)
Features/Tasks/TaskNavigation.spec.ts › navigating to /table/TASK-XXXXX should show 404 (invalid URL pattern) (shard 3, 1 retry)
Features/UserProfileOnlineStatus.spec.ts › Should not show online status for inactive users (shard 3, 1 retry)
Flow/CustomizeWidgets.spec.ts › My Data Widget (shard 4, 1 retry)
Flow/PlatformLineage.spec.ts › Verify Platform Lineage View (shard 4, 2 retries)
Pages/CustomProperties.spec.ts › Email (shard 4, 2 retries)
Pages/CustomProperties.spec.ts › Set & Update all CP types on dashboardDataModel (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for Worksheet (shard 4, 1 retry)
Pages/Entity.spec.ts › Domain Add, Update and Remove (shard 5, 1 retry)
Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
Pages/Entity.spec.ts › Glossary Term Add, Update and Remove for child entities (shard 5, 1 retry)
Pages/Entity.spec.ts › Inactive Announcement create & delete (shard 5, 1 retry)
Pages/Entity.spec.ts › Inactive Announcement create & delete (shard 5, 1 retry)
Pages/Entity.spec.ts › Update displayName (shard 5, 1 retry)
Pages/Entity.spec.ts › Domain Propagation (shard 5, 1 retry)
Pages/Entity.spec.ts › User as Owner with unsorted list (shard 5, 1 retry)
Pages/Entity.spec.ts › Update description (shard 5, 1 retry)
Pages/Entity.spec.ts › Glossary Term Add, Update and Remove (shard 5, 1 retry)
Pages/Entity.spec.ts › User should be denied access to edit description when deny policy rule is applied on an entity (shard 5, 1 retry)
Pages/EntityDataConsumer.spec.ts › Follow & Un-follow entity (shard 5, 1 retry)
Pages/EntityDataConsumer.spec.ts › UpVote & DownVote entity (shard 5, 1 retry)
Pages/EntityDataConsumer.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
Pages/EntityDataSteward.spec.ts › Update description (shard 5, 1 retry)
Pages/EntityDataSteward.spec.ts › Glossary Term Add, Update and Remove for child entities (shard 5, 1 retry)
... and 18 more

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot

Pull request overview

Copilot reviewed 18 out of 21 changed files in this pull request and generated 2 comments.

+    Optional<String> optParentField =
+        Optional.ofNullable(parentField).filter(Predicate.not(String::isEmpty));
+    List<TagLabel> tags = new ArrayList<>();
+    int index = 0;
+    boolean capReached = false;
+    while (index < fields.size() && !capReached) {
+      if (flattenSchemaFields.size() >= limits.getMaxColumns()) {
+        LOG.warn(
+            "Reached max indexed schema fields {}; dropping remaining under '{}'",
+            limits.getMaxColumns(),
+            parentField);
+        capReached = true;
+      } else {
+        Field field = fields.get(index);
+        if (field.getTags() != null) {
+          tags = field.getTags();
+        }
+        String fieldName = addFlattenField(field, optParentField, tags, flattenSchemaFields);


    Optional<String> optParentColumn =
        Optional.ofNullable(parentColumn).filter(Predicate.not(String::isEmpty));
    List<TagLabel> tags = new ArrayList<>();
-    for (Column col : columns) {
-      String columnName = col.getName();
-      if (optParentColumn.isPresent()) {
-        columnName = FullyQualifiedName.add(optParentColumn.get(), columnName);
-      }
-      if (col.getTags() != null) {
-        tags = col.getTags();
+    int index = 0;
+    boolean capReached = false;
+    while (index < columns.size() && !capReached) {
+      if (flattenColumns.size() >= limits.getMaxColumns()) {
+        LOG.warn(
+            "Reached max indexed columns {}; dropping remaining columns under '{}'",
+            limits.getMaxColumns(),
+            parentColumn);
+        capReached = true;
+      } else {
+        Column col = columns.get(index);
+        if (col.getTags() != null) {
+          tags = col.getTags();
+        }
+        String columnName = addFlattenColumn(col, optParentColumn, tags, flattenColumns);
+        if (col.getChildren() != null) {


…o feat/schema-indexing-safety

github-actions · 2026-06-29T13:26:31Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

The status indexer stripped only the free-form `config` map from each pipeline run status; `metadata` is the same arbitrary per-run map and carried the identical exposure — its keys were dynamically mapped, risking string-then-object type conflicts and pushing the index past total_fields.limit (a hard document rejection). Strip both via a single NON_SEARCHABLE_STATUS_FIELDS set; searchable fields (state, runId, timestamps) are preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Restructure the page to match the established settings-page conventions: a header with right-aligned Save/Reset actions, an info banner, a toolbar card with labelled responsive language + (searchable) entity-type selectors, and a bordered editor card showing the active entity/language with a Format action, loader, and empty state. Heights and layout move to a dedicated .less using design tokens. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tsIT The rest5 low-level client surfaces a rejected write as the Response carrying the 4xx rather than always throwing, so rejects() — which relied on catching an exception — always returned false, failing every "raw must reject" assertion on both engines. Inspect the status code (and still unwrap a thrown ResponseException), mirroring how ElasticSearchClient reads e.getResponse(); accepts() already used the status code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… mapping The recursive column/struct flatten that builds columnNamesFuzzy was capped at the global depth limit (20), so a column nested deeper than 20 was never searchable and could not be recovered. Resolve the depth per entity type from the stored, admin-editable index mapping (settings.index.mapping.depth.limit) via SearchFieldLimits.forEntity(...), memoized and invalidated when the mapping setting changes. Raising the depth through the search-index-mappings API and reindexing now makes deeper column names searchable; the default 20 still protects against explosion. SearchIndexImmenseTermIT is reworked to prove this end to end (default depth drops a 25-level column, raising it via the API restores searchability), and the Playwright counterpart asserts searchability via an in-limit column while still guarding that the oversized deeply nested column never breaks indexing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

buildEntityMapping fed the request-supplied `language` path parameter straight into a classpath resource path (getIndexMappingFile -> getResource AsStream), which CodeQL flagged as path injection. Validate it against the trusted IndexMappingLanguage allowlist first; an unknown language now yields no mapping (404 on the endpoint) instead of an arbitrary resource read. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-06-30T07:21:23Z

Code Review 👍 Approved with suggestions 5 resolved / 7 findings

Hardens search index mappings with engine-native limits and introduces persistent, UI-editable mapping management to prevent document rejection. Ensure the read-modify-write cycle in spliceSearchIndexMapping is protected and add entity type validation to updateSearchIndexMapping to prevent configuration drifts.

💡 Bug: Read-modify-write race can drop concurrent mapping edits

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:393-407

spliceSearchIndexMapping reads the whole blob (getOrBuildSearchIndexMappings), mutates one (language, entityType) slice in memory, then writes the entire blob back via createOrUpdate. Two concurrent admin saves (or a save racing a reset) for different entities/languages will each read the pre-update blob and the last writer wins, silently discarding the other's change since the full document is overwritten rather than patched.

This is admin-only and low frequency, so impact is limited, but the lost-update is silent. Consider serializing these updates (e.g. optimistic version check on the settings row, or a DB-side JSON merge) so concurrent edits to different slices don't clobber each other.

💡 Quality: updateSearchIndexMapping does not validate entityType exists

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/system/SystemResource.java:367-381 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:378-381

validateSearchIndexMappingRequest validates the language and that the body contains a mappings.properties object, but never validates that entityType is a known entity/index type. upsertSearchIndexMapping then stores the slice under that arbitrary key. Because SearchRepository.resolveEntityType() derives the entityType key from entityIndexMap (the IndexMappingLoader registry), a mapping persisted under a typo'd or non-existent entityType will never be matched at read time — it becomes silent dead config that the admin believes is active, and there is no feedback that the edit will never take effect.

Suggested fix: in validateSearchIndexMappingRequest, reject (BadRequestException / NotFoundException) when entityType is not present in IndexMappingLoader.getInstance().getIndexMapping().

Validate entityType against the known index registry before persisting.

private void validateSearchIndexMappingRequest(
    String language, String entityType, Map<String, Object> mapping) {
  if (!SearchIndexMappingsSeeder.supportedLanguages()
      .contains(language.toLowerCase(Locale.ROOT))) {
    throw new BadRequestException("Unsupported search index mapping language: " + language);
  }
  if (!IndexMappingLoader.getInstance().getIndexMapping().containsKey(entityType)) {
    throw new BadRequestException("Unknown search index entity type: " + entityType);
  }
  if (!hasMappingProperties(mapping)) {
    throw new BadRequestException("Index mapping must contain a 'mappings.properties' object");
  }
}

✅ 5 resolved

✅ Bug: SearchFieldLimits caches defaults permanently if loaded pre-init

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchFieldLimits.java:89-103
loadActive() unconditionally assigns active = result even when IndexMappingLoader.getInstance() throws IllegalStateException (not yet initialized). Because active() returns the cached value forever after first invocation, any call to SearchFieldLimits.active() that happens before IndexMappingLoader.init() permanently caches the hardcoded defaults(), after which the operator-configured searchIndexingLimits (keywordMaxBytes, depth/nested/total limits, maxColumns) are silently ignored for the rest of the JVM's lifetime. There is no production code that calls setActive(...) with the resolved configuration, so correctness relies entirely on active() never being invoked before the loader is initialized. Consider not caching when the loader is uninitialized (so a later call can pick up the real configuration), or explicitly calling setActive(SearchFieldLimits.from(config)) during search startup.

✅ Bug: updateIndex (PutMapping) path bypasses mapping hardening

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:83-97 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:150-164
createIndexInternal now runs SearchIndexSettings.harden(...) and the index-template path is hardened too, but updateIndex(IndexMapping, indexMappingContent) sends the raw indexMappingContent straight to PutMappingRequest without hardening. updateIndex is reached at runtime via SearchRepository.updateIndex → ElasticSearchClient.updateIndex (and the OpenSearch equivalent), which is exactly the path that adds new fields to existing indexes during schema/version changes. New keyword fields added through this path receive no ignore_above, and new numeric/date fields receive no ignore_malformed, so the very immense-term / malformed-value document rejections this PR sets out to prevent can be reintroduced on any index that is updated rather than freshly created. This is an exhaustiveness gap relative to the PR's stated coverage. Recommend hardening the field definitions in the update path as well (a field-only variant, since PutMapping accepts mapping properties rather than index settings).

✅ Edge Case: Removing mustExist(false) reintroduces TOCTOU failure in swapAliases

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchIndexManager.java:446-457 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:365-375
In both OpenSearchIndexManager.swapAliases and ElasticSearchIndexManager.swapAliases, the removeIndex(...) action no longer sets mustExist(false). The justification is that resolveCanonicalRemoval only forwards indices already confirmed to exist via indexExists(). However, indexExists() runs earlier than the _aliases request, so there is a check-then-act (TOCTOU) window: if a concurrent reindex/delete removes that concrete index in between, the remove_index action defaults to must_exist=true semantics and the engine throws index_not_found_exception, which fails the entire atomic request — meaning the alias add for newIndex is also rolled back and swapAliases returns false. Previously mustExist(false) made the removal a tolerant no-op precisely to avoid this. The change is acceptable for OpenSearch (whose parser rejects must_exist), but for Elasticsearch it was removed only for "byte-for-byte alignment," trading away race tolerance the ES client could have kept. Impact is limited (the failure is caught, logged, and returns false rather than corrupting state), so severity is minor, but callers relying on the swap succeeding may see intermittent failures under concurrent index churn. Consider keeping mustExist(false) on the ES path, or documenting/handling the false return at the call site.

✅ Bug: Stored mapping shadows bundled resource mappings on upgrade

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:929-943 📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v203/MigrationUtil.java:31-45 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/settings/SettingsCache.java:531-542
SearchRepository.readIndexMapping now prefers the persisted searchIndexMappings setting (getStoredMapping) over the classpath resource, and only falls back to the resource when no stored slice exists. The blob is seeded once — at fresh install (SettingsCache.createDefaultConfiguration) or via the v203 migration (MigrationUtil.seedSearchIndexMappings), and both paths are insert-if-absent (getConfigWithKey(...) == null).

Consequence: after the row is seeded, any future OpenMetadata upgrade that ships changes to the bundled mapping resource JSON (new explicit fields, analyzer/type changes, additional ignore_above hardening) will be silently ignored for every already-seeded (language, entityType) pair, because the stale stored blob always wins and is never re-seeded or merged. Previously getIndexMapping read the resource directly, so mapping improvements took effect on the next reindex automatically. Admins now have to manually reset each entity/language to pick up shipped mapping changes — and they have no signal that a reset is needed. This can cause subtle search regressions after upgrades (e.g. a new field indexed only via dynamic mapping, or a lost analyzer change).

Consider re-seeding/merging the stored blob when the bundled mapping version changes (there is already an IndexMappingVersionTracker for drift), or layering stored overrides on top of the current resource rather than fully replacing it.

✅ Edge Case: enableMappingHardening=false has no effect once mappings are seeded

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:929-943 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexSettings.java:70-78 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/settings/SettingsCache.java:531-542
The enableMappingHardening flag is only consulted inside SearchIndexSettings.harden(), which guards the resource-fallback path (getHardenedResourceMapping) and the upsert/reset paths. However, the default mapping blob is seeded once into the searchIndexMappings setting at first boot / migration with hardening applied (flag defaults to true). On every subsequent reindex, SearchRepository.readIndexMapping() prefers getStoredMapping(), which returns the already-hardened stored slice WITHOUT re-consulting the flag.

Consequence: an operator who later sets enableMappingHardening=false (e.g. because ignore_above is dropping legitimately long keyword values from search/aggregations) sees no change — the stored blob remains hardened. The only way to truly disable hardening is to resetSearchIndexMapping every entity/language slice while the flag is false. This asymmetry is surprising and undocumented.

Suggested fix: either re-apply/strip hardening based on the current flag when reading stored mappings, or document clearly that the flag only governs initial seed + reset, and provide a global re-seed action that respects the flag.

🤖 Prompt for agents

Code Review: Hardens search index mappings with engine-native limits and introduces persistent, UI-editable mapping management to prevent document rejection. Ensure the read-modify-write cycle in `spliceSearchIndexMapping` is protected and add entity type validation to `updateSearchIndexMapping` to prevent configuration drifts.

1. 💡 Bug: Read-modify-write race can drop concurrent mapping edits
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:393-407

   `spliceSearchIndexMapping` reads the whole blob (`getOrBuildSearchIndexMappings`), mutates one (language, entityType) slice in memory, then writes the entire blob back via `createOrUpdate`. Two concurrent admin saves (or a save racing a reset) for different entities/languages will each read the pre-update blob and the last writer wins, silently discarding the other's change since the full document is overwritten rather than patched.
   
   This is admin-only and low frequency, so impact is limited, but the lost-update is silent. Consider serializing these updates (e.g. optimistic version check on the settings row, or a DB-side JSON merge) so concurrent edits to different slices don't clobber each other.

2. 💡 Quality: updateSearchIndexMapping does not validate entityType exists
   Files: openmetadata-service/src/main/java/org/openmetadata/service/resources/system/SystemResource.java:367-381, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:378-381

   `validateSearchIndexMappingRequest` validates the language and that the body contains a `mappings.properties` object, but never validates that `entityType` is a known entity/index type. `upsertSearchIndexMapping` then stores the slice under that arbitrary key. Because `SearchRepository.resolveEntityType()` derives the entityType key from `entityIndexMap` (the IndexMappingLoader registry), a mapping persisted under a typo'd or non-existent entityType will never be matched at read time — it becomes silent dead config that the admin believes is active, and there is no feedback that the edit will never take effect.
   
   Suggested fix: in `validateSearchIndexMappingRequest`, reject (BadRequestException / NotFoundException) when `entityType` is not present in `IndexMappingLoader.getInstance().getIndexMapping()`.

   Fix (Validate entityType against the known index registry before persisting.):
   private void validateSearchIndexMappingRequest(
       String language, String entityType, Map<String, Object> mapping) {
     if (!SearchIndexMappingsSeeder.supportedLanguages()
         .contains(language.toLowerCase(Locale.ROOT))) {
       throw new BadRequestException("Unsupported search index mapping language: " + language);
     }
     if (!IndexMappingLoader.getInstance().getIndexMapping().containsKey(entityType)) {
       throw new BadRequestException("Unknown search index entity type: " + entityType);
     }
     if (!hasMappingProperties(mapping)) {
       throw new BadRequestException("Index mapping must contain a 'mappings.properties' object");
     }
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

…ply on upgrade Addresses review feedback that the searchIndexMappings setting, once seeded with the full default blob, permanently shadowed the bundled resource: after seeding, readIndexMapping always preferred the stored (now stale) slice, so a later release shipping mapping improvements was silently ignored until an admin manually reset each entity, with no signal one was needed. The enableMappingHardening flag was likewise a no-op once seeded. The setting now holds only entities an admin has explicitly edited. Un-edited entities resolve from the current bundled hardened resource (the fallback that already existed), so upgrades apply automatically on the next reindex and drift is detected by IndexMappingVersionTracker. seedIfAbsent initializes an empty override store; listSearchIndexMappings now enumerates editable entities from the bundled registry rather than the stored keys (same response shape, so the UI is unaffected). Hardening now runs at index-create time for un-edited entities — cheap (per index, not per doc). Also validates the entityType on update against the known index-mapping registry (was previously unchecked, persisting dead data for typos). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-06-30T07:25:22Z

Code Review 👍 Approved with suggestions 6 resolved / 7 findings

Hardens search index mappings with engine-native bounds and adds an admin-managed persistence layer to prevent document rejections. Address the identified read-modify-write race condition in spliceSearchIndexMapping to ensure concurrent updates are not lost.

💡 Bug: Read-modify-write race can drop concurrent mapping edits

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:393-407

spliceSearchIndexMapping reads the whole blob (getOrBuildSearchIndexMappings), mutates one (language, entityType) slice in memory, then writes the entire blob back via createOrUpdate. Two concurrent admin saves (or a save racing a reset) for different entities/languages will each read the pre-update blob and the last writer wins, silently discarding the other's change since the full document is overwritten rather than patched.

This is admin-only and low frequency, so impact is limited, but the lost-update is silent. Consider serializing these updates (e.g. optimistic version check on the settings row, or a DB-side JSON merge) so concurrent edits to different slices don't clobber each other.

✅ 6 resolved

✅ Bug: SearchFieldLimits caches defaults permanently if loaded pre-init

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchFieldLimits.java:89-103
loadActive() unconditionally assigns active = result even when IndexMappingLoader.getInstance() throws IllegalStateException (not yet initialized). Because active() returns the cached value forever after first invocation, any call to SearchFieldLimits.active() that happens before IndexMappingLoader.init() permanently caches the hardcoded defaults(), after which the operator-configured searchIndexingLimits (keywordMaxBytes, depth/nested/total limits, maxColumns) are silently ignored for the rest of the JVM's lifetime. There is no production code that calls setActive(...) with the resolved configuration, so correctness relies entirely on active() never being invoked before the loader is initialized. Consider not caching when the loader is uninitialized (so a later call can pick up the real configuration), or explicitly calling setActive(SearchFieldLimits.from(config)) during search startup.

✅ Bug: updateIndex (PutMapping) path bypasses mapping hardening

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:83-97 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:150-164
createIndexInternal now runs SearchIndexSettings.harden(...) and the index-template path is hardened too, but updateIndex(IndexMapping, indexMappingContent) sends the raw indexMappingContent straight to PutMappingRequest without hardening. updateIndex is reached at runtime via SearchRepository.updateIndex → ElasticSearchClient.updateIndex (and the OpenSearch equivalent), which is exactly the path that adds new fields to existing indexes during schema/version changes. New keyword fields added through this path receive no ignore_above, and new numeric/date fields receive no ignore_malformed, so the very immense-term / malformed-value document rejections this PR sets out to prevent can be reintroduced on any index that is updated rather than freshly created. This is an exhaustiveness gap relative to the PR's stated coverage. Recommend hardening the field definitions in the update path as well (a field-only variant, since PutMapping accepts mapping properties rather than index settings).

✅ Edge Case: Removing mustExist(false) reintroduces TOCTOU failure in swapAliases

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchIndexManager.java:446-457 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:365-375
In both OpenSearchIndexManager.swapAliases and ElasticSearchIndexManager.swapAliases, the removeIndex(...) action no longer sets mustExist(false). The justification is that resolveCanonicalRemoval only forwards indices already confirmed to exist via indexExists(). However, indexExists() runs earlier than the _aliases request, so there is a check-then-act (TOCTOU) window: if a concurrent reindex/delete removes that concrete index in between, the remove_index action defaults to must_exist=true semantics and the engine throws index_not_found_exception, which fails the entire atomic request — meaning the alias add for newIndex is also rolled back and swapAliases returns false. Previously mustExist(false) made the removal a tolerant no-op precisely to avoid this. The change is acceptable for OpenSearch (whose parser rejects must_exist), but for Elasticsearch it was removed only for "byte-for-byte alignment," trading away race tolerance the ES client could have kept. Impact is limited (the failure is caught, logged, and returns false rather than corrupting state), so severity is minor, but callers relying on the swap succeeding may see intermittent failures under concurrent index churn. Consider keeping mustExist(false) on the ES path, or documenting/handling the false return at the call site.

✅ Bug: Stored mapping shadows bundled resource mappings on upgrade

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:929-943 📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v203/MigrationUtil.java:31-45 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/settings/SettingsCache.java:531-542
SearchRepository.readIndexMapping now prefers the persisted searchIndexMappings setting (getStoredMapping) over the classpath resource, and only falls back to the resource when no stored slice exists. The blob is seeded once — at fresh install (SettingsCache.createDefaultConfiguration) or via the v203 migration (MigrationUtil.seedSearchIndexMappings), and both paths are insert-if-absent (getConfigWithKey(...) == null).

Consequence: after the row is seeded, any future OpenMetadata upgrade that ships changes to the bundled mapping resource JSON (new explicit fields, analyzer/type changes, additional ignore_above hardening) will be silently ignored for every already-seeded (language, entityType) pair, because the stale stored blob always wins and is never re-seeded or merged. Previously getIndexMapping read the resource directly, so mapping improvements took effect on the next reindex automatically. Admins now have to manually reset each entity/language to pick up shipped mapping changes — and they have no signal that a reset is needed. This can cause subtle search regressions after upgrades (e.g. a new field indexed only via dynamic mapping, or a lost analyzer change).

Consider re-seeding/merging the stored blob when the bundled mapping version changes (there is already an IndexMappingVersionTracker for drift), or layering stored overrides on top of the current resource rather than fully replacing it.

✅ Edge Case: enableMappingHardening=false has no effect once mappings are seeded

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:929-943 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexSettings.java:70-78 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/settings/SettingsCache.java:531-542
The enableMappingHardening flag is only consulted inside SearchIndexSettings.harden(), which guards the resource-fallback path (getHardenedResourceMapping) and the upsert/reset paths. However, the default mapping blob is seeded once into the searchIndexMappings setting at first boot / migration with hardening applied (flag defaults to true). On every subsequent reindex, SearchRepository.readIndexMapping() prefers getStoredMapping(), which returns the already-hardened stored slice WITHOUT re-consulting the flag.

Consequence: an operator who later sets enableMappingHardening=false (e.g. because ignore_above is dropping legitimately long keyword values from search/aggregations) sees no change — the stored blob remains hardened. The only way to truly disable hardening is to resetSearchIndexMapping every entity/language slice while the flag is false. This asymmetry is surprising and undocumented.

Suggested fix: either re-apply/strip hardening based on the current flag when reading stored mappings, or document clearly that the flag only governs initial seed + reset, and provide a global re-seed action that respects the flag.

...and 1 more resolved from earlier reviews

🤖 Prompt for agents

Code Review: Hardens search index mappings with engine-native bounds and adds an admin-managed persistence layer to prevent document rejections. Address the identified read-modify-write race condition in `spliceSearchIndexMapping` to ensure concurrent updates are not lost.

1. 💡 Bug: Read-modify-write race can drop concurrent mapping edits
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:393-407

   `spliceSearchIndexMapping` reads the whole blob (`getOrBuildSearchIndexMappings`), mutates one (language, entityType) slice in memory, then writes the entire blob back via `createOrUpdate`. Two concurrent admin saves (or a save racing a reset) for different entities/languages will each read the pre-update blob and the last writer wins, silently discarding the other's change since the full document is overwritten rather than patched.
   
   This is admin-only and low frequency, so impact is limited, but the lost-update is silent. Consider serializing these updates (e.g. optimistic version check on the settings row, or a DB-side JSON merge) so concurrent edits to different slices don't clobber each other.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

…QL path-injection The prior allowlist guard (supportedLanguages().contains(language)) did not clear CodeQL alert #2213 (java/path-injection): a List.contains() check is not recognized as a sanitizer, and the raw request-supplied language was still passed into the classpath resource path. Resolve the language to the matching IndexMappingLanguage constant's own value and build the path from that, so the value reaching getResourceAsStream originates from the enum rather than the request. Unknown languages still yield no mapping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ngLimitsIT Once the rejects() fix let the test reach the hardened-accept assertion, the boolean case failed on OpenSearch: OsUtils strips ignore_malformed from boolean (OpenSearch does not support it), so a hardened boolean index there still rejects a malformed value. Assert acceptance conditionally — every guarded type accepts on both engines except boolean on OpenSearch, which still rejects — instead of assuming universal acceptance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…zzy match The deep-column-searchability assertion was flaky: a 25-level leaf buried in a large concatenated columnNamesFuzzy is reliably *indexed* once the container is built at the raised depth, but the fuzzy query matching it is subject to ES relevance/refresh timing that is unrelated to the depth this test exercises (one run matched instantly, another never did within 120s). Assert the deterministic signal instead: the container indexed at the default depth (20) drops the 25-level leaf from columnNamesFuzzy, and one indexed after raising the depth via the search-index-mappings API carries it. The leaf name is checked against the indexed _source.columnNamesFuzzy value (located by the container's unique description), and a short letters-only token keeps the field small. Runs in ~5s, deterministically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Addresses review feedback to use @openmetadata/ui-core-components instead of Ant Design: Button/Select/Card/Alert/Badge replace the AntD primitives (with Tailwind tw: utilities for layout), TitleBreadcrumb is replaced by the core-components-backed HeaderBreadcrumb, and the custom SelectOption interface is dropped in favour of the library's SelectItemType. The .less now imports variables with (reference) so it does not duplicate their output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The "indexed before delete" precondition in recursiveHardDelete_serviceSubtree_leavesNoOrphansAndSearchClean used a tight 60s, while the post-delete checks in the same method already allow 90s/180s. Secondary docs (e.g. column_search_index) are written on the async per-entity indexing lane and can sit briefly behind a concurrent full-reindex alias swap under full IT load, flaking the 60s bound. Raise it to 120s. Robustness-only: it cannot turn a real regression into a false pass, since the asserted post-delete cleanup is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-06-30T13:10:25Z

Code Review 👍 Approved with suggestions 6 resolved / 7 findings

Hardens search index mappings with engine-native bounds and adds an admin-managed persistence layer to prevent document rejections. Address the identified read-modify-write race condition in spliceSearchIndexMapping to ensure concurrent updates are not lost.

💡 Bug: Read-modify-write race can drop concurrent mapping edits

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:393-407

spliceSearchIndexMapping reads the whole blob (getOrBuildSearchIndexMappings), mutates one (language, entityType) slice in memory, then writes the entire blob back via createOrUpdate. Two concurrent admin saves (or a save racing a reset) for different entities/languages will each read the pre-update blob and the last writer wins, silently discarding the other's change since the full document is overwritten rather than patched.

This is admin-only and low frequency, so impact is limited, but the lost-update is silent. Consider serializing these updates (e.g. optimistic version check on the settings row, or a DB-side JSON merge) so concurrent edits to different slices don't clobber each other.

✅ 6 resolved

✅ Bug: SearchFieldLimits caches defaults permanently if loaded pre-init

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchFieldLimits.java:89-103
loadActive() unconditionally assigns active = result even when IndexMappingLoader.getInstance() throws IllegalStateException (not yet initialized). Because active() returns the cached value forever after first invocation, any call to SearchFieldLimits.active() that happens before IndexMappingLoader.init() permanently caches the hardcoded defaults(), after which the operator-configured searchIndexingLimits (keywordMaxBytes, depth/nested/total limits, maxColumns) are silently ignored for the rest of the JVM's lifetime. There is no production code that calls setActive(...) with the resolved configuration, so correctness relies entirely on active() never being invoked before the loader is initialized. Consider not caching when the loader is uninitialized (so a later call can pick up the real configuration), or explicitly calling setActive(SearchFieldLimits.from(config)) during search startup.

✅ Bug: updateIndex (PutMapping) path bypasses mapping hardening

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:83-97 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:150-164
createIndexInternal now runs SearchIndexSettings.harden(...) and the index-template path is hardened too, but updateIndex(IndexMapping, indexMappingContent) sends the raw indexMappingContent straight to PutMappingRequest without hardening. updateIndex is reached at runtime via SearchRepository.updateIndex → ElasticSearchClient.updateIndex (and the OpenSearch equivalent), which is exactly the path that adds new fields to existing indexes during schema/version changes. New keyword fields added through this path receive no ignore_above, and new numeric/date fields receive no ignore_malformed, so the very immense-term / malformed-value document rejections this PR sets out to prevent can be reintroduced on any index that is updated rather than freshly created. This is an exhaustiveness gap relative to the PR's stated coverage. Recommend hardening the field definitions in the update path as well (a field-only variant, since PutMapping accepts mapping properties rather than index settings).

✅ Edge Case: Removing mustExist(false) reintroduces TOCTOU failure in swapAliases

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchIndexManager.java:446-457 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:365-375
In both OpenSearchIndexManager.swapAliases and ElasticSearchIndexManager.swapAliases, the removeIndex(...) action no longer sets mustExist(false). The justification is that resolveCanonicalRemoval only forwards indices already confirmed to exist via indexExists(). However, indexExists() runs earlier than the _aliases request, so there is a check-then-act (TOCTOU) window: if a concurrent reindex/delete removes that concrete index in between, the remove_index action defaults to must_exist=true semantics and the engine throws index_not_found_exception, which fails the entire atomic request — meaning the alias add for newIndex is also rolled back and swapAliases returns false. Previously mustExist(false) made the removal a tolerant no-op precisely to avoid this. The change is acceptable for OpenSearch (whose parser rejects must_exist), but for Elasticsearch it was removed only for "byte-for-byte alignment," trading away race tolerance the ES client could have kept. Impact is limited (the failure is caught, logged, and returns false rather than corrupting state), so severity is minor, but callers relying on the swap succeeding may see intermittent failures under concurrent index churn. Consider keeping mustExist(false) on the ES path, or documenting/handling the false return at the call site.

✅ Bug: Stored mapping shadows bundled resource mappings on upgrade

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:929-943 📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v203/MigrationUtil.java:31-45 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/settings/SettingsCache.java:531-542
SearchRepository.readIndexMapping now prefers the persisted searchIndexMappings setting (getStoredMapping) over the classpath resource, and only falls back to the resource when no stored slice exists. The blob is seeded once — at fresh install (SettingsCache.createDefaultConfiguration) or via the v203 migration (MigrationUtil.seedSearchIndexMappings), and both paths are insert-if-absent (getConfigWithKey(...) == null).

Consequence: after the row is seeded, any future OpenMetadata upgrade that ships changes to the bundled mapping resource JSON (new explicit fields, analyzer/type changes, additional ignore_above hardening) will be silently ignored for every already-seeded (language, entityType) pair, because the stale stored blob always wins and is never re-seeded or merged. Previously getIndexMapping read the resource directly, so mapping improvements took effect on the next reindex automatically. Admins now have to manually reset each entity/language to pick up shipped mapping changes — and they have no signal that a reset is needed. This can cause subtle search regressions after upgrades (e.g. a new field indexed only via dynamic mapping, or a lost analyzer change).

Consider re-seeding/merging the stored blob when the bundled mapping version changes (there is already an IndexMappingVersionTracker for drift), or layering stored overrides on top of the current resource rather than fully replacing it.

✅ Edge Case: enableMappingHardening=false has no effect once mappings are seeded

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:929-943 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexSettings.java:70-78 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/settings/SettingsCache.java:531-542
The enableMappingHardening flag is only consulted inside SearchIndexSettings.harden(), which guards the resource-fallback path (getHardenedResourceMapping) and the upsert/reset paths. However, the default mapping blob is seeded once into the searchIndexMappings setting at first boot / migration with hardening applied (flag defaults to true). On every subsequent reindex, SearchRepository.readIndexMapping() prefers getStoredMapping(), which returns the already-hardened stored slice WITHOUT re-consulting the flag.

Consequence: an operator who later sets enableMappingHardening=false (e.g. because ignore_above is dropping legitimately long keyword values from search/aggregations) sees no change — the stored blob remains hardened. The only way to truly disable hardening is to resetSearchIndexMapping every entity/language slice while the flag is false. This asymmetry is surprising and undocumented.

Suggested fix: either re-apply/strip hardening based on the current flag when reading stored mappings, or document clearly that the flag only governs initial seed + reset, and provide a global re-seed action that respects the flag.

...and 1 more resolved from earlier reviews

🤖 Prompt for agents

Code Review: Hardens search index mappings with engine-native bounds and adds an admin-managed persistence layer to prevent document rejections. Address the identified read-modify-write race condition in `spliceSearchIndexMapping` to ensure concurrent updates are not lost.

1. 💡 Bug: Read-modify-write race can drop concurrent mapping edits
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:393-407

   `spliceSearchIndexMapping` reads the whole blob (`getOrBuildSearchIndexMappings`), mutates one (language, entityType) slice in memory, then writes the entire blob back via `createOrUpdate`. Two concurrent admin saves (or a save racing a reset) for different entities/languages will each read the pre-update blob and the last writer wins, silently discarding the other's change since the full document is overwritten rather than patched.
   
   This is admin-only and low frequency, so impact is limited, but the lost-update is silent. Consider serializing these updates (e.g. optimistic version check on the settings row, or a DB-side JSON merge) so concurrent edits to different slices don't clobber each other.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-06-30T13:28:30Z

Quality Gate passed for 'open-metadata-ui'

Issues
1276 New issues
0 Accepted issues

Measures
13 Security Hotspots
0.0% Coverage on New Code
3.4% Duplication on New Code

See analysis details on SonarQube Cloud

sonarqubecloud · 2026-06-30T14:28:34Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings June 3, 2026 15:25

Copilot started reviewing on behalf of mohityadav766 June 3, 2026 15:25 View session

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jun 3, 2026

gitar-bot Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/search/SearchFieldLimits.java

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Update generated TypeScript types

ae11b47

github-actions Bot requested a review from a team as a code owner June 3, 2026 15:31

Merge branch 'main' into feat/schema-indexing-safety

4a1afce

mohityadav766 had a problem deploying to test June 3, 2026 16:01 — with GitHub Actions Error

Copilot AI review requested due to automatic review settings June 3, 2026 16:11

Copilot started reviewing on behalf of mohityadav766 June 3, 2026 16:11 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Update generated TypeScript types

ec8a077

Merge branch 'main' into feat/schema-indexing-safety

5c79b0b

Copilot AI review requested due to automatic review settings June 4, 2026 05:28

Copilot started reviewing on behalf of mohityadav766 June 4, 2026 05:29 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

mohityadav766 temporarily deployed to test June 4, 2026 05:40 — with GitHub Actions Inactive

mohityadav766 and others added 4 commits June 25, 2026 23:58

Merge remote-tracking branch 'origin/feat/schema-indexing-safety' int…

a196ddd

…o feat/schema-indexing-safety

Update migration

d7cfdaa

Merge branch 'main' into feat/schema-indexing-safety

7c57cf1

Merge branch 'main' into feat/schema-indexing-safety

6604b93

mohityadav766 and others added 10 commits June 29, 2026 20:20

Fix CheckStyle

c2704a6

Merge branch 'main' into feat/schema-indexing-safety

9c5dbb9

Fix Failure

dfbc456

Fix Failure

c1485f7

Merge branch 'main' into feat/schema-indexing-safety

2dda25c

mohityadav766 and others added 4 commits June 30, 2026 13:03

Merge branch 'main' into feat/schema-indexing-safety

f7cf80e

chirag-madlani requested changes Jun 30, 2026

View reviewed changes

mohityadav766 and others added 2 commits June 30, 2026 17:41

Merge branch 'main' into feat/schema-indexing-safety

aba8307

chirag-madlani previously approved these changes Jun 30, 2026

View reviewed changes

Uh oh!

Conversation

mohityadav766 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Approach — engine-native hardening (zero per-document cost)

Admin-editable, persisted mappings (DB-backed + UI)

Coverage (exhaustive)

Edge cases analysed

Tests

Out of scope (follow-ups)

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

✅ TypeScript Types Auto-Updated

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented Jun 3, 2026

✅ TypeScript Types Auto-Updated

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Jest test Coverage

UI tests summary

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Playwright Results — 12 failure(s), 48 flaky

Genuine Failures (failed on all attempts)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

gitar-bot Bot commented Jun 30, 2026

Uh oh!

gitar-bot Bot commented Jun 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gitar-bot Bot commented Jun 30, 2026

Uh oh!

sonarqubecloud Bot commented Jun 30, 2026

Quality Gate passed for 'open-metadata-ui'

Uh oh!

sonarqubecloud Bot commented Jun 30, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mohityadav766 commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading