Commit e23dff8
* fix(it): stop two flaky integration tests (#27649)
GlossaryOntologyExportIT: mark @isolated. @BeforeAll flips RdfUpdater (a
JVM-wide singleton) on, which makes every concurrent test class start
doing synchronous Fuseki writes on entity create, saturating the
Dropwizard thread pool and causing 60s request timeouts. @execution
(SAME_THREAD) alone only serialises within this class.
WorkflowDefinitionResourceIT#triggerWorkflow_SDK: drop the redundant
waitForWorkflowDeployment call — the create path already waits. Add
descriptive aliases to the two await() polls so the next flake tells
us which FQN or workflow name actually timed out instead of an
anonymous lambda.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): never skip live indexing during reindex (#27649)
Live search indexing was silently skipped whenever a reindex job was in
RUNNING/READY/STOPPING state. SearchRepository.createEntityIndex() and
six sibling methods consulted SearchIndexRetryQueue.isEntityTypeSuspended()
and returned early with nothing written, nothing enqueued — entities
vanished from search until a future reindex happened to cover them.
The retry worker doubled down: when the scope refresh observed an active
job, it purged the retry queue; and processRecord() deleted records
whose type was suspended. So even manually enqueued retries were wiped.
This is how the #27649 flake surfaced: AppsResourceIT triggers
SearchIndexingApplication runs and its best-effort 30s wait silently
swallows timeouts. If a run was still RUNNING when AppsResourceIT
finished, the next class in the sequential fork
(WorkflowDefinitionResourceIT) inherited the suspension and its
freshly-created tables were never indexed — waitForEntityIndexedInSearch
then timed out at 120s. Same mechanism bites real users mid-reindex in
production.
Remove the suspension mechanism entirely:
* SearchRepository — drop the 8 isEntityTypeSuspended() early-returns;
the client-availability path already enqueues for retry on its own.
* SearchIndexRetryWorker — drop refreshReindexSuspensionScopeIfNeeded()
and the suspension branches in processRecord(); remove the retry-queue
purge on suspendAll.
* SearchIndexRetryQueue — delete the updateSuspension / clearSuspension
/ isEntityTypeSuspended / isStreamingSuspended / isSuspendAllStreaming
/ getSuspendedEntityTypes API and the static AtomicBoolean /
AtomicReference they backed.
* Drop the two IT cases that asserted the removed behaviour.
Live writes now always reach the search client; reindex and live
writes both target the same indices as before. Version conflicts
between the two paths (stale reindex batch overwriting a newer live
write) remain possible as they did before suspension was introduced —
that is the race suspension was meant to dodge, but dropping writes
altogether was worse than the race.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): route live writes to staged index during reindex (#27649)
The distributed reindex has a TOCTOU: partitions read from a DB
snapshot at T0 and write to a staged index, then at T1 (seconds
later) the alias is atomically swapped from the old index to the
staged one and the old index is deleted. Any entity that live-writers
create between T0 and T1 goes via the alias → old index, and is
destroyed when that old index is deleted post-swap.
The CI log for #27649 shows this directly:
10:13:35 staged table_search_index_rebuild_…_215646 built from snapshot
10:13:40 POST /v1/tables table1_gold → written to alias target
(old index _179670)
10:13:40 table2_silver, table3_bronze, table4_brass all written to
old index _179670
10:13:42 Atomically swapped aliases from [_179670] to _215646
10:13:42 Successfully deleted index _179670
10:13:43+ waitForEntityIndexedInSearch polls, finds nothing, times
out at 2 min
Removing the silent-skip suspension mechanism in the previous commit
exposed this race (it had been hidden by dropping the writes outright,
which was strictly worse).
Route live writes to the staged index during the reindex window:
* SearchRepository gains an activeStagedIndices map (entityType →
stagedIndex) plus register/unregister/resolveWriteIndex. Writes
resolve to the staged index when one is registered for the type,
otherwise to the canonical alias — the existing behaviour.
* DefaultRecreateHandler.recreateIndexFromMapping registers the
staged index as soon as it is created; finalizeReindex and
promoteEntityIndex unregister it on every exit path (successful
swap, swap failure, failed-reindex delete, exception).
* Every live-write path in SearchRepository — createEntityIndex,
createEntitiesIndex, indexTableColumns, indexColumnsForTables,
updateEntityIndex, createTimeSeriesEntity, updateTimeSeriesEntity,
deleteEntityIndex, deleteEntityByFQNPrefix, deleteTimeSeriesEntityById
— goes through resolveWriteIndex instead of reading the canonical
alias directly.
During a reindex, live writes land in the index that the alias will
promote to; after the swap the alias points to that same index and
subsequent writes continue to reach the same place. Old-index deletion
no longer discards fresh data.
Note: searches through the alias during the brief reindex window (<
seconds in the CI log) can miss a write until the swap lands — an
acceptable trade compared to silently dropping the write or losing it
on deletion. The #27649 test tolerates this because its 120s poll
spans many swap cycles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): re-register SearchIndexHandler on every SearchRepository init (#27649)
The previous commit routed live writes through resolveWriteIndex so they
land in the staged index during reindex. The CI log for the next run
showed the register/unregister fire correctly, but the live writes to
tables still went to the canonical alias — as if activeStagedIndices
was empty for the entity type.
Root cause: stale handler pointing at a stale SearchRepository.
TestSuiteBootstrap creates SearchRepository three times (migration,
createIndices, and finally the embedded OpenMetadataApplication). Each
constructor calls registerSearchIndexHandler → new SearchIndexHandler(this)
→ dispatcher.registerHandler(…). EntityLifecycleEventDispatcher.
registerHandler silently SKIPS if a handler with the same name already
exists (see EntityLifecycleEventDispatcher.java:80-86), so the dispatcher
keeps the FIRST SearchIndexHandler forever — bound to the migration-time
SearchRepository.
Meanwhile DefaultRecreateHandler.registerStagedIndex writes into
Entity.getSearchRepository(), which by then is the third (current)
instance. Live writes flowing through the stale handler never see that
entry; resolveWriteIndex falls through to the canonical alias; the alias
swap at the end of the reindex drops the writes, same as before.
Fix: unregister any existing SearchIndexHandler by name before
registering the new one. The latest-constructed SearchRepository always
owns the handler delivered through the dispatcher, so its
activeStagedIndices is the one consulted on every live write.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): centralize staged-index routing on canonical name (#27649)
Re-key activeStagedIndices by canonical index name (e.g.
openmetadata_table_search_index) instead of entity type, and route
every live-write site through a single getWriteIndexName(IndexMapping)
helper.
Why
- Previous routing went through resolveWriteIndex(entityType, mapping)
but only at hand-picked call sites. Several write paths still
resolved indexMapping.getIndexName(clusterAlias) directly and
bypassed routing — bulkIndexPipelineExecutions, deleteByScript,
softDeleteOrRestoreEntity, propagateToDomainChildren,
updateEntityCertificationInSearch, propagateToRelatedEntities (PAGE),
deleteTableColumns, updateTableColumnsInheritedFields. Any reindex
in flight could lose those writes on the alias swap.
- Keying by canonical index name lets any write site resolve correctly
even without entity type in scope (FQN-prefix deletes, child
propagation, script updates).
What
- activeStagedIndices: Map<canonicalIndexName, stagedIndexName>.
- registerStagedIndex(entityType, stagedIndex) now resolves the
canonical name from the IndexMapping before storing.
- New getWriteIndexName(IndexMapping) is the single point of
resolution; routeToStagedIfActive(String) handles raw alias names
(e.g. pipeline_status_search_index resolved via
getIndexOrAliasName).
- Replaced every direct indexMapping.getIndexName(clusterAlias) for
writes with getWriteIndexName(indexMapping). Admin/setup paths
(createIndex/updateIndex/deleteIndex/createOrUpdateIndexTemplate)
intentionally keep canonical names — they manage the alias itself.
- Cascade ops on shared aliases (GLOBAL_SEARCH_ALIAS,
DATA_ASSET_SEARCH_ALIAS, child aliases) are not entity-scoped and
cannot route to a single staged index; left untouched.
- resolveWriteIndex(entityType, mapping) preserved as a thin wrapper
for binary compatibility.
Also runs spotless:apply on the file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(it): bump DB sort memory + search-wait ceilings (#27649)
Two CI failures observed under the parallel-tests fork load on the
post-centralization run:
1. TagResourceIT line 161 (listEntities) — the server returned 500
wrapping "java.sql.SQLException: Out of sort memory, consider
increasing server sort buffer size" from TagDAO.listAfter. The
query joins tag → entity_relationship → classification and orders
by tag.name,tag.id; with the tag table accumulating across many
parallel test classes (reuseForks=true), MySQL's default 256KB
sort_buffer_size overflows. Bump it to 8MB. Add a parallel
work_mem=32MB bump to the postgres command for the same query.
2. TagResourceIT line 1 — Awaitility timeout at 1m30s waiting for a
freshly created tag to appear in search index. Five inherited
waits in BaseEntityIT had a 90s ceiling while the sibling
checkCreatedEntity already used 180s. Standardise on 180s — under
tag-scale data the alias swap that the staged-index routing
depends on can take longer than 90s in slow CI workers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): address Copilot/Gitar review on staged-index routing (#27649)
* DefaultRecreateHandler.finalizeReindex / promoteEntityIndex — wrap
the entire promote block in try/finally so unregisterStagedIndex
always runs, including on swap failure, empty aliasesToAttach, and
exceptions. Without this the routing map could be left pointing at
a staged index nobody reads from, silently diverging live writes
from search results until the next reindex (Copilot, multiple
comments).
* SearchRepository.resolveWriteIndex — deprecate. The entityType
argument is unused; getWriteIndexName(IndexMapping) is the single
resolution point now (Copilot + Gitar).
* SearchRepository.routeToStagedIfActive — tighten the Javadoc to
state explicitly that it expects a canonical index name and that
short/parent aliases are passed through unchanged (Copilot).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): fan out cross-alias update-by-query to staged indices (#27649)
The four bulk update-by-query operations rooted on shared aliases —
updateAssetDomainsForDataProduct, updateAssetDomainsByIds,
updateDomainFqnByPrefix, updateAssetDomainFqnByPrefix — hardcoded their
target to GLOBAL_SEARCH_ALIAS / Entity.DOMAIN. During an in-flight
reindex those updates landed on the about-to-be-discarded active
index only; on alias swap, the new staged index (built from a DB
snapshot taken before the script ran) replaced it and the script's
effect was lost. Copilot called this out four times.
Add SearchRepository.getWriteFanoutTargets(aliasOrIndex) — returns the
caller's alias plus every currently-staged index. Pass that list to
req.index(...) on all four methods in both OpenSearchEntityManager and
ElasticSearchEntityManager. The OS/ES update-by-query API natively
takes a list, so the fan-out is one request per call.
The scripts these methods run are idempotent (UPDATE_ASSET_DOMAIN_SCRIPT
checks `exists` before adding a domain; UPDATE_DOMAIN_FQN_BY_PREFIX_SCRIPT
walks the array and rewrites in place), so applying them again to the
staged index — even if the staged copy of the document already reflects
the latest DB state — converges to the same result.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): scope fan-out to canonical input vs multi-entity alias (#27649)
Previous getWriteFanoutTargets always appended every staged index,
which made entity-scoped update-by-query calls (e.g.
updateDomainFqnByPrefix targeting only the domain canonical index)
fan out onto unrelated staged indices. Adds avoidable load on every
currently-reindexing entity type for an update that should touch one
index.
Branch the implementation on whether the input is a known canonical
entity index name. If yes, only the matching staged index is added.
If no — i.e. the caller is hitting a multi-entity alias such as
GLOBAL_SEARCH_ALIAS — every staged index is added because the
update's match query can hit documents from any reindexing type.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 84ed278 commit e23dff8
11 files changed
Lines changed: 244 additions & 322 deletions
File tree
- openmetadata-integration-tests/src/test/java/org/openmetadata/it
- bootstrap
- tests
- openmetadata-service/src/main/java/org/openmetadata/service/search
- elasticsearch
- opensearch
openmetadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java
Lines changed: 14 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
233 | | - | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
234 | 241 | | |
235 | 242 | | |
236 | 243 | | |
| |||
278 | 285 | | |
279 | 286 | | |
280 | 287 | | |
281 | | - | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
282 | 294 | | |
283 | 295 | | |
284 | 296 | | |
| |||
Lines changed: 6 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4378 | 4378 | | |
4379 | 4379 | | |
4380 | 4380 | | |
4381 | | - | |
| 4381 | + | |
4382 | 4382 | | |
4383 | 4383 | | |
4384 | 4384 | | |
| |||
4440 | 4440 | | |
4441 | 4441 | | |
4442 | 4442 | | |
4443 | | - | |
| 4443 | + | |
4444 | 4444 | | |
4445 | 4445 | | |
4446 | 4446 | | |
| |||
5130 | 5130 | | |
5131 | 5131 | | |
5132 | 5132 | | |
5133 | | - | |
| 5133 | + | |
5134 | 5134 | | |
5135 | 5135 | | |
5136 | 5136 | | |
| |||
5166 | 5166 | | |
5167 | 5167 | | |
5168 | 5168 | | |
5169 | | - | |
| 5169 | + | |
5170 | 5170 | | |
5171 | 5171 | | |
5172 | 5172 | | |
| |||
5194 | 5194 | | |
5195 | 5195 | | |
5196 | 5196 | | |
5197 | | - | |
| 5197 | + | |
5198 | 5198 | | |
5199 | 5199 | | |
5200 | 5200 | | |
| |||
5213 | 5213 | | |
5214 | 5214 | | |
5215 | 5215 | | |
5216 | | - | |
| 5216 | + | |
5217 | 5217 | | |
5218 | 5218 | | |
5219 | 5219 | | |
| |||
Lines changed: 7 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
44 | | - | |
45 | | - | |
46 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
47 | 50 | | |
| 51 | + | |
48 | 52 | | |
49 | 53 | | |
50 | 54 | | |
| |||
Lines changed: 0 additions & 58 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
793 | 793 | | |
794 | 794 | | |
795 | 795 | | |
796 | | - | |
797 | | - | |
798 | | - | |
799 | | - | |
800 | | - | |
801 | | - | |
802 | | - | |
803 | | - | |
804 | | - | |
805 | | - | |
806 | | - | |
807 | | - | |
808 | | - | |
809 | | - | |
810 | | - | |
811 | | - | |
812 | | - | |
813 | | - | |
814 | | - | |
815 | | - | |
816 | | - | |
817 | | - | |
818 | | - | |
819 | | - | |
820 | | - | |
821 | | - | |
822 | | - | |
823 | | - | |
824 | | - | |
825 | | - | |
826 | | - | |
827 | | - | |
828 | | - | |
829 | | - | |
830 | | - | |
831 | | - | |
832 | | - | |
833 | | - | |
834 | | - | |
835 | | - | |
836 | | - | |
837 | | - | |
838 | | - | |
839 | | - | |
840 | | - | |
841 | | - | |
842 | | - | |
843 | | - | |
844 | | - | |
845 | | - | |
846 | | - | |
847 | | - | |
848 | | - | |
849 | | - | |
850 | | - | |
851 | | - | |
852 | | - | |
853 | | - | |
854 | 796 | | |
855 | 797 | | |
856 | 798 | | |
| |||
Lines changed: 2 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6882 | 6882 | | |
6883 | 6883 | | |
6884 | 6884 | | |
6885 | | - | |
6886 | 6885 | | |
6887 | 6886 | | |
6888 | 6887 | | |
| |||
7000 | 6999 | | |
7001 | 7000 | | |
7002 | 7001 | | |
7003 | | - | |
| 7002 | + | |
7004 | 7003 | | |
7005 | 7004 | | |
7006 | 7005 | | |
| |||
7015 | 7014 | | |
7016 | 7015 | | |
7017 | 7016 | | |
7018 | | - | |
| 7017 | + | |
7019 | 7018 | | |
7020 | 7019 | | |
7021 | 7020 | | |
| |||
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
102 | 109 | | |
103 | 110 | | |
104 | 111 | | |
| |||
148 | 155 | | |
149 | 156 | | |
150 | 157 | | |
| 158 | + | |
| 159 | + | |
151 | 160 | | |
152 | 161 | | |
153 | 162 | | |
| |||
180 | 189 | | |
181 | 190 | | |
182 | 191 | | |
| 192 | + | |
| 193 | + | |
183 | 194 | | |
184 | 195 | | |
185 | 196 | | |
| |||
196 | 207 | | |
197 | 208 | | |
198 | 209 | | |
| 210 | + | |
| 211 | + | |
199 | 212 | | |
200 | 213 | | |
201 | 214 | | |
| |||
262 | 275 | | |
263 | 276 | | |
264 | 277 | | |
| 278 | + | |
| 279 | + | |
265 | 280 | | |
266 | 281 | | |
267 | 282 | | |
268 | 283 | | |
| 284 | + | |
269 | 285 | | |
270 | 286 | | |
271 | 287 | | |
| |||
340 | 356 | | |
341 | 357 | | |
342 | 358 | | |
| 359 | + | |
| 360 | + | |
343 | 361 | | |
344 | 362 | | |
345 | 363 | | |
| |||
422 | 440 | | |
423 | 441 | | |
424 | 442 | | |
| 443 | + | |
425 | 444 | | |
426 | 445 | | |
427 | 446 | | |
| |||
Lines changed: 0 additions & 49 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | 4 | | |
8 | | - | |
9 | | - | |
10 | 5 | | |
11 | 6 | | |
12 | 7 | | |
| |||
24 | 19 | | |
25 | 20 | | |
26 | 21 | | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | 22 | | |
32 | 23 | | |
33 | 24 | | |
| |||
117 | 108 | | |
118 | 109 | | |
119 | 110 | | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | 111 | | |
161 | 112 | | |
162 | 113 | | |
| |||
0 commit comments