Skip to content

fix(it): Stabilize Flaky integration tests#27546

Merged
yan-3005 merged 27 commits intomainfrom
ram/fix-flaky-it-tests
May 6, 2026
Merged

fix(it): Stabilize Flaky integration tests#27546
yan-3005 merged 27 commits intomainfrom
ram/fix-flaky-it-tests

Conversation

@yan-3005
Copy link
Copy Markdown
Contributor

@yan-3005 yan-3005 commented Apr 20, 2026

Summary

Fixes seven flaky integration tests:

  1. TagResourceIT (`test_searchTagByClassificationDisplayName`) — Awaitility timeout extended 30s → 90s to tolerate slow search index propagation in CI
  2. GlossaryOntologyExportIT (`testExportGlossaryAsRdfXml`) — HTTP client timeout increased 60s → 150s. Jena's legacy `model.write()` for RDF/XML fetches the w3.org DTD on first use; in network-isolated CI this stalls ~100s before falling back.
  3. SearchResourceIT (`testExportWithFromAndSizeForPagination`) — Add stable `id.keyword` tiebreaker sort in both ES/OS managers. Strengthen pagination assertions to require exactly 2 lines (header + 1 data row) per page before comparing.
  4. TagResourceIT (`checkCreatedEntity`) — `IndexTemplateIT.testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference` was wiping the production `openmetadata_tag_search_index`. Rewritten to use a test-scoped index name so the production index is never touched.
  5. SearchIndexRetryQueueIT (`testClaimPendingIncludesRetryStatuses`) — The production `SearchIndexRetryWorker` (4 daemon threads, 5s poll) races the test via the same global `claimPending` SQL. Replaced size assertion with Awaitility loop checking `claimedAt != null || retryCount > 0` per record — proves `claimPending`'s SQL filter accepted all three statuses regardless of which thread won the race.
  6. GlossaryTermResourceIT (`patch_addDeleteReviewers`) — `GlossaryTermApprovalWorkflow` fires asynchronously when reviewers are added, setting `entityStatus=IN_REVIEW`. Stale `entityStatus=APPROVED` from the previous response caused a spurious `IN_REVIEW→APPROVED` status transition in the patch diff, rejected because admin is not a reviewer. Fixed by re-fetching the entity before the final patch.
  7. RecognizerFeedbackRepositoryTest (`testApplyFeedback_withRecognizerMetadata_shouldTargetSpecificRecognizer`) — `repository.create()` publishes a ChangeEvent that triggers `ApplyRecognizerFeedbackImpl` asynchronously. That workflow call races with the direct `applyFeedback`: by the time the workflow runs, the GENERATED tag is already removed from the entity, so `getRecognizerIdFromTagLabel` returns null and the workflow falls back to ALL recognizers, contaminating `recognizer2`. Fixed by inserting directly to DAO (bypassing `publishChangeEvent`) so the governance workflow is never triggered for this unit-level test.

Changes

File Change
`TagResourceIT.java` Awaitility timeout 30s → 90s
`GlossaryOntologyExportIT.java` HTTP request timeout 60s → 150s (Jena DTD stall)
`ElasticSearchSearchManager.java` `id.keyword` tiebreaker sort for export/score queries
`OpenSearchSearchManager.java` `id.keyword` tiebreaker sort for export/score queries
`SearchResourceIT.java` Hard assertions on page line count + Awaitility retry
`IndexTemplateIT.java` Rewrite tag template test to use test-scoped index, not production index
`SearchIndexRetryQueueIT.java` Awaitility + `claimedAt || retryCount > 0` invariant instead of race-prone size assertion
`GlossaryTermResourceIT.java` Re-fetch entity before reviewer removal to avoid stale entityStatus in diff
`RecognizerFeedbackRepositoryTest.java` Direct DAO insert to bypass ChangeEvent publication and avoid governance workflow race

Test plan

  • `integration-tests-mysql-elasticsearch` — `TagResourceIT`, `GlossaryOntologyExportIT`, `SearchIndexRetryQueueIT`
  • `integration-tests-postgres-opensearch` — `SearchResourceIT`, `GlossaryTermResourceIT`, `RecognizerFeedbackRepositoryTest`

- TagResourceIT.test_searchTagByClassificationDisplayName: raise Awaitility
  timeout from 30s to 90s — under full-suite concurrent load the tag search
  index can lag well past 30s before the tag is discoverable by classification
  display name
- GlossaryOntologyExportIT.testExportGlossaryAsRdfXml: replace legacy
  model.write("RDF/XML") with RDFDataMgr.write(RDFXML_PLAIN) — the legacy
  Jena API attempts external DTD/entity resolution from w3.org, hanging ~104s
  in network-isolated CI before the client times out at 60s; RIOT writes
  purely in-memory with no network I/O
- SearchResourceIT.testExportWithFromAndSizeForPagination: add _id as a
  final tiebreaker sort on export requests in both ElasticSearch and
  OpenSearch managers; from/size pagination without a unique tiebreaker
  produces duplicate rows across pages when concurrent CI tests mutate the
  same index between requests; also deduplicate the redundant name.keyword
  secondary sort when the caller already sorts by name.keyword
Copilot AI review requested due to automatic review settings April 20, 2026 11:43
@yan-3005 yan-3005 added safe to test Add this label to run secure Github workflows on PRs backend labels Apr 20, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets flaky integration tests by hardening two eventual-consistency/pagination behaviors in search and by switching RDF/XML serialization to a non-networking Jena writer.

Changes:

  • Add a deterministic tiebreaker sort for export searches and avoid redundant name.keyword secondary sort when already sorting by name.keyword.
  • Switch glossary ontology export serialization to Jena RIOT (RDFDataMgr) with explicit RDFFormat selection.
  • Increase Awaitility timeout for tag search-by-classification-display-name test to tolerate slower indexing in CI.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java Adds export sort stabilization (secondary sort + tiebreaker).
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java Mirrors export sort stabilization for Elasticsearch.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java Uses RIOT writer + RDFFormat to avoid RDF/XML hangs from legacy writer behavior.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TagResourceIT.java Extends Awaitility window for search indexing lag.

if (!sortField.equalsIgnoreCase("name.keyword")) {
requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
}
requestBuilder.sort("_id", SortOrder.Asc, null);
if (!sortField.equalsIgnoreCase("name.keyword")) {
requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
}
requestBuilder.sort("_id", SortOrder.Asc, null);
_id is an Elasticsearch meta-field that requires fielddata to sort on,
disabled by default. Use the indexed id.keyword sub-field instead, which
is a proper keyword field with doc values and is sortable without any
cluster setting changes.
Copilot AI review requested due to automatic review settings April 20, 2026 14:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Stabilizes several flaky integration tests by addressing eventual-consistency timing, RDF/XML export serialization hangs in network-isolated CI, and non-deterministic export pagination caused by non-unique sorting.

Changes:

  • Add a deterministic id.keyword tiebreaker sort (and avoid duplicating name.keyword) for export/score-based searches in both Elasticsearch and OpenSearch managers.
  • Switch glossary ontology export serialization to Jena RIOT (RDFDataMgr.write) and map requested formats to RDFFormat constants.
  • Increase Awaitility timeout in TagResourceIT to accommodate slower indexing under parallel CI load.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java Stabilizes export pagination by adding deterministic secondary sorts and avoiding redundant name.keyword sort.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java Mirrors the same deterministic export sorting behavior for Elasticsearch.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java Uses RIOT serialization to prevent RDF/XML export hangs caused by external resolution.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TagResourceIT.java Raises Awaitility timeout to reduce flakiness from search indexing lag.

if (!sortField.equalsIgnoreCase("name.keyword")) {
requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
}
requestBuilder.sort("id.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
Comment on lines 1206 to 1211
if (sortField.equalsIgnoreCase(SORT_FIELD_SCORE) || isExport) {
requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
if (!sortField.equalsIgnoreCase("name.keyword")) {
requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
}
requestBuilder.sort("id.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
}
if (!sortField.equalsIgnoreCase("name.keyword")) {
requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
}
requestBuilder.sort("id.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
Comment on lines 2592 to 2601
RDFFormat rdfFormat =
switch (format.toLowerCase()) {
case "rdfxml", "xml" -> "RDF/XML";
case "ntriples", "nt" -> "N-TRIPLES";
case "jsonld", "json-ld" -> "JSON-LD";
default -> "TURTLE";
case "rdfxml", "xml" -> RDFFormat.RDFXML_PLAIN;
case "ntriples", "nt" -> RDFFormat.NTRIPLES;
case "jsonld", "json-ld" -> RDFFormat.JSONLD;
default -> RDFFormat.TURTLE_FLAT;
};

model.write(writer, rdfFormat);
RDFDataMgr.write(writer, model, rdfFormat);
return writer.toString();
Copilot AI review requested due to automatic review settings May 5, 2026 03:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment on lines 257 to 261
} finally {
deleteIndexIfExists(searchClient, canonicalIndex);
Request recreateRequest = new Request("PUT", "/" + canonicalIndex);
searchClient.performRequest(recreateRequest);
Entity.getSearchRepository()
.createIndex(Entity.getSearchRepository().getEntityIndexMap().get("tag"));
}
Request recreateRequest = new Request("PUT", "/" + canonicalIndex);
searchClient.performRequest(recreateRequest);
Entity.getSearchRepository()
.createIndex(Entity.getSearchRepository().getEntityIndexMap().get("tag"));
Comment on lines +1898 to +1902
Awaitility.await("Page 1 and page 2 must return different rows")
.atMost(15, TimeUnit.SECONDS)
.pollInterval(500, TimeUnit.MILLISECONDS)
.untilAsserted(
() -> {
yan-3005 and others added 2 commits May 5, 2026 15:00
…search index

testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference was deleting the
production openmetadata_tag_search_index backing index, racing with
TagResourceIT.test_searchTagByClassificationDisplayName which polls that
index for 90s. Use a test-scoped index name matching the template pattern
instead, consistent with the other tests in this class.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment on lines 182 to +186
void testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference(TestNamespace ns) throws Exception {
Rest5Client searchClient = TestSuiteBootstrap.createSearchClient();
String canonicalIndex = CLUSTER_ALIAS + "_tag_search_index";
String testIndexName = CLUSTER_ALIAS + "_tag_search_index_rebuild_it_doc_update";

assertNotNull(
getMappingsForIndex(searchClient, canonicalIndex), "Original index should have mappings");

String realIndexName = resolveActualIndexName(searchClient, canonicalIndex);
deleteIndexIfExists(searchClient, realIndexName);

try {
Request existsRequest = new Request("HEAD", "/" + canonicalIndex);
Response existsResponse = searchClient.performRequest(existsRequest);
assertEquals(404, existsResponse.getStatusCode(), "Index should not exist after deletion");
} catch (Exception e) {
assertTrue(e.getMessage().contains("404"), "Index/alias should not exist after deletion");
}
deleteIndexIfExists(searchClient, testIndexName);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the PR description now reflects the actual approach: the test was rewritten to use a test-scoped index name (openmetadata_tag_search_index_rebuild_it_doc_update) so the production tag index is never touched. No SearchRepository.createIndex() call is involved.

yan-3005 and others added 3 commits May 5, 2026 17:05
The production SearchIndexRetryWorker (4 daemon threads, 5s poll) races
the test by calling the same global claimPending SQL. Replace the brittle
size-based assertion with an Awaitility loop that checks claimedAt != null
for each inserted record — proving claimPending's SQL filter accepted the
record's status regardless of which thread won the race.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment on lines 182 to +186
void testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference(TestNamespace ns) throws Exception {
Rest5Client searchClient = TestSuiteBootstrap.createSearchClient();
String canonicalIndex = CLUSTER_ALIAS + "_tag_search_index";
String testIndexName = CLUSTER_ALIAS + "_tag_search_index_rebuild_it_doc_update";

assertNotNull(
getMappingsForIndex(searchClient, canonicalIndex), "Original index should have mappings");

String realIndexName = resolveActualIndexName(searchClient, canonicalIndex);
deleteIndexIfExists(searchClient, realIndexName);

try {
Request existsRequest = new Request("HEAD", "/" + canonicalIndex);
Response existsResponse = searchClient.performRequest(existsRequest);
assertEquals(404, existsResponse.getStatusCode(), "Index should not exist after deletion");
} catch (Exception e) {
assertTrue(e.getMessage().contains("404"), "Index/alias should not exist after deletion");
}
deleteIndexIfExists(searchClient, testIndexName);
Comment on lines 268 to +304
void testClaimPendingIncludesRetryStatuses(TestNamespace ns) {
String id1 = UUID.randomUUID().toString();
String id2 = UUID.randomUUID().toString();
String id3 = UUID.randomUUID().toString();
String fqn1 = ns.prefix("rq") + ".a";
String fqn2 = ns.prefix("rq") + ".b";
String fqn3 = ns.prefix("rq") + ".c";

retryQueueDAO.upsert(id1, fqn1, "f", SearchIndexRetryQueue.STATUS_PENDING, "");
retryQueueDAO.upsert(id2, fqn2, "f", SearchIndexRetryQueue.STATUS_PENDING_RETRY_1, "");
retryQueueDAO.upsert(id3, fqn3, "f", SearchIndexRetryQueue.STATUS_PENDING_RETRY_2, "");

List<SearchIndexRetryRecord> claimed = retryQueueDAO.claimPending(10);
assertTrue(claimed.size() >= 3);
assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id1)));
assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id2)));
assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id3)));
Set<String> ourIds = Set.of(id1, id2, id3);

Awaitility.await()
.atMost(Duration.ofSeconds(15))
.pollInterval(Duration.ofMillis(200))
.untilAsserted(
() -> {
retryQueueDAO.claimPending(50);
Set<String> claimed = new HashSet<>();
for (String status :
List.of(
SearchIndexRetryQueue.STATUS_PENDING,
SearchIndexRetryQueue.STATUS_PENDING_RETRY_1,
SearchIndexRetryQueue.STATUS_PENDING_RETRY_2,
SearchIndexRetryQueue.STATUS_IN_PROGRESS,
SearchIndexRetryQueue.STATUS_FAILED)) {
retryQueueDAO.findByStatus(status, 5000).stream()
.filter(r -> ourIds.contains(r.getEntityId()) && r.getClaimedAt() != null)
.map(SearchIndexRetryRecord::getEntityId)
.forEach(claimed::add);
}
assertTrue(claimed.contains(id1), "id1 (PENDING) was never claimed");
assertTrue(claimed.contains(id2), "id2 (PENDING_RETRY_1) was never claimed");
assertTrue(claimed.contains(id3), "id3 (PENDING_RETRY_2) was never claimed");
});
yan-3005 added 2 commits May 5, 2026 19:46
The GlossaryTermApprovalWorkflow fires asynchronously when reviewers are
added, setting entityStatus=IN_REVIEW. The final patch sent the stale
entityStatus=APPROVED from the previous response, causing a spurious
IN_REVIEW→APPROVED transition in the diff which requires the caller to
be a reviewer — admin is not. Re-fetch the entity before the reviewer
removal so the diff contains only the reviewer change.
updateFailureAndRetryCount sets claimedAt=NULL after the worker processes
a record. Add retryCount > 0 as a secondary proof-of-claim signal so
records that were claimed, processed, and had claimedAt reset are still
counted — covers the FAILED exhaustion path and intermediate PENDING_RETRY_*
states where claimedAt is temporarily null.
@yan-3005
Copy link
Copy Markdown
Contributor Author

yan-3005 commented May 5, 2026

Fixed in 929f2b2 — added || r.getRetryCount() > 0 as a secondary proof-of-claim signal. retryCount is only incremented by updateFailureAndRetryCount, which is only called after a successful claim, so retryCount > 0 proves the record was claimed regardless of whether claimedAt was subsequently reset.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment on lines 268 to +307
void testClaimPendingIncludesRetryStatuses(TestNamespace ns) {
String id1 = UUID.randomUUID().toString();
String id2 = UUID.randomUUID().toString();
String id3 = UUID.randomUUID().toString();
String fqn1 = ns.prefix("rq") + ".a";
String fqn2 = ns.prefix("rq") + ".b";
String fqn3 = ns.prefix("rq") + ".c";

retryQueueDAO.upsert(id1, fqn1, "f", SearchIndexRetryQueue.STATUS_PENDING, "");
retryQueueDAO.upsert(id2, fqn2, "f", SearchIndexRetryQueue.STATUS_PENDING_RETRY_1, "");
retryQueueDAO.upsert(id3, fqn3, "f", SearchIndexRetryQueue.STATUS_PENDING_RETRY_2, "");

List<SearchIndexRetryRecord> claimed = retryQueueDAO.claimPending(10);
assertTrue(claimed.size() >= 3);
assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id1)));
assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id2)));
assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id3)));
Set<String> ourIds = Set.of(id1, id2, id3);

Awaitility.await()
.atMost(Duration.ofSeconds(15))
.pollInterval(Duration.ofMillis(200))
.untilAsserted(
() -> {
retryQueueDAO.claimPending(50);
Set<String> claimed = new HashSet<>();
for (String status :
List.of(
SearchIndexRetryQueue.STATUS_PENDING,
SearchIndexRetryQueue.STATUS_PENDING_RETRY_1,
SearchIndexRetryQueue.STATUS_PENDING_RETRY_2,
SearchIndexRetryQueue.STATUS_IN_PROGRESS,
SearchIndexRetryQueue.STATUS_FAILED)) {
retryQueueDAO.findByStatus(status, 5000).stream()
.filter(
r ->
ourIds.contains(r.getEntityId())
&& (r.getClaimedAt() != null || r.getRetryCount() > 0))
.map(SearchIndexRetryRecord::getEntityId)
.forEach(claimed::add);
}
assertTrue(claimed.contains(id1), "id1 (PENDING) was never claimed");
assertTrue(claimed.contains(id2), "id2 (PENDING_RETRY_1) was never claimed");
assertTrue(claimed.contains(id3), "id3 (PENDING_RETRY_2) was never claimed");
});
yan-3005 and others added 2 commits May 5, 2026 20:52
…gnizerMetadata

repository.create() publishes a ChangeEvent that triggers
ApplyRecognizerFeedbackImpl asynchronously. That workflow call races
with the direct applyFeedback below: by the time the workflow runs, the
GENERATED tag is already removed by the direct call, so
getRecognizerIdFromTagLabel returns null and the workflow falls back to
ALL recognizers, contaminating recognizer2.

Fix: insert directly to DAO (bypassing publishChangeEvent) so the
governance workflow is never triggered for this unit-level test.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment on lines +282 to +306
Awaitility.await()
.atMost(Duration.ofSeconds(15))
.pollInterval(Duration.ofMillis(200))
.untilAsserted(
() -> {
retryQueueDAO.claimPending(50);
Set<String> claimed = new HashSet<>();
for (String status :
List.of(
SearchIndexRetryQueue.STATUS_PENDING,
SearchIndexRetryQueue.STATUS_PENDING_RETRY_1,
SearchIndexRetryQueue.STATUS_PENDING_RETRY_2,
SearchIndexRetryQueue.STATUS_IN_PROGRESS,
SearchIndexRetryQueue.STATUS_FAILED)) {
retryQueueDAO.findByStatus(status, 5000).stream()
.filter(
r ->
ourIds.contains(r.getEntityId())
&& (r.getClaimedAt() != null || r.getRetryCount() > 0))
.map(SearchIndexRetryRecord::getEntityId)
.forEach(claimed::add);
}
assertTrue(claimed.contains(id1), "id1 (PENDING) was never claimed");
assertTrue(claimed.contains(id2), "id2 (PENDING_RETRY_1) was never claimed");
assertTrue(claimed.contains(id3), "id3 (PENDING_RETRY_2) was never claimed");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7a4ec4c. Now tracking which IDs are still visible in any status after each poll. An ID absent from all statuses was deleted by the worker via deleteByEntity — that path is only reached after claimPending accepted the record, so absence is equally valid proof that the SQL filter worked. The absent IDs are added to claimed before the final assertions.

yan-3005 added 3 commits May 5, 2026 21:44
…sRetryStatuses

The worker's processRecord takes the delete path (removeStaleEntityById +
deleteByEntity) when resolveEntityReference returns null but entityId is
non-empty — which applies to our fake UUID test records. If the worker
wins and deletes a record, findByStatus finds nothing and the assertion
fails.

Fix: track which IDs are still visible in any status. An ID absent from
all statuses was deleted by the worker after a successful claim —
deleteByEntity is only reached after claimPending accepted the record,
so absence is equally valid proof that claimPending's SQL filter worked.
…rsMultipleUpdates

Same root cause as patch_addDeleteReviewers: GlossaryTermApprovalWorkflow
fires asynchronously after reviewers are added, setting entityStatus=IN_REVIEW.
Subsequent patches using the stale APPROVED status from the previous response
trigger a spurious IN_REVIEW→APPROVED transition, rejected because admin is
not a reviewer. Re-fetch before each subsequent patch to avoid the stale status.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment on lines +1898 to +1917
Awaitility.await("Page 1 and page 2 must return different rows")
.atMost(15, TimeUnit.SECONDS)
.pollInterval(500, TimeUnit.MILLISECONDS)
.untilAsserted(
() -> {
HttpResponse<String> page1 =
httpGetExport(
"/v1/search/export?q=export_page_test&index=table_search_index"
+ "&sort_field=name.keyword&sort_order=asc&from=0&size=1");
assertEquals(200, page1.statusCode());
String[] page1Lines = page1.body().split("\n");
assertEquals(2, page1Lines.length, "from=0&size=1 should return header + 1 data row");

HttpResponse<String> page2 =
httpGetExport(
"/v1/search/export?q=export_page_test&index=table_search_index"
+ "&sort_field=name.keyword&sort_order=asc&from=1&size=1");
assertEquals(200, page2.statusCode());
String[] page2Lines = page2.body().split("\n");
assertEquals(2, page2Lines.length, "from=1&size=1 should return header + 1 data row");
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 6, 2026

Code Review ✅ Approved 1 resolved / 1 findings

Stabilizes seven flaky integration tests by addressing race conditions, increasing timeouts for slow CI operations, and ensuring proper test isolation. No issues found.

✅ 1 resolved
Edge Case: claimedAt check may miss records already processed by worker

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchIndexRetryQueueIT.java:288-297
The updateFailureAndRetryCount SQL sets claimedAt = NULL after the worker processes a record. If the background SearchIndexRetryWorker claims a record, fails to resolve the fake entity, and calls updateFailureAndRetryCount (resetting claimedAt to NULL and bumping the status to e.g. PENDING_RETRY_1), the test's filter r.getClaimedAt() != null won't match it on that iteration.

The test partially mitigates this because it calls claimPending(50) each poll, which would re-claim a record that cycled back to a PENDING_RETRY_* status. However, if the worker exhausts retries and marks a record FAILED (with claimedAt = NULL) before the test catches it in a claimed state, the assertion would fail.

In practice this is unlikely within 15s (retry exhaustion requires multiple cycles), but it leaves a narrow theoretical race window. A more robust assertion would check that each record either has claimedAt != null OR has been moved to a post-claim status (IN_PROGRESS or beyond), proving the claim SQL accepted it.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 6, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants