fix(it): Stabilize Flaky integration tests by yan-3005 · Pull Request #27546 · open-metadata/OpenMetadata

yan-3005 · 2026-04-20T11:43:38Z

Summary

Fixes seven flaky integration tests:

TagResourceIT (`test_searchTagByClassificationDisplayName`) — Awaitility timeout extended 30s → 90s to tolerate slow search index propagation in CI
GlossaryOntologyExportIT (`testExportGlossaryAsRdfXml`) — HTTP client timeout increased 60s → 150s. Jena's legacy `model.write()` for RDF/XML fetches the w3.org DTD on first use; in network-isolated CI this stalls ~100s before falling back.
SearchResourceIT (`testExportWithFromAndSizeForPagination`) — Add stable `id.keyword` tiebreaker sort in both ES/OS managers. Strengthen pagination assertions to require exactly 2 lines (header + 1 data row) per page before comparing.
TagResourceIT (`checkCreatedEntity`) — `IndexTemplateIT.testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference` was wiping the production `openmetadata_tag_search_index`. Rewritten to use a test-scoped index name so the production index is never touched.
SearchIndexRetryQueueIT (`testClaimPendingIncludesRetryStatuses`) — The production `SearchIndexRetryWorker` (4 daemon threads, 5s poll) races the test via the same global `claimPending` SQL. Replaced size assertion with Awaitility loop checking `claimedAt != null || retryCount > 0` per record — proves `claimPending`'s SQL filter accepted all three statuses regardless of which thread won the race.
GlossaryTermResourceIT (`patch_addDeleteReviewers`) — `GlossaryTermApprovalWorkflow` fires asynchronously when reviewers are added, setting `entityStatus=IN_REVIEW`. Stale `entityStatus=APPROVED` from the previous response caused a spurious `IN_REVIEW→APPROVED` status transition in the patch diff, rejected because admin is not a reviewer. Fixed by re-fetching the entity before the final patch.
RecognizerFeedbackRepositoryTest (`testApplyFeedback_withRecognizerMetadata_shouldTargetSpecificRecognizer`) — `repository.create()` publishes a ChangeEvent that triggers `ApplyRecognizerFeedbackImpl` asynchronously. That workflow call races with the direct `applyFeedback`: by the time the workflow runs, the GENERATED tag is already removed from the entity, so `getRecognizerIdFromTagLabel` returns null and the workflow falls back to ALL recognizers, contaminating `recognizer2`. Fixed by inserting directly to DAO (bypassing `publishChangeEvent`) so the governance workflow is never triggered for this unit-level test.

Changes

File	Change
`TagResourceIT.java`	Awaitility timeout 30s → 90s
`GlossaryOntologyExportIT.java`	HTTP request timeout 60s → 150s (Jena DTD stall)
`ElasticSearchSearchManager.java`	`id.keyword` tiebreaker sort for export/score queries
`OpenSearchSearchManager.java`	`id.keyword` tiebreaker sort for export/score queries
`SearchResourceIT.java`	Hard assertions on page line count + Awaitility retry
`IndexTemplateIT.java`	Rewrite tag template test to use test-scoped index, not production index
`SearchIndexRetryQueueIT.java`	Awaitility + `claimedAt \|\| retryCount > 0` invariant instead of race-prone size assertion
`GlossaryTermResourceIT.java`	Re-fetch entity before reviewer removal to avoid stale entityStatus in diff
`RecognizerFeedbackRepositoryTest.java`	Direct DAO insert to bypass ChangeEvent publication and avoid governance workflow race

Test plan

`integration-tests-mysql-elasticsearch` — `TagResourceIT`, `GlossaryOntologyExportIT`, `SearchIndexRetryQueueIT`
`integration-tests-postgres-opensearch` — `SearchResourceIT`, `GlossaryTermResourceIT`, `RecognizerFeedbackRepositoryTest`

- TagResourceIT.test_searchTagByClassificationDisplayName: raise Awaitility timeout from 30s to 90s — under full-suite concurrent load the tag search index can lag well past 30s before the tag is discoverable by classification display name - GlossaryOntologyExportIT.testExportGlossaryAsRdfXml: replace legacy model.write("RDF/XML") with RDFDataMgr.write(RDFXML_PLAIN) — the legacy Jena API attempts external DTD/entity resolution from w3.org, hanging ~104s in network-isolated CI before the client times out at 60s; RIOT writes purely in-memory with no network I/O - SearchResourceIT.testExportWithFromAndSizeForPagination: add _id as a final tiebreaker sort on export requests in both ElasticSearch and OpenSearch managers; from/size pagination without a unique tiebreaker produces duplicate rows across pages when concurrent CI tests mutate the same index between requests; also deduplicate the redundant name.keyword secondary sort when the caller already sorts by name.keyword

Copilot

Pull request overview

This PR targets flaky integration tests by hardening two eventual-consistency/pagination behaviors in search and by switching RDF/XML serialization to a non-networking Jena writer.

Changes:

Add a deterministic tiebreaker sort for export searches and avoid redundant name.keyword secondary sort when already sorting by name.keyword.
Switch glossary ontology export serialization to Jena RIOT (RDFDataMgr) with explicit RDFFormat selection.
Increase Awaitility timeout for tag search-by-classification-display-name test to tolerate slower indexing in CI.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java	Adds export sort stabilization (secondary sort + tiebreaker).
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java	Mirrors export sort stabilization for Elasticsearch.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java	Uses RIOT writer + `RDFFormat` to avoid RDF/XML hangs from legacy writer behavior.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TagResourceIT.java	Extends Awaitility window for search indexing lag.

+        if (!sortField.equalsIgnoreCase("name.keyword")) {
+          requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
+        }
+        requestBuilder.sort("_id", SortOrder.Asc, null);


+        if (!sortField.equalsIgnoreCase("name.keyword")) {
+          requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
+        }
+        requestBuilder.sort("_id", SortOrder.Asc, null);


_id is an Elasticsearch meta-field that requires fielddata to sort on, disabled by default. Use the indexed id.keyword sub-field instead, which is a proper keyword field with doc values and is sortable without any cluster setting changes.

Copilot

Pull request overview

Stabilizes several flaky integration tests by addressing eventual-consistency timing, RDF/XML export serialization hangs in network-isolated CI, and non-deterministic export pagination caused by non-unique sorting.

Changes:

Add a deterministic id.keyword tiebreaker sort (and avoid duplicating name.keyword) for export/score-based searches in both Elasticsearch and OpenSearch managers.
Switch glossary ontology export serialization to Jena RIOT (RDFDataMgr.write) and map requested formats to RDFFormat constants.
Increase Awaitility timeout in TagResourceIT to accommodate slower indexing under parallel CI load.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java	Stabilizes export pagination by adding deterministic secondary sorts and avoiding redundant `name.keyword` sort.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java	Mirrors the same deterministic export sorting behavior for Elasticsearch.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java	Uses RIOT serialization to prevent RDF/XML export hangs caused by external resolution.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TagResourceIT.java	Raises Awaitility timeout to reduce flakiness from search indexing lag.

+        if (!sortField.equalsIgnoreCase("name.keyword")) {
+          requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
+        }
+        requestBuilder.sort("id.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);


      if (sortField.equalsIgnoreCase(SORT_FIELD_SCORE) || isExport) {
-        requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
+        if (!sortField.equalsIgnoreCase("name.keyword")) {
+          requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
+        }
+        requestBuilder.sort("id.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
      }


+        if (!sortField.equalsIgnoreCase("name.keyword")) {
+          requestBuilder.sort("name.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);
+        }
+        requestBuilder.sort("id.keyword", SortOrder.Asc, SORT_TYPE_KEYWORD);


+      RDFFormat rdfFormat =
          switch (format.toLowerCase()) {
-            case "rdfxml", "xml" -> "RDF/XML";
-            case "ntriples", "nt" -> "N-TRIPLES";
-            case "jsonld", "json-ld" -> "JSON-LD";
-            default -> "TURTLE";
+            case "rdfxml", "xml" -> RDFFormat.RDFXML_PLAIN;
+            case "ntriples", "nt" -> RDFFormat.NTRIPLES;
+            case "jsonld", "json-ld" -> RDFFormat.JSONLD;
+            default -> RDFFormat.TURTLE_FLAT;
          };

-      model.write(writer, rdfFormat);
+      RDFDataMgr.write(writer, model, rdfFormat);
      return writer.toString();


Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

    } finally {
      deleteIndexIfExists(searchClient, canonicalIndex);
-      Request recreateRequest = new Request("PUT", "/" + canonicalIndex);
-      searchClient.performRequest(recreateRequest);
+      Entity.getSearchRepository()
+          .createIndex(Entity.getSearchRepository().getEntityIndexMap().get("tag"));
    }


-      Request recreateRequest = new Request("PUT", "/" + canonicalIndex);
-      searchClient.performRequest(recreateRequest);
+      Entity.getSearchRepository()
+          .createIndex(Entity.getSearchRepository().getEntityIndexMap().get("tag"));


+    Awaitility.await("Page 1 and page 2 must return different rows")
+        .atMost(15, TimeUnit.SECONDS)
+        .pollInterval(500, TimeUnit.MILLISECONDS)
+        .untilAsserted(
+            () -> {


…search index testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference was deleting the production openmetadata_tag_search_index backing index, racing with TagResourceIT.test_searchTagByClassificationDisplayName which polls that index for 90s. Use a test-scoped index name matching the template pattern instead, consistent with the other tests in this class.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

yan-3005 · 2026-05-05T12:50:15Z

  void testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference(TestNamespace ns) throws Exception {
    Rest5Client searchClient = TestSuiteBootstrap.createSearchClient();
-    String canonicalIndex = CLUSTER_ALIAS + "_tag_search_index";
+    String testIndexName = CLUSTER_ALIAS + "_tag_search_index_rebuild_it_doc_update";

-    assertNotNull(
-        getMappingsForIndex(searchClient, canonicalIndex), "Original index should have mappings");
-
-    String realIndexName = resolveActualIndexName(searchClient, canonicalIndex);
-    deleteIndexIfExists(searchClient, realIndexName);
-
-    try {
-      Request existsRequest = new Request("HEAD", "/" + canonicalIndex);
-      Response existsResponse = searchClient.performRequest(existsRequest);
-      assertEquals(404, existsResponse.getStatusCode(), "Index should not exist after deletion");
-    } catch (Exception e) {
-      assertTrue(e.getMessage().contains("404"), "Index/alias should not exist after deletion");
-    }
+    deleteIndexIfExists(searchClient, testIndexName);


Fixed — the PR description now reflects the actual approach: the test was rewritten to use a test-scoped index name (openmetadata_tag_search_index_rebuild_it_doc_update) so the production tag index is never touched. No SearchRepository.createIndex() call is involved.

The production SearchIndexRetryWorker (4 daemon threads, 5s poll) races the test by calling the same global claimPending SQL. Replace the brittle size-based assertion with an Awaitility loop that checks claimedAt != null for each inserted record — proving claimPending's SQL filter accepted the record's status regardless of which thread won the race.

…/fix-flaky-it-tests

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

  void testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference(TestNamespace ns) throws Exception {
    Rest5Client searchClient = TestSuiteBootstrap.createSearchClient();
-    String canonicalIndex = CLUSTER_ALIAS + "_tag_search_index";
+    String testIndexName = CLUSTER_ALIAS + "_tag_search_index_rebuild_it_doc_update";

-    assertNotNull(
-        getMappingsForIndex(searchClient, canonicalIndex), "Original index should have mappings");
-
-    String realIndexName = resolveActualIndexName(searchClient, canonicalIndex);
-    deleteIndexIfExists(searchClient, realIndexName);
-
-    try {
-      Request existsRequest = new Request("HEAD", "/" + canonicalIndex);
-      Response existsResponse = searchClient.performRequest(existsRequest);
-      assertEquals(404, existsResponse.getStatusCode(), "Index should not exist after deletion");
-    } catch (Exception e) {
-      assertTrue(e.getMessage().contains("404"), "Index/alias should not exist after deletion");
-    }
+    deleteIndexIfExists(searchClient, testIndexName);


  void testClaimPendingIncludesRetryStatuses(TestNamespace ns) {
    String id1 = UUID.randomUUID().toString();
    String id2 = UUID.randomUUID().toString();
    String id3 = UUID.randomUUID().toString();
    String fqn1 = ns.prefix("rq") + ".a";
    String fqn2 = ns.prefix("rq") + ".b";
    String fqn3 = ns.prefix("rq") + ".c";

    retryQueueDAO.upsert(id1, fqn1, "f", SearchIndexRetryQueue.STATUS_PENDING, "");
    retryQueueDAO.upsert(id2, fqn2, "f", SearchIndexRetryQueue.STATUS_PENDING_RETRY_1, "");
    retryQueueDAO.upsert(id3, fqn3, "f", SearchIndexRetryQueue.STATUS_PENDING_RETRY_2, "");

-    List<SearchIndexRetryRecord> claimed = retryQueueDAO.claimPending(10);
-    assertTrue(claimed.size() >= 3);
-    assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id1)));
-    assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id2)));
-    assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id3)));
+    Set<String> ourIds = Set.of(id1, id2, id3);
+
+    Awaitility.await()
+        .atMost(Duration.ofSeconds(15))
+        .pollInterval(Duration.ofMillis(200))
+        .untilAsserted(
+            () -> {
+              retryQueueDAO.claimPending(50);
+              Set<String> claimed = new HashSet<>();
+              for (String status :
+                  List.of(
+                      SearchIndexRetryQueue.STATUS_PENDING,
+                      SearchIndexRetryQueue.STATUS_PENDING_RETRY_1,
+                      SearchIndexRetryQueue.STATUS_PENDING_RETRY_2,
+                      SearchIndexRetryQueue.STATUS_IN_PROGRESS,
+                      SearchIndexRetryQueue.STATUS_FAILED)) {
+                retryQueueDAO.findByStatus(status, 5000).stream()
+                    .filter(r -> ourIds.contains(r.getEntityId()) && r.getClaimedAt() != null)
+                    .map(SearchIndexRetryRecord::getEntityId)
+                    .forEach(claimed::add);
+              }
+              assertTrue(claimed.contains(id1), "id1 (PENDING) was never claimed");
+              assertTrue(claimed.contains(id2), "id2 (PENDING_RETRY_1) was never claimed");
+              assertTrue(claimed.contains(id3), "id3 (PENDING_RETRY_2) was never claimed");
+            });


The GlossaryTermApprovalWorkflow fires asynchronously when reviewers are added, setting entityStatus=IN_REVIEW. The final patch sent the stale entityStatus=APPROVED from the previous response, causing a spurious IN_REVIEW→APPROVED transition in the diff which requires the caller to be a reviewer — admin is not. Re-fetch the entity before the reviewer removal so the diff contains only the reviewer change.

updateFailureAndRetryCount sets claimedAt=NULL after the worker processes a record. Add retryCount > 0 as a secondary proof-of-claim signal so records that were claimed, processed, and had claimedAt reset are still counted — covers the FAILED exhaustion path and intermediate PENDING_RETRY_* states where claimedAt is temporarily null.

yan-3005 · 2026-05-05T14:40:32Z

Fixed in 929f2b2 — added || r.getRetryCount() > 0 as a secondary proof-of-claim signal. retryCount is only incremented by updateFailureAndRetryCount, which is only called after a successful claim, so retryCount > 0 proves the record was claimed regardless of whether claimedAt was subsequently reset.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

  void testClaimPendingIncludesRetryStatuses(TestNamespace ns) {
    String id1 = UUID.randomUUID().toString();
    String id2 = UUID.randomUUID().toString();
    String id3 = UUID.randomUUID().toString();
    String fqn1 = ns.prefix("rq") + ".a";
    String fqn2 = ns.prefix("rq") + ".b";
    String fqn3 = ns.prefix("rq") + ".c";

    retryQueueDAO.upsert(id1, fqn1, "f", SearchIndexRetryQueue.STATUS_PENDING, "");
    retryQueueDAO.upsert(id2, fqn2, "f", SearchIndexRetryQueue.STATUS_PENDING_RETRY_1, "");
    retryQueueDAO.upsert(id3, fqn3, "f", SearchIndexRetryQueue.STATUS_PENDING_RETRY_2, "");

-    List<SearchIndexRetryRecord> claimed = retryQueueDAO.claimPending(10);
-    assertTrue(claimed.size() >= 3);
-    assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id1)));
-    assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id2)));
-    assertTrue(claimed.stream().anyMatch(r -> r.getEntityId().equals(id3)));
+    Set<String> ourIds = Set.of(id1, id2, id3);
+
+    Awaitility.await()
+        .atMost(Duration.ofSeconds(15))
+        .pollInterval(Duration.ofMillis(200))
+        .untilAsserted(
+            () -> {
+              retryQueueDAO.claimPending(50);
+              Set<String> claimed = new HashSet<>();
+              for (String status :
+                  List.of(
+                      SearchIndexRetryQueue.STATUS_PENDING,
+                      SearchIndexRetryQueue.STATUS_PENDING_RETRY_1,
+                      SearchIndexRetryQueue.STATUS_PENDING_RETRY_2,
+                      SearchIndexRetryQueue.STATUS_IN_PROGRESS,
+                      SearchIndexRetryQueue.STATUS_FAILED)) {
+                retryQueueDAO.findByStatus(status, 5000).stream()
+                    .filter(
+                        r ->
+                            ourIds.contains(r.getEntityId())
+                                && (r.getClaimedAt() != null || r.getRetryCount() > 0))
+                    .map(SearchIndexRetryRecord::getEntityId)
+                    .forEach(claimed::add);
+              }
+              assertTrue(claimed.contains(id1), "id1 (PENDING) was never claimed");
+              assertTrue(claimed.contains(id2), "id2 (PENDING_RETRY_1) was never claimed");
+              assertTrue(claimed.contains(id3), "id3 (PENDING_RETRY_2) was never claimed");
+            });


…gnizerMetadata repository.create() publishes a ChangeEvent that triggers ApplyRecognizerFeedbackImpl asynchronously. That workflow call races with the direct applyFeedback below: by the time the workflow runs, the GENERATED tag is already removed by the direct call, so getRecognizerIdFromTagLabel returns null and the workflow falls back to ALL recognizers, contaminating recognizer2. Fix: insert directly to DAO (bypassing publishChangeEvent) so the governance workflow is never triggered for this unit-level test.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

yan-3005 · 2026-05-05T16:15:13Z

+    Awaitility.await()
+        .atMost(Duration.ofSeconds(15))
+        .pollInterval(Duration.ofMillis(200))
+        .untilAsserted(
+            () -> {
+              retryQueueDAO.claimPending(50);
+              Set<String> claimed = new HashSet<>();
+              for (String status :
+                  List.of(
+                      SearchIndexRetryQueue.STATUS_PENDING,
+                      SearchIndexRetryQueue.STATUS_PENDING_RETRY_1,
+                      SearchIndexRetryQueue.STATUS_PENDING_RETRY_2,
+                      SearchIndexRetryQueue.STATUS_IN_PROGRESS,
+                      SearchIndexRetryQueue.STATUS_FAILED)) {
+                retryQueueDAO.findByStatus(status, 5000).stream()
+                    .filter(
+                        r ->
+                            ourIds.contains(r.getEntityId())
+                                && (r.getClaimedAt() != null || r.getRetryCount() > 0))
+                    .map(SearchIndexRetryRecord::getEntityId)
+                    .forEach(claimed::add);
+              }
+              assertTrue(claimed.contains(id1), "id1 (PENDING) was never claimed");
+              assertTrue(claimed.contains(id2), "id2 (PENDING_RETRY_1) was never claimed");
+              assertTrue(claimed.contains(id3), "id3 (PENDING_RETRY_2) was never claimed");


Fixed in 7a4ec4c. Now tracking which IDs are still visible in any status after each poll. An ID absent from all statuses was deleted by the worker via deleteByEntity — that path is only reached after claimPending accepted the record, so absence is equally valid proof that the SQL filter worked. The absent IDs are added to claimed before the final assertions.

…sRetryStatuses The worker's processRecord takes the delete path (removeStaleEntityById + deleteByEntity) when resolveEntityReference returns null but entityId is non-empty — which applies to our fake UUID test records. If the worker wins and deletes a record, findByStatus finds nothing and the assertion fails. Fix: track which IDs are still visible in any status. An ID absent from all statuses was deleted by the worker after a successful claim — deleteByEntity is only reached after claimPending accepted the record, so absence is equally valid proof that claimPending's SQL filter worked.

…/fix-flaky-it-tests

…rsMultipleUpdates Same root cause as patch_addDeleteReviewers: GlossaryTermApprovalWorkflow fires asynchronously after reviewers are added, setting entityStatus=IN_REVIEW. Subsequent patches using the stale APPROVED status from the previous response trigger a spurious IN_REVIEW→APPROVED transition, rejected because admin is not a reviewer. Re-fetch before each subsequent patch to avoid the stale status.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

+    Awaitility.await("Page 1 and page 2 must return different rows")
+        .atMost(15, TimeUnit.SECONDS)
+        .pollInterval(500, TimeUnit.MILLISECONDS)
+        .untilAsserted(
+            () -> {
+              HttpResponse<String> page1 =
+                  httpGetExport(
+                      "/v1/search/export?q=export_page_test&index=table_search_index"
+                          + "&sort_field=name.keyword&sort_order=asc&from=0&size=1");
+              assertEquals(200, page1.statusCode());
+              String[] page1Lines = page1.body().split("\n");
+              assertEquals(2, page1Lines.length, "from=0&size=1 should return header + 1 data row");
+
+              HttpResponse<String> page2 =
+                  httpGetExport(
+                      "/v1/search/export?q=export_page_test&index=table_search_index"
+                          + "&sort_field=name.keyword&sort_order=asc&from=1&size=1");
+              assertEquals(200, page2.statusCode());
+              String[] page2Lines = page2.body().split("\n");
+              assertEquals(2, page2Lines.length, "from=1&size=1 should return header + 1 data row");


gitar-bot · 2026-05-06T15:27:11Z

Code Review ✅ Approved 1 resolved / 1 findings

Stabilizes seven flaky integration tests by addressing race conditions, increasing timeouts for slow CI operations, and ensuring proper test isolation. No issues found.

✅ 1 resolved

✅ Edge Case: claimedAt check may miss records already processed by worker

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchIndexRetryQueueIT.java:288-297
The updateFailureAndRetryCount SQL sets claimedAt = NULL after the worker processes a record. If the background SearchIndexRetryWorker claims a record, fails to resolve the fake entity, and calls updateFailureAndRetryCount (resetting claimedAt to NULL and bumping the status to e.g. PENDING_RETRY_1), the test's filter r.getClaimedAt() != null won't match it on that iteration.

The test partially mitigates this because it calls claimPending(50) each poll, which would re-claim a record that cycled back to a PENDING_RETRY_* status. However, if the worker exhausts retries and marks a record FAILED (with claimedAt = NULL) before the test catches it in a claimed state, the assertion would fail.

In practice this is unlikely within 15s (retry exhaustion requires multiple cycles), but it leaves a narrow theoretical race window. A more robust assertion would check that each record either has claimedAt != null OR has been moved to a post-claim status (IN_PROGRESS or beyond), proving the claim SQL accepted it.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

sonarqubecloud · 2026-05-06T16:35:23Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings April 20, 2026 11:43

yan-3005 added safe to test Add this label to run secure Github workflows on PRs backend labels Apr 20, 2026

Copilot started reviewing on behalf of yan-3005 April 20, 2026 11:44 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

yan-3005 had a problem deploying to test April 20, 2026 11:52 — with GitHub Actions Error

yan-3005 had a problem deploying to test April 20, 2026 11:52 — with GitHub Actions Failure

yan-3005 had a problem deploying to test April 20, 2026 11:52 — with GitHub Actions Error

yan-3005 temporarily deployed to test April 20, 2026 11:53 — with GitHub Actions Inactive

yan-3005 temporarily deployed to test April 20, 2026 12:32 — with GitHub Actions Inactive

yan-3005 temporarily deployed to test April 20, 2026 12:34 — with GitHub Actions Inactive

yan-3005 had a problem deploying to test April 20, 2026 12:34 — with GitHub Actions Error

yan-3005 had a problem deploying to test April 20, 2026 12:34 — with GitHub Actions Failure

Merge branch 'main' into ram/fix-flaky-it-tests

e392976

Copilot AI review requested due to automatic review settings April 20, 2026 14:27

Copilot started reviewing on behalf of yan-3005 April 20, 2026 14:28 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

yan-3005 had a problem deploying to test April 20, 2026 14:36 — with GitHub Actions Error

yan-3005 had a problem deploying to test April 20, 2026 14:38 — with GitHub Actions Error

Copilot AI review requested due to automatic review settings May 5, 2026 03:55