Fix Flowable workflow termination message lookup and async-job race on deletion by yan-3005 · Pull Request #27117 · open-metadata/OpenMetadata

yan-3005 · 2026-04-07T06:22:24Z

Summary

Production bug fix: WorkflowHandler.findTerminationMessageName was splitting task definition keys on _ and searching for messages named <key>_terminateProcess. Workflow.getFlowableElementId joins with a ., so actual task keys are <subProcess>.approvalTask and messages are <subProcess>.terminateProcess. Every conflict-resolution path (a newer workflow instance replacing an older one for the same entity) silently failed since the original Custom Workflows commit — the old subprocess was never terminated via message, left dangling until force-delete. Fixed by splitting on the last . and looking up <subProcessId>.terminateProcess directly.
Test fix: waitForWorkflowIdle now waits for async jobs on the trigger process definition IDs rather than waiting for process instances to drain. The async-after jobs on PeriodicBatchEntityTrigger's fetch task are ephemeral (<100ms); the old approach waited for the entire batch loop to finish (potentially minutes with many entities), causing test suite time to blow up.
Removed waitForWorkflowIdle from cleanupCreatedWorkflows @AfterEach — it ran a Flowable query per workflow in every test regardless of whether the workflow was ever triggered.

Test plan

WorkflowDefinitionResourceIT#test_DataCompletenessWorkflow_SDK — zero NullPointerException / JOB_EXECUTION_FAILURE lines
WorkflowDefinitionResourceIT#test_TagApprovalWorkflow_SDK — zero No termination message found warnings; no 60s ConditionTimeoutException timeout
Full WorkflowDefinitionResourceIT suite — no regressions, test time back to baseline (~2m30s)

…n deletion WorkflowHandler.findTerminationMessageName was splitting task definition keys on underscore and looking for messages named '<key>_terminateProcess'. But Workflow.getFlowableElementId joins with a dot, so actual task keys are '<subProcess>.approvalTask' and termination messages are '<subProcess>.terminateProcess'. The bug has been latent since the original Custom Workflows commit - every conflict-resolution path silently failed to terminate the old subprocess, leaving it dangling until force-delete. Fix: split on the last dot, derive the subprocess ID, and check for the '<subProcessId>.terminateProcess' subscription directly. Remove the dead 'terminateChangeReviewProcess' pattern and the debug scaffolding loops. Also add getManagementService/getRepositoryService getters to WorkflowHandler for use in integration tests. In WorkflowDefinitionResourceIT, replace waitForWorkflowIdle to check pending async jobs on the trigger process definition IDs instead of waiting for process instances to drain. Async-after jobs on the fetch task are ephemeral (<100ms) - waiting for them has near-zero cost, whereas the old approach waited for the entire batch loop (potentially minutes with many entities). Also remove the waitForWorkflowIdle call from cleanupCreatedWorkflows @AfterEach since it ran a query per workflow in every test unnecessarily.

Copilot

Pull request overview

Fixes a long-standing Flowable custom-workflow termination bug by aligning termination message lookup with the actual element/message naming scheme, and hardens integration-test cleanup by waiting for short-lived async jobs to drain before deleting workflow deployments.

Changes:

Update WorkflowHandler.findTerminationMessageName(...) to derive <subProcessId>.terminateProcess from task definition keys using the last . separator and validate via message subscription query.
Add ManagementService / RepositoryService accessors on WorkflowHandler to support test-side Flowable queries.
Update WorkflowDefinitionResourceIT cleanup paths to wait for trigger process async jobs to drain (instead of waiting for process instances to finish), and avoid per-test unconditional idle-waits.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java	Correct termination message name resolution to match Flowable element IDs; add Flowable service getters for test usage.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/WorkflowDefinitionResourceIT.java	Add `waitForWorkflowIdle` to avoid async-job vs deletion race during workflow cleanup and reduce test suite latency.

...-integration-tests/src/test/java/org/openmetadata/it/tests/WorkflowDefinitionResourceIT.java

github-actions · 2026-04-07T08:16:35Z

🟡 Playwright Results — all passed (23 flaky)

✅ 3595 passed · ❌ 0 failed · 🟡 23 flaky · ⏭️ 207 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	456	1	2
🟡 Shard 2	640	2	32
🟡 Shard 3	647	4	26
🟡 Shard 4	615	7	47
🟡 Shard 5	605	2	67
🟡 Shard 6	632	7	33

🟡 23 flaky test(s) (passed on retry)

Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/DataQuality/TableLevelTests.spec.ts › Table Column Count To Be Between (shard 2, 1 retry)
Features/Permissions/EntityPermissions.spec.ts › DashboardDataModel allow common operations permissions (shard 3, 1 retry)
Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
Features/UserProfileOnlineStatus.spec.ts › Should show "Active recently" for users active within last hour (shard 3, 1 retry)
Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for Store Procedure (shard 4, 1 retry)
Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
Pages/Domains.spec.ts › Rename domain with tags and glossary terms preserves associations (shard 4, 1 retry)
Pages/Domains.spec.ts › Multiple consecutive domain renames preserve all associations (shard 4, 1 retry)
Pages/Entity.spec.ts › Certification Add Remove (shard 4, 1 retry)
Pages/Entity.spec.ts › Glossary Term Add, Update and Remove (shard 4, 1 retry)
Pages/ExploreTree.spec.ts › Verify Database and Database Schema available in explore tree (shard 5, 1 retry)
Pages/Glossary.spec.ts › Add and Remove Assets (shard 5, 2 retries)
Pages/HyperlinkCustomProperty.spec.ts › should display URL when no display text is provided (shard 6, 1 retry)
Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Container (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
Pages/ProfilerConfigurationPage.spec.ts › Non admin user (shard 6, 1 retry)
Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

…ocesses WorkflowInstanceListener and WorkflowInstanceStageListener were firing on Flowable process instances with numeric keys (e.g. 239376, 239373) that have no corresponding OM workflow definition, causing EntityNotFoundException noise in AUT runs. - WorkflowInstanceListener: skip addWorkflowInstance/updateWorkflowInstance when getMainWorkflowDefinitionNameFromTrigger returns the raw process key unchanged (no 'Trigger' suffix to strip = not an OM trigger workflow) - WorkflowInstanceStageListener: skip addNewStage/updateStage when the process key is purely numeric (isUnknownFlowableProcess guard) - WorkflowDefinitionResourceIT: fix waitForWorkflowIdle to use TriggerFactory.getTriggerWorkflowId(fqn) instead of name+"Trigger%" so it correctly queries by FQN-based process definition key

...-integration-tests/src/test/java/org/openmetadata/it/tests/WorkflowDefinitionResourceIT.java

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Copilot · 2026-04-07T14:11:18Z

...c/main/java/org/openmetadata/service/governance/workflows/WorkflowInstanceStageListener.java

+  private boolean isUnknownFlowableProcess(String processKey) {
+    return processKey.matches("\\d+");
+  }


isUnknownFlowableProcess treats any numeric-only process key as “not an OM workflow”. WorkflowDefinition name/fullyQualifiedName uses entityName, which allows numeric-only values, so a workflow named e.g. "123" would have stage tracking silently disabled. Consider detecting non-OM processes via presence/format of the expected businessKey/variables (or validating/disallowing numeric-only workflow names) rather than processKey.matches("\\d+").

Copilot · 2026-04-07T14:11:19Z

...-integration-tests/src/test/java/org/openmetadata/it/tests/WorkflowDefinitionResourceIT.java

+   * picked up and executed. These jobs are ephemeral (complete in &lt;100ms) but if the deployment
+   * is deleted while one is in flight the Flowable job executor NPEs when it tries to resolve the
+   * process definition. Returns immediately for event-based workflows that have no
+   * "&lt;name&gt;Trigger-*" process definitions deployed, or for periodic-batch workflows whose


The Javadoc says this returns immediately for event-based workflows that have no "Trigger-*" process definitions, but the code queries processDefinitionKeyLike(... + "%"), which matches both "Trigger" and "Trigger-...". Update the comment to reflect the actual matching criteria ("Trigger%") so future readers don’t infer the wrong behavior.

Suggested change

* "<name>Trigger-*" process definitions deployed, or for periodic-batch workflows whose

* "<name>Trigger%" process definitions deployed, or for periodic-batch workflows whose

Copilot · 2026-04-07T14:11:19Z

...-integration-tests/src/test/java/org/openmetadata/it/tests/WorkflowDefinitionResourceIT.java

+    List<ProcessDefinition> triggerPDs =
+        repositoryService
+            .createProcessDefinitionQuery()
+            .processDefinitionKeyLike(TriggerFactory.getTriggerWorkflowId(workflowFqn) + "%")
+            .list();


processDefinitionKeyLike(...) uses SQL LIKE semantics where _ and % are wildcards. Since workflow names/FQNs in tests can include _ (e.g., via ns.prefix(...)), this query may pick up unrelated process definitions and make the wait slower/flaky. Consider filtering the returned list by pd.getKey().startsWith(baseKey) (or otherwise escaping LIKE wildcards) before collecting PD IDs.

Suggested change

List<ProcessDefinition> triggerPDs =

repositoryService

.createProcessDefinitionQuery()

.processDefinitionKeyLike(TriggerFactory.getTriggerWorkflowId(workflowFqn) + "%")

.list();

String triggerWorkflowId = TriggerFactory.getTriggerWorkflowId(workflowFqn);

List<ProcessDefinition> triggerPDs =

repositoryService

.createProcessDefinitionQuery()

.processDefinitionKeyLike(triggerWorkflowId + "%")

.list()

.stream()

.filter(pd -> pd.getKey() != null && pd.getKey().startsWith(triggerWorkflowId))

.toList();

Copilot · 2026-04-07T14:11:20Z

...ata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java

+                  .createExecutionQuery()
+                  .processInstanceId(task.getProcessInstanceId())
+                  .messageEventSubscriptionName(messageName)
+                  .count()
+              > 0;


findTerminationMessageName queries Flowable for the message subscription via .count(), and terminateTaskProcessInstance immediately runs a second query (.singleResult()) for the same messageEventSubscriptionName. You can avoid the extra round-trip by having one of these queries serve both purposes (e.g., fetch the Execution once and branch on null).

- WorkflowInstanceStageListener: replace fragile numeric-key guard with businessKey-based guard — skip stage tracking when no business key is present (non-OM-managed process instance) instead of using processKey format heuristic that could misfire on numeric-named workflows - WorkflowDefinitionResourceIT: fix Javadoc comment ("Trigger-*" -> "Trigger%") and add startsWith filter after processDefinitionKeyLike query to prevent SQL LIKE wildcard '_' from matching unrelated PDs - WorkflowHandler: eliminate double Flowable round-trip in termination logic by merging findTerminationMessageName into terminateTask helper that does a single singleResult() query per task

github-actions · 2026-04-07T16:23:01Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

...c/main/java/org/openmetadata/service/governance/workflows/WorkflowInstanceStageListener.java

yan-3005 · 2026-04-07T16:39:19Z

@gitar-bot Fix the Edge Case: Silent skip on null businessKey uses DEBUG; WARN would aid diagnosis and do mvn spotless:apply on the root of the repository and push it

…ceStageListener Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>

gitar-bot · 2026-04-07T16:44:57Z

@gitar-bot Fix the Edge Case: Silent skip on null businessKey uses DEBUG; WARN would aid diagnosis and do mvn spotless:apply on the root of the repository and push it

Done. Changed LOG.debug to LOG.warn in WorkflowInstanceStageListener.java for the null business key skip case, then ran mvn spotless:apply on the repository root. Changes committed.

gitar-bot · 2026-04-07T16:45:14Z

Code Review ✅ Approved 2 resolved / 2 findings

Fixes Flowable workflow termination message lookup and async-job race on deletion by correcting variable references and improving logging verbosity. Two issues were addressed: waitForWorkflowIdle now uses the correct workflowFqn parameter, and null businessKey skips now log at WARN level for better diagnostics.

✅ 2 resolved

✅ Bug: waitForWorkflowIdle uses undefined workflowName instead of workflowFqn

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/WorkflowDefinitionResourceIT.java:1982
In waitForWorkflowIdle(String workflowFqn), line 1982 references workflowName in the await() description string, but the method parameter is named workflowFqn. There is no workflowName field on the class. This should either fail to compile or resolve to a variable from an unexpected scope. The intended variable is clearly workflowFqn.

✅ Edge Case: Silent skip on null businessKey uses DEBUG; WARN would aid diagnosis

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowInstanceStageListener.java:140-144 📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowInstanceListener.java:100-106 📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowInstanceListener.java:126-132
In both WorkflowInstanceListener.java and WorkflowInstanceStageListener.java, when a non-OM-managed process instance is detected (null business key or non-trigger process key), the code logs at DEBUG level and silently returns. While this is correct for known orphaned Flowable processes, if a legitimate OM-managed workflow ever reached these paths without a business key due to a bug elsewhere, the issue would be invisible at typical log levels.

The WorkflowInstanceStageListener change (line 140) is the more impactful one — it previously threw IllegalStateException, providing a hard failure signal. Now it silently skips stage tracking entirely. Consider logging at WARN level for the WorkflowInstanceStageListener case specifically, since missing stage tracking for a real workflow would be a user-visible problem.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-04-07T17:48:48Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

…n deletion (open-metadata#27117) * Fix Flowable workflow termination message lookup and async-job race on deletion WorkflowHandler.findTerminationMessageName was splitting task definition keys on underscore and looking for messages named '<key>_terminateProcess'. But Workflow.getFlowableElementId joins with a dot, so actual task keys are '<subProcess>.approvalTask' and termination messages are '<subProcess>.terminateProcess'. The bug has been latent since the original Custom Workflows commit - every conflict-resolution path silently failed to terminate the old subprocess, leaving it dangling until force-delete. Fix: split on the last dot, derive the subprocess ID, and check for the '<subProcessId>.terminateProcess' subscription directly. Remove the dead 'terminateChangeReviewProcess' pattern and the debug scaffolding loops. Also add getManagementService/getRepositoryService getters to WorkflowHandler for use in integration tests. In WorkflowDefinitionResourceIT, replace waitForWorkflowIdle to check pending async jobs on the trigger process definition IDs instead of waiting for process instances to drain. Async-after jobs on the fetch task are ephemeral (<100ms) - waiting for them has near-zero cost, whereas the old approach waited for the entire batch loop (potentially minutes with many entities). Also remove the waitForWorkflowIdle call from cleanupCreatedWorkflows @AfterEach since it ran a query per workflow in every test unnecessarily. * Fix spurious EntityNotFoundException errors from orphaned Flowable processes WorkflowInstanceListener and WorkflowInstanceStageListener were firing on Flowable process instances with numeric keys (e.g. 239376, 239373) that have no corresponding OM workflow definition, causing EntityNotFoundException noise in AUT runs. - WorkflowInstanceListener: skip addWorkflowInstance/updateWorkflowInstance when getMainWorkflowDefinitionNameFromTrigger returns the raw process key unchanged (no 'Trigger' suffix to strip = not an OM trigger workflow) - WorkflowInstanceStageListener: skip addNewStage/updateStage when the process key is purely numeric (isUnknownFlowableProcess guard) - WorkflowDefinitionResourceIT: fix waitForWorkflowIdle to use TriggerFactory.getTriggerWorkflowId(fqn) instead of name+"Trigger%" so it correctly queries by FQN-based process definition key * Fix compilation issues * Address Copilot review feedback on PR open-metadata#27117 - WorkflowInstanceStageListener: replace fragile numeric-key guard with businessKey-based guard — skip stage tracking when no business key is present (non-OM-managed process instance) instead of using processKey format heuristic that could misfire on numeric-named workflows - WorkflowDefinitionResourceIT: fix Javadoc comment ("Trigger-*" -> "Trigger%") and add startsWith filter after processDefinitionKeyLike query to prevent SQL LIKE wildcard '_' from matching unrelated PDs - WorkflowHandler: eliminate double Flowable round-trip in termination logic by merging findTerminationMessageName into terminateTask helper that does a single singleResult() query per task * fix: change DEBUG to WARN for null businessKey skip in WorkflowInstanceStageListener Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com> --------- Co-authored-by: Gitar <noreply@gitar.ai>

Copilot AI review requested due to automatic review settings April 7, 2026 06:22

yan-3005 added safe to test Add this label to run secure Github workflows on PRs backend labels Apr 7, 2026

yan-3005 self-assigned this Apr 7, 2026

Copilot started reviewing on behalf of yan-3005 April 7, 2026 06:23 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

...-integration-tests/src/test/java/org/openmetadata/it/tests/WorkflowDefinitionResourceIT.java Outdated Show resolved Hide resolved

yan-3005 temporarily deployed to test April 7, 2026 06:32 — with GitHub Actions Inactive

gitar-bot bot reviewed Apr 7, 2026

View reviewed changes

...-integration-tests/src/test/java/org/openmetadata/it/tests/WorkflowDefinitionResourceIT.java Outdated Show resolved Hide resolved

Fix compilation issues

6f09caa

Copilot AI review requested due to automatic review settings April 7, 2026 14:04

Copilot started reviewing on behalf of yan-3005 April 7, 2026 14:05 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

yan-3005 temporarily deployed to test April 7, 2026 14:16 — with GitHub Actions Inactive

gitar-bot bot reviewed Apr 7, 2026

View reviewed changes

...c/main/java/org/openmetadata/service/governance/workflows/WorkflowInstanceStageListener.java Outdated Show resolved Hide resolved

yan-3005 had a problem deploying to test April 7, 2026 16:32 — with GitHub Actions Error

fix: change DEBUG to WARN for null businessKey skip in WorkflowInstan…

b8f7142

…ceStageListener Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 7, 2026 16:44

yan-3005 review requested due to automatic review settings April 7, 2026 16:44

gitar-bot bot temporarily deployed to test April 7, 2026 16:55 Inactive

sonika-shah approved these changes Apr 8, 2026

View reviewed changes

yan-3005 merged commit 804f551 into main Apr 8, 2026
50 checks passed

yan-3005 deleted the ram/fix-flowable-termination-message branch April 8, 2026 13:46

	* "<name>Trigger-*" process definitions deployed, or for periodic-batch workflows whose
	* "<name>Trigger%" process definitions deployed, or for periodic-batch workflows whose

Conversation

yan-3005 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟡 Playwright Results — all passed (23 flaky)

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Uh oh!

yan-3005 commented Apr 7, 2026

Uh oh!

gitar-bot bot commented Apr 7, 2026

Uh oh!

gitar-bot bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Apr 7, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yan-3005 commented Apr 7, 2026 •

edited

Loading

github-actions bot commented Apr 7, 2026 •

edited

Loading

gitar-bot bot commented Apr 7, 2026 •

edited

Loading