Skip to content

feat(1.12.9): backfill task domains + stamp domains on approval task creation#28286

Open
yan-3005 wants to merge 1 commit into
1.12.9from
ram/task-domain-backfill-1.12.9
Open

feat(1.12.9): backfill task domains + stamp domains on approval task creation#28286
yan-3005 wants to merge 1 commit into
1.12.9from
ram/task-domain-backfill-1.12.9

Conversation

@yan-3005
Copy link
Copy Markdown
Contributor

Summary

Ports the task-domain backfill from PR #28013 (main) onto the 1.12.9 release branch. Direct cherry-pick was not viable because main has moved to the new Task entity (CreateTask), while 1.12.9 still uses the Thread-based CreateApprovalTaskImpl.

1. v1129 migration: backfill domains on existing approval tasks

MigrationUtil (v1129) now runs migrateTaskDomains() alongside the existing policy migrations.

  • thread_entity path (1.12.x): batches rows where JSON_EXTRACT(json,'$.domains') IS NULL / json->'domains' IS NULL (Postgres), resolves domains from the linked entity via MessageParser.EntityLink, and patches $.domains via JSON_SET / jsonb_set. Caches (entityType, entityFQN) -> domainIds so each unique target entity is resolved only once.
  • task_entity path (2.x): guarded by tableExists() so it is a no-op on this release branch but ports cleanly forward.
  • Loop-termination guard: any row whose target entity can't be resolved is marked migrated with $.domains = [] (and counted in markedDoneOnError) so the batch always advances — no more infinite loop on persistent failures.
  • Stall detector: if a non-empty batch produces zero updates, the loop aborts with an ERROR log rather than spinning.

2. Migration runners: non-fatal policy + task backfill

mysql/v1129/Migration.java and postgres/v1129/Migration.java wrap the policy migrations and the task-domain backfill in separate try/catch blocks so a failure in either step no longer blocks server startup. @SneakyThrows was removed.

3. CreateApprovalTaskImpl: stamp domains at creation

CreateApprovalTaskImpl now sets Thread.domains from entity.getDomains() (mapped to UUIDs) at thread-creation time, so newly-created approval tasks carry the correct domain inline — no further backfill needed for new rows.

Test plan

  • MigrationUtilTest (v1129) — 17 unit tests covering both DB dialects, deleted = FALSE filters, JSON-NULL WHERE-clause shape, loop termination, malformed-JSON-marked-migrated, empty-batch termination, INSERT IGNORE vs ON CONFLICT shape, full-batch loop continuation. All passing.
  • Existing-install upgrade: 1.12.x install with domain-tagged glossary terms / tables that have open approval tasks → after upgrade, domain-scoped users see those tasks in the activity feed.
  • New approval task creation: trigger a glossary-term approval workflow on a domain-tagged entity → confirm Thread.domains is set to the entity's domains in the persisted row.
  • Migration failure path: induce a DB error during policy or task-domain backfill → server still starts; error is logged.

Related: main-branch PR #28013.

…n approval tasks

Ports the task-domain backfill work from the main-branch PR (#28013) onto the
1.12.9 release branch. Direct cherry-pick is not viable because main has moved
to the new Task entity (CreateTask), while 1.12.9 still uses the Thread-based
CreateApprovalTaskImpl.

1. v1129 MigrationUtil now also runs migrateTaskDomains alongside the existing
   policy migrations. The thread_entity path patches \$.domains based on the
   linked entity. The task_entity path is guarded by tableExists() so it is a
   no-op on this release branch but ports cleanly forward.

2. Both Migration.java runners wrap the policy and task-domain calls in their
   own try/catch so a backfill failure no longer blocks server startup. The
   prior @SneakyThrows is removed.

3. CreateApprovalTaskImpl now stamps Thread.domains from entity.getDomains()
   at creation time so newly-created approval tasks carry the correct domain
   from the start (no further backfill required for new rows).
@yan-3005 yan-3005 added the safe to test Add this label to run secure Github workflows on PRs label May 19, 2026
Comment on lines +407 to +409
+ "LIMIT "
+ BATCH_SIZE
+ " ON CONFLICT (fromId, toId, relation) DO NOTHING";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: PostgreSQL LIMIT in INSERT...SELECT before ON CONFLICT

In buildPostgresInsertTaskDomainSql(), the LIMIT clause is placed outside parentheses directly before ON CONFLICT DO NOTHING. While this is syntactically valid in modern PostgreSQL (LIMIT is parsed as part of the SELECT, ON CONFLICT as part of the INSERT), it is an unusual pattern that can confuse readers and has historically caused issues with some PostgreSQL parser edge cases. Wrapping the SELECT in parentheses makes the intent unambiguous and is the commonly recommended pattern for INSERT ... (SELECT ... LIMIT n) ON CONFLICT ....

The migration is wrapped in try/catch so a failure wouldn't block startup, but it would silently skip the task_entity domain backfill on PostgreSQL deployments.

Wrap the SELECT in parentheses so LIMIT is unambiguously part of the subquery. Also add an opening paren after the column list: change the first line to "INSERT INTO entity_relationship " + " (fromId, toId, fromEntity, toEntity, relation) " + "(SELECT ...".:

+ "  ) "
+ "LIMIT "
+ BATCH_SIZE
+ ") ON CONFLICT (fromId, toId, relation) DO NOTHING";
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 20, 2026

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Backfills domains on existing approval tasks and updates creation logic to stamp domains inline for the 1.12.9 release. Ensure the PostgreSQL LIMIT clause in the migration script is correctly positioned to avoid syntax issues with ON CONFLICT.

💡 Edge Case: PostgreSQL LIMIT in INSERT...SELECT before ON CONFLICT

📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v1129/MigrationUtil.java:407-409

In buildPostgresInsertTaskDomainSql(), the LIMIT clause is placed outside parentheses directly before ON CONFLICT DO NOTHING. While this is syntactically valid in modern PostgreSQL (LIMIT is parsed as part of the SELECT, ON CONFLICT as part of the INSERT), it is an unusual pattern that can confuse readers and has historically caused issues with some PostgreSQL parser edge cases. Wrapping the SELECT in parentheses makes the intent unambiguous and is the commonly recommended pattern for INSERT ... (SELECT ... LIMIT n) ON CONFLICT ....

The migration is wrapped in try/catch so a failure wouldn't block startup, but it would silently skip the task_entity domain backfill on PostgreSQL deployments.

Wrap the SELECT in parentheses so LIMIT is unambiguously part of the subquery. Also add an opening paren after the column list: change the first line to `"INSERT INTO entity_relationship " + " (fromId, toId, fromEntity, toEntity, relation) " + "(SELECT ..."`.
+ "  ) "
+ "LIMIT "
+ BATCH_SIZE
+ ") ON CONFLICT (fromId, toId, relation) DO NOTHING";
🤖 Prompt for agents
Code Review: Backfills domains on existing approval tasks and updates creation logic to stamp domains inline for the 1.12.9 release. Ensure the PostgreSQL LIMIT clause in the migration script is correctly positioned to avoid syntax issues with ON CONFLICT.

1. 💡 Edge Case: PostgreSQL LIMIT in INSERT...SELECT before ON CONFLICT
   Files: openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v1129/MigrationUtil.java:407-409

   In `buildPostgresInsertTaskDomainSql()`, the `LIMIT` clause is placed outside parentheses directly before `ON CONFLICT DO NOTHING`. While this is syntactically valid in modern PostgreSQL (LIMIT is parsed as part of the SELECT, ON CONFLICT as part of the INSERT), it is an unusual pattern that can confuse readers and has historically caused issues with some PostgreSQL parser edge cases. Wrapping the SELECT in parentheses makes the intent unambiguous and is the commonly recommended pattern for `INSERT ... (SELECT ... LIMIT n) ON CONFLICT ...`.
   
   The migration is wrapped in try/catch so a failure wouldn't block startup, but it would silently skip the task_entity domain backfill on PostgreSQL deployments.

   Fix (Wrap the SELECT in parentheses so LIMIT is unambiguously part of the subquery. Also add an opening paren after the column list: change the first line to `"INSERT INTO entity_relationship " + "  (fromId, toId, fromEntity, toEntity, relation) " + "(SELECT ..."`.):
   + "  ) "
   + "LIMIT "
   + BATCH_SIZE
   + ") ON CONFLICT (fromId, toId, relation) DO NOTHING";

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (16 flaky)

✅ 2087 passed · ❌ 0 failed · 🟡 16 flaky · ⏭️ 52 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 3 709 0 10 5
🟡 Shard 4 729 0 1 19
🟡 Shard 6 649 0 5 28
🟡 16 flaky test(s) (passed on retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Features/Permissions/ServiceEntityPermissions.spec.ts › SearchIndex Service allow common operations permissions (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/TestSuiteMultiPipeline.spec.ts › TestSuite multi pipeline support (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
  • Flow/Metric.spec.ts › verify metric expression update (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Flow/ServiceForm.spec.ts › Verify SSL cert upload with long filename and UI overflow handling (shard 3, 1 retry)
  • Pages/AppStopRunModal.spec.ts › should open stop modal when stop button is clicked and call stop API with runId (shard 3, 1 retry)
  • Pages/CustomThemeConfig.spec.ts › Update Hover and selected Color (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify LineageSearchSelect in lineage mode (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: container (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab is NOT visible for storageService in platform lineage (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Delete Storage Service (shard 6, 1 retry)
  • Pages/Tag.spec.ts › Add and Remove Assets and Check Restricted Entity (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant