Skip to content

fix(dbt): remove Automated tags absent from incoming dbt schema (#26737)#27249

Closed
aniruddhaadak80 wants to merge 5 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-26737
Closed

fix(dbt): remove Automated tags absent from incoming dbt schema (#26737)#27249
aniruddhaadak80 wants to merge 5 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-26737

Conversation

@aniruddhaadak80
Copy link
Copy Markdown

Fixes #26737. Adds a reconciliation step in addDataModel to remove tags that were automatically applied by the dbt ingestion bot but are now absent from the incoming dataModel payload, resolving the indefinitely persisting tags bug.

Copilot AI review requested due to automatic review settings April 10, 2026 14:38
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds reconciliation logic in the backend TableRepository.addDataModel flow intended to remove dbt-ingestion-applied tags that are no longer present in the incoming DataModel payload (addressing the “automated tags persist forever” bug described in #26737).

Changes:

  • After merging tags with mergeTagsWithIncomingPrecedence, removes existing LabelType.AUTOMATED table tags that are absent from the incoming dataModel.tags.
  • Applies the same “remove absent automated tags” reconciliation to column tags from dataModel.columns[*].tags.

Comment on lines +1410 to +1414
if (table.getTags() != null) {
java.util.List<String> incomingTags = dataModel.getTags() != null
? dataModel.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList())
: java.util.Collections.emptyList();
mergedTableTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingTags.contains(t.getTagFQN()));
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reconciliation removes all existing LabelType.AUTOMATED tags that are not present in the incoming dataModel payload. Since addDataModel is a general Table API (not dbt-only), this can unintentionally delete automated tags applied by other systems (e.g., auto-classification), causing data loss on any dataModel update. Please scope the removal to dbt-ingestion-owned tags (e.g., t.getAppliedBy() equals Entity.INGESTION_BOT_NAME or another dbt bot identity) and ideally gate it on dataModel.getModelType() == DBT so non-dbt updates don’t reconcile bot-applied tags.

Copilot uses AI. Check for mistakes.
Comment on lines +1433 to +1437
if (stored.getTags() != null) {
java.util.List<String> incomingColTags = modelColumn.getTags() != null
? modelColumn.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList())
: java.util.Collections.emptyList();
mergedColumnTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingColTags.contains(t.getTagFQN()));
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue for column tags: this removes every existing LabelType.AUTOMATED column tag not present in the incoming model, which can wipe automated tags applied by other bots/recognizers whenever a data model is updated. Please restrict the reconciliation to tags applied by the ingestion/dbt bot (e.g., Entity.INGESTION_BOT_NAME via t.getAppliedBy()) and consider only applying this logic for DBT dataModel.getModelType().

Copilot uses AI. Check for mistakes.
@aniruddhaadak80
Copy link
Copy Markdown
Author

Hello! I am participating in the WeMakeDevs hackathon. Could a maintainer please assign the safe to test label so the GitHub Actions workflows can validate my fixes? Thank you!

@aniruddhaadak80
Copy link
Copy Markdown
Author

aniruddhaadak80 commented Apr 13, 2026

Hello team, could we get the safe to test label added to this PR? Let me know if any updates are needed before that.

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings April 13, 2026 14:09
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

Comment on lines +1410 to +1415
if (table.getTags() != null) {
java.util.List<String> incomingTags = dataModel.getTags() != null
? dataModel.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList())
: java.util.Collections.emptyList();
mergedTableTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingTags.contains(t.getTagFQN()));
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reconciliation removes all existing AUTOMATED tags that are not present in the incoming dataModel.getTags(). Since AUTOMATED is also used for non-dbt automation (e.g., tier/certification/other workflows), this can unintentionally delete tags managed by other subsystems whenever /tables/{id}/dataModel is called. Consider restricting deletions to tags that are known to be dbt-managed (e.g., those present in the previous stored table.getDataModel().getTags() or another provenance signal), rather than all LabelType.AUTOMATED tags.

Copilot uses AI. Check for mistakes.
Comment on lines +1433 to +1438
if (stored.getTags() != null) {
java.util.List<String> incomingColTags = modelColumn.getTags() != null
? modelColumn.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList())
: java.util.Collections.emptyList();
mergedColumnTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingColTags.contains(t.getTagFQN()));
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as table-level tags: this removes every existing column tag with LabelType.AUTOMATED if it’s missing from the incoming dbt column payload. Column-level automated tags can also be produced by other automation (e.g., classification workflows), so this risks deleting non-dbt tags on every dbt run. Narrow the deletion scope to dbt-managed column tags (e.g., compare against the previous stored table.getDataModel().getColumns() tags for the same column) instead of all automated tags.

Copilot uses AI. Check for mistakes.
Comment on lines 6361 to 6363
return listCount(
getTableName(), mySqlCondition, postgresCondition, team, Relationship.HAS.ordinal());
getTableName(), mySqlCondition, postgresCondition, team, Relationship.HAS.ordinal(), filter.getQueryParams());
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing filter.getQueryParams() as a @BindMap can bind a raw team value to :team, which conflicts with the explicit @BindFQN("team") binding (expected to hash the team). This can lead to incorrect/unstable binding for :team (and potentially wrong results) when the team filter is used. Pass a copy of the params map with team removed (or rename one of the bind parameters) to avoid double-binding :team.

Copilot uses AI. Check for mistakes.
Comment on lines 6436 to +6445
return listBefore(
getTableName(),
mySqlCondition,
postgresCondition,
team,
limit,
beforeName,
beforeId,
Relationship.HAS.ordinal());
Relationship.HAS.ordinal(),
filter.getQueryParams());
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same :team double-binding risk as in listCount: @BindFQN("team") and @BindMap(filter.getQueryParams()) can both bind :team when a team filter is present. Use a params map with team removed (or use a different bind name) before passing it into the DAO method.

Copilot uses AI. Check for mistakes.
Comment on lines 6518 to +6527
return listAfter(
getTableName(),
mySqlCondition,
postgresCondition,
team,
limit,
afterName,
afterId,
Relationship.HAS.ordinal());
Relationship.HAS.ordinal(),
filter.getQueryParams());
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same :team double-binding risk as in listCount/listBefore: @BindFQN("team") can be overridden/conflicted by @BindMap(filter.getQueryParams()) if the query params include team. Remove team from the bind map (or rename binds) before passing params to the SQL method.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 13, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Removes Automated tags absent from incoming dbt schema, addressing UTF-16LE encoding in DummyBackendTest.java and cleaning up unnecessary fully-qualified class names. No issues remain.

✅ 2 resolved
Bug: DummyBackendTest.java is UTF-16LE encoded, will break builds

The new test file DummyBackendTest.java is encoded in UTF-16LE (with BOM FF FE), which is why Git treats it as a binary file. Java source files must be UTF-8 (or ASCII). Most Java compilers and build tools (Maven/Gradle) default to UTF-8 and will fail to parse this file correctly, causing compilation errors or the test being silently skipped.

This file needs to be re-saved as UTF-8 without BOM.

Quality: Unnecessary fully-qualified class names; imports already exist

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java:1411-1414 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java:1434-1437 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java:1412 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java:1434
The new code uses fully-qualified names (java.util.List, java.util.stream.Collectors, java.util.Collections, org.openmetadata.schema.type.TagLabel.LabelType, org.openmetadata.schema.type.TagLabel) even though all of these types are already imported at the top of the file (lines 53, 58, 65, 107). This hurts readability and is inconsistent with the rest of the file.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@PubChimps
Copy link
Copy Markdown
Contributor

Hi @aniruddhaadak80, thank you for your pr here, but we have another pr that is being developed here, please make sure that you are assigned to an issue before adding a fix to it, we will reopen this pr if necessary.

@PubChimps PubChimps closed this Apr 16, 2026
@aniruddhaadak80
Copy link
Copy Markdown
Author

Hi @aniruddhaadak80, thank you for your pr here, but we have another pr that is being developed here, please make sure that you are assigned to an issue before adding a fix to it, we will reopen this pr if necessary.

ok , thank you for transparency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dbt ingestion cannot remove previously applied non-mutually-exclusive tags (follow-up to #26054)

3 participants