fix(dbt): remove Automated tags absent from incoming dbt schema (#26737)#27249
fix(dbt): remove Automated tags absent from incoming dbt schema (#26737)#27249aniruddhaadak80 wants to merge 5 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Adds reconciliation logic in the backend TableRepository.addDataModel flow intended to remove dbt-ingestion-applied tags that are no longer present in the incoming DataModel payload (addressing the “automated tags persist forever” bug described in #26737).
Changes:
- After merging tags with
mergeTagsWithIncomingPrecedence, removes existingLabelType.AUTOMATEDtable tags that are absent from the incomingdataModel.tags. - Applies the same “remove absent automated tags” reconciliation to column tags from
dataModel.columns[*].tags.
| if (table.getTags() != null) { | ||
| java.util.List<String> incomingTags = dataModel.getTags() != null | ||
| ? dataModel.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList()) | ||
| : java.util.Collections.emptyList(); | ||
| mergedTableTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingTags.contains(t.getTagFQN())); |
There was a problem hiding this comment.
This reconciliation removes all existing LabelType.AUTOMATED tags that are not present in the incoming dataModel payload. Since addDataModel is a general Table API (not dbt-only), this can unintentionally delete automated tags applied by other systems (e.g., auto-classification), causing data loss on any dataModel update. Please scope the removal to dbt-ingestion-owned tags (e.g., t.getAppliedBy() equals Entity.INGESTION_BOT_NAME or another dbt bot identity) and ideally gate it on dataModel.getModelType() == DBT so non-dbt updates don’t reconcile bot-applied tags.
| if (stored.getTags() != null) { | ||
| java.util.List<String> incomingColTags = modelColumn.getTags() != null | ||
| ? modelColumn.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList()) | ||
| : java.util.Collections.emptyList(); | ||
| mergedColumnTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingColTags.contains(t.getTagFQN())); |
There was a problem hiding this comment.
Same issue for column tags: this removes every existing LabelType.AUTOMATED column tag not present in the incoming model, which can wipe automated tags applied by other bots/recognizers whenever a data model is updated. Please restrict the reconciliation to tags applied by the ingestion/dbt bot (e.g., Entity.INGESTION_BOT_NAME via t.getAppliedBy()) and consider only applying this logic for DBT dataModel.getModelType().
|
Hello! I am participating in the WeMakeDevs hackathon. Could a maintainer please assign the |
|
Hello team, could we get the |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| if (table.getTags() != null) { | ||
| java.util.List<String> incomingTags = dataModel.getTags() != null | ||
| ? dataModel.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList()) | ||
| : java.util.Collections.emptyList(); | ||
| mergedTableTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingTags.contains(t.getTagFQN())); | ||
| } |
There was a problem hiding this comment.
The reconciliation removes all existing AUTOMATED tags that are not present in the incoming dataModel.getTags(). Since AUTOMATED is also used for non-dbt automation (e.g., tier/certification/other workflows), this can unintentionally delete tags managed by other subsystems whenever /tables/{id}/dataModel is called. Consider restricting deletions to tags that are known to be dbt-managed (e.g., those present in the previous stored table.getDataModel().getTags() or another provenance signal), rather than all LabelType.AUTOMATED tags.
| if (stored.getTags() != null) { | ||
| java.util.List<String> incomingColTags = modelColumn.getTags() != null | ||
| ? modelColumn.getTags().stream().map(org.openmetadata.schema.type.TagLabel::getTagFQN).collect(java.util.stream.Collectors.toList()) | ||
| : java.util.Collections.emptyList(); | ||
| mergedColumnTags.removeIf(t -> t.getLabelType() == org.openmetadata.schema.type.TagLabel.LabelType.AUTOMATED && !incomingColTags.contains(t.getTagFQN())); | ||
| } |
There was a problem hiding this comment.
Same concern as table-level tags: this removes every existing column tag with LabelType.AUTOMATED if it’s missing from the incoming dbt column payload. Column-level automated tags can also be produced by other automation (e.g., classification workflows), so this risks deleting non-dbt tags on every dbt run. Narrow the deletion scope to dbt-managed column tags (e.g., compare against the previous stored table.getDataModel().getColumns() tags for the same column) instead of all automated tags.
| return listCount( | ||
| getTableName(), mySqlCondition, postgresCondition, team, Relationship.HAS.ordinal()); | ||
| getTableName(), mySqlCondition, postgresCondition, team, Relationship.HAS.ordinal(), filter.getQueryParams()); | ||
| } |
There was a problem hiding this comment.
Passing filter.getQueryParams() as a @BindMap can bind a raw team value to :team, which conflicts with the explicit @BindFQN("team") binding (expected to hash the team). This can lead to incorrect/unstable binding for :team (and potentially wrong results) when the team filter is used. Pass a copy of the params map with team removed (or rename one of the bind parameters) to avoid double-binding :team.
| return listBefore( | ||
| getTableName(), | ||
| mySqlCondition, | ||
| postgresCondition, | ||
| team, | ||
| limit, | ||
| beforeName, | ||
| beforeId, | ||
| Relationship.HAS.ordinal()); | ||
| Relationship.HAS.ordinal(), | ||
| filter.getQueryParams()); |
There was a problem hiding this comment.
Same :team double-binding risk as in listCount: @BindFQN("team") and @BindMap(filter.getQueryParams()) can both bind :team when a team filter is present. Use a params map with team removed (or use a different bind name) before passing it into the DAO method.
| return listAfter( | ||
| getTableName(), | ||
| mySqlCondition, | ||
| postgresCondition, | ||
| team, | ||
| limit, | ||
| afterName, | ||
| afterId, | ||
| Relationship.HAS.ordinal()); | ||
| Relationship.HAS.ordinal(), | ||
| filter.getQueryParams()); |
There was a problem hiding this comment.
Same :team double-binding risk as in listCount/listBefore: @BindFQN("team") can be overridden/conflicted by @BindMap(filter.getQueryParams()) if the query params include team. Remove team from the bind map (or rename binds) before passing params to the SQL method.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Code Review ✅ Approved 2 resolved / 2 findingsRemoves Automated tags absent from incoming dbt schema, addressing UTF-16LE encoding in DummyBackendTest.java and cleaning up unnecessary fully-qualified class names. No issues remain. ✅ 2 resolved✅ Bug: DummyBackendTest.java is UTF-16LE encoded, will break builds
✅ Quality: Unnecessary fully-qualified class names; imports already exist
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
Hi @aniruddhaadak80, thank you for your pr here, but we have another pr that is being developed here, please make sure that you are assigned to an issue before adding a fix to it, we will reopen this pr if necessary. |
ok , thank you for transparency. |
Fixes #26737. Adds a reconciliation step in addDataModel to remove tags that were automatically applied by the dbt ingestion bot but are now absent from the incoming dataModel payload, resolving the indefinitely persisting tags bug.