Fix MSSQL ingestion failure when procedure names contain % by escapin…#16864
Fix MSSQL ingestion failure when procedure names contain % by escapin…#16864ritika-jha239 wants to merge 1 commit into
Conversation
|
Linear: ING-2140 Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned. |
|
Hii, this was my first PR. I'm open to feedback and guidance on the same. |
|
Hi @ritika-jha239 - thanks a lot for the contribution here, and great work on adding tests as well 🙌 Apologies for the delay on this, let me help get the right eyes on it and move this forward. |
…g before report_dropped
820c2f6 to
a4d23cd
Compare
|
Your PR has been assigned to @maggiehays (maggie) for review (ING-2140). |
kyungsoo-datahub
left a comment
There was a problem hiding this comment.
Before iterating further, I think the diagnosis may need a second look.
Blockers:
-
The fix appears to be in the wrong place. report_dropped at sql_report.py:77 is just:
def report_dropped(self, ent_name: str) -> None:
self.filtered.append(ent_name)
No % formatting happens here, so .replace("%", "%%") isn't preventing any crash — it just stores a corrupted name. A procedure called get%Invoices would appear in report.filtered as get%%Invoices, which would confuse anyone debugging dropped procedures later.
-
The tests don't exercise the production code. They never import source.py or call report_dropped; they build their own
msg = "Dropped: %s" % qualifiedline and assert against that. So the tests pass, but they don't verify the actual change. -
The PR description shows a different code change than the diff (escape inside .allowed() vs. inside report_dropped()). Worth aligning so the intended fix is clear.
To unblock: could you share the original traceback from the crash? The crash is almost certainly real, but it's likely somewhere else — a logger call using old-style % formatting, pyodbc parameter handling, or a LIKE-pattern query. With the stack trace we can pinpoint the actual line. The instinct (escape % when it breaks format strings) is right; we just need to apply it at the source of the crash.
I might be missing context from the original report. If there's a linked issue or error log, that'd help a lot.
What’s Fixed
MSSQL ingestion was failing whenever a stored procedure had a
%in its name. Python was interpreting%in the string incorrectly, causing the pipeline to crash. This PR ensures such procedure names are properly escaped before reporting.Compliance:
This PR conforms to DataHub’s contributing guidelines. Tests have been added for edge cases with
%in procedure names. No documentation or breaking changes are affected.Changes Made
%characters in procedure names before passing toreport_dropped.test_mssql_percent_procedure.pyto cover edge cases:%in procedure name%characters.gitignoretweaks to exclude temporary test filesFiles Modified
metadata-ingestion/src/datahub/ingestion/source/sql/mssql/source.pymetadata-ingestion/tests/unit/test_mssql_percent_procedure.py.gitignoreCode Snippet of Consideration
How to Test
Due to local environment/setup limitations, tests were verified by running the test file directly as a Python script: