fix(ingestion): replace MetaData.reflect() with direct SQL in Greenplum DDL extraction#27445
Conversation
…m DDL Fixes open-metadata#27405 - Replace MetaData.reflect() in get_all_table_ddls() with single pg_catalog SQL query, reduces ~16,000 catalog queries per schema down to 1 - Use pg_get_expr(adbin, adrelid) for defaults, adsrc removed in PG12+ - Add GREENPLUM_TABLE_DDLS in queries.py with DISTRIBUTED BY via gp_distribution_policy and WITH (reloptions) for Greenplum-specific DDL - Wire query through utils.py following the get_view_definition pattern - Remove dead CreateTable and MetaData imports from sqlalchemy_utils.py
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
athena, mysql, exasol pass query=None via the default get_table_ddl in sqlalchemy_utils.py. Without this guard, connection.execute(None) crashes. Early return preserves existing behaviour for connectors without a custom DDL query.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Code Review ✅ Approved 1 resolved / 1 findingsRefactors Greenplum DDL extraction to use direct SQL instead of MetaData.reflect(), resolving the regression that previously broke DDL generation for other connectors. ✅ 1 resolved✅ Bug: Removing MetaData.reflect() fallback breaks DDL for other connectors
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
PR up at #27445 addresses the MetaData.reflect() regression and the query=None fallback for other connectors (athena, mysql, exasol). Ready for review when you get a chance @PubChimps |
|
Hey @PubChimps, I've addressed the query=None regression flagged by the bot (fallback added for athena/mysql/exasol). Could you add the safe to test label when you get a chance so CI can run? Thanks! |
Fixes #27405
Describe your changes:
Fixes #27405
get_table_ddlinsqlalchemy_utils.pywas passingquery=Noneintoget_table_ddl_wrapper, which fell through toMetaData.reflect().This fires ~8 catalog queries per table, so ~16,000 per schema with 2000 tables. With 24+ schemas and the connection never committed,AccessShareLockpiles up for hours blocking other users.Fixed by wiring up the
queryparam that was already in the signature but swallowed by**kw. AddedGREENPLUM_TABLE_DDLSinqueries.pyand passed it throughutils.pyfollowing the same patternget_view_definitionuses.Used
pg_get_expr(d.adbin, d.adrelid)instead ofd.adsrc(removed in PG12).GREENPLUM_TABLE_DDLSincludesWITH (reloptions)andDISTRIBUTED BYviagp_distribution_policy. Removed deadCreateTableandMetaDataimports.Tested against local PostgreSQL 14. @KrasnovidKE confirmed
pg_get_exprworks on their live Greenplum 6 instance.Type of change:
Checklist:
Fixes #27405: Fix Greenplum DDL extraction causing thousands of catalog queries per schema