fix(starrocks): ingest table comments from StarRocks (#26692) #27030
fix(starrocks): ingest table comments from StarRocks (#26692) #27030SaaiAravindhRaja wants to merge 4 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Updates the StarRocks ingestion logic to reliably ingest table-level COMMENT values by bypassing SQLAlchemy MySQL dialect reflection and querying information_schema.tables directly (fixing #26692).
Changes:
- Add a StarRocks-specific query (
STARROCKS_TABLE_COMMENTS) usage to retrieveTABLE_COMMENT. - Replace
inspector.get_table_comment(...)-based table description extraction with a directinformation_schemaquery.
| @staticmethod | ||
| def get_table_description( | ||
| schema_name: str, table_name: str, inspector: Inspector | ||
| self, schema_name: str, table_name: str, inspector: Inspector |
There was a problem hiding this comment.
inspector parameter is now unused, which will trigger unused-argument linting and makes the signature misleading. Consider renaming it to _inspector or explicitly disabling the unused-argument warning for this method while keeping the parameter to satisfy the base interface.
| self, schema_name: str, table_name: str, inspector: Inspector | |
| self, schema_name: str, table_name: str, _inspector: Inspector |
| with self.engine.connect() as conn: | ||
| rows = conn.execute( | ||
| sql.text(STARROCKS_TABLE_COMMENTS), | ||
| {"table_name": table_name, "schema": schema_name}, | ||
| ) | ||
| for row in rows: | ||
| description = row[0] | ||
| break |
There was a problem hiding this comment.
This opens a new SQLAlchemy connection for every table description lookup (self.engine.connect()), which can add significant overhead during large ingestions and bypasses the per-thread cached connection provided by CommonDbSourceService.connection (common_db_source.py:754-764). Prefer executing the query via self.connection (or inspector.bind) and fetching a single scalar row (e.g., fetchone/scalar) to avoid iterating the full result set.
| with self.engine.connect() as conn: | |
| rows = conn.execute( | |
| sql.text(STARROCKS_TABLE_COMMENTS), | |
| {"table_name": table_name, "schema": schema_name}, | |
| ) | |
| for row in rows: | |
| description = row[0] | |
| break | |
| row = self.connection.execute( | |
| sql.text(STARROCKS_TABLE_COMMENTS), | |
| {"table_name": table_name, "schema": schema_name}, | |
| ).fetchone() | |
| if row: | |
| description = row[0] |
| def get_table_description( | ||
| schema_name: str, table_name: str, inspector: Inspector | ||
| self, schema_name: str, table_name: str, inspector: Inspector | ||
| ) -> Optional[str]: | ||
| description = None | ||
| try: | ||
| table_info: dict = inspector.get_table_comment(table_name, schema_name) | ||
| with self.engine.connect() as conn: | ||
| rows = conn.execute( | ||
| sql.text(STARROCKS_TABLE_COMMENTS), | ||
| {"table_name": table_name, "schema": schema_name}, | ||
| ) | ||
| for row in rows: | ||
| description = row[0] | ||
| break | ||
| except Exception as exc: # pylint: disable=broad-except | ||
| logger.debug(traceback.format_exc()) | ||
| logger.warning( | ||
| f"Table description error for table [{schema_name}.{table_name}]: {exc}" | ||
| ) | ||
| else: | ||
| description = table_info.get("text") | ||
|
|
||
| if description is None: | ||
| return None | ||
| if isinstance(description, (list, tuple)) and len(description) > 0: | ||
| return description[0] | ||
| return description | ||
| return description or None |
There was a problem hiding this comment.
New logic for fetching StarRocks table comments is untested. There are existing unit tests for the StarRocks connector (e.g., ingestion/tests/unit/topology/database/test_starrocks.py), but none validating that get_table_description returns the TABLE_COMMENT from information_schema.tables. Adding a unit test that mocks the connection execute result would help prevent regressions for issue #26692.
|
@SaaiAravindhRaja can you please check copilot comments |
b6ab03d to
8e78525
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
@ulixius9 Worked on the copilot comments, thanks for the heads up 😄 |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Code Review ✅ Approved 1 resolved / 1 findingsAdds table comment ingestion support for StarRocks with a trailing newline correction. No issues found. ✅ 1 resolved✅ Bug: PR contains no functional change — only a trailing newline added
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
Hello @SaaiAravindhRaja, thanks for your pr here, but we have assigned this issue to another pr in progress. Please make sure that you are assigned to an issue but opening a PR to fix it, we will reopen this if needed |
Fixes #26692
Fixes table-level
COMMENTnot being ingested from StarRocks due to SQLAlchemy MySQL dialect reflection not parsing comments from multi-lineCREATE TABLEstatements.