fix(starrocks): query INFORMATION_SCHEMA for table comments#27035
fix(starrocks): query INFORMATION_SCHEMA for table comments#27035yash-agarwa-l wants to merge 12 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Fixes StarRocks table description ingestion by fetching table comments directly from INFORMATION_SCHEMA.TABLES, avoiding SQLAlchemy/MySQL DDL reflection issues where SHOW CREATE TABLE drops multi-line COMMENT output.
Changes:
- Updated StarRocks
get_table_description()to queryINFORMATION_SCHEMA.TABLESforTABLE_COMMENT. - Added StarRocks unit tests covering comment present, empty comment, missing table row, and execution exception cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| ingestion/src/metadata/ingestion/source/database/starrocks/metadata.py | Switches table description retrieval to an INFORMATION_SCHEMA query using STARROCKS_TABLE_COMMENTS. |
| ingestion/tests/unit/topology/database/test_starrocks.py | Adds unit tests validating the new get_table_description() behavior across key scenarios. |
| def get_table_description( | ||
| schema_name: str, table_name: str, inspector: Inspector | ||
| self, schema_name: str, table_name: str, inspector: Inspector | ||
| ) -> Optional[str]: | ||
| description = None | ||
| # Fixes #26692: StarRocks formats SHOW CREATE TABLE with COMMENT on a | ||
| # separate line, which SQLAlchemy's MySQL dialect parser cannot parse. | ||
| # Query INFORMATION_SCHEMA.TABLES directly to get the table comment. |
There was a problem hiding this comment.
get_table_description now overrides CommonDbSourceService.get_table_description (a @staticmethod) with an instance method and also no longer uses the inspector parameter. This will likely trigger pylint arguments-differ and unused-argument on this module. Consider either keeping it as a @staticmethod and using inspector, or add an explicit # pylint: disable=arguments-differ,unused-argument on the method (as done in other connectors, e.g. athena/metadata.py).
| try: | ||
| table_info: dict = inspector.get_table_comment(table_name, schema_name) | ||
| result = self.connection.execute( | ||
| sql.text(STARROCKS_TABLE_COMMENTS), | ||
| {"schema": schema_name, "table_name": table_name}, | ||
| ) | ||
| row = result.first() | ||
| if row: | ||
| return row[0] or None |
There was a problem hiding this comment.
This implementation duplicates the StarRocks table-comment query logic that already exists in metadata.ingestion.source.database.starrocks.utils.get_table_comment (which is @reflection.cache-decorated). To avoid divergence and keep comment-fetching logic in one place, consider reusing the existing helper (or extracting a single shared function) instead of re-implementing the query here.
There was a problem hiding this comment.
@yash-agarwa-l please check this. We may need to make the changes in metadata.ingestion.source.database.starrocks.utils.get_table_comment
…adata#26692) StarRocks DDL places COMMENT on a separate line, causing SQLAlchemy's MySQL parser to silently drop table descriptions. Override get_table_description() to fetch TABLE_COMMENT directly from INFORMATION_SCHEMA.TABLES instead.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi team, this PR is ready for review. Could a maintainer please add the safe to test label so the CI workflows can execute? Thank you! |
| try: | ||
| table_info: dict = inspector.get_table_comment(table_name, schema_name) | ||
| result = self.connection.execute( | ||
| sql.text(STARROCKS_TABLE_COMMENTS), | ||
| {"schema": schema_name, "table_name": table_name}, | ||
| ) | ||
| row = result.first() | ||
| if row: | ||
| return row[0] or None |
There was a problem hiding this comment.
@yash-agarwa-l please check this. We may need to make the changes in metadata.ingestion.source.database.starrocks.utils.get_table_comment
| result = self.connection.execute( | ||
| sql.text(STARROCKS_TABLE_COMMENTS), | ||
| {"schema": schema_name, "table_name": table_name}, | ||
| ) | ||
| row = result.first() | ||
| if row: | ||
| return row[0] or None |
There was a problem hiding this comment.
This re-implements the StarRocks table-comment query even though metadata.ingestion.source.database.starrocks.utils.get_table_comment already exists and uses the same STARROCKS_TABLE_COMMENTS query. To avoid the two implementations diverging, consider fixing/reusing the helper (e.g., have it return the actual comment string in {"text": ...}) and call it from here, rather than duplicating the SQL + result parsing in multiple places.
🟡 Playwright Results — all passed (10 flaky)✅ 3220 passed · ❌ 0 failed · 🟡 10 flaky · ⏭️ 78 skipped
🟡 10 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
Addresses review feedback. utils.get_table_comment was returning the raw SQLAlchemy Row tuple as the comment value instead of the string. Fixed it to extract row[0], and simplified get_table_description in metadata.py to delegate cleanly to the util — single source of truth for comment extraction. Tests updated accordingly.
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
|
Hi @TeddyCr , Is there a possibility to rerun only the failed checks from your side? Or is a dummy commit preferred to retrigger the ci from my side? |
|
Hi @TeddyCr, Could you please look into this? |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
| @@ -42,7 +42,7 @@ def get_table_comment(_, connection, table_name, schema=None, **kw): | |||
| {"table_name": table_name, "schema": schema}, | |||
| **kw, | |||
There was a problem hiding this comment.
get_table_comment forwards **kw to connection.execute(...). If this helper is ever called through SQLAlchemy reflection (it’s @reflection.cache-decorated), kw commonly contains reflection-only keys like info_cache, which Connection.execute does not accept and would raise TypeError. Consider not forwarding **kw to execute (or explicitly popping/ignoring non-execution kwargs like info_cache).
| **kw, |
| # pylint: disable=arguments-differ,unused-argument | ||
| def get_table_description(self, schema_name: str, table_name: str, inspector: Inspector) -> Optional[str]: | ||
| description = None | ||
| try: | ||
| table_info: dict = inspector.get_table_comment(table_name, schema_name) | ||
| table_info = get_table_comment(None, self.connection, table_name, schema=schema_name) | ||
| except Exception as exc: # pylint: disable=broad-except | ||
| logger.debug(traceback.format_exc()) | ||
| logger.warning(f"Table description error for table [{schema_name}.{table_name}]: {exc}") | ||
| else: | ||
| description = table_info.get("text") | ||
|
|
||
| if description is None: | ||
| return None | ||
| if isinstance(description, (list, tuple)) and len(description) > 0: | ||
| return description[0] | ||
| return description |
There was a problem hiding this comment.
⚠️ Bug: Override changes @staticmethod to instance method, breaking contract
The parent class CommonDbSourceService.get_table_description is declared as @staticmethod (and @abstractmethod in the base SqlAlchemySource). The override in StarRocksSource changes it to an instance method (self as first parameter). While Python allows this at runtime (calling self.get_table_description(...) will work because self is simply consumed as the first positional arg), it violates the method contract, triggers the arguments-differ pylint warning (hence the suppression comment), and loses the @calculate_execution_time() decorator that the parent applies.
This also means any code that calls the method as a static method (e.g., StarRocksSource.get_table_description(schema, table, inspector)) would fail because self is now expected.
A cleaner approach: keep it as a @staticmethod and pass connection explicitly, or access self.connection but properly override without suppressing arguments-differ.
Suggested fix:
# Keep the @staticmethod contract and pass connection:
@staticmethod
@calculate_execution_time()
def get_table_description(
schema_name: str, table_name: str, inspector: Inspector
) -> Optional[str]:
description = None
try:
table_info = inspector.get_table_comment(table_name, schema_name)
except Exception as exc:
...
else:
description = table_info.get("text")
return description
# And patch inspector.get_table_comment via the dialect instead.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| assert RELKIND_MAP["ICEBERG"] == TableType.Iceberg | ||
|
|
||
|
|
||
| class TestStarRocksGetTableDescription(TestCase): |
There was a problem hiding this comment.
💡 Quality: Tests use unittest.TestCase; project prefers plain pytest
Per the project's testing guidelines, new tests should use pytest with plain assert statements and fixtures for setup, rather than inheriting from unittest.TestCase. The new TestStarRocksGetTableDescription and TestGetTableComment classes both extend TestCase. While functional, converting them to plain pytest classes (using @pytest.fixture instead of setUp) would be more consistent with the project's conventions.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| # pylint: disable=arguments-differ,unused-argument | ||
| def get_table_description(self, schema_name: str, table_name: str, inspector: Inspector) -> Optional[str]: |
There was a problem hiding this comment.
StarRocksSource.get_table_description now returns Optional[str], but the base CommonDbSourceService.get_table_description is annotated to return str. This override widens the return type and can trigger basedpyright/pyright incompatible-override errors. Consider keeping the return annotation as str to match the base signature, or updating the base method annotation to Optional[str] (and any other overrides) if None is a valid return everywhere.
Code Review
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
| connection = MagicMock() | ||
| result = MagicMock() | ||
| result.mappings.return_value = iter(rows) | ||
| connection.execute.return_value = result | ||
| return connection |
There was a problem hiding this comment.
get_table_comment is @sqlalchemy.engine.reflection.cache-decorated and the decorator expects connection.info to be a real dict (it uses connection.info.setdefault(...) and membership checks). In _make_connection the connection is a bare MagicMock, so connection.info becomes another MagicMock and the cache wrapper can incorrectly treat it as a populated cache and return a MagicMock instead of executing the function. Set connection.info = {} (or pass info_cache={} when calling get_table_comment) to make these unit tests exercise the real row-parsing logic reliably.
|


Describe your changes:
Fixes #26692
I worked on fixing missing table descriptions for the StarRocks connector because
StarRocks places
COMMENT "..."on a separate line inSHOW CREATE TABLEoutput,which SQLAlchemy's MySQL parser silently drops.
Changes:
get_table_commenthelper instarrocks/utils.pythat queriesINFORMATION_SCHEMA.TABLESdirectlyget_table_description()instarrocks/metadata.pyto delegate tothe helper, bypassing broken DDL reflection
Type of change:
Checklist:
Fixes #26692: Fetch StarRocks table comments from INFORMATION_SCHEMASummary by Gitar
ingestion/.basedpyright/baseline.jsonfollowing a merge withmainto align static analysis baselines.This will update automatically on new commits.