Skip to content

fix(starrocks): query INFORMATION_SCHEMA for table comments#27035

Closed
yash-agarwa-l wants to merge 12 commits intoopen-metadata:mainfrom
yash-agarwa-l:fix
Closed

fix(starrocks): query INFORMATION_SCHEMA for table comments#27035
yash-agarwa-l wants to merge 12 commits intoopen-metadata:mainfrom
yash-agarwa-l:fix

Conversation

@yash-agarwa-l
Copy link
Copy Markdown

@yash-agarwa-l yash-agarwa-l commented Apr 4, 2026

Describe your changes:

Fixes #26692

I worked on fixing missing table descriptions for the StarRocks connector because
StarRocks places COMMENT "..." on a separate line in SHOW CREATE TABLE output,
which SQLAlchemy's MySQL parser silently drops.

Changes:

  • Added get_table_comment helper in starrocks/utils.py that queries
    INFORMATION_SCHEMA.TABLES directly
  • Overrode get_table_description() in starrocks/metadata.py to delegate to
    the helper, bypassing broken DDL reflection
  • Added 3 unit tests covering success, empty comment, and exception paths

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes #26692: Fetch StarRocks table comments from INFORMATION_SCHEMA
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: N/A — Python-only change.
  • I have added a test that covers the exact scenario we are fixing.

Summary by Gitar

  • Maintenance:
    • Regenerated ingestion/.basedpyright/baseline.json following a merge with main to align static analysis baselines.

This will update automatically on new commits.

@yash-agarwa-l yash-agarwa-l requested a review from a team as a code owner April 4, 2026 08:50
Copilot AI review requested due to automatic review settings April 4, 2026 08:50
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes StarRocks table description ingestion by fetching table comments directly from INFORMATION_SCHEMA.TABLES, avoiding SQLAlchemy/MySQL DDL reflection issues where SHOW CREATE TABLE drops multi-line COMMENT output.

Changes:

  • Updated StarRocks get_table_description() to query INFORMATION_SCHEMA.TABLES for TABLE_COMMENT.
  • Added StarRocks unit tests covering comment present, empty comment, missing table row, and execution exception cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
ingestion/src/metadata/ingestion/source/database/starrocks/metadata.py Switches table description retrieval to an INFORMATION_SCHEMA query using STARROCKS_TABLE_COMMENTS.
ingestion/tests/unit/topology/database/test_starrocks.py Adds unit tests validating the new get_table_description() behavior across key scenarios.

Comment on lines +273 to +278
def get_table_description(
schema_name: str, table_name: str, inspector: Inspector
self, schema_name: str, table_name: str, inspector: Inspector
) -> Optional[str]:
description = None
# Fixes #26692: StarRocks formats SHOW CREATE TABLE with COMMENT on a
# separate line, which SQLAlchemy's MySQL dialect parser cannot parse.
# Query INFORMATION_SCHEMA.TABLES directly to get the table comment.
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_table_description now overrides CommonDbSourceService.get_table_description (a @staticmethod) with an instance method and also no longer uses the inspector parameter. This will likely trigger pylint arguments-differ and unused-argument on this module. Consider either keeping it as a @staticmethod and using inspector, or add an explicit # pylint: disable=arguments-differ,unused-argument on the method (as done in other connectors, e.g. athena/metadata.py).

Copilot uses AI. Check for mistakes.
Comment on lines +279 to +286
try:
table_info: dict = inspector.get_table_comment(table_name, schema_name)
result = self.connection.execute(
sql.text(STARROCKS_TABLE_COMMENTS),
{"schema": schema_name, "table_name": table_name},
)
row = result.first()
if row:
return row[0] or None
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation duplicates the StarRocks table-comment query logic that already exists in metadata.ingestion.source.database.starrocks.utils.get_table_comment (which is @reflection.cache-decorated). To avoid divergence and keep comment-fetching logic in one place, consider reusing the existing helper (or extracting a single shared function) instead of re-implementing the query here.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yash-agarwa-l please check this. We may need to make the changes in metadata.ingestion.source.database.starrocks.utils.get_table_comment

…adata#26692)

StarRocks DDL places COMMENT on a separate line, causing SQLAlchemy's
MySQL parser to silently drop table descriptions. Override
get_table_description() to fetch TABLE_COMMENT directly from
INFORMATION_SCHEMA.TABLES instead.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@yash-agarwa-l yash-agarwa-l changed the title fix(starrocks): query INFORMATION_SCHEMA for table comments (#26692) fix(starrocks): query INFORMATION_SCHEMA for table comments Apr 4, 2026
@yash-agarwa-l
Copy link
Copy Markdown
Author

Hi team, this PR is ready for review. Could a maintainer please add the safe to test label so the CI workflows can execute? Thank you!

Comment on lines +279 to +286
try:
table_info: dict = inspector.get_table_comment(table_name, schema_name)
result = self.connection.execute(
sql.text(STARROCKS_TABLE_COMMENTS),
{"schema": schema_name, "table_name": table_name},
)
row = result.first()
if row:
return row[0] or None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yash-agarwa-l please check this. We may need to make the changes in metadata.ingestion.source.database.starrocks.utils.get_table_comment

@TeddyCr TeddyCr added the safe to test Add this label to run secure Github workflows on PRs label Apr 7, 2026
Copilot AI review requested due to automatic review settings April 7, 2026 18:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +288 to +294
result = self.connection.execute(
sql.text(STARROCKS_TABLE_COMMENTS),
{"schema": schema_name, "table_name": table_name},
)
row = result.first()
if row:
return row[0] or None
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This re-implements the StarRocks table-comment query even though metadata.ingestion.source.database.starrocks.utils.get_table_comment already exists and uses the same STARROCKS_TABLE_COMMENTS query. To avoid the two implementations diverging, consider fixing/reusing the helper (e.g., have it return the actual comment string in {"text": ...}) and call it from here, rather than duplicating the SQL + result parsing in multiple places.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 7, 2026

🟡 Playwright Results — all passed (10 flaky)

✅ 3220 passed · ❌ 0 failed · 🟡 10 flaky · ⏭️ 78 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 3 747 0 1 7
🟡 Shard 4 757 0 2 18
🟡 Shard 5 686 0 1 41
🟡 Shard 6 732 0 5 8
🟡 10 flaky test(s) (passed on retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Set & Update all CP types on apiCollection (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Database Schema (shard 4, 1 retry)
  • Pages/EntityDataSteward.spec.ts › User as Owner Add, Update and Remove (shard 5, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for mlModel -> table (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Inactive Announcement create & delete (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Addresses review feedback. utils.get_table_comment was returning the
raw SQLAlchemy Row tuple as the comment value instead of the string.
Fixed it to extract row[0], and simplified get_table_description in
metadata.py to delegate cleanly to the util — single source of truth
for comment extraction. Tests updated accordingly.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

TeddyCr
TeddyCr previously approved these changes Apr 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@yash-agarwa-l
Copy link
Copy Markdown
Author

yash-agarwa-l commented Apr 16, 2026

Hi @TeddyCr , Is there a possibility to rerun only the failed checks from your side? Or is a dummy commit preferred to retrigger the ci from my side?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@yash-agarwa-l
Copy link
Copy Markdown
Author

Hi @TeddyCr, Could you please look into this?

@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

@@ -42,7 +42,7 @@ def get_table_comment(_, connection, table_name, schema=None, **kw):
{"table_name": table_name, "schema": schema},
**kw,
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_table_comment forwards **kw to connection.execute(...). If this helper is ever called through SQLAlchemy reflection (it’s @reflection.cache-decorated), kw commonly contains reflection-only keys like info_cache, which Connection.execute does not accept and would raise TypeError. Consider not forwarding **kw to execute (or explicitly popping/ignoring non-execution kwargs like info_cache).

Suggested change
**kw,

Copilot uses AI. Check for mistakes.
Comment on lines +262 to 272
# pylint: disable=arguments-differ,unused-argument
def get_table_description(self, schema_name: str, table_name: str, inspector: Inspector) -> Optional[str]:
description = None
try:
table_info: dict = inspector.get_table_comment(table_name, schema_name)
table_info = get_table_comment(None, self.connection, table_name, schema=schema_name)
except Exception as exc: # pylint: disable=broad-except
logger.debug(traceback.format_exc())
logger.warning(f"Table description error for table [{schema_name}.{table_name}]: {exc}")
else:
description = table_info.get("text")

if description is None:
return None
if isinstance(description, (list, tuple)) and len(description) > 0:
return description[0]
return description
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Override changes @staticmethod to instance method, breaking contract

The parent class CommonDbSourceService.get_table_description is declared as @staticmethod (and @abstractmethod in the base SqlAlchemySource). The override in StarRocksSource changes it to an instance method (self as first parameter). While Python allows this at runtime (calling self.get_table_description(...) will work because self is simply consumed as the first positional arg), it violates the method contract, triggers the arguments-differ pylint warning (hence the suppression comment), and loses the @calculate_execution_time() decorator that the parent applies.

This also means any code that calls the method as a static method (e.g., StarRocksSource.get_table_description(schema, table, inspector)) would fail because self is now expected.

A cleaner approach: keep it as a @staticmethod and pass connection explicitly, or access self.connection but properly override without suppressing arguments-differ.

Suggested fix:

# Keep the @staticmethod contract and pass connection:
@staticmethod
@calculate_execution_time()
def get_table_description(
    schema_name: str, table_name: str, inspector: Inspector
) -> Optional[str]:
    description = None
    try:
        table_info = inspector.get_table_comment(table_name, schema_name)
    except Exception as exc:
        ...
    else:
        description = table_info.get("text")
    return description
# And patch inspector.get_table_comment via the dialect instead.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

assert RELKIND_MAP["ICEBERG"] == TableType.Iceberg


class TestStarRocksGetTableDescription(TestCase):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Tests use unittest.TestCase; project prefers plain pytest

Per the project's testing guidelines, new tests should use pytest with plain assert statements and fixtures for setup, rather than inheriting from unittest.TestCase. The new TestStarRocksGetTableDescription and TestGetTableComment classes both extend TestCase. While functional, converting them to plain pytest classes (using @pytest.fixture instead of setUp) would be more consistent with the project's conventions.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

Comment on lines +262 to +263
# pylint: disable=arguments-differ,unused-argument
def get_table_description(self, schema_name: str, table_name: str, inspector: Inspector) -> Optional[str]:
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StarRocksSource.get_table_description now returns Optional[str], but the base CommonDbSourceService.get_table_description is annotated to return str. This override widens the return type and can trigger basedpyright/pyright incompatible-override errors. Consider keeping the return annotation as str to match the base signature, or updating the base method annotation to Optional[str] (and any other overrides) if None is a valid return everywhere.

Copilot uses AI. Check for mistakes.
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 27, 2026

Code Review ⚠️ Changes requested 0 resolved / 2 findings

Querying INFORMATION_SCHEMA for table comments introduces a breaking contract change by converting a static method to an instance method. The implementation also misses an execution time decorator and uses outdated unittest patterns instead of required pytest conventions.

⚠️ Bug: Override changes @staticmethod to instance method, breaking contract

📄 ingestion/src/metadata/ingestion/source/database/starrocks/metadata.py:262-272 📄 ingestion/src/metadata/ingestion/source/database/starrocks/metadata.py:263

The parent class CommonDbSourceService.get_table_description is declared as @staticmethod (and @abstractmethod in the base SqlAlchemySource). The override in StarRocksSource changes it to an instance method (self as first parameter). While Python allows this at runtime (calling self.get_table_description(...) will work because self is simply consumed as the first positional arg), it violates the method contract, triggers the arguments-differ pylint warning (hence the suppression comment), and loses the @calculate_execution_time() decorator that the parent applies.

This also means any code that calls the method as a static method (e.g., StarRocksSource.get_table_description(schema, table, inspector)) would fail because self is now expected.

A cleaner approach: keep it as a @staticmethod and pass connection explicitly, or access self.connection but properly override without suppressing arguments-differ.

Suggested fix
# Keep the @staticmethod contract and pass connection:
@staticmethod
@calculate_execution_time()
def get_table_description(
    schema_name: str, table_name: str, inspector: Inspector
) -> Optional[str]:
    description = None
    try:
        table_info = inspector.get_table_comment(table_name, schema_name)
    except Exception as exc:
        ...
    else:
        description = table_info.get("text")
    return description
# And patch inspector.get_table_comment via the dialect instead.
💡 Quality: Tests use unittest.TestCase; project prefers plain pytest

📄 ingestion/tests/unit/topology/database/test_starrocks.py:177 📄 ingestion/tests/unit/topology/database/test_starrocks.py:226

Per the project's testing guidelines, new tests should use pytest with plain assert statements and fixtures for setup, rather than inheriting from unittest.TestCase. The new TestStarRocksGetTableDescription and TestGetTableComment classes both extend TestCase. While functional, converting them to plain pytest classes (using @pytest.fixture instead of setUp) would be more consistent with the project's conventions.

🤖 Prompt for agents
Code Review: Querying INFORMATION_SCHEMA for table comments introduces a breaking contract change by converting a static method to an instance method. The implementation also misses an execution time decorator and uses outdated unittest patterns instead of required pytest conventions.

1. ⚠️ Bug: Override changes @staticmethod to instance method, breaking contract
   Files: ingestion/src/metadata/ingestion/source/database/starrocks/metadata.py:262-272, ingestion/src/metadata/ingestion/source/database/starrocks/metadata.py:263

   The parent class `CommonDbSourceService.get_table_description` is declared as `@staticmethod` (and `@abstractmethod` in the base `SqlAlchemySource`). The override in `StarRocksSource` changes it to an instance method (`self` as first parameter). While Python allows this at runtime (calling `self.get_table_description(...)` will work because `self` is simply consumed as the first positional arg), it violates the method contract, triggers the `arguments-differ` pylint warning (hence the suppression comment), and loses the `@calculate_execution_time()` decorator that the parent applies.
   
   This also means any code that calls the method as a static method (e.g., `StarRocksSource.get_table_description(schema, table, inspector)`) would fail because `self` is now expected.
   
   A cleaner approach: keep it as a `@staticmethod` and pass `connection` explicitly, or access `self.connection` but properly override without suppressing `arguments-differ`.

   Suggested fix:
   # Keep the @staticmethod contract and pass connection:
   @staticmethod
   @calculate_execution_time()
   def get_table_description(
       schema_name: str, table_name: str, inspector: Inspector
   ) -> Optional[str]:
       description = None
       try:
           table_info = inspector.get_table_comment(table_name, schema_name)
       except Exception as exc:
           ...
       else:
           description = table_info.get("text")
       return description
   # And patch inspector.get_table_comment via the dialect instead.

2. 💡 Quality: Tests use unittest.TestCase; project prefers plain pytest
   Files: ingestion/tests/unit/topology/database/test_starrocks.py:177, ingestion/tests/unit/topology/database/test_starrocks.py:226

   Per the project's testing guidelines, new tests should use `pytest` with plain `assert` statements and fixtures for setup, rather than inheriting from `unittest.TestCase`. The new `TestStarRocksGetTableDescription` and `TestGetTableComment` classes both extend `TestCase`. While functional, converting them to plain pytest classes (using `@pytest.fixture` instead of `setUp`) would be more consistent with the project's conventions.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment on lines +230 to +234
connection = MagicMock()
result = MagicMock()
result.mappings.return_value = iter(rows)
connection.execute.return_value = result
return connection
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_table_comment is @sqlalchemy.engine.reflection.cache-decorated and the decorator expects connection.info to be a real dict (it uses connection.info.setdefault(...) and membership checks). In _make_connection the connection is a bare MagicMock, so connection.info becomes another MagicMock and the cache wrapper can incorrectly treat it as a populated cache and return a MagicMock instead of executing the function. Set connection.info = {} (or pass info_cache={} when calling get_table_comment) to make these unit tests exercise the real row-parsing logic reliably.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
0.0% Coverage on New Code (required ≥ 20%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[StarRocks] Table comments not ingested / not shown

3 participants