Feat: Add starrocks catalog support by chris-celerdata · Pull Request #8856 · marimo-team/marimo

chris-celerdata · 2026-03-24T18:18:00Z

📝 Summary

Implements #8765.

🔍 Description of Changes

This PR introduces a new engine which extends SQLAlchemyEngine. When possible (e.g. working with default_catalog), the inherited methods are used; otherwise special "external" methods are used.

Primary overridden functions:

get_databases
get_tables_in_schema
get_table_details

Before

After

Notes

Integration testing against a real StarRocks instance is not present. There is no "in-memory" equivalent like what DuckDB and SQLite have.

Arrays of variable types default greedily to their inner types. There is currently no support for showing complex types with different icons from a standard string (i.e. ARRAY<INT> as not simply an integer). I'm not sure if this is relevant, but I thought I would mention it anyway.

📋 Checklist

I have read the contributor guidelines.
For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
Tests have been added for the changes made.
Documentation has been updated where applicable, including docstrings for API changes.
Pull request title is a good summary of the changes - it will be used in the release notes.

vercel · 2026-03-24T18:18:08Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
marimo-docs	Ready	Preview, Comment	Apr 1, 2026 7:42pm

github-actions · 2026-03-24T18:18:13Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

chris-celerdata · 2026-03-24T18:19:38Z

I have read the CLA Document and I hereby sign the CLA

Copilot

Pull request overview

Adds first-class StarRocks support to marimo’s SQL datasource discovery by introducing a StarRocksEngine that can enumerate catalogs and introspect external-catalog objects via SHOW .../DESC SQL when SQLAlchemy’s inspector can’t.

Changes:

Implement StarRocksEngine(SQLAlchemyEngine) with multi-catalog discovery and external-catalog fallbacks.
Extend SQL identifier quoting to support the starrocks dialect (backtick style) and add related tests.
Register the new engine in engine detection and add StarRocks optional test dependency / dependency manager entry.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`marimo/_sql/engines/starrocks.py`	New engine implementing StarRocks catalog/database/table discovery with SQL fallbacks for external catalogs.
`marimo/_sql/get_engines.py`	Adds `StarRocksEngine` to `SUPPORTED_ENGINES` so it’s selected before generic `SQLAlchemyEngine`.
`marimo/_sql/sql_quoting.py`	Adds `starrocks` to backtick-quoting dialects.
`marimo/_sql/sql.py`	Updates unsupported-engine error message to mention StarRocks.
`marimo/_dependencies/dependencies.py`	Adds `DependencyManager.starrocks`.
`pyproject.toml`	Adds `starrocks` to `test-optional` and bans module-level `starrocks` imports.
`tests/_sql/test_starrocks.py`	New unit tests for StarRocks engine behavior using mocks.
`tests/_sql/test_sql_quoting.py`	Adds StarRocks quoting/roundtrip cases and qualified-name quoting case.
`tests/_sql/test_get_engines.py`	Adds StarRocks engine selection test and datasource connection mapping assertion.

codecov · 2026-03-24T20:13:29Z

Bundle Report

Changes will increase total bundle size by 559 bytes (0.0%) ⬆️. This is within the configured threshold ✅

Detailed changes

Bundle name	Size	Change
marimo-esm	24.81MB	559 bytes (0.0%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: marimo-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`assets/ai-*.js`	555 bytes	248.9kB	0.22%
`assets/markdown-*.js`	4 bytes	5.12kB	0.08%

Light2Dark · 2026-03-25T03:54:18Z

hi @chris-celerdata, this sounds similar to an issue with other multi-db engines like Snowflake. There is a start of the fix here #8824

Would you mind having a look and seeing if that direction would fix this problem universally, rather than custom support for Starrocks? If not, we can also have a look (wish there was Starrocks cloud haha)

chris-celerdata · 2026-03-25T23:13:30Z

There is a StarRocks cloud. It can also be deployed on Docker. If you want to work with external catalogs, the documentation can be found here. However, the default_catalog being separated in the sidebar should be sufficient for testing.

I took a look at the other PR and it does not fix the problem; SQLAlchemyEngine is essentially untouched and the primary problem is that get_schemas and get_databases expect a singular database. The current implementation of get_databases in sqlalchemy.pywill only return one or zero databases. There is not an easy fix because SQLAlchemy does not have a standardized method for multi-database introspection (i.e. get_database_names or get_catalog_names). Any ideas for a more general approach would be appreciated!

Note: I considered doing something similar to get_default_database. In this function, there is a block that defines what SQL to call.

dialect_queries = {
    "postgresql": "SELECT current_database()",
    "mssql": "SELECT DB_NAME()",
    "timeplus": "SELECT current_database()",
}

This approach could be duplicated into get_databases for the relevant dialects. Would this be a better approach than a new engine?

Light2Dark · 2026-03-26T04:58:04Z

I think that's the eventual direction. That PR decouples get_schema, so eventually we can call get_databases and get all databases connected.

I'm okay to have this too, but there'll be some code drift once we merge that. In the meantime, would you want to create the frontend UI changes for adding a starrocks connection?

chris-celerdata · 2026-03-26T16:03:14Z

I'd love to add StarRocks to the UI. I was referencing #8169, which talked about keeping the UI uncluttered. Has that changed?
Also, to be clear, you are talking about adding StarRocks to this dialog?

Light2Dark · 2026-03-27T01:54:59Z

@chris-celerdata , yeah. But I'm checking with the team on if we're okay to add to the UI.

The PR referenced #8824 has been merged, does this change this PR? Sorry for the back and forth on this

chris-celerdata · 2026-03-30T17:46:50Z

I have pushed a commit with changes based on the merge. Any comments would be appreciated!

Light2Dark · 2026-03-31T01:15:25Z

Thank you! This is likely to get merged in soon, which is closer to global solution we talked about.
#8920

Light2Dark · 2026-04-01T17:01:49Z

Hey @chris-celerdata! We're good to add StarRocks in the UI if you're still down.

mscolnick

sorry approved too early. i think it makes sense to pull in #8920 to simplify some logic.

we can add the UI changes in another PR / followup

for more information, see https://pre-commit.ci

chris-celerdata · 2026-04-01T19:36:44Z

I have merged the main branch into this branch and made the changes to sqlalchemy.py in order for StarRocks to work. #8920 does simplify the logic. It should be ready for review!

mscolnick

looks great!

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

marimo/_sql/engines/sqlalchemy.py:170

_get_inspector executes SET CATALOG ... on a pooled connection but never restores the previous catalog. That session-level state can leak to later operations that reuse the same DBAPI connection (e.g., queries/introspection happening after discovery may run against the last catalog visited). Consider capturing the current catalog at entry and restoring it in a finally block, or otherwise resetting connection state before returning it to the pool.

        from sqlalchemy import inspect, text

        _use_database_dialect_command: dict[str, str] = {
            "snowflake": f"USE DATABASE {self._quote_identifier(database)}",
            "starrocks": f"SET CATALOG {self._quote_identifier(database)}",
        }
        dialect_command = _use_database_dialect_command.get(self.dialect)

        if dialect_command is not None:
            with self._connection.connect() as connection:
                connection.execute(text(dialect_command))

Copilot · 2026-04-01T20:12:13Z

        database_name: Optional[str] = None
        dialect_queries = {
            "postgresql": "SELECT current_database()",
            "mssql": "SELECT DB_NAME()",
            "timeplus": "SELECT current_database()",
+            "starrocks": "SELECT CATALOG()",
        }


For StarRocks, get_default_database() adds a SELECT CATALOG() fallback, but earlier in the method it returns self._connection.url.database when present. On StarRocks URLs, that value is typically the database/schema, not the catalog, which can make default_database inconsistent with _get_database_names() (catalog list) and break default highlighting/qualification in the UI. Consider skipping the URL-database fast path when dialect == 'starrocks' and always resolving the default catalog via CATALOG() (leaving the URL database to default_schema).

Copilot · 2026-04-01T20:12:14Z

+            return [str(row[0]) for row in result.fetchall()]
+


_get_starrocks_database_names() assumes the catalog name is always in row[0]. To make this more robust across dialect/driver variations, consider selecting the column by name (using result.keys() like the Snowflake implementation) and raising a clear error if the expected column is missing.

Suggested change

return [str(row[0]) for row in result.fetchall()]

columns = list(result.keys())

# StarRocks generally returns a "Catalog" column for SHOW CATALOGS,

# but we defensively support a few plausible variants.

candidate_columns = ("Catalog", "catalog_name", "name")

name_col_index: Optional[int] = None

for col_name in candidate_columns:

if col_name in columns:

name_col_index = columns.index(col_name)

break

if name_col_index is None:

raise RuntimeError(

"Unexpected SHOW CATALOGS result: expected one of "

f"{candidate_columns!r} columns, but got {columns!r}"

)

return [str(row[name_col_index]) for row in result.fetchall()]

Copilot · 2026-04-01T20:12:14Z

@@ -361,8 +376,11 @@
        Returns a single-element list with the default database when
        the dialect has no dedicated discovery mechanism.
        """
-        if self.dialect.lower() == "snowflake":
+        dialect = self.dialect.lower()
+        if dialect == "snowflake":
            return self._get_snowflake_database_names()
+        if dialect == "starrocks":
+            return self._get_starrocks_database_names()


The StarRocks-specific paths (SET CATALOG in _get_inspector, SHOW CATALOGS discovery, and the starrocks branch in _get_database_names) aren’t covered by existing SQLAlchemyEngine tests. Since this file already has unit tests (including Snowflake mocking), consider adding a mocked StarRocks test to assert the emitted SQL and that the discovery branch is exercised.

Copilot · 2026-04-01T20:12:15Z

+    def _get_starrocks_database_names(self) -> list[str]:
+        """Get catalog names for StarRocks via 'SHOW CATALOGS'.
+
+        StarRocks uses a three-level hierarchy (Catalog → Database → Table)
+        which maps to marimo's (Database → Schema → Table).
+        """


PR description says it “introduces a new engine which extends SQLAlchemyEngine”, but there doesn’t appear to be a StarRocksEngine implementation in the codebase; instead the StarRocks behavior is implemented directly in SQLAlchemyEngine. Please either update the PR description to match the implementation or add the intended engine subclass + registration (if that’s still the plan).

Copilot · 2026-04-01T20:12:15Z

    boto3 = Dependency("boto3")

    redshift_connector = Dependency("redshift_connector")
+    starrocks = Dependency("starrocks")


DependencyManager.starrocks is introduced but appears unused (no references in the repo). If StarRocks isn’t being explicitly dependency-gated anywhere, consider removing this entry; otherwise, add the corresponding require_many/feature checks so this dependency record has an effect.

Suggested change

starrocks = Dependency("starrocks")

Copilot · 2026-04-01T20:12:15Z

    "clickhouse-connect>=0.8.18",
    "redshift-connector[full]>=2.1.7",
+    # For testing starrocks
+    "starrocks>=1.3.0",


starrocks>=1.3.0 is added to the test-optional dependency group, but the current test changes don’t import or require the StarRocks package (quoting tests are pure string logic). If there aren’t additional StarRocks tests elsewhere in the PR, consider removing this optional dependency until it’s needed, to keep the optional test environment lighter.

Suggested change

"starrocks>=1.3.0",

Copilot · 2026-04-01T20:12:16Z

        else:
            raise ValueError(
-                "Unsupported engine. Must be a SQLAlchemy, Ibis, Clickhouse, DuckDB, Redshift or DBAPI 2.0 compatible engine."
+                "Unsupported engine. Must be a SQLAlchemy, Ibis, Clickhouse, DuckDB, Redshift, StarRocks or DBAPI 2.0 compatible engine."


The updated error message lists StarRocks as a separate supported engine type, but StarRocks connections are still surfaced via the generic SQLAlchemy engine in SUPPORTED_ENGINES. Consider rewording to avoid implying a distinct engine class (e.g., “SQLAlchemy (including StarRocks dialect)” or similar).

Suggested change

"Unsupported engine. Must be a SQLAlchemy, Ibis, Clickhouse, DuckDB, Redshift, StarRocks or DBAPI 2.0 compatible engine."

"Unsupported engine. Must be a SQLAlchemy (including StarRocks dialect), Ibis, Clickhouse, DuckDB, Redshift, or DBAPI 2.0 compatible engine."

github-actions · 2026-04-01T21:19:52Z

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.22.1-dev22

chris-celerdata added 6 commits March 12, 2026 10:41

initial commit

81c7672

refactored inheritance

f5da6fc

remove comments

34f70e7

added generic tests

227f305

resolve pycheck

f8310c0

Merge branch 'main' into add-starrocks

0c95ead

chris-celerdata requested review from Light2Dark and akshayka as code owners March 24, 2026 18:18

vercel Bot deployed to Preview March 24, 2026 18:19 View deployment

mscolnick requested a review from Copilot March 24, 2026 19:40

Copilot started reviewing on behalf of mscolnick March 24, 2026 19:40 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Comment thread marimo/_sql/sql_quoting.py

Comment thread marimo/_sql/engines/starrocks.py Outdated

Comment thread marimo/_sql/engines/starrocks.py Outdated

Comment thread tests/_sql/test_starrocks.py Outdated

dmadisetti added the enhancement New feature or request label Mar 24, 2026

fix copilot review

aeead7b

vercel Bot deployed to Preview March 24, 2026 21:18 View deployment

dmadisetti linked an issue Mar 25, 2026 that may be closed by this pull request

New Database: StarRocks #8765

Closed

1 task

chris-celerdata added 2 commits March 27, 2026 09:01

Merge remote-tracking branch 'upstream/main' into add-starrocks

d87ce6d

merge main branch

a4b11d2

vercel Bot deployed to Preview March 30, 2026 17:45 View deployment

fix py-check

fe0226c

vercel Bot deployed to Preview March 30, 2026 18:05 View deployment

mscolnick approved these changes Apr 1, 2026

View reviewed changes

mscolnick requested changes Apr 1, 2026

View reviewed changes

merge main and add changes

9de8ec3

vercel Bot deployed to Preview April 1, 2026 19:22 View deployment

merge main

e6d52ba

vercel Bot deployed to Preview April 1, 2026 19:30 View deployment

[pre-commit.ci] auto fixes from pre-commit.com hooks

df32d78

for more information, see https://pre-commit.ci

vercel Bot deployed to Preview April 1, 2026 19:32 View deployment

remove import

2884c49

vercel Bot deployed to Preview April 1, 2026 19:42 View deployment

mscolnick approved these changes Apr 1, 2026

View reviewed changes

mscolnick requested a review from Copilot April 1, 2026 20:06

Copilot started reviewing on behalf of mscolnick April 1, 2026 20:07 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

mscolnick merged commit 14e0e13 into marimo-team:main Apr 1, 2026
47 of 51 checks passed

-            return [str(row[0]) for row in result.fetchall()]
+            columns = list(result.keys())
+            # StarRocks generally returns a "Catalog" column for SHOW CATALOGS,
+            # but we defensively support a few plausible variants.
+            candidate_columns = ("Catalog", "catalog_name", "name")
+            name_col_index: Optional[int] = None
+            for col_name in candidate_columns:
+                if col_name in columns:
+                    name_col_index = columns.index(col_name)
+                    break
+            if name_col_index is None:
+                raise RuntimeError(
+                    "Unexpected SHOW CATALOGS result: expected one of "
+                    f"{candidate_columns!r} columns, but got {columns!r}"
+                )
+            return [str(row[name_col_index]) for row in result.fetchall()]

	"Unsupported engine. Must be a SQLAlchemy, Ibis, Clickhouse, DuckDB, Redshift, StarRocks or DBAPI 2.0 compatible engine."
	"Unsupported engine. Must be a SQLAlchemy (including StarRocks dialect), Ibis, Clickhouse, DuckDB, Redshift, or DBAPI 2.0 compatible engine."

Conversation

chris-celerdata commented Mar 24, 2026

📝 Summary

🔍 Description of Changes

Notes

📋 Checklist

Uh oh!

vercel Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-celerdata commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bundle Report

Affected Assets, Files, and Routes:

Assets Changed:

Uh oh!

Light2Dark commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-celerdata commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Light2Dark commented Mar 26, 2026

Uh oh!

chris-celerdata commented Mar 26, 2026

Uh oh!

Light2Dark commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-celerdata commented Mar 30, 2026

Uh oh!

Light2Dark commented Mar 31, 2026

Uh oh!

Light2Dark commented Apr 1, 2026

Uh oh!

mscolnick left a comment

Choose a reason for hiding this comment

Uh oh!

chris-celerdata commented Apr 1, 2026

Uh oh!

mscolnick left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

vercel Bot commented Mar 24, 2026 •

edited

Loading

github-actions Bot commented Mar 24, 2026 •

edited

Loading

codecov Bot commented Mar 24, 2026 •

edited

Loading

Light2Dark commented Mar 25, 2026 •

edited

Loading

chris-celerdata commented Mar 25, 2026 •

edited

Loading

Light2Dark commented Mar 27, 2026 •

edited

Loading