Text-to-SQL: Help agents turn natural language into SQL queries by amotl · Pull Request #762 · crate/cratedb-toolkit

amotl · 2026-04-15T09:19:07Z

About

Exploring Text-to-SQL with LlamaIndex and frontier and non-frontier models.

What's inside

DataQuery is the little sister of Vanna AI and Google's QueryData product, tailored to CrateDB.

Documentation

Preview: https://cratedb-toolkit--762.org.readthedocs.build/query/nlsql/

Preview

alias ask='uvx --from="cratedb-toolkit[nlsql] @ git+https://github.com/crate/cratedb-toolkit@llm" ctk query nlsql'
export CRATEDB_CLUSTER_URL=crate://crate@localhost:4200/
export LLM_PROVIDER=openrouter
export OPENROUTER_API_KEY=sk-or-v1-cf4a824.....

ask "What is the average value for sensor 1?"

References

coderabbitai · 2026-04-15T09:19:23Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 53341b4b-0328-4c36-a1a2-1f817e0c3ed8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Adds NLSQL: model and database dataclasses, LLM configuration and embedding suppression, a DataQuery adapter using LlamaIndex, a read‑only SQL gateway, a new nlsql CLI subcommand, tests, documentation and changelog updates, an nlsql extra and CI workflow; refactors parent query CLI to use make_cli().

Changes

Cohort / File(s)	Summary
Core NL-to-SQL Implementation `cratedb_toolkit/query/nlsql/api.py`, `cratedb_toolkit/query/nlsql/model.py`, `cratedb_toolkit/query/nlsql/util.py`, `cratedb_toolkit/query/nlsql/sqlgate.py`	New NLSQL modules: `DataQuery`, `ModelInfo`/`DatabaseInfo`, `ModelProvider` enum, LLM configuration (`configure_llm`), embedding suppression (`disable_embeddings`), SQL gateway (`enable_sql_gateway`), SQL classification and protected `run_sql`. Heavy new logic and integration points with LlamaIndex.
CLI Integration `cratedb_toolkit/query/nlsql/cli.py`, `cratedb_toolkit/query/cli.py`	Add `nlsql` Click command (stdin support, env overrides, JSON output). Parent CLI refactored to construct `cli` via `make_cli()` and register `nlsql` subcommand.
Tests `tests/query/test_nlsql.py`, `tests/query/test_convert.py`	Add end-to-end `nlsql` tests (Py≥3.10, provider-key gated) with DB provisioning and permission tests. Update `test_convert.py` to call `convert_query` CLI directly and improve failure assertion messaging.
Docs & Changelog `CHANGES.md`, `doc/query/index.md`, `doc/query/nlsql/index.md`, `doc/query/nlsql/backlog.md`	Add changelog entry and Sphinx docs: install, CLI usage, API example, provider notes, safety guidance, examples and backlog.
Dependencies & Packaging `pyproject.toml`, `README.md`	Add `optional-dependencies.nlsql` with multiple `llama-index-llms-` extras, include `nlsql` in `full` extra, add `roman-numerals-py<4` to docs extras, add `llama_index.*` to Ruff allowlist, and add `nlsql` pytest marker. Update README CI badges.
CI Workflows `.github/workflows/nlsql.yml`, `.github/workflows/main.yml`	Add new `nlsql` GitHub Actions workflow running `pytest -m nlsql` (matrixed Python 3.10/3.14, secrets for provider keys); minor whitespace tweak in main workflow.
Tests/CI wiring (misc) `tests/query/`, `.github/workflows/`	New workflow targets nlsql code and tests; test runner installs extras `.[nlsql,test]` and provides provider secrets for end-to-end provider-based tests.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant CLI as nlsql CLI
    participant DataQuery as DataQuery
    participant Engine as NLSQLTableQueryEngine
    participant LLM as LLM Adapter
    participant DB as CrateDB

    User->>CLI: provide question (arg or stdin)
    CLI->>CLI: parse args, build DatabaseInfo & ModelInfo
    CLI->>DataQuery: instantiate DataQuery(db, model)
    DataQuery->>LLM: configure_llm(ModelInfo)
    DataQuery->>Engine: create NLSQLTableQueryEngine(LLM, SQLDatabase)
    User->>DataQuery: ask(question)
    DataQuery->>Engine: query(question)
    Engine->>LLM: request NL→SQL transformation
    LLM-->>Engine: return SQL string/plan
    Engine->>DB: execute SQL (via protected run_sql)
    DB-->>Engine: return rows
    Engine-->>DataQuery: structured response (answer + metadata)
    DataQuery-->>CLI: return response
    CLI-->>User: print JSON (question, answer, metadata)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

QA: Migrate type checker from mypy to ty #725 — Touches the same query CLI entrypoint and subcommand registration; strongly related.

Poem

"I nibble bytes and hop on keys,
I turn your questions into trees,
From fuzzy prompts to tidy SQL art,
A rabbit's nudge gets queries to start,
Hooray — the data dance begins!" 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main feature: a Text-to-SQL system that helps agents convert natural language queries into SQL, which is the central objective of this pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description comprehensively relates to the changeset, covering the main feature (Text-to-SQL with DataQuery), usage patterns, documentation, and relevant references.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch llm

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/query/test_nlsql.py (1)
29-60: ⚠️ Potential issue | 🟡 Minor

provision_db fixture is not idempotent; reset the table before seeding.

The DDL uses plain CREATE TABLE (no IF NOT EXISTS) while the fixture is function-scoped and test_query_nlsql_openrouter_permitted intentionally drops the table. If tests run out of declared order (e.g., with -k, xdist, reruns) or if a previous run left state behind, provision_db will fail or insert on top of stale rows. Drop first to make setup deterministic.
📝 Proposed fix
+    sql_drop = f'DROP TABLE IF EXISTS "{TESTDRIVE_DATA_SCHEMA}".time_series_data;'
+    cratedb.database.run_sql(sql_drop)
     cratedb.database.run_sql(sql_ddl)
     cratedb.database.run_sql(sql_dml)
     cratedb.database.run_sql(sql_refresh)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/query/test_nlsql.py` around lines 29 - 60, The provision_db fixture is
not idempotent because it CREATEs the table unconditionally and then inserts
data; modify provision_db to reset state first by executing a DROP TABLE IF
EXISTS "{TESTDRIVE_DATA_SCHEMA}".time_series_data (or TRUNCATE TABLE if you
prefer) before running the CREATE TABLE DDL and inserts; update the calls around
cratedb.database.run_sql in provision_db so the DROP/TRUNCATE runs first, then
CREATE TABLE, then the INSERT DML and the REFRESH to ensure deterministic,
idempotent setup for tests like test_query_nlsql_openrouter_permitted.

🧹 Nitpick comments (1)

cratedb_toolkit/query/nlsql/util.py (1)
8-21: Redundant utils import under both TYPE_CHECKING and runtime.

from llama_index.core.embeddings import utils appears at line 11 (TYPE_CHECKING block) and again at line 19 (runtime try block). The first is unreachable as a type-only import since the real module reference is needed at runtime for the utils.resolve_embed_model assignments inside disable_embeddings(). You can drop the line 11 import without affecting type checking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cratedb_toolkit/query/nlsql/util.py` around lines 8 - 21, Remove the
redundant type-only import of "from llama_index.core.embeddings import utils"
inside the TYPE_CHECKING block; keep the runtime import in the try/except so
that utils.resolve_embed_model used in disable_embeddings() has a real module at
runtime, and ensure only the TYPE_CHECKING block contains necessary type-only
names (e.g., EmbedType, BaseEmbedding, CallbackManager, LLM) while "utils" is
imported only at runtime where needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doc/query/nlsql/index.md`:
- Around line 33-40: The docs show a CLI flag mismatch: the nlsql CLI exposes
--llm-name but the example uses --llm-model; update the Synopsis example to use
--llm-name (or alternatively, add support for --llm-model in llm_cli in
cratedb_toolkit/query/nlsql/cli.py) so the flag name matches the implemented
option; specifically, change the shell snippet's --llm-model to --llm-name (or
add an alias/option named --llm-model to llm_cli) and ensure the example text
matches the llm_cli flag name.

---

Duplicate comments:
In `@tests/query/test_nlsql.py`:
- Around line 29-60: The provision_db fixture is not idempotent because it
CREATEs the table unconditionally and then inserts data; modify provision_db to
reset state first by executing a DROP TABLE IF EXISTS
"{TESTDRIVE_DATA_SCHEMA}".time_series_data (or TRUNCATE TABLE if you prefer)
before running the CREATE TABLE DDL and inserts; update the calls around
cratedb.database.run_sql in provision_db so the DROP/TRUNCATE runs first, then
CREATE TABLE, then the INSERT DML and the REFRESH to ensure deterministic,
idempotent setup for tests like test_query_nlsql_openrouter_permitted.

---

Nitpick comments:
In `@cratedb_toolkit/query/nlsql/util.py`:
- Around line 8-21: Remove the redundant type-only import of "from
llama_index.core.embeddings import utils" inside the TYPE_CHECKING block; keep
the runtime import in the try/except so that utils.resolve_embed_model used in
disable_embeddings() has a real module at runtime, and ensure only the
TYPE_CHECKING block contains necessary type-only names (e.g., EmbedType,
BaseEmbedding, CallbackManager, LLM) while "utils" is imported only at runtime
where needed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 70c6926f-375d-4890-af16-c9a5334a94a6

📥 Commits

Reviewing files that changed from the base of the PR and between 1456dc6 and d2a17a6.

📒 Files selected for processing (17)

.github/workflows/main.yml
.github/workflows/nlsql.yml
CHANGES.md
README.md
cratedb_toolkit/query/cli.py
cratedb_toolkit/query/nlsql/__init__.py
cratedb_toolkit/query/nlsql/api.py
cratedb_toolkit/query/nlsql/cli.py
cratedb_toolkit/query/nlsql/model.py
cratedb_toolkit/query/nlsql/sqlgate.py
cratedb_toolkit/query/nlsql/util.py
doc/query/index.md
doc/query/nlsql/backlog.md
doc/query/nlsql/index.md
pyproject.toml
tests/query/test_convert.py
tests/query/test_nlsql.py

💤 Files with no reviewable changes (1)

.github/workflows/main.yml

✅ Files skipped from review due to trivial changes (4)

CHANGES.md
README.md
doc/query/index.md
.github/workflows/nlsql.yml

🚧 Files skipped from review as they are similar to previous changes (3)

tests/query/test_convert.py
cratedb_toolkit/query/nlsql/cli.py
pyproject.toml

amotl · 2026-04-21T14:04:46Z

+    elif info.provider is ModelProvider.ANTHROPIC:
+        from llama_index.llms.anthropic import Anthropic
+        from llama_index.llms.anthropic.utils import CLAUDE_MODELS
+
+        # TODO: Add new model types to upstream `llama-index-llms-anthropic`.
+        CLAUDE_MODELS.update({"claude-opus-4-7": 200000, "claude-sonnet-4-6": 1000000, "claude-haiku-4-5": 200000})
+
+        llm = Anthropic(
+            model=completion_model,
+            temperature=0.0,
+            base_url=info.endpoint,
+            api_key=info.api_key,
+        )


Hi. We discovered a few models to be missing from CLAUDE_MODELS in llama-index-llms-anthropic, so we needed to add them per monkey patch here. We also found this patch by @fitoria that applies similar maintenance work in another area, by updating the list of Bedrock Converse models.

Adds Google Gemma models and Bump version to 0.14.6 run-llama/llama_index#21380

As we don't feel comfortable enough to submit a patch because we don't know too much about intrinsic details of models and their parameters, shall we carry this over into a request on the LlamaIndex issue tracker, or do you know anyone else we could notify? 🙏

/cc @logan-markewich, @AstraBert

I could probably create a PR to have this fixed on the llama index side :)
Are the missing models only the ones reported here?

Hi. Thanks for your quick reply.

Are the missing models only the ones reported here?

I am not sure, because I didn't do a narrower research. I think I just picked those up from their current models overview page, but I can't assess any backlog beyond that, apologies. 🙏

NB: I also don't know if the context sizes I've used are actually correct, so please validate them instead of picking them 1:1. 🙇

Yeah sure, I'll take a look at the models we do not support and add them.

Just as a note, we are on a bit of a maintenance freeze on the llamaindex framework, so problems like this one might occur more and more in the future

I'll take a look at the models we do not support and add them.

Thank you!

Just as a note, we are on a bit of a maintenance freeze on the llamaindex framework [...]

Oh, thanks for letting us know. What happened? Probably it's not the right spot to discuss here, but can you possibly refer to any upstream discussion about future maintenance or other plans and decisions what will be ahead?

Unfortunately, we also can't provide many capacities in maintenance, but maybe a quick contribution here and there can make a positive difference to support you.

NB: We also thought about contributing a llama-index-vector-stores-cratedb adapter, but if maintenance will freeze, we should probably not invest too much here?

Yeah, for now we won't be accepting any new integration with the LlamaIndex framework.
There are no specific comms around this (yet), for now you can refer to this blog post from our CEO.
We're hoping, in the near future, to come up with some more mature form of the LlamaIndex framework, btw, so stay tuned!

Thank you for sharing this blog post.

Yeah, for now we won't be accepting any new integration with the LlamaIndex framework.

I see, thanks. In this case, it probably doesn't make sense to ramp up a PR to contribute a vector store adapter for CrateDB.

Our North Star is automating knowledge work over documents, and we're excited about what's ahead.

We've already picked up a future focus around document processing and OCR topics with LlamaParse and beyond. In this spirit, I expect the NLSQL topic to be gradually fading away from your main focus. Good luck on your journey, we will certainly find other integration capabilities with CrateDB and your technologies beyond NLSQL. 🍀

We're hoping, in the near future, to come up with some more mature form of the LlamaIndex framework, btw, so stay tuned!

NB: Are you planning some sort of possibility for the community to take over maintaining the existing LlamaIndex code base until a more mature form will be published?

There are no specific comms around this (yet) [...]

I understand you don't have answers yourself, so I will not push too much on this topic as we certainly respect any future directions of your technologies, and we are similarly excited about your next focus area. We are just asking ourselves how we would gradually bring such minor improvements to mainline, or into any other package we can depend on properly, or if we are stuck with the status quo and need to work around certain obstacles by using monkey patches. ✨

fix(embeddings): do not clutter stdout when turning off embeddings run-llama/llama_index#21417

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

tests/query/test_nlsql.py (2)
29-60: ⚠️ Potential issue | 🟠 Major

provision_db will fail on the 2nd-and-later test if the cratedb fixture doesn't reset state between tests.

The DDL uses plain CREATE TABLE (no IF NOT EXISTS) and there's no teardown that drops the table. If cratedb is session-scoped (which it usually is for the cratedb-toolkit testcontainers-backed fixture), the second NLSQL test invocation will hit Table 'testdrive.time_series_data' already exists.

Even with IF NOT EXISTS, the unconditional INSERT would also accumulate duplicate rows across tests, skewing the AVG asserts (e.g., the “17.03” checks at lines 120, 149).

Recommend either dropping the table at the top of provision_db (matches the previously raised determinism concern) or making the fixture both create and drop in setup/teardown:
📝 Proposed fix
 `@pytest.fixture`
 def provision_db(cratedb):
+    cratedb.database.run_sql(
+        f'DROP TABLE IF EXISTS "{TESTDRIVE_DATA_SCHEMA}".time_series_data'
+    )
     sql_ddl = f"""
 CREATE TABLE "{TESTDRIVE_DATA_SCHEMA}".time_series_data (
     timestamp TIMESTAMP,
#!/bin/bash
# Determine the scope and reset behavior of the `cratedb` fixture used here.
fd conftest.py | xargs rg -nP -B2 -A8 '^def cratedb\b|@pytest\.fixture.*\n[^\n]*\bcratedb\b' 2>/dev/null
87-93: ⚠️ Potential issue | 🟡 Minor

Live-LLM exact-string assertions are still brittle.

This and the analogous assertions at lines 120-121, 149-150, 238-239 require the model to produce exact answer strings and exact SQL formatting. Live providers (OpenAI, Anthropic, MythoMax via OpenRouter) routinely vary wording, alias casing, semicolons, whitespace, and numeric rounding while still returning a correct result, which will produce flaky CI runs whenever these tests do execute (the API-key skip currently masks this in most pipelines).

Consider switching to semantic checks (numeric tolerance via pytest.approx, lower-cased substring/regex match for the SQL) — this was raised before and the asserts on lines 120 and 149 already use in which is fine; lines 87, 121, 150, 239 are still strict equality.

🧹 Nitpick comments (4)

doc/query/nlsql/index.md (1)

162-162: Misleading LLM_INSTANCE example value for Azure.

info.instance is forwarded to AzureOpenAI(engine=...) in cratedb_toolkit/query/nlsql/util.py (line 240), i.e. the chat deployment name — not an embedding model. Naming the example my_embedding-model (here and in the Python API tab on line 313) is likely to confuse new users. Consider something like my-gpt4-deployment.

cratedb_toolkit/query/nlsql/util.py (1)

14-25: Module-level guard relies on try/except ImportError, but BaseEmbedding/CallbackManager/EmbedType/LLM are imported only under TYPE_CHECKING.

Looks correctly handled — quoted annotations on lines 32-34 and 180 are evaluated lazily so import works without the nlsql extra. Just noting that a future refactor that turns any of those quoted annotations into a runtime use (e.g. isinstance checks, default factories) would silently re-introduce the import failure on minimal installs. Worth a unit test that imports this module without cratedb-toolkit[nlsql].

cratedb_toolkit/query/nlsql/sqlgate.py (1)

29-42: Process-wide monkey-patch leaks protection to unrelated SQLDatabase users.

enable_sql_gateway() replaces the unbound SQLDatabase.run_sql for the entire process during the with block. While _protection_lock (RLock) serializes contexts within the same thread, other threads running unrelated LlamaIndex SQLDatabase workloads concurrently with an NLSQL ask() will execute through the protected wrapper too, getting their non-DQL statements rejected without ever opting in. The existing FIXME notes the implementation may be too naive — please at least document this caveat at module level (or, longer term, gate at the per-instance level by subclassing or wrapping the database).
cratedb_toolkit/query/nlsql/model.py (1)
68-70: Optional: raise a clear error when engine is unset.

cast() is a type-checker hint only — if a caller invokes get_engine() before setup() (or without supplying an engine/dburi), this returns None typed as Engine, deferring failure to a less obvious AttributeError later. A small explicit guard makes the contract clearer.
♻️ Proposed refactor
     def get_engine(self) -> sa.engine.Engine:
         """Return SQLAlchemy engine object."""
+        if self.engine is None:
+            raise RuntimeError("Engine is not configured. Call setup() first.")
         return cast(sa.engine.Engine, self.engine)

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef19b6f3-df28-43b5-85f3-af5f64879a31

📥 Commits

Reviewing files that changed from the base of the PR and between d2a17a6 and 7c345d9.

📒 Files selected for processing (21)

.github/workflows/main.yml
.github/workflows/nlsql.yml
CHANGES.md
README.md
cratedb_toolkit/query/cli.py
cratedb_toolkit/query/nlsql/__init__.py
cratedb_toolkit/query/nlsql/api.py
cratedb_toolkit/query/nlsql/cli.py
cratedb_toolkit/query/nlsql/model.py
cratedb_toolkit/query/nlsql/sqlgate.py
cratedb_toolkit/query/nlsql/util.py
doc/query/index.md
doc/query/nlsql/backlog.md
doc/query/nlsql/example-employee.md
doc/query/nlsql/example-product.md
doc/query/nlsql/example-sensor.md
doc/query/nlsql/example-weather.md
doc/query/nlsql/index.md
pyproject.toml
tests/query/test_convert.py
tests/query/test_nlsql.py

💤 Files with no reviewable changes (1)

.github/workflows/main.yml

✅ Files skipped from review due to trivial changes (8)

doc/query/index.md
CHANGES.md
doc/query/nlsql/example-employee.md
doc/query/nlsql/example-weather.md
README.md
.github/workflows/nlsql.yml
cratedb_toolkit/query/nlsql/api.py
pyproject.toml

🚧 Files skipped from review as they are similar to previous changes (2)

tests/query/test_convert.py
cratedb_toolkit/query/nlsql/cli.py