Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
f648a33
Text-to-SQL: Help agents turn natural language into SQL queries
amotl Apr 15, 2026
a31852c
Text-to-SQL: Remove matters about text embeddings, NLSQL doesn't need it
amotl Apr 18, 2026
53c242e
Text-to-SQL: Make LLM instance name configurable, for Azure
amotl Apr 18, 2026
31981cd
Text-to-SQL: CLI improvements
amotl Apr 18, 2026
e7e5ab6
Text-to-SQL: Software tests
amotl Apr 18, 2026
831bb83
Text-to-SQL: Naming things
amotl Apr 18, 2026
af3a342
Text-to-SQL: Implement suggestions by CodeRabbit
amotl Apr 18, 2026
05c9561
Text-to-SQL: Add Anthropic provider
amotl Apr 19, 2026
99344cd
Text-to-SQL: Copy editing. Suggestions by CodeRabbit.
amotl Apr 19, 2026
b2f881e
Text-to-SQL: Add Mistral provider
amotl Apr 19, 2026
aa5d5e3
Text-to-SQL: Copy editing. Suggestions by CodeRabbit.
amotl Apr 19, 2026
4f3b412
Text-to-SQL: Add Hugging Face API provider
amotl Apr 19, 2026
f668641
Text-to-SQL: Improve disabling embeddings per context manager
amotl Apr 19, 2026
57d5896
Text-to-SQL: Add Google API provider
amotl Apr 19, 2026
7561394
Text-to-SQL: Improve logging and documentation
amotl Apr 20, 2026
5c3f196
Text-to-SQL: Add llamafile provider
amotl Apr 20, 2026
7999a6e
Text-to-SQL: Add Amazon Bedrock (Converse) provider
amotl Apr 20, 2026
9463166
Text-to-SQL: Add rungpt provider (experimental)
amotl Apr 20, 2026
414dc2d
Text-to-SQL: Refactoring
amotl Apr 20, 2026
0124ca8
Text-to-SQL: Add Runpod serverless provider
amotl Apr 20, 2026
92998d3
Text-to-SQL: Add OpenRouter provider
amotl Apr 20, 2026
5dc27bb
Text-to-SQL: Implement suggestions by CodeRabbit
amotl Apr 20, 2026
a53fcfd
Text-to-SQL: Migrate to `llama-index-llms-google-genai` package
amotl Apr 20, 2026
f7a6149
Text-to-SQL: Pin models for integration tests. Use `gpt-4o-mini`.
amotl Apr 20, 2026
d2c8489
Text-to-SQL: Software tests with OpenRouter
amotl Apr 20, 2026
ee8e19b
Text-to-SQL: Separate software tests into different CI workflow
amotl Apr 20, 2026
85f3a53
Text-to-SQL: Only permit SELECT statements by default (sqlgate)
amotl Apr 20, 2026
aaa4cd2
Text-to-SQL: Implement suggestions by CodeRabbit
amotl Apr 21, 2026
e56a719
Text-to-SQL: Copy editing. This and that.
amotl Apr 21, 2026
b8c0325
Text-to-SQL: Improve documentation
amotl Apr 21, 2026
fc0d32d
Text-to-SQL: This and that
amotl Apr 21, 2026
dcf7b18
Text-to-SQL: Implement suggestions by CodeRabbit
amotl Apr 21, 2026
76f67da
Text-to-SQL: Remove RunGPT
amotl Apr 25, 2026
4aea728
Text-to-SQL: Rename Hugging Face provider identifier
amotl Apr 25, 2026
71467ef
Text-to-SQL: Improve documentation
amotl Apr 25, 2026
01c2661
Text-to-SQL: Add more examples
amotl Apr 25, 2026
8c7a9f7
Text-to-SQL: Implement suggestions by CodeRabbit
amotl Apr 25, 2026
78f9697
Text-to-SQL: Implement suggestions by CodeRabbit
amotl Apr 25, 2026
a064cef
Text-to-SQL: Record Amazon Nova Lite problem in backlog
amotl Apr 25, 2026
01c179a
Text-to-SQL: Fix YAML front matter
amotl Apr 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ jobs:
PYTHON: ${{ matrix.python-version }}
# Do not tear down Testcontainers
TC_KEEPALIVE: true

# https://docs.github.com/en/actions/using-containerized-services/about-service-containers
services:
cratedb:
Expand Down
83 changes: 83 additions & 0 deletions .github/workflows/nlsql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
name: "Tests: NLSQL"

on:
pull_request:
paths:
- '.github/workflows/nlsql.yml'
- 'cratedb_toolkit/query/nlsql/**'
- 'tests/query/*nlsql*'
- 'pyproject.toml'
push:
branches: [ main ]
paths:
- '.github/workflows/nlsql.yml'
- 'cratedb_toolkit/query/nlsql/**'
- 'tests/query/*nlsql*'
- 'pyproject.toml'

# Allow job to be triggered manually.
workflow_dispatch:

# Run the job each night after CrateDB nightly has been published.
schedule:
- cron: '0 3 * * *'

# Cancel in-progress jobs when pushing to the same branch.
concurrency:
cancel-in-progress: true
group: ${{ github.workflow }}-${{ github.ref }}

jobs:

tests:

runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: ["ubuntu-latest"]
python-version: [
"3.10",
"3.14",
]

env:
OS: ${{ matrix.os }}
PYTHON: ${{ matrix.python-version }}
# Do not tear down Testcontainers
TC_KEEPALIVE: true

name: Python ${{ matrix.python-version }} on OS ${{ matrix.os }}
steps:

- name: Acquire sources
uses: actions/checkout@v6

- name: Install uv
uses: astral-sh/setup-uv@v7
with:
activate-environment: 'true'
cache-suffix: ${{ matrix.python-version }}
enable-cache: true
python-version: ${{ matrix.python-version }}

- name: Set up project
run: |
# Install package in editable mode.
uv pip install --editable='.[nlsql,test]'

- name: Run software tests
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
run: |
pytest -m nlsql

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v6
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
with:
fail_ci_if_error: true
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- Kinesis: Added `ctk kinesis` CLI group with `list-checkpoints` and
`prune-checkpoints` commands for checkpoint table maintenance
- Dependencies: Permitted installation of click 8.3
- DataQuery: Help agents turn natural language into SQL queries

## 2026/03/16 v0.0.46
- I/O: API improvements: `ctk {load,save} table` became `ctk {load,save}`
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@

[![ci-main][ci-main-badge]][ci-main-workflow]
[![ci-cloud][ci-cloud-badge]][ci-cloud-workflow]
[![ci-nlsql][ci-nlsql-badge]][ci-nlsql-workflow]

[![ci-dynamodb][ci-dynamodb-badge]][ci-dynamodb-workflow]
[![ci-influxdb][ci-influxdb-badge]][ci-influxdb-workflow]

[![ci-kinesis][ci-kinesis-badge]][ci-kinesis-workflow]
[![ci-mongodb][ci-mongodb-badge]][ci-mongodb-workflow]
[![ci-postgresql][ci-postgresql-badge]][ci-postgresql-workflow]
Expand Down Expand Up @@ -99,6 +100,8 @@ pip install 'cratedb-toolkit[full]==0.0.38'
[ci-kinesis-workflow]: https://github.com/crate/cratedb-toolkit/actions/workflows/kinesis.yml
[ci-mongodb-badge]: https://github.com/crate/cratedb-toolkit/actions/workflows/mongodb.yml/badge.svg
[ci-mongodb-workflow]: https://github.com/crate/cratedb-toolkit/actions/workflows/mongodb.yml
[ci-nlsql-badge]: https://github.com/crate/cratedb-toolkit/actions/workflows/nlsql.yml/badge.svg
[ci-nlsql-workflow]: https://github.com/crate/cratedb-toolkit/actions/workflows/nlsql.yml
[ci-postgresql-badge]: https://github.com/crate/cratedb-toolkit/actions/workflows/postgresql.yml/badge.svg
[ci-postgresql-workflow]: https://github.com/crate/cratedb-toolkit/actions/workflows/postgresql.yml
[ci-pymongo-badge]: https://github.com/crate/cratedb-toolkit/actions/workflows/pymongo.yml/badge.svg
Expand Down
25 changes: 4 additions & 21 deletions cratedb_toolkit/query/cli.py
Original file line number Diff line number Diff line change
@@ -1,26 +1,9 @@
import logging

import click
from click_aliases import ClickAliasedGroup

from ..util.cli import boot_click
from ..util.app import make_cli
from .convert.cli import convert_query
from .mcp.cli import cli as mcp_cli
from .nlsql.cli import llm_cli

logger = logging.getLogger(__name__)


@click.group(cls=ClickAliasedGroup)
@click.option("--verbose", is_flag=True, required=False, help="Turn on logging")
@click.option("--debug", is_flag=True, required=False, help="Turn on logging with debug level")
@click.version_option()
@click.pass_context
def cli(ctx: click.Context, verbose: bool, debug: bool):
"""
Query utilities.
"""
return boot_click(ctx, verbose, debug)


cli = make_cli()
cli.add_command(convert_query, name="convert")
cli.add_command(llm_cli, name="nlsql")
cli.add_command(mcp_cli, name="mcp")
Empty file.
99 changes: 99 additions & 0 deletions cratedb_toolkit/query/nlsql/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
"""
Use an LLM to query a database in human language using LlamaIndex' NLSQLTableQueryEngine.
"""

import contextlib
import dataclasses
import logging
from typing import Optional

from cratedb_toolkit.query.nlsql.model import DatabaseInfo, ModelInfo

logger = logging.getLogger(__name__)

llama_index_import_error: Optional[ImportError] = None

try:
from llama_index.core.base.response.schema import RESPONSE_TYPE
from llama_index.core.llms import LLM
from llama_index.core.query_engine import NLSQLTableQueryEngine
from llama_index.core.utilities.sql_wrapper import SQLDatabase
except ImportError as exc:
llama_index_import_error = exc


@dataclasses.dataclass
class DataQuery:
"""
DataQuery helps agents turn natural language into SQL queries.
It's the little sister of Google's QueryData product. [1]

We recommend evaluating the Text-to-SQL interface using the Gemma models if you are
looking at non-frontier variants that need less resources for inference. However,
depending on the complexity of your problem, you may also want to use cutting-edge
models with your provider of choice at the cost of higher resource usage.

Attention: Any natural language SQL table query engine and Text-to-SQL application
should be aware that executing arbitrary SQL queries can be a security risk.
It is recommended to take precautions as needed, such as using restricted roles,
read-only databases, sandboxing, etc.

[1] https://cloud.google.com/blog/products/databases/introducing-querydata-for-near-100-percent-accurate-data-agents
[2] https://github.com/kupp0/multi-db-property-search-data-agents
"""

db: DatabaseInfo
model: ModelInfo
query_engine: Optional["NLSQLTableQueryEngine"] = None
permit_all_statements: bool = False

def __post_init__(self):
"""Initialize query engine."""
if self.query_engine is None:
self.setup()

def setup(self):
"""Configure database connection and query engine."""
if llama_index_import_error:
raise ImportError(
"NLSQL support requires installing `cratedb-toolkit[nlsql]`"
) from llama_index_import_error

from cratedb_toolkit.query.nlsql.util import configure_llm, disable_embeddings

# Configure model.
logger.info("Configuring LLM: provider=%s, name=%s", self.model.provider.name, self.model.name)
llm: LLM = configure_llm(self.model)
logger.info("Selected LLM: %s", llm.metadata.model_dump_json())

# Configure database.
self.db.setup()

# schema = quote_relation_name(self.db.schema) if self.db.schema else None # noqa: ERA001

# Configure NLSQL query engine.
logger.info("Creating query engine")
sql_database = SQLDatabase(
self.db.get_engine(),
schema=self.db.schema,
ignore_tables=self.db.ignore_tables,
include_tables=self.db.include_tables,
)
Comment thread
amotl marked this conversation as resolved.
with disable_embeddings():
self.query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
llm=llm,
)

def ask(self, question: str) -> "RESPONSE_TYPE":
"""Invoke an inquiry to the LLM."""
from cratedb_toolkit.query.nlsql.sqlgate import enable_sql_gateway

if not self.query_engine:
raise ValueError("Query engine not configured")
if self.permit_all_statements:
sql_gateway = contextlib.nullcontext
else:
sql_gateway = enable_sql_gateway
with sql_gateway():
return self.query_engine.query(question)
112 changes: 112 additions & 0 deletions cratedb_toolkit/query/nlsql/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
import json
import logging
import os
import sys
from typing import Optional

import click
from dotenv import load_dotenv

from cratedb_toolkit.option import (
option_cluster_id,
option_cluster_name,
option_cluster_url,
option_password,
option_schema,
option_username,
)
from cratedb_toolkit.query.nlsql.api import DataQuery
from cratedb_toolkit.query.nlsql.model import DatabaseInfo
from cratedb_toolkit.util.common import setup_logging
from cratedb_toolkit.util.data import asbool

logger = logging.getLogger(__name__)


def help_llm():
"""
Use an LLM to query the database in human language.

Synopsis
========

export CRATEDB_CLUSTER_URL=crate://localhost/
ctk query nlsql "What is the average value for sensor 1?"

""" # noqa: E501


@click.command()
@click.argument("question")
@option_cluster_id
@option_cluster_name
@option_cluster_url
@option_username
@option_password
@option_schema
@click.option("--llm-provider", type=str, required=False, help="LLM provider name")
@click.option("--llm-endpoint", type=str, required=False, help="LLM endpoint URL")
@click.option(
"--llm-instance", type=str, required=False, help="LLM model resource name, e.g. with Azure OpenAI service"
)
@click.option("--llm-name", type=str, required=False, help="LLM model name for completions")
@click.option("--llm-api-key", type=str, required=False, help="LLM API key")
@click.option("--llm-api-version", type=str, required=False, help="LLM API version")
@click.pass_context
def llm_cli(
ctx: click.Context,
question: str,
cluster_id: str,
cluster_name: str,
cluster_url: str,
username: str,
password: str,
schema: str,
llm_provider: Optional[str],
llm_endpoint: Optional[str],
llm_instance: Optional[str],
llm_name: Optional[str],
llm_api_key: Optional[str],
llm_api_version: Optional[str],
):
"""
Use an LLM to query a database in human language.
"""
from cratedb_toolkit.query.nlsql.util import read_llm_options

setup_logging()
load_dotenv()

# Read question.
if question == "-":
question = sys.stdin.read().strip()

schema = schema or "doc"
permit_all_statements = asbool(os.getenv("NLSQL_PERMIT_ALL_STATEMENTS"))

# Connect to database and configure LLM.
dburi = ctx.meta["address"].cluster_url
Comment thread
amotl marked this conversation as resolved.

# Configure natural language query machinery.
dataquery = DataQuery(
db=DatabaseInfo(
dburi=dburi,
schema=schema,
),
model=read_llm_options(
llm_provider=llm_provider,
llm_name=llm_name,
llm_endpoint=llm_endpoint,
llm_instance=llm_instance,
llm_api_key=llm_api_key,
llm_api_version=llm_api_version,
),
permit_all_statements=permit_all_statements,
)

# Submit query.
response = dataquery.ask(question)
output = {"question": question, "answer": str(response)}
if response.metadata:
output.update(next(iter(response.metadata.values())))
print(json.dumps(output, indent=2, default=str), file=sys.stdout) # noqa: T201
Loading
Loading