Text-to-SQL: Improve documentation

amotl · amotl · commit 76a386d4e174 · 2026-04-25T14:47:41.000+02:00
diff --git a/doc/query/nlsql/backlog.md b/doc/query/nlsql/backlog.md
@@ -6,24 +6,62 @@ orphan: true
 
 ## Iteration +1
 
+- More examples
+  - https://huggingface.co/PipableAI/pip-sql-1.3b
+  - https://motherduck.com/blog/duckdb-text2sql-llm/
+  - https://huggingface.co/Ellbendls/Qwen-2.5-3b-Text_to_SQL-GGUF
+  - https://github.com/distil-labs/distil-text2sql#usage-examples
+  - https://app.readytensor.ai/publications/generating-sql-from-natural-language-using-llama-32-jOImvIBGCfwt
+
+## Iteration +2
+
 - Document `--include-tables`.
-- Use as agentic tool?
+- Use as agentic tool? SKILLS.md? AGENTS.md?
 - Exercise example that draws a table from database results.
 - Exercise example that draws a graph from database results.
 - Exercise example that uses time ranges.
 - Exercise example that needs SQL JOINs.
 - Exercise example that uses vector database features.
 - Is the machinery using pgvector-specific prompt instructions
   that should be adjusted for CrateDB?
+- Demonstrate Gemma3 on Bedrock
+  - https://aws.amazon.com/bedrock/pricing/
+  - https://github.com/run-llama/llama_index/pull/21380
 
-## Iteration +2
+## Iteration +3
 
 - Add providers: anyscale,openllm,vllm
-- Validate providers: Azure, Google, Hugging Face, Mistral, RunGPT
+- Validate providers: Azure, Google, Hugging Face, Mistral
 - Tests: When using the vanilla schema `testdrive-data` with `from tests.conftest import TESTDRIVE_DATA_SCHEMA`,
   the LLM gets confused, and thinks the table is called `sensor_data`. The error message is:
   » The error indicates that the specified table, "sensor_data," is not recognized in the "testdrive-data" schema.
 - How to prevent queries like `Who is Shakespeare?`?
+- Maintain chat memory/context.
+  https://github.com/run-llama/llama_index/discussions/11424
+- https://unsloth.ai/docs/models/qwen3.5
+  ```shell
+  ollama run hf.co/unsloth/Qwen3.5-0.8B-GGUF:UD-Q4_K_XL
+  ```
+
+### Fine tuning
+- Text2SQirreL 🐿️ : Query your data in plain English
+  https://github.com/distil-labs/distil-text2sql
+- https://yia333.medium.com/enhancing-text-to-sql-with-a-fine-tuned-7b-llm-for-database-interactions-fa754dc2e992
+- https://www.promptlayer.com/models/pip-sql-13b-gguf/
+  https://huggingface.co/PipableAI/pip-sql-1.3b
+- https://huggingface.co/QuantFactory/Meta-Llama-3.1-8B-Text-to-SQL-GGUF
+- https://motherduck.com/blog/duckdb-text2sql-llm/
+  https://github.com/NumbersStationAI/DuckDB-NSQL
+- https://huggingface.co/srujanamadiraju/nl-sql-gemma2b
+- https://github.com/raghujhts13/text-to-sql
+  https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF
+- https://app.readytensor.ai/publications/generating-sql-from-natural-language-using-llama-32-jOImvIBGCfwt
+  https://huggingface.co/sai-santhosh/text-2-sql-gguf
+- https://huggingface.co/Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF
+- https://huggingface.co/Ellbendls/Qwen-2.5-3b-Text_to_SQL-GGUF/blob/main/Qwen-2.5-3b-Text_to_SQL.gguf
+- https://www.jan.ai/docs/desktop/jan-models/lucy
+- More runtimes
+  https://docs.docker.com/ai/model-runner/
 
 ## Notes
 
@@ -37,3 +75,26 @@ We've unlocked a few popular ones, but there are certainly many more.
 - Router: cloudflare-ai-gateway,featherlessai,modelscope,nano-gpt,neutrino,ovhcloud
 - More I: Dolly, Pythia, Nano-GPT (litellm), DuckDB-NSQL, nsql-llama-2-7B, pip-sql-1.3b-GGUF, SQLCoder-7B, Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF
 - More II: kwaipilot/kat-coder-pro-v2, undi95/remm-slerp-l2-13b
+
+## llamafile
+
+```shell
+export LLM_PROVIDER="llamafile"
+export LLM_ENDPOINT="http://localhost:8080/"
+export LLM_NAME="n/a"
+export LLM_ENDPOINT="http://localhost:8080/"
+```
+```shell
+wget https://huggingface.co/mozilla-ai/Llama-3.2-1B-Instruct-llamafile/resolve/main/Llama-3.2-1B-Instruct-Q6_K.llamafile
+wget https://huggingface.co/mozilla-ai/llamafile_0.10.0/resolve/main/Qwen3.5-0.8B-Q8_0.llamafile
+./Llama-3.2-1B-Instruct-Q6_K.llamafile
+./Qwen3.5-0.8B-Q8_0.llamafile
+```
+```shell
+wget "https://github.com/mozilla-ai/llamafile/releases/download/0.10.0/llamafile-0.10.0"
+wget "https://huggingface.co/Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF/resolve/main/Qwen-3-4b-Text_to_SQL-q2_k.gguf?download=true"
+```
+
+## Security
+
+- https://github.com/rodrigo-pedro/P2SQL
diff --git a/doc/query/nlsql/example-sensor.md b/doc/query/nlsql/example-sensor.md
@@ -0,0 +1,83 @@
+(nlsql-example-sensor)=
+
+# NLSQL sensor data example
+
+:::{rubric} Provision
+:::
+
+Add data to the database. Let's use a very basic table schema and
+just a few records worth of time series data.
+
+```sql
+CREATE TABLE IF NOT EXISTS time_series_data (
+    timestamp TIMESTAMP,
+    value DOUBLE,
+    location STRING,
+    sensor_id INT
+);
+
+INSERT INTO time_series_data (timestamp, value, location, sensor_id)
+VALUES
+    ('2023-09-14T00:00:00', 10.5, 'Sensor A', 1),
+    ('2023-09-14T01:00:00', 15.2, 'Sensor A', 1),
+    ('2023-09-14T02:00:00', 18.9, 'Sensor A', 1),
+    ('2023-09-14T03:00:00', 12.7, 'Sensor B', 2),
+    ('2023-09-14T04:00:00', 17.3, 'Sensor B', 2),
+    ('2023-09-14T05:00:00', 20.1, 'Sensor B', 2),
+    ('2023-09-14T06:00:00', 22.5, 'Sensor A', 1),
+    ('2023-09-14T07:00:00', 18.3, 'Sensor A', 1),
+    ('2023-09-14T08:00:00', 16.8, 'Sensor A', 1),
+    ('2023-09-14T09:00:00', 14.6, 'Sensor B', 2),
+    ('2023-09-14T10:00:00', 13.2, 'Sensor B', 2),
+    ('2023-09-14T11:00:00', 11.7, 'Sensor B', 2);
+
+REFRESH TABLE time_series_data;
+```
+
+:::{rubric} Query
+:::
+
+Submit a typical query in human language.
+
+```shell
+ctk query nlsql "What is the average value for sensor 1?"
+```
+
+:::{rubric} Response
+:::
+
+The model figures out the SQL statement, the engine runs it, and
+uses the model again to come back with an answer in human language.
+
+```sql
+SQL:    SELECT AVG(value) FROM time_series_data WHERE sensor_id = 1;
+```
+
+```text
+Answer: The average value for sensor 1 is approximately 17.03.
+```
+
+:::{rubric} Multiple languages
+:::
+
+The NLSQL conversation works well in multiple languages.
+
+> Q: ¿Cuál es el valor medio del sensor 1?
+>
+> A: El valor medio del sensor 1 es 17.0333.
+
+> Q: Quelle est la valeur moyenne du capteur 1 ?
+>
+> A: La valeur moyenne du capteur 1 est de 17,0333.
+
+> Q: What is the average value for sensor 1?
+>
+> A: The average value for sensor 1 is approximately 17.03.
+
+> Q: Wie lautet der Durchschnittswert für Sensor 1?
+>
+> A: Der Durchschnittswert für Sensor 1 beträgt 17,0333.
+
+> Q: Qual è il valore medio del sensore 1?
+>
+> A: Il valore medio del sensore 1 è pari a 17,0333.
diff --git a/doc/query/nlsql/index.md b/doc/query/nlsql/index.md