Skip to content

Commit 76a386d

Browse files
committed
Text-to-SQL: Improve documentation
1 parent d9fafc2 commit 76a386d

3 files changed

Lines changed: 367 additions & 141 deletions

File tree

doc/query/nlsql/backlog.md

Lines changed: 64 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,62 @@ orphan: true
66

77
## Iteration +1
88

9+
- More examples
10+
- https://huggingface.co/PipableAI/pip-sql-1.3b
11+
- https://motherduck.com/blog/duckdb-text2sql-llm/
12+
- https://huggingface.co/Ellbendls/Qwen-2.5-3b-Text_to_SQL-GGUF
13+
- https://github.com/distil-labs/distil-text2sql#usage-examples
14+
- https://app.readytensor.ai/publications/generating-sql-from-natural-language-using-llama-32-jOImvIBGCfwt
15+
16+
## Iteration +2
17+
918
- Document `--include-tables`.
10-
- Use as agentic tool?
19+
- Use as agentic tool? SKILLS.md? AGENTS.md?
1120
- Exercise example that draws a table from database results.
1221
- Exercise example that draws a graph from database results.
1322
- Exercise example that uses time ranges.
1423
- Exercise example that needs SQL JOINs.
1524
- Exercise example that uses vector database features.
1625
- Is the machinery using pgvector-specific prompt instructions
1726
that should be adjusted for CrateDB?
27+
- Demonstrate Gemma3 on Bedrock
28+
- https://aws.amazon.com/bedrock/pricing/
29+
- https://github.com/run-llama/llama_index/pull/21380
1830

19-
## Iteration +2
31+
## Iteration +3
2032

2133
- Add providers: anyscale,openllm,vllm
22-
- Validate providers: Azure, Google, Hugging Face, Mistral, RunGPT
34+
- Validate providers: Azure, Google, Hugging Face, Mistral
2335
- Tests: When using the vanilla schema `testdrive-data` with `from tests.conftest import TESTDRIVE_DATA_SCHEMA`,
2436
the LLM gets confused, and thinks the table is called `sensor_data`. The error message is:
2537
» The error indicates that the specified table, "sensor_data," is not recognized in the "testdrive-data" schema.
2638
- How to prevent queries like `Who is Shakespeare?`?
39+
- Maintain chat memory/context.
40+
https://github.com/run-llama/llama_index/discussions/11424
41+
- https://unsloth.ai/docs/models/qwen3.5
42+
```shell
43+
ollama run hf.co/unsloth/Qwen3.5-0.8B-GGUF:UD-Q4_K_XL
44+
```
45+
46+
### Fine tuning
47+
- Text2SQirreL 🐿️ : Query your data in plain English
48+
https://github.com/distil-labs/distil-text2sql
49+
- https://yia333.medium.com/enhancing-text-to-sql-with-a-fine-tuned-7b-llm-for-database-interactions-fa754dc2e992
50+
- https://www.promptlayer.com/models/pip-sql-13b-gguf/
51+
https://huggingface.co/PipableAI/pip-sql-1.3b
52+
- https://huggingface.co/QuantFactory/Meta-Llama-3.1-8B-Text-to-SQL-GGUF
53+
- https://motherduck.com/blog/duckdb-text2sql-llm/
54+
https://github.com/NumbersStationAI/DuckDB-NSQL
55+
- https://huggingface.co/srujanamadiraju/nl-sql-gemma2b
56+
- https://github.com/raghujhts13/text-to-sql
57+
https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF
58+
- https://app.readytensor.ai/publications/generating-sql-from-natural-language-using-llama-32-jOImvIBGCfwt
59+
https://huggingface.co/sai-santhosh/text-2-sql-gguf
60+
- https://huggingface.co/Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF
61+
- https://huggingface.co/Ellbendls/Qwen-2.5-3b-Text_to_SQL-GGUF/blob/main/Qwen-2.5-3b-Text_to_SQL.gguf
62+
- https://www.jan.ai/docs/desktop/jan-models/lucy
63+
- More runtimes
64+
https://docs.docker.com/ai/model-runner/
2765

2866
## Notes
2967

@@ -37,3 +75,26 @@ We've unlocked a few popular ones, but there are certainly many more.
3775
- Router: cloudflare-ai-gateway,featherlessai,modelscope,nano-gpt,neutrino,ovhcloud
3876
- More I: Dolly, Pythia, Nano-GPT (litellm), DuckDB-NSQL, nsql-llama-2-7B, pip-sql-1.3b-GGUF, SQLCoder-7B, Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF
3977
- More II: kwaipilot/kat-coder-pro-v2, undi95/remm-slerp-l2-13b
78+
79+
## llamafile
80+
81+
```shell
82+
export LLM_PROVIDER="llamafile"
83+
export LLM_ENDPOINT="http://localhost:8080/"
84+
export LLM_NAME="n/a"
85+
export LLM_ENDPOINT="http://localhost:8080/"
86+
```
87+
```shell
88+
wget https://huggingface.co/mozilla-ai/Llama-3.2-1B-Instruct-llamafile/resolve/main/Llama-3.2-1B-Instruct-Q6_K.llamafile
89+
wget https://huggingface.co/mozilla-ai/llamafile_0.10.0/resolve/main/Qwen3.5-0.8B-Q8_0.llamafile
90+
./Llama-3.2-1B-Instruct-Q6_K.llamafile
91+
./Qwen3.5-0.8B-Q8_0.llamafile
92+
```
93+
```shell
94+
wget "https://github.com/mozilla-ai/llamafile/releases/download/0.10.0/llamafile-0.10.0"
95+
wget "https://huggingface.co/Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF/resolve/main/Qwen-3-4b-Text_to_SQL-q2_k.gguf?download=true"
96+
```
97+
98+
## Security
99+
100+
- https://github.com/rodrigo-pedro/P2SQL

doc/query/nlsql/example-sensor.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
(nlsql-example-sensor)=
2+
3+
# NLSQL sensor data example
4+
5+
:::{rubric} Provision
6+
:::
7+
8+
Add data to the database. Let's use a very basic table schema and
9+
just a few records worth of time series data.
10+
11+
```sql
12+
CREATE TABLE IF NOT EXISTS time_series_data (
13+
timestamp TIMESTAMP,
14+
value DOUBLE,
15+
location STRING,
16+
sensor_id INT
17+
);
18+
19+
INSERT INTO time_series_data (timestamp, value, location, sensor_id)
20+
VALUES
21+
('2023-09-14T00:00:00', 10.5, 'Sensor A', 1),
22+
('2023-09-14T01:00:00', 15.2, 'Sensor A', 1),
23+
('2023-09-14T02:00:00', 18.9, 'Sensor A', 1),
24+
('2023-09-14T03:00:00', 12.7, 'Sensor B', 2),
25+
('2023-09-14T04:00:00', 17.3, 'Sensor B', 2),
26+
('2023-09-14T05:00:00', 20.1, 'Sensor B', 2),
27+
('2023-09-14T06:00:00', 22.5, 'Sensor A', 1),
28+
('2023-09-14T07:00:00', 18.3, 'Sensor A', 1),
29+
('2023-09-14T08:00:00', 16.8, 'Sensor A', 1),
30+
('2023-09-14T09:00:00', 14.6, 'Sensor B', 2),
31+
('2023-09-14T10:00:00', 13.2, 'Sensor B', 2),
32+
('2023-09-14T11:00:00', 11.7, 'Sensor B', 2);
33+
34+
REFRESH TABLE time_series_data;
35+
```
36+
37+
:::{rubric} Query
38+
:::
39+
40+
Submit a typical query in human language.
41+
42+
```shell
43+
ctk query nlsql "What is the average value for sensor 1?"
44+
```
45+
46+
:::{rubric} Response
47+
:::
48+
49+
The model figures out the SQL statement, the engine runs it, and
50+
uses the model again to come back with an answer in human language.
51+
52+
```sql
53+
SQL: SELECT AVG(value) FROM time_series_data WHERE sensor_id = 1;
54+
```
55+
56+
```text
57+
Answer: The average value for sensor 1 is approximately 17.03.
58+
```
59+
60+
:::{rubric} Multiple languages
61+
:::
62+
63+
The NLSQL conversation works well in multiple languages.
64+
65+
> Q: ¿Cuál es el valor medio del sensor 1?
66+
>
67+
> A: El valor medio del sensor 1 es 17.0333.
68+
69+
> Q: Quelle est la valeur moyenne du capteur 1 ?
70+
>
71+
> A: La valeur moyenne du capteur 1 est de 17,0333.
72+
73+
> Q: What is the average value for sensor 1?
74+
>
75+
> A: The average value for sensor 1 is approximately 17.03.
76+
77+
> Q: Wie lautet der Durchschnittswert für Sensor 1?
78+
>
79+
> A: Der Durchschnittswert für Sensor 1 beträgt 17,0333.
80+
81+
> Q: Qual è il valore medio del sensore 1?
82+
>
83+
> A: Il valore medio del sensore 1 è pari a 17,0333.

0 commit comments

Comments
 (0)