Skip to content

Commit 7c345d9

Browse files
committed
Text-to-SQL: Add more examples
1 parent 76a386d commit 7c345d9

5 files changed

Lines changed: 327 additions & 9 deletions

File tree

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
(nlsql-example-employee)=
2+
3+
# NLSQL with employee data
4+
5+
Let's use a single `employees` database table
6+
and populate it with a few records worth of data.
7+
8+
:::{rubric} Provision
9+
:::
10+
11+
Create table and insert data.
12+
13+
```sql
14+
CREATE TABLE employees (id INT, name TEXT, department TEXT, hire_date TIMESTAMP);
15+
16+
INSERT INTO employees (id, name, department, hire_date) VALUES
17+
(1, 'Alice Johnson', 'Engineering', '2022-03-15'),
18+
(2, 'Bob Smith', 'Marketing', '2021-07-01'),
19+
(3, 'Carol Lee', 'Human Resources', '2020-11-23'),
20+
(4, 'David Brown', 'Finance', '2019-05-30'),
21+
(5, 'Eva Green', 'Engineering', '2023-01-10'),
22+
(6, 'Frank Miller', 'Sales', '2019-08-12'),
23+
(7, 'Grace Kim', 'Sales', '2021-02-18'),
24+
(8, 'Henry Davis', 'Sales', '2022-06-25'),
25+
(9, 'Isabella Martinez', 'Sales', '2020-12-05'),
26+
(10, 'Jack Wilson', 'Sales', '2023-09-14');
27+
```
28+
29+
:::{rubric} Query
30+
:::
31+
32+
Submit a typical query in human language.
33+
34+
```shell
35+
ctk query nlsql "List all employees in the 'Sales' department hired after 2022."
36+
```
37+
38+
:::{rubric} Response
39+
:::
40+
41+
The model figures out the SQL statement, the engine runs it, and
42+
uses the model again to come back with an answer in human language:
43+
```text
44+
The employees in the Sales department hired after 2022 are Henry Davis and Jack Wilson.
45+
```
46+
47+
The SQL statement was:
48+
```sql
49+
SELECT
50+
name FROM employees
51+
WHERE
52+
department = 'Sales' AND
53+
hire_date > '2022-01-01';
54+
```

doc/query/nlsql/example-product.md

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
(nlsql-example-product)=
2+
3+
# NLSQL with product orders
4+
5+
Let's use a basic products / orders / customers database.
6+
7+
## Basic JOINs and filtering
8+
9+
:::{rubric} Provision
10+
:::
11+
12+
Create table and insert data.
13+
Populate the table using a few records worth of example data.
14+
15+
```sql
16+
CREATE TABLE customers (customer_id INTEGER, name VARCHAR, city VARCHAR);
17+
CREATE TABLE orders (order_id INTEGER, customer_id INTEGER, amount INTEGER);
18+
CREATE TABLE products (product_id INTEGER, name VARCHAR);
19+
CREATE TABLE order_items (order_id INTEGER, product_id INTEGER);
20+
21+
-- customers
22+
INSERT INTO customers (customer_id, name, city) VALUES
23+
(1, 'Alice', 'Berlin'),
24+
(2, 'Bob', 'Munich'),
25+
(3, 'Charlie', 'Hamburg');
26+
27+
-- products
28+
INSERT INTO products (product_id, name) VALUES
29+
(1, 'Laptop'),
30+
(2, 'Phone'),
31+
(3, 'Headphones');
32+
33+
-- orders
34+
INSERT INTO orders (order_id, customer_id, amount) VALUES
35+
(101, 1, 1200),
36+
(102, 2, 800),
37+
(103, 1, 200),
38+
(104, 3, 150);
39+
40+
-- order_items
41+
-- Alice bought Laptop, Bob bought Phone, Alice bought Headphones,
42+
-- Charlie bought Headphones, Charlie also bought Phone.
43+
INSERT INTO order_items (order_id, product_id) VALUES
44+
(101, 1),
45+
(102, 2),
46+
(103, 3),
47+
(104, 3),
48+
(104, 2);
49+
```
50+
51+
:::{rubric} Query
52+
:::
53+
54+
Submit a typical query in human language.
55+
56+
```shell
57+
ctk query nlsql "List all customers with orders over €500."
58+
```
59+
60+
:::{rubric} Response
61+
:::
62+
63+
The model figures out the SQL statement, the engine runs it, and
64+
uses the model again to come back with an answer in human language:
65+
66+
> The query results show that the customers 'Alice' from Berlin
67+
> and 'Bob' from Munich have placed orders over €500.
68+
69+
The SQL statement was:
70+
```sql
71+
SELECT customers.name, customers.city
72+
FROM customers JOIN orders ON customers.customer_id = orders.customer_id
73+
WHERE orders.amount > 500;
74+
```
75+
76+
## Advanced JOINs and filtering
77+
78+
:::{rubric} Provision
79+
:::
80+
81+
Create table and insert data.
82+
Add a few customers in New York and others elsewhere.
83+
Synthesize orders with amounts both above and below the average.
84+
85+
```sql
86+
CREATE TABLE customers (customer_id INTEGER, name VARCHAR, city VARCHAR);
87+
CREATE TABLE orders (order_id INTEGER, customer_id INTEGER, amount INTEGER);
88+
CREATE TABLE products (product_id INTEGER, name VARCHAR);
89+
CREATE TABLE order_items (order_id INTEGER, product_id INTEGER);
90+
91+
INSERT INTO customers (customer_id, name, city) VALUES
92+
(1, 'Alice Johnson', 'New York'),
93+
(2, 'Bob Smith', 'Los Angeles'),
94+
(3, 'Carol Lee', 'New York'),
95+
(4, 'David Brown', 'Chicago');
96+
97+
INSERT INTO orders (order_id, customer_id, amount) VALUES
98+
(101, 1, 500), -- NY, high
99+
(102, 1, 150), -- NY, low
100+
(103, 2, 300), -- non-NY
101+
(104, 3, 700), -- NY, high
102+
(105, 4, 200); -- non-NY
103+
104+
INSERT INTO products (product_id, name) VALUES
105+
(1001, 'Laptop'),
106+
(1002, 'Phone'),
107+
(1003, 'Tablet'),
108+
(1004, 'Headphones');
109+
110+
INSERT INTO order_items (order_id, product_id) VALUES
111+
(101, 1001),
112+
(101, 1004),
113+
(102, 1002),
114+
(103, 1003),
115+
(104, 1001),
116+
(104, 1002),
117+
(105, 1004);
118+
```
119+
120+
:::{rubric} Query
121+
:::
122+
123+
Submit a typical query in human language.
124+
125+
```shell
126+
ctk query nlsql "Get the names of products that were ordered by customers in New York who spent more than the average amount."
127+
```
128+
129+
:::{rubric} Response
130+
:::
131+
132+
The model figures out the SQL statement, the engine runs it, and
133+
uses the model again to come back with a synthesized response
134+
based on the provided SQL query and its result:
135+
136+
> The query identifies the top 10 product names ordered by customers in New York
137+
> who spent more than the average order amount.
138+
> The results show that "Laptop", "Phone", and "Headphones" were among the most
139+
> popular products purchased by New York customers with high spending.
140+
141+
The SQL statement was:
142+
```sql
143+
SELECT
144+
p.name FROM products AS p
145+
JOIN order_items AS oi ON p.product_id = oi.product_id
146+
JOIN orders AS o ON oi.order_id = o.order_id
147+
JOIN customers AS c ON o.customer_id = c.customer_id
148+
WHERE
149+
c.city = 'New York'
150+
ORDER BY
151+
o.amount DESC LIMIT 10;
152+
```
153+
154+
## JOINs and grouping
155+
156+
:::{rubric} Provision
157+
:::
158+
159+
```sql
160+
CREATE TABLE customers (customer_id INTEGER, name VARCHAR, city VARCHAR, email_address VARCHAR, gender_code VARCHAR);
161+
CREATE TABLE orders (order_id INTEGER, customer_id INTEGER, amount INTEGER);
162+
CREATE TABLE products (product_id INTEGER, name VARCHAR, price NUMERIC(2), size VARCHAR);
163+
CREATE TABLE order_items (order_id INTEGER, product_id INTEGER);
164+
165+
INSERT INTO customers (customer_id, name, city, email_address, gender_code) VALUES
166+
(1, 'Alice Johnson', 'New York', 'alice@example.com', 'F'),
167+
(2, 'Bob Smith', 'Los Angeles', 'bob@example.com', 'M'),
168+
(3, 'Carol Lee', 'Chicago', 'carol@example.com', 'F'),
169+
(4, 'David Brown', 'Houston', 'david@example.com', 'M'),
170+
(5, 'Eva Green', 'Phoenix', 'eva@example.com', 'F'),
171+
(6, 'Frank Miller', 'Miami', 'frank@example.com', 'M'),
172+
(7, 'Grace Kim', 'Seattle', 'grace@example.com', 'F'),
173+
(8, 'Henry Davis', 'Boston', 'henry@example.com', 'O'); -- least common gender
174+
175+
INSERT INTO orders (order_id, customer_id, amount) VALUES
176+
(101, 1, 120),
177+
(102, 2, 200),
178+
(103, 3, 150),
179+
(104, 4, 300),
180+
(105, 6, 80);
181+
182+
INSERT INTO products (product_id, name, price, size) VALUES
183+
(1001, 'T-Shirt', 20, 'M'),
184+
(1002, 'Jeans', 50, 'L'),
185+
(1003, 'Jacket', 80, 'XL'),
186+
(1004, 'Sneakers', 60, '42'),
187+
(1005, 'Hat', 15, 'S');
188+
189+
INSERT INTO order_items (order_id, product_id) VALUES
190+
(101, 1001),
191+
(101, 1005),
192+
(102, 1002),
193+
(103, 1003),
194+
(104, 1004),
195+
(105, 1001);
196+
```
197+
198+
:::{rubric} Q & A
199+
:::
200+
201+
- Q: What are the email address and town of the customers who are of the least common gender?
202+
SQL: `SELECT email_address, city FROM customers GROUP BY gender_code ORDER BY count(*) ASC LIMIT 1`
203+
- Q: What are the product price and the product size of the products whose price is above average?
204+
SQL: `SELECT products.price, products.size FROM products WHERE products.price > (SELECT AVG(price) FROM products)`
205+
- Q: Which customers did not make any orders?
206+
SQL: `SELECT c.name FROM customers AS c LEFT JOIN orders AS o ON c.customer_id = o.customer_id WHERE o.order_id IS NULL;`

doc/query/nlsql/example-sensor.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
(nlsql-example-sensor)=
22

3-
# NLSQL sensor data example
3+
# NLSQL with sensor data
4+
5+
Let's use a single `time_series_data` database table
6+
and populate it with a few records worth of time series data.
47

58
:::{rubric} Provision
69
:::
710

8-
Add data to the database. Let's use a very basic table schema and
9-
just a few records worth of time series data.
11+
Create table and insert data.
1012

1113
```sql
1214
CREATE TABLE IF NOT EXISTS time_series_data (
@@ -47,14 +49,14 @@ ctk query nlsql "What is the average value for sensor 1?"
4749
:::
4850

4951
The model figures out the SQL statement, the engine runs it, and
50-
uses the model again to come back with an answer in human language.
51-
52-
```sql
53-
SQL: SELECT AVG(value) FROM time_series_data WHERE sensor_id = 1;
52+
uses the model again to come back with an answer in human language:
53+
```text
54+
The average value for sensor 1 is approximately 17.03.
5455
```
5556

56-
```text
57-
Answer: The average value for sensor 1 is approximately 17.03.
57+
The SQL statement was:
58+
```sql
59+
SELECT AVG(value) FROM time_series_data WHERE sensor_id = 1;
5860
```
5961

6062
:::{rubric} Multiple languages

doc/query/nlsql/example-weather.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
(nlsql-example-weather)=
2+
3+
# NLSQL with weather data
4+
5+
Let's use a basic database including weather observations.
6+
7+
:::{rubric} Provision
8+
:::
9+
10+
Create table and insert data.
11+
12+
```sql
13+
CREATE TABLE weather (zip_code VARCHAR, city VARCHAR, temperature_fahrenheit INTEGER, mean_visibility_miles INTEGER);
14+
15+
INSERT INTO weather (zip_code, city, temperature_fahrenheit, mean_visibility_miles) VALUES
16+
('10001', 'New York', 85, 8), -- visibility < 10
17+
('90001', 'Los Angeles', 95, 12), -- temp > 90
18+
('60601', 'Chicago', 88, 9), -- visibility < 10
19+
('73301', 'Austin', 102, 15), -- temp > 90
20+
('94102', 'San Francisco', 65, 7), -- visibility < 10
21+
('85001', 'Phoenix', 110, 20), -- temp > 90
22+
('33101', 'Miami', 91, 11); -- temp > 90
23+
```
24+
25+
:::{rubric} Query
26+
:::
27+
28+
Submit typical queries in human language.
29+
30+
```shell
31+
ctk query nlsql "Find the zip code where the mean visibility is lower than 10."
32+
ctk query nlsql "Find all cities with temperatures above 90°F."
33+
```
34+
35+
:::{rubric} Response
36+
:::
37+
38+
The model figures out the SQL statements, the engine runs it, and
39+
uses the model again to come back with answers in human language:
40+
```text
41+
The zip codes with a mean visibility of less than 10 miles are 94102, 10001, and 60601.
42+
```
43+
```text
44+
The cities with temperatures above 90°F are Miami, Austin, Phoenix, and Los Angeles.
45+
```
46+
47+
The SQL statements were:
48+
```sql
49+
SELECT zip_code FROM weather WHERE mean_visibility_miles < 10;
50+
```
51+
```sql
52+
SELECT city FROM weather WHERE temperature_fahrenheit > 90;
53+
```

doc/query/nlsql/index.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,10 @@ information from a single table.
398398
:maxdepth: 1
399399
:hidden:
400400
401+
Employee data example <example-employee>
402+
Product orders example <example-product>
401403
Sensor data example <example-sensor>
404+
Weather data example <example-weather>
402405
```
403406

404407

0 commit comments

Comments
 (0)