Skip to content

Commit 3906685

Browse files
committed
feat(demo): Enhance demo data generation and documentation
- Added CLI instructions for generating deterministic demo data and configurations. - Introduced new demo data architecture and entity relationships in the documentation. - Updated demo data generation to include additional factories, employee roles, customer segments, and maintenance logs. - Enhanced sample questions to cover a broader range of queries for the demo. - Refactored demo data structure to support new entities and relationships, improving realism in scenarios. - Updated README.md to include demo data usage instructions and configuration details.
1 parent 18cbddf commit 3906685

27 files changed

Lines changed: 1061 additions & 399 deletions

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,28 @@ result = run_with_graph(ctx, "Top 5 customers by revenue last quarter?")
137137
print(result.get("final_answer"))
138138
```
139139

140+
## 🧪 Demo data (CLI-only)
141+
142+
Use the CLI to generate deterministic demo data and configs, then point the API at the generated files.
143+
144+
1. Generate demo data + configs:
145+
146+
```bash
147+
nl2sql setup --demo --lite
148+
```
149+
150+
2. Start the API with demo settings:
151+
152+
```bash
153+
# Option A: load .env.demo via ENV
154+
ENV=demo uvicorn nl2sql_api.main:app
155+
156+
# Option B: load a specific env file
157+
ENV_FILE_PATH=.env.demo uvicorn nl2sql_api.main:app
158+
```
159+
160+
The demo datasource file uses relative paths (e.g. `data/demo_lite/*.db`), so start the API from the repo root.
161+
140162
## 🔖 Versioning Policy
141163

142164
NL2SQL uses unified versioning across the monorepo. Core, adapters, API, and CLI

configs/llm.demo.yaml

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,9 @@
33
version: 1
44
default:
55
provider: openai
6-
model: gpt-5.2
6+
model: gpt-4o
77
temperature: 0.0
8-
api_key: ${env:OPENAI_API_KEY}
9-
agents:
10-
indexing_enrichment:
11-
provider: openai
12-
model: gpt-5.2
13-
temperature: 0.0
14-
api_key: ${env:OPENAI_API_KEY}
8+
api_key: !!python/object:pydantic.types.SecretStr
9+
_secret_value: ${env:OPENAI_API_KEY}
10+
name: default
11+
agents: {}

configs/sample_questions.demo.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,35 @@ manufacturing_ref:
33
- Show me the capacity of Berlin Plant
44
- What shifts are available?
55
- List all machine types produced by TechCorp
6+
- Which factories have capacity greater than 4000?
7+
- Show all employee roles in the Maintenance department
8+
- List customer segments available for reporting
69
manufacturing_ops:
710
- Show me active employees in the Austin Gigafactory
811
- Which machines have error logs in the last 7 days?
912
- Who is the operator for machine 5?
1013
- Count the number of active machines per factory
1114
- List maintenance logs for Vibration sensor alerts
15+
- Which machines are overdue for maintenance based on last_maintenance_date?
16+
- Show employees hired in the last 12 months
17+
- Compare machine status counts by factory
18+
- List maintenance technicians with the most downtime hours logged
1219
manufacturing_supply:
1320
- Total sales amount for 'Industrial Controller'
1421
- Find suppliers for high value components
1522
- Check inventory levels for 'Bolt M5' in Berlin
1623
- List products with base cost greater than 500
1724
- Show me suppliers from Germany
25+
- List products with low inventory across all factories
26+
- Which suppliers provide EV Battery Pack Long Range?
27+
- Show inventory last updated more than 7 days ago
28+
- Compare inventory levels for Hardware category products
1829
manufacturing_history:
1930
- Show total sales orders in Q4
2031
- Calculate average production output per run
2132
- Summarize sales by customer for last year
2233
- List the top 5 largest orders
34+
- Show sales orders by status for the last 30 days
35+
- Compare production output by factory over the last quarter
36+
- List highest revenue products by month
37+
- Show average discount percentage by customer segment

docs/getting_started/demo.md

Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# Demo Data (CLI-first)
2+
3+
Use the CLI to generate deterministic demo data and configs, then query them from
4+
the CLI. This keeps data generation out of the API runtime and gives you a realistic
5+
multi-database scenario with cross-database relationships.
6+
7+
## 1. Install the CLI
8+
9+
```bash
10+
# Install from PyPI
11+
pip install nl2sql-cli
12+
13+
# Or install from source (dev)
14+
pip install -e packages/cli
15+
```
16+
17+
## 2. Generate demo data with the CLI
18+
19+
```bash
20+
nl2sql setup --demo
21+
```
22+
23+
This writes:
24+
25+
- SQLite databases in `data/demo_lite/`
26+
- `configs/datasources.demo.yaml`
27+
- `configs/llm.demo.yaml`
28+
- `configs/policies.demo.json`
29+
- `configs/sample_questions.demo.yaml`
30+
- `.env.demo`
31+
32+
For the lite demo, setup also runs schema indexing once automatically.
33+
34+
## 3. Use demo data with the CLI
35+
36+
```bash
37+
# Run a query against demo data
38+
ENV=demo nl2sql run "Show me broken machines in Austin"
39+
40+
# Index schemas if you need to re-index after regenerating demo data
41+
ENV=demo nl2sql index
42+
```
43+
44+
Note: the demo datasource config uses relative database paths (e.g. `data/demo_lite/*.db`),
45+
so run the CLI from the repo root.
46+
47+
## Demo data architecture
48+
49+
The demo models a manufacturing organization with multiple databases and vendors:
50+
51+
- `manufacturing_ref` (Postgres/SQLite): shared reference data (factories, roles, shifts)
52+
- `manufacturing_ops` (Postgres/SQLite): operational data (employees, machines, maintenance)
53+
- `manufacturing_supply` (MySQL/SQLite): supply chain data (products, suppliers, inventory)
54+
- `manufacturing_history` (MSSQL/SQLite): historical data (sales orders, production runs)
55+
56+
Cross-database relationships are logical (not enforced by DB constraints), so they
57+
mirror real-world enterprise setups where data is distributed across systems.
58+
59+
### Entity relationships
60+
61+
```mermaid
62+
erDiagram
63+
FACTORIES {
64+
int id PK
65+
text name
66+
text region
67+
int capacity
68+
}
69+
MACHINE_TYPES {
70+
int id PK
71+
text model
72+
text producer
73+
int maintenance_interval_days
74+
}
75+
SHIFTS {
76+
int id PK
77+
text name
78+
text start_time
79+
text end_time
80+
}
81+
DEPARTMENTS {
82+
int id PK
83+
text name
84+
}
85+
EMPLOYEE_ROLES {
86+
int id PK
87+
text title
88+
int department_id
89+
}
90+
CUSTOMER_SEGMENTS {
91+
int id PK
92+
text name
93+
}
94+
EMPLOYEES {
95+
int id PK
96+
text name
97+
int factory_id
98+
int shift_id
99+
date hire_date
100+
int role_id
101+
int department_id
102+
text status
103+
}
104+
MACHINES {
105+
int id PK
106+
int factory_id
107+
int type_id
108+
text status
109+
date installation_date
110+
date last_maintenance_date
111+
}
112+
MAINTENANCE_LOGS {
113+
int id PK
114+
int machine_id
115+
date date
116+
text description
117+
int technician_id
118+
text severity
119+
int downtime_hours
120+
}
121+
PRODUCTS {
122+
int id PK
123+
text sku
124+
text name
125+
decimal base_cost
126+
text category
127+
}
128+
SUPPLIERS {
129+
int id PK
130+
text name
131+
text country
132+
}
133+
INVENTORY {
134+
int product_id
135+
int factory_id
136+
int quantity
137+
date last_updated
138+
}
139+
SUPPLIER_PRODUCTS {
140+
int supplier_id
141+
int product_id
142+
}
143+
SALES_ORDERS {
144+
int id PK
145+
text customer_name
146+
date order_date
147+
decimal total_amount
148+
text status
149+
int customer_segment_id
150+
int factory_id
151+
}
152+
SALES_ITEMS {
153+
int id PK
154+
int order_id
155+
int product_id
156+
int quantity
157+
decimal unit_price
158+
decimal discount_pct
159+
}
160+
PRODUCTION_RUNS {
161+
int id PK
162+
int factory_id
163+
date date
164+
int output_quantity
165+
int shift_id
166+
text status
167+
}
168+
169+
EMPLOYEES ||--o{ FACTORIES : "works_at"
170+
EMPLOYEES ||--o{ SHIFTS : "assigned_to"
171+
EMPLOYEES ||--o{ EMPLOYEE_ROLES : "has_role"
172+
EMPLOYEE_ROLES ||--o{ DEPARTMENTS : "in_department"
173+
MACHINES ||--o{ FACTORIES : "located_at"
174+
MACHINES ||--o{ MACHINE_TYPES : "is_type"
175+
MAINTENANCE_LOGS ||--o{ MACHINES : "logs_for"
176+
MAINTENANCE_LOGS ||--o{ EMPLOYEES : "performed_by"
177+
INVENTORY ||--o{ PRODUCTS : "tracks"
178+
INVENTORY ||--o{ FACTORIES : "stored_at"
179+
SUPPLIER_PRODUCTS ||--o{ PRODUCTS : "supplies"
180+
SUPPLIER_PRODUCTS ||--o{ SUPPLIERS : "sourced_from"
181+
SALES_ITEMS ||--o{ SALES_ORDERS : "belongs_to"
182+
SALES_ITEMS ||--o{ PRODUCTS : "sells"
183+
SALES_ORDERS ||--o{ CUSTOMER_SEGMENTS : "segment"
184+
SALES_ORDERS ||--o{ FACTORIES : "fulfilled_by"
185+
PRODUCTION_RUNS ||--o{ FACTORIES : "produced_at"
186+
PRODUCTION_RUNS ||--o{ SHIFTS : "run_shift"
187+
```
188+
189+
## Data scenarios and volumes
190+
191+
- Employees: ~500 across five factories, with roles, departments, and hire dates
192+
- Machines: ~150 with maintenance intervals and last maintenance dates
193+
- Maintenance logs: ~250 with severity and downtime hours
194+
- Inventory: all products across factories with last updated timestamps
195+
- Sales orders: ~5,000 with seasonal spikes in Q4
196+
- Production runs: daily runs per factory over the last year
197+
198+
Embedded scenarios:
199+
- Low-stock alerts for specific products and factories
200+
- Maintenance backlogs for older machines
201+
- Seasonal sales spikes and production variability
202+
- Data skew across factories to mimic regional load
203+
204+
## Sample queries
205+
206+
Single-database examples:
207+
- "Which machines are overdue for maintenance based on last_maintenance_date?"
208+
- "Show sales orders by status for the last 30 days"
209+
- "List products with low inventory across all factories"
210+
211+
Cross-database examples (requires multi-datasource querying):
212+
- "Which factories have the highest sales for EV Battery Pack Long Range in Q4?"
213+
- "Show inventory levels for products with pending orders this month"
214+
- "Compare production output vs sales orders by factory for the last quarter"
215+
- "List maintenance technicians assigned to machines with recent error logs"
216+
217+
## Relationship guide
218+
219+
Common join paths:
220+
- `manufacturing_ops.employees.factory_id` -> `manufacturing_ref.factories.id`
221+
- `manufacturing_ops.machines.type_id` -> `manufacturing_ref.machine_types.id`
222+
- `manufacturing_supply.inventory.product_id` -> `manufacturing_supply.products.id`
223+
- `manufacturing_history.sales_items.product_id` -> `manufacturing_supply.products.id`
224+
- `manufacturing_history.sales_orders.factory_id` -> `manufacturing_ref.factories.id`
225+
226+
## Refreshing demo data
227+
228+
Regenerate data at any time with:
229+
230+
```bash
231+
nl2sql setup --demo
232+
```
233+
234+
This overwrites all demo databases and regenerates sample questions and configs.

docs/getting_started/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ to use the platform:
1212
- [PyPI (Python API)](pypi.md)
1313
- [Docker (REST API)](docker.md)
1414
- [From Source (Development)](source.md)
15+
- [Demo Data (CLI-first)](demo.md)
1516

1617
## Configuration prerequisites
1718

0 commit comments

Comments
 (0)