Skip to content

Commit 39e9f56

Browse files
committed
feat: Introduce Query, Result, and Settings APIs for NL2SQL
- Added QueryAPI for executing natural language queries against databases. - Implemented ResultAPI for managing and storing query results. - Created SettingsAPI for configuration and settings management. - Removed obsolete ResultStore and SandboxManager classes. - Refactored PhysicalValidatorNode to streamline validation processes. - Established a public API for NL2SQL core functionalities, enhancing modular access. - Updated tests to reflect changes in the API structure and removed deprecated tests.
1 parent 61846b4 commit 39e9f56

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+1803
-1040
lines changed
Lines changed: 0 additions & 160 deletions
Original file line numberDiff line numberDiff line change
@@ -1,160 +0,0 @@
1-
# PhysicalValidatorNode
2-
3-
## Overview
4-
5-
- Performs dry‑run and cost‑estimate checks on generated SQL using sandboxed execution.
6-
- Exists to validate executability and performance limits before execution.
7-
- Not wired in the default SQL agent graph.
8-
- Class: `PhysicalValidatorNode`
9-
- Source: `packages/core/src/nl2sql/pipeline/nodes/validator/physical_node.py`
10-
11-
---
12-
13-
## Responsibilities
14-
15-
- Run dry‑run validation via adapter `dry_run()`.
16-
- Run cost estimation via adapter `cost_estimate()`.
17-
- Use sandboxed execution pool and `DB_BREAKER` for isolation and resilience.
18-
19-
---
20-
21-
## Position in Execution Graph
22-
23-
Upstream:
24-
- None (not connected in current graph).
25-
26-
Downstream:
27-
- None (not connected in current graph).
28-
29-
Trigger conditions:
30-
- Not executed in current pipeline; requires graph wiring.
31-
32-
```mermaid
33-
flowchart LR
34-
PhysicalValidator[PhysicalValidatorNode]
35-
```
36-
37-
---
38-
39-
## Inputs
40-
41-
From `SubgraphExecutionState`:
42-
43-
- `generator_response.sql_draft` (required)
44-
- `sub_query.datasource_id` (required)
45-
46-
From `NL2SQLContext`:
47-
48-
- `ds_registry` (adapter resolution)
49-
50-
Validation performed:
51-
52-
- If `sql` is missing, returns empty response.
53-
- If `datasource_id` missing, emits `MISSING_DATASOURCE_ID`.
54-
55-
---
56-
57-
## Outputs
58-
59-
Mutations to `SubgraphExecutionState`:
60-
61-
- `physical_validator_response` (`PhysicalValidatorResponse`)
62-
- `errors` and `reasoning`
63-
64-
Side effects:
65-
66-
- Sandbox execution pool usage via `execute_in_sandbox`.
67-
- Adapter calls to `dry_run()` and `cost_estimate()`.
68-
69-
---
70-
71-
## Internal Flow (Step-by-Step)
72-
73-
1. If SQL is missing, return empty response.
74-
2. Resolve datasource adapter via registry.
75-
3. `_validate_semantic()` runs dry‑run in sandbox under `DB_BREAKER`.
76-
4. `_validate_performance()` runs cost estimate in sandbox under `DB_BREAKER`.
77-
5. Collect errors/warnings and return `PhysicalValidatorResponse`.
78-
6. On exceptions, emit `PHYSICAL_VALIDATOR_FAILED`.
79-
80-
---
81-
82-
## Contracts & Interfaces
83-
84-
Implements a LangGraph node callable:
85-
86-
```
87-
def __call__(self, state: SubgraphExecutionState) -> Dict[str, Any]
88-
```
89-
90-
Key contracts:
91-
92-
- `PhysicalValidatorResponse`
93-
- `ExecutionRequest` / `ExecutionResult`
94-
95-
---
96-
97-
## Determinism Guarantees
98-
99-
- Deterministic for a fixed SQL and adapter behavior.
100-
- External DB behavior can vary across runs.
101-
102-
---
103-
104-
## Error Handling
105-
106-
Emits `PipelineError` with:
107-
108-
- `MISSING_DATASOURCE_ID`
109-
- `EXECUTION_ERROR`
110-
- `EXECUTOR_CRASH`
111-
- `PERFORMANCE_WARNING`
112-
- `SERVICE_UNAVAILABLE`
113-
- `PHYSICAL_VALIDATOR_FAILED`
114-
115-
---
116-
117-
## Retry + Idempotency
118-
119-
- No internal retry logic beyond circuit breaker behavior.
120-
- Idempotency depends on adapter dry‑run semantics.
121-
122-
---
123-
124-
## Performance Characteristics
125-
126-
- Uses process pool execution (sandbox).
127-
- Cost estimation and dry‑run are external DB calls.
128-
129-
---
130-
131-
## Observability
132-
133-
- Logger: `physical_validator`
134-
- Uses `DB_BREAKER` which logs breaker state changes.
135-
136-
---
137-
138-
## Configuration
139-
140-
- None directly; uses sandbox pool sizing from settings and adapter config.
141-
142-
---
143-
144-
## Extension Points
145-
146-
- Wire into `build_sql_agent_graph()` between generator and executor.
147-
- Extend dry‑run and cost estimation logic per adapter.
148-
149-
---
150-
151-
## Known Limitations
152-
153-
- Not connected to the default SQL agent graph.
154-
- Behavior depends on adapter support for dry‑run and cost estimation.
155-
156-
---
157-
158-
## Related Code
159-
160-
- `packages/core/src/nl2sql/pipeline/nodes/validator/physical_node.py`

docs/execution/sandbox.md

Lines changed: 0 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +0,0 @@
1-
# Execution Sandbox Architecture
2-
3-
NL2SQL includes a sandbox subsystem that provides **process-level isolation** for execution and indexing tasks. The sandbox is implemented via `ProcessPoolExecutor` pools managed by `SandboxManager`.
4-
5-
## Sandbox structure (available)
6-
7-
```mermaid
8-
flowchart TD
9-
Caller[Execution/Indexing Task] --> Sandbox[get_execution_pool()/get_indexing_pool()]
10-
Sandbox --> Worker[ProcessPool Worker]
11-
Worker --> Adapter[DatasourceAdapterProtocol.execute()]
12-
Adapter --> Database[(Database)]
13-
```
14-
15-
## Current wiring (as implemented)
16-
17-
- `SandboxManager` and `execute_in_sandbox()` exist and are production-ready.
18-
- The default `SqlExecutorService` currently executes **in-process** and does not call `execute_in_sandbox()`.
19-
- Circuit breakers are available in `nl2sql.common.resilience`, but `SqlExecutorService` does not wrap calls with `DB_BREAKER` today.
20-
21-
This means execution isolation is **available but not enforced** by default in SQL execution.
22-
23-
## Concurrency model
24-
25-
- `run_with_graph()` executes the control graph within a `ThreadPoolExecutor`.
26-
- Sandbox pools are **process-based** and designed for isolation and crash containment.
27-
- Execution and indexing pools are configured via `Settings.sandbox_exec_workers` and `Settings.sandbox_index_workers`.
28-
29-
## Sandbox APIs
30-
31-
- `SandboxManager.get_execution_pool()` for latency-sensitive execution.
32-
- `SandboxManager.get_indexing_pool()` for throughput-heavy indexing.
33-
- `execute_in_sandbox()` for timeouts and crash handling.
34-
35-
## Failure handling semantics
36-
37-
`execute_in_sandbox()` returns an `ExecutionResult` that captures:
38-
39-
- timeouts (worker hung)
40-
- worker crashes (segfault/OOM)
41-
- serialization or runtime errors
42-
43-
## Retry behavior
44-
45-
Retry behavior is not defined at the sandbox layer. Retries are controlled by:
46-
47-
- SQL agent retry loop (`sql_agent_max_retries`)
48-
- Circuit breaker configuration (fail-fast)
49-
50-
## Source references
51-
52-
- Sandbox manager: `packages/core/src/nl2sql/common/sandbox.py`
53-
- SQL executor: `packages/core/src/nl2sql/execution/executor/sql_executor.py`

packages/adapter-sdk/src/nl2sql_adapter_sdk/protocols.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,8 @@ def execute(self, request: AdapterRequest) -> ResultFrame:
3434
def get_dialect(self) -> str:
3535
"""Return the normalized dialect string (SQL adapters)."""
3636
...
37+
38+
39+
def test_connection(self) -> bool:
40+
"""Test if the connection to the datasource can be established."""
41+
...

packages/adapter-sqlalchemy/src/nl2sql_sqlalchemy_adapter/adapter.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -460,3 +460,16 @@ def get_dialect(self) -> str:
460460

461461
def cost_estimate(self, sql: str) -> CostEstimate:
462462
raise NotImplementedError(f"Adapter {self.__class__.__name__} must implement cost_estimate")
463+
464+
465+
def test_connection(self) -> bool:
466+
"""
467+
Tests the database connection by executing a simple query.
468+
"""
469+
try:
470+
with self.engine.connect() as conn:
471+
conn.execute(text("SELECT 1"))
472+
return True
473+
except Exception as e:
474+
logger.error(f"Connection test failed for {self}: {e}")
475+
return False

packages/api/API_DOCS.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# NL2SQL API Documentation
2+
3+
The NL2SQL API provides a REST interface to the NL2SQL engine, allowing external clients (such as the TypeScript CLI) to interact with the system.
4+
5+
## Available Endpoints
6+
7+
### Query Endpoint
8+
- **URL**: `/api/v1/query`
9+
- **Method**: `POST`
10+
- **Description**: Execute a natural language query against a database
11+
- **Request Body**:
12+
```json
13+
{
14+
"natural_language": "Show top 10 customers by revenue",
15+
"datasource_id": "optional_datasource_id",
16+
"execute": true,
17+
"user_context": {
18+
"user_id": "user123",
19+
"permissions": ["read_customers", "read_orders"]
20+
}
21+
}
22+
```
23+
- **Response**:
24+
```json
25+
{
26+
"sql": "SELECT customer_name, revenue FROM customers ORDER BY revenue DESC LIMIT 10",
27+
"results": [...],
28+
"final_answer": "Here are the top 10 customers by revenue...",
29+
"errors": [],
30+
"trace_id": "unique_trace_id",
31+
"reasoning": [...],
32+
"warnings": [...]
33+
}
34+
```
35+
36+
### Schema Endpoints
37+
- **URL**: `/api/v1/schema/{datasource_id}`
38+
- **Method**: `GET`
39+
- **Description**: Get schema information for a specific datasource
40+
- **Response**:
41+
```json
42+
{
43+
"datasource_id": "my_database",
44+
"tables": [...],
45+
"relationships": [...],
46+
"metadata": {...}
47+
}
48+
```
49+
50+
- **URL**: `/api/v1/schema`
51+
- **Method**: `GET`
52+
- **Description**: List all available datasources
53+
- **Response**: `["datasource1", "datasource2", ...]`
54+
55+
### Health Check Endpoints
56+
- **URL**: `/api/v1/health`
57+
- **Method**: `GET`
58+
- **Description**: Check if the API is running
59+
- **Response**:
60+
```json
61+
{
62+
"success": true,
63+
"message": "NL2SQL API is running"
64+
}
65+
```
66+
67+
- **URL**: `/api/v1/ready`
68+
- **Method**: `GET`
69+
- **Description**: Check if the API is ready to serve requests
70+
- **Response**:
71+
```json
72+
{
73+
"success": true,
74+
"message": "NL2SQL API is ready"
75+
}
76+
```
77+
78+
## Running the API Server
79+
80+
To start the API server:
81+
82+
```bash
83+
nl2sql-api --host 0.0.0.0 --port 8000 --reload
84+
```
85+
86+
Or using uvicorn directly:
87+
88+
```bash
89+
uvicorn nl2sql_api.main:app --host 0.0.0.0 --port 8000 --reload
90+
```
91+
92+
## Configuration
93+
94+
The API relies on the same configuration files as the core NL2SQL engine:
95+
- `configs/datasources.yaml` - Database connection configurations
96+
- `configs/llm.yaml` - LLM provider configurations
97+
- `configs/secrets.yaml` - Secret management configurations
98+
99+
Make sure these files are properly configured before starting the API server.

packages/api/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# NL2SQL API
2+
3+
API layer for the NL2SQL engine that provides a REST interface to the core functionality.
4+
5+
## Overview
6+
7+
This package provides a FastAPI-based REST API that uses the NL2SQL core's public API to interact with the NL2SQL engine over HTTP. It serves as a bridge between external clients (such as the TypeScript CLI) and the core engine functionality.
8+
9+
## Architecture
10+
11+
The API package leverages the NL2SQL core's public API layer (`NL2SQL` class), ensuring clean separation between the API service and the core engine implementation. The service layer uses the core's public methods like `run_query()`, `list_datasources()`, and schema access through `engine.context.schema_store`.
12+
13+
## Features
14+
15+
- RESTful API endpoints for natural language to SQL conversion
16+
- Schema introspection capabilities
17+
- Health and readiness checks
18+
- Proper error handling and response formatting
19+
- Lazy initialization to avoid configuration issues during import
20+
- Integration with the core's public API layer
21+
22+
## Endpoints
23+
24+
- `POST /api/v1/query` - Execute a natural language query
25+
- `GET /api/v1/schema/{datasource_id}` - Get schema for a specific datasource
26+
- `GET /api/v1/schema` - List available datasources
27+
- `GET /api/v1/health` - Health check endpoint
28+
- `GET /api/v1/ready` - Readiness check endpoint
29+
30+
## Running the API
31+
32+
```bash
33+
pip install -e .
34+
nl2sql-api --host 0.0.0.0 --port 8000 --reload
35+
```
36+
37+
Or using uvicorn directly:
38+
39+
```bash
40+
uvicorn nl2sql_api.main:app --host 0.0.0.0 --port 8000 --reload
41+
```
42+
43+
## Development
44+
45+
Install in development mode:
46+
47+
```bash
48+
pip install -e .
49+
```
50+
51+
For detailed API documentation, see [API_DOCS.md](API_DOCS.md).

0 commit comments

Comments
 (0)