Skip to content

Commit 4238067

Browse files
authored
Merge pull request #28 from nadeem4/feat/deterministic-aggregation
Feat/deterministic aggregation
2 parents 2b9bb1a + a3a5c22 commit 4238067

247 files changed

Lines changed: 11719 additions & 6514 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,6 @@ site
2626

2727
data
2828

29-
last_reasoning.json
29+
last_reasoning.json
30+
31+
artifacts/

audit/remediation_plan.md

Lines changed: 0 additions & 92 deletions
This file was deleted.

audit/remediation_plan_observability.md

Lines changed: 0 additions & 64 deletions
This file was deleted.

configs/llm.demo.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,9 @@ default:
66
model: gpt-5.2
77
temperature: 0.0
88
api_key: ${env:OPENAI_API_KEY}
9+
agents:
10+
indexing_enrichment:
11+
provider: openai
12+
model: gpt-5.2
13+
temperature: 0.0
14+
api_key: ${env:OPENAI_API_KEY}

configs/llm.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ default:
33
provider: openai
44
model: gpt-5.2
55
agents:
6-
intent_validator:
6+
indexing_enrichment:
77
provider: openai
8-
model: gpt-4o-mini
8+
model: gpt-5.2
99
temperature: 0.0
1010

docs/adapters/development.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ There are two primary ways to build an adapter. Choose the one that fits your ta
99
| If you are checking... | Use... | Reference |
1010
| :--- | :--- | :--- |
1111
| A standard SQL Database (Postgres, Oracle, Snowflake) | `nl2sql-adapter-sqlalchemy` | **[SQLAlchemy Adapter Reference](sqlalchemy.md)** |
12-
| A NoSQL DB, REST API, or custom driver | `nl2sql-adapter-sdk` | **[Adapter SDK Reference](sdk.md)** |
12+
| A NoSQL DB, REST API, or custom driver | Core adapter protocol | **[Adapter Interface Reference](sdk.md)** |
1313

1414
## Option 1: The "Fast Lane" (SQLAlchemy)
1515

@@ -35,11 +35,11 @@ class PostgresAdapter(BaseSQLAlchemyAdapter):
3535

3636
> See the **[SQLAlchemy Adapter Reference](sqlalchemy.md)** for full API details.
3737
38-
## Option 2: The "Custom" Path (SDK)
38+
## Option 2: The "Custom" Path (Protocol)
3939

40-
If you need to connect to something else (e.g., ElasticSearch, a CRM API, or a raw SQL driver), you must implement the raw interface.
40+
If you need to connect to something else (e.g., ElasticSearch, a CRM API, or a raw SQL driver), implement the core adapter protocol.
4141

42-
**Implement `DatasourceAdapter`**. You must manually handle:
42+
**Implement `DatasourceAdapterProtocol`**. You must manually handle:
4343

4444
* Fetching and normalizing schema metadata.
4545
* Executing queries and formatting results.
@@ -48,25 +48,29 @@ If you need to connect to something else (e.g., ElasticSearch, a CRM API, or a r
4848
### Example
4949

5050
```python
51-
from nl2sql_adapter_sdk import DatasourceAdapter
51+
from nl2sql.datasources.protocols import DatasourceAdapterProtocol
52+
from nl2sql_adapter_sdk.contracts import AdapterRequest, ResultFrame
53+
from nl2sql_adapter_sdk.capabilities import DatasourceCapability
5254

53-
class MyRestAdapter(DatasourceAdapter):
54-
def fetch_schema(self) -> SchemaMetadata:
55+
class MyRestAdapter(DatasourceAdapterProtocol):
56+
def capabilities(self):
57+
return {DatasourceCapability.SUPPORTS_REST}
58+
59+
def fetch_schema_snapshot(self):
5560
# call API, return schema
5661
pass
5762

58-
def execute(self, query: str) -> QueryResult:
59-
# run query, return rows
63+
def execute(self, request: AdapterRequest) -> ResultFrame:
64+
# run request, return rows
6065
pass
6166
```
6267

63-
> See the **[Adapter SDK Reference](sdk.md)** for the mandatory method signatures and compliance testing guide.
68+
> See the **[Adapter Interface Reference](sdk.md)** for the method signatures and compliance guide.
6469
6570
## Compliance Testing
6671

67-
Regardless of which path you choose, your adapter **MUST** pass the compliance test suite to ensuring it handles types and errors correctly.
72+
Regardless of which path you choose, your adapter **MUST** pass the compliance test suite to ensure it handles types and errors correctly.
6873

6974
```python
70-
from nl2sql_adapter_sdk.testing import BaseAdapterTest
71-
# ... see SDK Reference for test setup
75+
# See Adapter Interface Reference for test setup
7276
```

docs/adapters/sdk.md

Lines changed: 27 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,82 +1,64 @@
1-
# Adapter SDK Reference
1+
# Adapter Interface Reference
22

3-
The **Adapter SDK** (`nl2sql-adapter-sdk`) defines the core contract that all datasources must implement.
3+
The adapter contract lives in the adapter SDK:
44

5-
## Interface: `DatasourceAdapter`
5+
- `nl2sql.datasources.protocols.DatasourceAdapterProtocol`
6+
- `nl2sql_adapter_sdk.contracts.AdapterRequest`
7+
- `nl2sql_adapter_sdk.contracts.ResultFrame`
68

7-
All adapters must inherit from `nl2sql_adapter_sdk.interfaces.DatasourceAdapter`.
9+
## Interface: `DatasourceAdapterProtocol`
10+
11+
Adapters expose a capability-driven interface:
812

913
```python
10-
from nl2sql_adapter_sdk import DatasourceAdapter
14+
from nl2sql.datasources.protocols import DatasourceAdapterProtocol
15+
from nl2sql_adapter_sdk.contracts import AdapterRequest, ResultFrame
16+
from nl2sql_adapter_sdk.capabilities import DatasourceCapability
1117
```
1218

1319
### Mandatory Properties
1420

1521
| Property | Type | Description |
1622
| :--- | :--- | :--- |
1723
| `datasource_id` | `str` | Unique identifier (e.g., "production_db"). |
18-
| `row_limit` | `int` | **Safety Breaker**. Must return a safe limit (e.g., 1000) to prevent OOM errors. |
19-
| `max_bytes` | `int` | **Safety Breaker**. Recommended limit for network payloads. |
24+
| `datasource_engine_type` | `str` | Engine type string (e.g., `postgres`, `rest`). |
25+
| `row_limit` | `int` | Safety breaker (limit returned rows). |
26+
| `max_bytes` | `int` | Safety breaker (limit payload size). |
2027

2128
### Mandatory Methods
2229

23-
#### `fetch_schema()`
30+
#### `capabilities() -> set[DatasourceCapability]`
2431

25-
Returns `SchemaMetadata`.
32+
Declares supported capabilities (e.g., `supports_sql`, `supports_rest`).
2633

27-
* **Returns**: `SchemaMetadata` containing tables, columns, PKs, FKs.
28-
* **Requirement**: Must populate `col.statistics` (samples, min/max) for the validation logic to work effectively.
34+
#### `execute(request: AdapterRequest) -> ResultFrame`
2935

30-
#### `execute(sql: str)`
36+
Executes a plan-specific request and returns a normalized `ResultFrame`.
3137

32-
Executes a query and returns results.
38+
* **Args**: `AdapterRequest` with `plan_type` and `payload`
39+
* **Returns**: `ResultFrame` with `columns`, `rows`, `row_count`, and error metadata
3340

34-
* **Args**: `sql` (str) - The SQL query to run.
35-
* **Returns**: `QueryResult` with `rows` (list of dicts) and `columns` (list of names).
41+
#### `fetch_schema_snapshot()`
3642

37-
#### `dry_run(sql: str)`
43+
Required only if `supports_schema_introspection` is advertised.
3844

39-
Validates SQL without executing it (or safely rolling back).
45+
### Optional Methods (SQL adapters)
4046

41-
* **Args**: `sql` (str)
42-
* **Returns**: `DryRunResult(is_valid=bool, error_message=str)`
47+
#### `dry_run(sql: str)`
4348

44-
### Optional Methods
49+
Validates SQL without executing it (or safely rolling back).
4550

4651
#### `explain(sql: str)`
4752

4853
Returns the execution plan.
4954

50-
* **Returns**: `QueryPlan(plan_text=str)`
51-
5255
#### `cost_estimate(sql: str)`
5356

5457
Returns cost/row estimates for the Physical Validator.
5558

56-
* **Returns**: `CostEstimate(estimated_cost=float, estimated_rows=int)`
57-
5859
---
5960

6061
## Compliance Testing
6162

62-
The SDK provides a compliance test suite. **All Adapters MUST pass this suite.**
63-
64-
It verifies:
65-
66-
* Schema Introspection (PKs/FKs detected?)
67-
* Type Mapping (Date -> Python Date, Numeric -> Python Float)
68-
* Error Handling (Bad SQL -> AdapterError)
69-
70-
### Running Tests
71-
72-
```python
73-
# tests/test_my_adapter.py
74-
from nl2sql_adapter_sdk.testing import BaseAdapterTest
75-
from my_adapter import MyAdapter
76-
import pytest
77-
78-
class TestMyAdapter(BaseAdapterTest):
79-
@pytest.fixture
80-
def adapter(self):
81-
return MyAdapter(...)
82-
```
63+
All adapters should pass the compliance test suite (schema introspection, type mapping,
64+
error handling, and result contract validation).

docs/architecture/decisions/ADR-001_sandboxed_execution.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### The Core Issue: "Blast Radius" & Reliability
66

7-
Currently, the NL2SQL Agent executes SQL queries via `ExecutorNode`, performs dry-runs via `PhysicalValidatorNode`, and indexes schemas via `OrchestratorVectorStore` **in-process**.
7+
Currently, the NL2SQL Agent executes SQL queries via `ExecutorNode`, performs dry-runs via `PhysicalValidatorNode`, and indexes schemas via `VectorStore` **in-process**.
88

99
This couples the stability of the Agent to the stability of the underlying SQL Drivers and the Database itself.
1010

@@ -34,7 +34,7 @@ Spawn worker processes on the same machine (Process Boundary Isolation).
3434

3535
* **Mechanism**:
3636
* **Execution Pool**: A low-latency pool for `ExecutorNode` and `PhysicalValidatorNode` (User-facing queries).
37-
* **Indexing Pool**: A separate, lower-priority pool for `OrchestratorVectorStore` (Background tasks).
37+
* **Indexing Pool**: A separate, lower-priority pool for `VectorStore` (Background tasks).
3838
* **Controls**:
3939
* `max_workers`: Hard cap on concurrency to prevent OOM.
4040
* `initializer`: Setup persistent global `SQLAlchemy Engine` for connection pooling.

0 commit comments

Comments
 (0)