Skip to content

Commit 7f3c8d4

Browse files
authored
Merge pull request #24 from nadeem4/doc
feat: enhance docs
2 parents 72042e8 + 5a138a8 commit 7f3c8d4

File tree

28 files changed

+714
-287
lines changed

28 files changed

+714
-287
lines changed

docs/adapters/development.md

Lines changed: 44 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -1,120 +1,72 @@
1-
# Building Adapters
1+
# Building Adapters Guide
22

3-
The **Adapter SDK** (`nl2sql-adapter-sdk`) allows you to extend the platform to support new databases or APIs.
3+
The NL2SQL Platform is designed to be extensible. You can build adapters for any datasource, from SQL databases to REST APIs.
44

5-
## Implementing an Adapter
5+
## Implementation Path
66

7-
You must implement the `DatasourceAdapter` interface.
7+
There are two primary ways to build an adapter. Choose the one that fits your target:
88

9-
### Mandatory Properties
10-
11-
* `datasource_id`: Unique identifier (e.g. "postgres_prod").
12-
* `row_limit`: **Safety Breaker**. Must return `1000` (or config value) to prevent massive result sets.
13-
* `max_bytes`: **Safety Breaker**. limit result size at the network/driver level if possible.
14-
15-
### Mandatory Methods
16-
17-
* `fetch_schema()`: Must return `SchemaMetadata` with `tables`, `columns`, `pks`, `fks`. *Crucially, it should also populate `col.statistics` (samples, min/max) for Indexing.*
18-
* `execute(sql)`: Returns `QueryResult`.
19-
* `dry_run(sql)`: Returns validity checks.
20-
21-
### Optional Optimization
22-
23-
* `explain(sql)`: Returns query plan.
24-
* `cost_estimate(sql)`: Returns estimated rows/time. used by PhysicalValidator.
9+
| If you are checking... | Use... | Reference |
10+
| :--- | :--- | :--- |
11+
| A standard SQL Database (Postgres, Oracle, Snowflake) | `nl2sql-adapter-sqlalchemy` | **[SQLAlchemy Adapter Reference](sqlalchemy.md)** |
12+
| A NoSQL DB, REST API, or custom driver | `nl2sql-adapter-sdk` | **[Adapter SDK Reference](sdk.md)** |
2513

26-
::: nl2sql_adapter_sdk.interfaces.DatasourceAdapter
14+
## Option 1: The "Fast Lane" (SQLAlchemy)
2715

28-
## Compliance Testing
16+
For 95% of use cases, you are connecting to a SQL database that already has a Python SQLAlchemy dialect.
2917

30-
The SDK provides a compliance test suite. **All Adapters MUST pass this suite.**
18+
**Use `BaseSQLAlchemyAdapter`**. It handles:
3119

32-
It verifies:
20+
* Automatic Schema Introspection (Tables, PKs, FKs)
21+
* Connection Pooling
22+
* Statistic Gathering
23+
* Transaction-based Dry Runs
3324

34-
* Schema Introspection (PKs/FKs detected?)
35-
* Type Mapping (Date -> Python Date, Numeric -> Python Float)
36-
* Error Handling (Bad SQL -> AdapterError)
25+
### Example
3726

3827
```python
39-
# tests/test_my_adapter.py
40-
from nl2sql_adapter_sdk.testing import BaseAdapterTest
41-
from my_adapter import MyAdapter
28+
from nl2sql_sqlalchemy_adapter import BaseSQLAlchemyAdapter
4229

43-
class TestMyAdapter(BaseAdapterTest):
44-
@pytest.fixture
45-
def adapter(self):
46-
return MyAdapter(...)
30+
class PostgresAdapter(BaseSQLAlchemyAdapter):
31+
def construct_uri(self, args: Dict[str, Any]) -> str:
32+
# Convert args to connection string
33+
return f"postgresql://{args['user']}:{args['password']}@{args['host']}/{args['database']}"
4734
```
4835

49-
## Choosing a Base Class
50-
51-
The platform provides two ways to build adapters. Choose the one that fits your target datasource.
52-
53-
| Feature | `DatasourceAdapter` (Base Interface) | `BaseSQLAlchemyAdapter` (Helper Class) |
54-
| :--- | :--- | :--- |
55-
| **Package** | `nl2sql-adapter-sdk` | `nl2sql-adapter-sqlalchemy` |
56-
| **Best For** | REST APIs, NoSQL, GraphQL, Manual SQL Drivers. | SQL Databases with SQLAlchemy dialects (Postgres, Oracle, Snowflake). |
57-
| **Schema Fetching** | **Manual Implementation Required**. You must map metadata to `SchemaMetadata`. | **Automatic**. Uses `sqlalchemy.inspect` to reflect tables/FKs. |
58-
| **Execution** | **Manual Implementation Required**. You handle connections, cursors, and types. | **Automatic**. Handles pooling, transactions, and result formatting. |
59-
| **Stats Gathering** | **Manual**. You write queries to fetch min/max/nulls. | **Automatic**. Runs optimized generic queries for stats. |
60-
| **Dry Run** | **Manual**. | **Automatic**. Uses transaction rollback pattern. |
61-
62-
### When to use `DatasourceAdapter`?
36+
> See the **[SQLAlchemy Adapter Reference](sqlalchemy.md)** for full API details.
6337
64-
Use the raw interface when:
38+
## Option 2: The "Custom" Path (SDK)
6539

66-
1. You are connecting to a non-SQL source (e.g., Elasticsearch, HubSpot API).
67-
2. You are using a customized internal SQL driver that is not compatible with SQLAlchemy.
68-
3. You need complete control over the execution lifecycle (e.g. async-only drivers).
40+
If you need to connect to something else (e.g., ElasticSearch, a CRM API, or a raw SQL driver), you must implement the raw interface.
6941

70-
### When to use `BaseSQLAlchemyAdapter`?
42+
**Implement `DatasourceAdapter`**. You must manually handle:
7143

72-
Use this helper class when:
44+
* Fetching and normalizing schema metadata.
45+
* Executing queries and formatting results.
46+
* Implementing safety breakers (`row_limit`).
7347

74-
1. There is an existing SQLAlchemy dialect for your database (this covers 95% of SQL databases).
75-
2. You want to save time on boilerplate (connection pooling, schema reflection).
76-
3. You want consistent behavior with the core supported adapters.
48+
### Example
7749

78-
## Building SQL Adapters (The Fast Way)
79-
80-
For SQL databases supported by SQLAlchemy, you should use the `nl2sql-adapter-sqlalchemy` package as described in the comparison above.
50+
```python
51+
from nl2sql_adapter_sdk import DatasourceAdapter
8152

82-
### `BaseSQLAlchemyAdapter` Features
53+
class MyRestAdapter(DatasourceAdapter):
54+
def fetch_schema(self) -> SchemaMetadata:
55+
# call API, return schema
56+
pass
8357

84-
This base class implements ~90% of the required functionality for you:
58+
def execute(self, query: str) -> QueryResult:
59+
# run query, return rows
60+
pass
61+
```
8562

86-
* **Automatic Schema Fetching**: Uses `sqlalchemy.inspect` to get tables, columns, PKs.
87-
* **Automatic Statistics**: Runs optimized queries to fetch `min/max`, `null_percentage`, `distinct_count`, and `sample_values` for text columns.
88-
* **Generic Execution**: Handles connection pooling and result formatting.
89-
* **Safety**: Built-in generic `dry_run` using transaction rollbacks.
63+
> See the **[Adapter SDK Reference](sdk.md)** for the mandatory method signatures and compliance testing guide.
9064
91-
### Example Implementation
65+
## Compliance Testing
9266

93-
See `packages/adapters/postgres` for a reference implementation.
67+
Regardless of which path you choose, your adapter **MUST** pass the compliance test suite to ensuring it handles types and errors correctly.
9468

9569
```python
96-
from nl2sql_sqlalchemy_adapter import BaseSQLAlchemyAdapter
97-
98-
class PostgresAdapter(BaseSQLAlchemyAdapter):
99-
def construct_uri(self, args: Dict[str, Any]) -> str:
100-
return f"postgresql://{args.get('user')}:{args.get('password')}@{args.get('host')}/{args.get('database')}"
101-
102-
# Optional: Override dry_run for better performance using EXPLAIN
103-
def dry_run(self, sql: str):
104-
self.execute(f"EXPLAIN {sql}")
105-
return DryRunResult(is_valid=True)
70+
from nl2sql_adapter_sdk.testing import BaseAdapterTest
71+
# ... see SDK Reference for test setup
10672
```
107-
108-
## Reference Adapters
109-
110-
For detailed usage configurations of our supported adapters, please see the **[Supported Adapters](index.md)** section.
111-
112-
Explore the `packages/adapters/` directory for examples:
113-
114-
* `postgres`: Standard implementation using `sqlalchemy`.
115-
* `sqlite`: Simple, file-based.
116-
* `mssql` / `mysql`: Standard enterprise drivers.
117-
118-
## Next Steps
119-
120-
Check out the [Postgres Adapter Source Code](https://github.com/nadeem4/nl2sql/tree/main/packages/adapters/postgres) for a complete, production-grade example.

docs/adapters/index.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,13 @@ We provide first-class support for the following SQL databases via SQLAlchemy.
1313
| **[Microsoft SQL Server](mssql.md)** | Enterprise support via `pyodbc` and `T-SQL` dialect. | 🟡 Beta |
1414
| **[SQLite](sqlite.md)** | File-based local development. | 🟢 Stable |
1515

16+
## Core Libraries
17+
18+
For developers building their own adapters, we provide detailed reference documentation for our core SDKs.
19+
20+
* **[Adapter SDK Reference](sdk.md)**: The core interface (`DatasourceAdapter`) that all adapters must implement.
21+
* **[SQLAlchemy Adapter Reference](sqlalchemy.md)**: The helper base class (`BaseSQLAlchemyAdapter`) for building SQL-based adapters.
22+
1623
## Missing your database?
1724

1825
Can't find what you need? Check out the **[Building Adapters](development.md)** guide to see how to implement your own.

docs/adapters/mssql.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22

33
Support for SQL Server 2017+ and Azure SQL.
44

5-
!!! info "Implementation"
6-
This adapter extends `BaseSQLAlchemyAdapter` but provides specialized `dry_run` logic using `SET NOEXEC ON` to safely validate T-SQL.
5+
This adapter extends `BaseSQLAlchemyAdapter` but provides specialized `dry_run` logic using `SET NOEXEC ON` to safely validate T-SQL.
76

87
## Configuration
98

@@ -35,6 +34,12 @@ connection:
3534
| **Dry Run** | `SET NOEXEC ON` | Validates syntax without execution. |
3635
| **Costing** | `SET SHOWPLAN_XML ON` | Parses XML for `StatementSubTreeCost`. |
3736

37+
### Optimization Details
38+
39+
* **Dry Run**: Uses `SET NOEXEC ON`. This is a native T-SQL session setting that compiles the query but ensures it is **not executed**. This is extremely safe and accurate for validation.
40+
* **Explain**: Uses `SET SHOWPLAN_XML ON` to retrieve the execution plan in XML format.
41+
* **Cost Estimate**: Parses the XML plan to find `StatementSubTreeCost` (estimated cost) and `StatementEstRows` (estimated rows).
42+
3843
## Requirements
3944

4045
You must have the MS ODBC Driver installed in your Docker image or local environment.

docs/adapters/mysql.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,12 @@ connection:
3636
| **Costing** | `EXPLAIN FORMAT=JSON` | Extracts `query_cost`. |
3737
| **Stats** | `SELECT count(*), min(), max()` | Standard aggregation. |
3838

39+
### Optimization Details
40+
41+
* **Dry Run**: Uses a Transaction Rollback strategy. It starts a transaction (`BEGIN`), executes the query, and immediately rolls back (`ROLLBACK`). **Note**: This means the query *is* technically executed, but its effects are reversed.
42+
* **Explain**: Uses `EXPLAIN FORMAT=JSON {sql}` to get the execution plan.
43+
* **Cost Estimate**: Parses the JSON output to extract `query_cost`. MySQL does not reliably provide a global "estimated rows" count for complex queries, so this is often returned as 0.
44+
3945
## Limitations
4046

4147
* **Row Estimation**: MySQL's `EXPLAIN` does not always provide a reliable "Total Rows" estimate for complex joins compared to Postgres.

docs/adapters/postgres.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22

33
The Postgres adapter is the **Gold Standard** adapter for the platform. It supports the full set of optimization features including `EXPLAIN`-based dry runs and cost estimation.
44

5-
!!! info "Implementation"
6-
This adapter extends `BaseSQLAlchemyAdapter`, leveraging automatic schema reflection and statistics gathering.
5+
This adapter extends `BaseSQLAlchemyAdapter`, leveraging automatic schema reflection and statistics gathering.
76

87
## Configuration
98

@@ -36,6 +35,16 @@ connection:
3635
| **Costing** | `EXPLAIN (FORMAT JSON) {sql}` | Returns "Total Cost" and "Plan Rows". |
3736
| **Stats** | Optimized Queries | Fetches `null_perc`, `distinct`, `min/max`. |
3837

38+
### Optimization Details
39+
40+
The Postgres adapter leverages native `EXPLAIN` capabilities for robust validation and estimation:
41+
42+
* **Dry Run**: Implemented via `EXPLAIN {sql}`. This validates the SQL syntax and ensures that all tables/columns exist without actually executing the query.
43+
* **Explain**: Uses `EXPLAIN (FORMAT JSON) {sql}` to retrieve the full query execution plan in structured JSON format.
44+
* **Cost Estimate**: Uses the same `EXPLAIN (FORMAT JSON) {sql}` command. It parses the root `Plan` object to extract:
45+
* `Total Cost`: Used as the query cost proxy.
46+
* `Plan Rows`: Used as the estimated result size.
47+
3948
## Troubleshooting
4049

4150
### SSL Verification

docs/adapters/sdk.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Adapter SDK Reference
2+
3+
The **Adapter SDK** (`nl2sql-adapter-sdk`) defines the core contract that all datasources must implement.
4+
5+
## Interface: `DatasourceAdapter`
6+
7+
All adapters must inherit from `nl2sql_adapter_sdk.interfaces.DatasourceAdapter`.
8+
9+
```python
10+
from nl2sql_adapter_sdk import DatasourceAdapter
11+
```
12+
13+
### Mandatory Properties
14+
15+
| Property | Type | Description |
16+
| :--- | :--- | :--- |
17+
| `datasource_id` | `str` | Unique identifier (e.g., "production_db"). |
18+
| `row_limit` | `int` | **Safety Breaker**. Must return a safe limit (e.g., 1000) to prevent OOM errors. |
19+
| `max_bytes` | `int` | **Safety Breaker**. Recommended limit for network payloads. |
20+
21+
### Mandatory Methods
22+
23+
#### `fetch_schema()`
24+
25+
Returns `SchemaMetadata`.
26+
27+
* **Returns**: `SchemaMetadata` containing tables, columns, PKs, FKs.
28+
* **Requirement**: Must populate `col.statistics` (samples, min/max) for the validation logic to work effectively.
29+
30+
#### `execute(sql: str)`
31+
32+
Executes a query and returns results.
33+
34+
* **Args**: `sql` (str) - The SQL query to run.
35+
* **Returns**: `QueryResult` with `rows` (list of dicts) and `columns` (list of names).
36+
37+
#### `dry_run(sql: str)`
38+
39+
Validates SQL without executing it (or safely rolling back).
40+
41+
* **Args**: `sql` (str)
42+
* **Returns**: `DryRunResult(is_valid=bool, error_message=str)`
43+
44+
### Optional Methods
45+
46+
#### `explain(sql: str)`
47+
48+
Returns the execution plan.
49+
50+
* **Returns**: `QueryPlan(plan_text=str)`
51+
52+
#### `cost_estimate(sql: str)`
53+
54+
Returns cost/row estimates for the Physical Validator.
55+
56+
* **Returns**: `CostEstimate(estimated_cost=float, estimated_rows=int)`
57+
58+
---
59+
60+
## Compliance Testing
61+
62+
The SDK provides a compliance test suite. **All Adapters MUST pass this suite.**
63+
64+
It verifies:
65+
66+
* Schema Introspection (PKs/FKs detected?)
67+
* Type Mapping (Date -> Python Date, Numeric -> Python Float)
68+
* Error Handling (Bad SQL -> AdapterError)
69+
70+
### Running Tests
71+
72+
```python
73+
# tests/test_my_adapter.py
74+
from nl2sql_adapter_sdk.testing import BaseAdapterTest
75+
from my_adapter import MyAdapter
76+
import pytest
77+
78+
class TestMyAdapter(BaseAdapterTest):
79+
@pytest.fixture
80+
def adapter(self):
81+
return MyAdapter(...)
82+
```

docs/adapters/sqlalchemy.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# SQLAlchemy Adapter Reference
2+
3+
The **SQLAlchemy Adapter** (`nl2sql-adapter-sqlalchemy`) provides a helper base class for building adapters for any SQL database supported by SQLAlchemy.
4+
5+
## Base Class: `BaseSQLAlchemyAdapter`
6+
7+
Constructs a robust adapter by wrapping standard SQLAlchemy components.
8+
9+
```python
10+
from nl2sql_sqlalchemy_adapter import BaseSQLAlchemyAdapter
11+
```
12+
13+
### Features
14+
15+
| Feature | Description |
16+
| :--- | :--- |
17+
| **Automatic Schema** | Uses `sqlalchemy.inspect` to reflect tables, columns, and foreign keys automatically. |
18+
| **Automatic Stats** | Runs optimized generic SQL queries to fetch `min`, `max`, `null_percentage`, and `distinct_count`. |
19+
| **Connection Pooling** | Manages engine lifecycle and connection pools. |
20+
| **Transaction Safety** | Implements generic `dry_run` using transaction rollbacks. |
21+
22+
### Required Overrides
23+
24+
#### `construct_uri(args: Dict[str, Any]) -> str`
25+
26+
Converts a configuration dictionary into a SQLAlchemy connection string.
27+
28+
* **Args**: `args` - The `connection` dictionary from `datasources.yaml`.
29+
* **Returns**: A valid URL (e.g., `postgresql://...`).
30+
31+
### Optional Overrides
32+
33+
#### `connect()`
34+
35+
Override to provide custom connection arguments (e.g., timeouts, isolation levels).
36+
37+
#### `get_dialect() -> str`
38+
39+
Returns the logical dialect name. Defaults to the engine driver name.
40+
41+
#### `explain(sql: str)` / `cost_estimate(sql: str)`
42+
43+
The base class provides stubs. Override these to implement database-specific optimization logic (e.g., `EXPLAIN ANALYZE`).

docs/adapters/sqlite.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,12 @@ connection:
3232
| **Dry Run** | `EXPLAIN QUERY PLAN` | Validates parsing (rudimentary). |
3333
| **Costing** | Stubbed | Returns default cost=1.0. |
3434

35+
### Optimization Details
36+
37+
* **Dry Run**: Uses `EXPLAIN QUERY PLAN {sql}`. If this command succeeds, the SQL syntax is valid.
38+
* **Explain**: Currently stubbed (returns a simple message) as SQLite's explain output is not in a standardized, easily parsable format like JSON or XML.
39+
* **Cost Estimate**: Stubbed. Returns a fixed cost of `1.0` and `10` estimated rows, as SQLite does not expose cost metrics comfortably.
40+
3541
## Hints
3642

3743
* **Concurrency**: SQLite is poor at high concurrency. Use for **Lite Mode** or single-user testing only.

0 commit comments

Comments
 (0)