You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: expand documentation and architecture details for NL2SQL
- Updated mkdocs.yml to enhance navigation, adding new sections for Indexing and Extensions, and reorganizing existing content for clarity.
- Introduced a new glossary.md file to define core concepts and terminology used throughout the documentation.
- Enhanced getting-started.md with instructions for indexing datasource schemas before query execution.
- Added detailed architecture documents for pipeline, indexing, and various nodes, improving understanding of system components and their interactions.
- Included multiple ADRs to capture architectural decisions related to chunking strategy, schema store design, adapter abstraction, deterministic planning, and artifact storage.
Copy file name to clipboardExpand all lines: docs/adapters/architecture.md
+30-8Lines changed: 30 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Plugin / Adapter Architecture
2
2
3
-
Adapters are discovered via Python entry points (`nl2sql.adapters`) and registered in `DatasourceRegistry`. All adapters implement the `DatasourceAdapterProtocol`and return a standardized `ResultFrame`.
3
+
Adapters integrate NL2SQL with external datasources. Each adapter implements a **protocol contract**, is discovered via **Python entry points**, and is registered in the `DatasourceRegistry`.
Execution nodes resolve the executor via `ExecutorRegistry`, which maps datasource capabilities to executor implementations (e.g., `SqlExecutorService` for SQL).
43
+
Adapters expose capabilities (e.g., `supports_sql`, `supports_schema_introspection`). These capabilities drive:
44
+
45
+
-**Subgraph selection** (`resolve_subgraph()` in routing).
The control graph can resolve multiple datasources for a single user query. `DecomposerNode` produces sub-queries scoped to individual datasources. Each sub-query is then routed to a subgraph that matches its adapter capabilities.
60
+
61
+
## Extensibility model
62
+
63
+
To add a new adapter:
64
+
65
+
1. Implement `DatasourceAdapterProtocol` (or extend a base adapter).
66
+
2. Publish the adapter class as an `nl2sql.adapters` entry point.
67
+
3. Configure the datasource in `configs/datasources.yaml`.
46
68
47
69
## Source references
48
70
49
-
- Adapter protocol and contracts: `packages/adapter-sdk/src/nl2sql_adapter_sdk/protocols.py`, `packages/adapter-sdk/src/nl2sql_adapter_sdk/contracts.py`
Accepted (implemented in `SqliteSchemaStore` and `InMemorySchemaStore`).
6
+
7
+
## Context
8
+
9
+
The system needs an authoritative, versioned view of each datasource schema. Vector indexes may drift or be stale, so planning must reference a canonical schema snapshot.
10
+
11
+
## Decision
12
+
13
+
Store schema snapshots with **deterministic fingerprints**:
14
+
15
+
-`SchemaContract` content is hashed to produce a stable fingerprint.
16
+
- Snapshots are versioned using timestamp + fingerprint prefix.
17
+
- Older versions are evicted beyond a configurable maximum.
18
+
19
+
Persistent storage is provided by a SQLite-backed schema store, with an in-memory alternative for testing.
20
+
21
+
## Consequences
22
+
23
+
- Schema versions are stable and deduplicated.
24
+
- Retrieval uses authoritative snapshots even if vector chunks drift.
25
+
- The system can enforce version mismatch policies.
Accepted (implemented in `ArtifactStore` and executor services).
6
+
7
+
## Context
8
+
9
+
Query execution results need to be persisted for aggregation and downstream usage. Persisting raw results in memory would be expensive and non-durable for multi-step DAGs.
10
+
11
+
## Decision
12
+
13
+
Persist execution results as Parquet artifacts:
14
+
15
+
- Adapters return `ResultFrame` objects.
16
+
-`SqlExecutorService` writes results to an `ArtifactStore`.
17
+
- Aggregation reads artifacts and applies combine/post operations.
18
+
19
+
Backends are pluggable (`local`, `s3`, `adls`).
20
+
21
+
## Consequences
22
+
23
+
- Results are durable across pipeline stages.
24
+
- Aggregation operates on persisted artifacts, reducing memory pressure.
25
+
- Backends can be swapped without changing executor logic.
26
+
27
+
## Source references
28
+
29
+
- Artifact store base: `packages/core/src/nl2sql/execution/artifacts/base.py`
30
+
- Local store: `packages/core/src/nl2sql/execution/artifacts/local_store.py`
0 commit comments