You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor: reorganize documentation structure and enhance navigation
- Updated mkdocs.yml to streamline the navigation structure, consolidating sections for clarity and ease of access.
- Introduced new documentation files for Getting Started, Architecture, and API Overview, providing comprehensive guides for users.
- Enhanced README.md to better reflect the system's capabilities and architecture, including a clearer description of the NL2SQL engine's features.
- Added detailed specifications for pipeline nodes and agent architecture, improving the overall documentation quality and usability.
Copy file name to clipboardExpand all lines: README.md
+63-32Lines changed: 63 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,58 +1,88 @@
1
-
# Enterprise NL2SQL Engine
1
+
# NL2SQL Engine
2
2
3
-
> **A Production-Grade Natural Language to SQL Engine built on the principles of Zero Trust and Deterministic Execution.**
3
+
> **Production-grade Natural Language → SQL runtime with deterministic orchestration.**
4
4
5
-
This platform treats "Text-to-SQL" not as a prompt engineering problem, but as a **Distributed Systems** problem. It replaces fragile one-shot generation with a robust, compiled pipeline that bridges the gap between Unstructured Intention (User Language) and Structured Execution (SQL Databases).
5
+
NL2SQL treats text-to-SQLas a **distributed systems** problem. The engine compiles a user query into a validated plan, executes via adapters, and aggregates results through a graph-based pipeline.
6
6
7
7
---
8
8
9
+
## 🧭 What you get
10
+
11
+
- Graph-based orchestration (`LangGraph`) with explicit state (`GraphState`)
12
+
- Deterministic planning and validation before SQL generation
The architecture is composed of three distinct planes, ensuring separation of concerns and failure isolation.
18
+
The runtime is organized around a LangGraph orchestration pipeline and supporting registries. It is designed for fault isolation and deterministic execution.
**Responsibility**: Reasoning, Planning, and Orchestration.
16
47
17
-
***Agentic Graph**: Implemented as a Directed Cyclic Graph (LangGraph) to enable "Refinement Loops". If a plan fails validation, the system self-corrects.
18
-
***State Management**: Deterministic state transitions ensure auditability and reproducibility of every decision.
48
+
***Agentic Graph**: Implemented as a Directed Cyclic Graph (LangGraph) to enable refinement loops. If a plan fails validation, the system self-corrects.
49
+
***State Management**: Shared `GraphState` ensures auditability and reproducibility of every decision.
19
50
20
51
### 2. The Security Plane (The Firewall)
21
52
22
53
**Responsibility**: Invariants Enforcement.
23
54
24
-
***Valid-by-Construction**: The LLM *never* executes SQL directly. It generates an **Abstract Syntax Tree (AST)**.
25
-
***Static Analysis**: The [Validator Node](docs/core/nodes.md#4-logical-validator) enforces **Row-Level Security (RLS)** and type safety on the AST *before* compilation.
26
-
***Intent Classification**: Upstream detection of adversarial prompts (Jailbreaks/Injections).
55
+
***Valid-by-Construction**: The LLM generates an **Abstract Syntax Tree (AST)** rather than executing SQL.
56
+
***Static Analysis**: The [Logical Validator](docs/agents/nodes.md) enforces RBAC and schema constraints before SQL generation.
27
57
28
58
### 3. The Data Plane (The Sandbox)
29
59
30
60
**Responsibility**: Semantic Search and Execution.
31
61
32
-
***Blast Radius Isolation**: SQL Drivers (ODBC/C-Ext) run in a dedicated **[Sandboxed Process Pool](docs/architecture/decisions/ADR-001_sandboxed_execution.md)**. A segfault in a driver kills a disposable worker, not the Agent.
33
-
***Partitioned Retrieval**: The [Orchestrator](docs/core/indexing.md)uses Partitioned MMR to inject only relevant schema context, preventing context window overflow.
62
+
***Blast Radius Isolation**: SQL drivers run in a dedicated **[Sandboxed Process Pool](docs/adr/adr-001-sandboxed-execution.md)**. A segfault in a driver kills a disposable worker, not the Agent.
63
+
***Partitioned Retrieval**: The [Schema Store + Retrieval](docs/schema/store.md)flow injects relevant schema context, preventing context window overflow.
34
64
35
65
### 4. The Reliability Plane (The Guard)
36
66
37
67
**Responsibility**: Fault Tolerance and Stability.
38
68
39
-
***Layered Defense**: A combination of **[Retries, Circuit Breakers, and Sandboxing](docs/core/reliability.md)**ensures the system stays up even when LLMs or Databases go down.
69
+
***Layered Defense**: A combination of **[Circuit Breakers](docs/observability/error-handling.md)** and **[Sandboxing](docs/execution/sandbox.md)**keeps the system stable during outages.
40
70
***Fail-Fast**: We stop processing immediately if a dependency is unresponsive, preserving resources.
41
71
42
72
### 5. The Observability Plane (The Watchtower)
43
73
44
74
**Responsibility**: Visibility, Forensics, and Compliance.
45
75
46
-
***Full-Stack Telemetry**: Native [OpenTelemetry](docs/ops/observability.md) integration provides distributed tracing (Jaeger) and metrics (Prometheus) for every node execution.
47
-
***Forensic Audit Logs**: A tamper-evident, persistent [Audit Log](docs/ops/observability.md#3-persistent-audit-log) records every AI decision (Prompt/Response/Reasoning) for compliance and debugging.
76
+
***Full-Stack Telemetry**: Native [OpenTelemetry](docs/observability/stack.md) integration provides distributed tracing (Jaeger) and metrics (Prometheus) for every node execution.
77
+
***Forensic Audit Logs**: A persistent [Audit Log](docs/observability/stack.md) records AI decisions for compliance and debugging.
48
78
49
79
---
50
80
51
81
## 📐 Architectural Invariants
52
82
53
83
| Invariant | Rationale | Mechanism |
54
84
| :--- | :--- | :--- |
55
-
|**No Unvalidated SQL**| Prevent Hullucinations & Data Leaks| All plans pass through `LogicalValidator` (AST) + `PhysicalValidator`(Dry Run) before execution. |
85
+
|**No Unvalidated SQL**| Prevent hallucinations & data leaks| All plans pass through `LogicalValidator` (AST). `PhysicalValidator`exists but is not wired into the default SQL subgraph. |
56
86
|**Zero Shared State**| Crash Safety | Execution happens in isolated processes; no shared memory with the Control Plane. |
Adapters are discovered via Python entry points (`nl2sql.adapters`) and registered in `DatasourceRegistry`. All adapters implement the `DatasourceAdapterProtocol` and return a standardized `ResultFrame`.
Execution nodes resolve the executor via `ExecutorRegistry`, which maps datasource capabilities to executor implementations (e.g., `SqlExecutorService` for SQL).
46
+
47
+
## Source references
48
+
49
+
- Adapter protocol and contracts: `packages/adapter-sdk/src/nl2sql_adapter_sdk/protocols.py`, `packages/adapter-sdk/src/nl2sql_adapter_sdk/contracts.py`
NL2SQL implements agent behavior as **LangGraph subgraphs**. The primary subgraph today is the SQL agent, built by `build_sql_agent_graph()`. Each node is a class with a `__call__` method that consumes and returns Pydantic state models (`SubgraphExecutionState`).
4
+
5
+
## SQL Agent subgraph
6
+
7
+
The SQL agent subgraph is built in `nl2sql.pipeline.subgraphs.sql_agent.build_sql_agent_graph`. It orchestrates schema retrieval, planning, validation, generation, execution, and optional refinement.
0 commit comments