|
1 | | -# NL2SQL Platform |
| 1 | +# Enterprise NL2SQL Engine |
2 | 2 |
|
3 | | -An enterprise-grade **Natural Language to SQL** engine built on an Agentic Graph Architecture. |
| 3 | +> **A Production-Grade Natural Language to SQL Engine built on the principles of Zero Trust and Deterministic Execution.** |
4 | 4 |
|
5 | | -## 🚀 Overview |
| 5 | +This platform treats "Text-to-SQL" not as a prompt engineering problem, but as a **Distributed Systems** problem. It replaces fragile one-shot generation with a robust, compiled pipeline that bridges the gap between Unstructured Intention (User Language) and Structured Execution (SQL Databases). |
6 | 6 |
|
7 | | -This platform transforms complex natural language questions into safe, optimized, and executable SQL queries across multiple database engines (PostgreSQL, MySQL, MSSQL, SQLite). It uses a **Directed Cyclic Graph** (LangGraph) to orchestrate planning, validation, generation, and self-correction. |
| 7 | +--- |
8 | 8 |
|
9 | | -### Key Features |
| 9 | +## 🏗️ System Topology |
10 | 10 |
|
11 | | -* **🛡️ Security First**: Strict AST Validation, **Intent Analysis** (Jailbreak Detection), RBAC Policies, and Read-Only enforcement. |
12 | | -* **🧠 Agentic Reasoning**: Self-correcting nodes that fix SQL errors automatically. |
13 | | -* **🔌 Polyglot**: First-class support for Postgres, MySQL, MSSQL, and SQLite. |
14 | | -* **⚡ Smart Routing**: Decomposes complex queries into sub-queries for multi-datasource environments. |
15 | | -* **🔄 Reliability**: Built-in **Exponential Backoff** and **Circuit Breakers** to handle transient failures gracefully. |
| 11 | +The architecture is composed of three distinct planes, ensuring separation of concerns and failure isolation. |
16 | 12 |
|
17 | | -## 🏁 Quick Demo |
| 13 | +### 1. The Control Plane (The Graph) |
18 | 14 |
|
19 | | -Explore the platform's capabilities with our interactive setup wizard. You can choose between **Lite Mode** (in-memory, no deps) or **Docker Mode** (real databases). |
| 15 | +**Responsibility**: Reasoning, Planning, and Orchestration. |
20 | 16 |
|
21 | | -### 1. Lite Mode (Fastest) uses SQLite |
| 17 | +* **Agentic Graph**: Implemented as a Directed Cyclic Graph (LangGraph) to enable "Refinement Loops". If a plan fails validation, the system self-corrects. |
| 18 | +* **State Management**: Deterministic state transitions ensure auditability and reproducibility of every decision. |
22 | 19 |
|
23 | | -Perfect for a standardized, local environment without needing Docker. |
| 20 | +### 2. The Security Plane (The Firewall) |
24 | 21 |
|
25 | | -```bash |
26 | | -nl2sql setup --demo |
27 | | -``` |
| 22 | +**Responsibility**: Invariants Enforcement. |
28 | 23 |
|
29 | | -### 2. Docker Mode (Full Fidelity) uses Postgres |
| 24 | +* **Valid-by-Construction**: The LLM *never* executes SQL directly. It generates an **Abstract Syntax Tree (AST)**. |
| 25 | +* **Static Analysis**: The [Validator Node](docs/core/nodes.md#4-logical-validator) enforces **Row-Level Security (RLS)** and type safety on the AST *before* compilation. |
| 26 | +* **Intent Classification**: Upstream detection of adversarial prompts (Jailbreaks/Injections). |
30 | 27 |
|
31 | | -Spins up real orchestrator and database containers for a production-like test. |
| 28 | +### 3. The Data Plane (The Sandbox) |
32 | 29 |
|
33 | | -```bash |
34 | | -nl2sql setup --demo --docker |
35 | | -``` |
| 30 | +**Responsibility**: Semantic Search and Execution. |
36 | 31 |
|
37 | | -## 🛠️ Installation |
| 32 | +* **Blast Radius Isolation**: SQL Drivers (ODBC/C-Ext) run in a dedicated **[Sandboxed Process Pool](docs/architecture/decisions/ADR-001_sandboxed_execution.md)**. A segfault in a driver kills a disposable worker, not the Agent. |
| 33 | +* **Partitioned Retrieval**: The [Orchestrator](docs/core/indexing.md) uses Partitioned MMR to inject only relevant schema context, preventing context window overflow. |
38 | 34 |
|
39 | | -This is a monorepo. To develop or run the platform from source: |
| 35 | +--- |
40 | 36 |
|
41 | | -### Prerequisites |
42 | | - |
43 | | -* Python 3.10+ |
44 | | -* Docker & Docker Compose (optional, for Integration Tests) |
| 37 | +## 📐 Architectural Invariants |
45 | 38 |
|
46 | | -### Setup |
| 39 | +| Invariant | Rationale | Mechanism | |
| 40 | +| :--- | :--- | :--- | |
| 41 | +| **No Unvalidated SQL** | Prevent Hullucinations & Data Leaks | All plans pass through `LogicalValidator` (AST) + `PhysicalValidator` (Dry Run) before execution. | |
| 42 | +| **Zero Shared State** | Crash Safety | Execution happens in isolated processes; no shared memory with the Control Plane. | |
| 43 | +| **Fail-Fast** | Reliability | Circuit Breakers and Strict Timeouts prevent cascading failures (Retry Storms). | |
| 44 | +| **Determinism** | Debuggability | Temperature-0 generation + Strict Typing (Pydantic) for all LLM outputs. | |
47 | 45 |
|
48 | | -1. **Clone and Install**: |
| 46 | +--- |
49 | 47 |
|
50 | | - ```bash |
51 | | - git clone https://github.com/nadeem4/nl2sql.git |
52 | | - cd nl2sql |
53 | | - |
54 | | - # Create virtual environment |
55 | | - python -m venv venv |
56 | | - source venv/bin/activate # or .\venv\Scripts\activate on Windows |
57 | | - |
58 | | - # Install Core and CLI |
59 | | - pip install -e packages/core |
60 | | - pip install -e packages/adapter-sdk |
61 | | - pip install -e packages/cli |
62 | | - pip install -e packages/adapters/postgres # Install specific adapters as needed |
63 | | - ``` |
| 48 | +## 🚀 Quick Start |
64 | 49 |
|
65 | | -2. **Verify Installation**: |
| 50 | +### Prerequisites |
66 | 51 |
|
67 | | - ```bash |
68 | | - nl2sql --help |
69 | | - ``` |
| 52 | +* Python 3.10+ |
| 53 | +* Docker (Optional, for full integration environment) |
70 | 54 |
|
71 | | -## 🏗️ Architecture |
| 55 | +### 1. Installation |
72 | 56 |
|
73 | | -The system is composed of specialized Neural Nodes: |
| 57 | +```bash |
| 58 | +git clone https://github.com/nadeem4/nl2sql.git |
| 59 | +cd nl2sql |
74 | 60 |
|
75 | | -1. **Semantic Analysis**: Intent classification and entity extraction. |
76 | | -2. **Decomposer (Router)**: Splits complex queries and routes them to the correct datasource. |
77 | | -3. **Planner**: Generates a database-agnostic Abstract Syntax Tree (AST). |
78 | | -4. **Validator**: Enforces security policies and logical correctness on the AST. |
79 | | -5. **Generator**: Compiles AST to dialect-specific SQL. |
80 | | -6. **Executor**: Runs the query in a sandboxed environment. |
81 | | -7. **Refiner**: Self-corrects errors by analyzing stack traces and feedback. |
82 | | -8. **Aggregator**: Synthesizes results from multiple sub-queries. |
| 61 | +# Set up environment |
| 62 | +python -m venv venv |
| 63 | +source venv/bin/activate |
83 | 64 |
|
84 | | -See [Architecture Documentation](docs/core/architecture.md) for details. |
| 65 | +# Install Core Engine & CLI |
| 66 | +pip install -e packages/core |
| 67 | +pip install -e packages/cli |
| 68 | +pip install -e packages/adapter-sdk |
| 69 | +``` |
85 | 70 |
|
86 | | -## 📚 Documentation |
| 71 | +### 2. Run Demo (Lite Mode) |
87 | 72 |
|
88 | | -Full documentation is available in the `docs/` directory. |
| 73 | +Boot the engine with an in-memory SQLite database (No Docker required). |
89 | 74 |
|
90 | 75 | ```bash |
91 | | -pip install -r requirements-docs.txt |
92 | | -mkdocs serve |
| 76 | +nl2sql setup --demo |
93 | 77 | ``` |
94 | 78 |
|
95 | | -## 📂 Repository Structure |
| 79 | +--- |
| 80 | + |
| 81 | +## 📚 Technical Documentation |
96 | 82 |
|
97 | | -* `packages/core`: The core graph engine, nodes, and state management. |
98 | | -* `packages/cli`: Command-line interface tool. |
99 | | -* `packages/adapter-sdk`: SDK for building custom database adapters. |
100 | | -* `configs/`: Configuration files (Policies, Datasources). |
101 | | -* `docs/`: MkDocs source files. |
| 83 | +* **[System Architecture](docs/core/architecture.md)**: Deep dive into the Control, Security, and Data planes. |
| 84 | +* **[Component Reference](docs/core/nodes.md)**: Detailed specs for Planner, Validator, Executor, etc. |
| 85 | +* **[Security Model](docs/safety/security.md)**: Defense-in-depth strategy against prompt injection and unauthorized access. |
| 86 | +* **[ADR-001: Sandboxed Execution](docs/architecture/decisions/ADR-001_sandboxed_execution.md)**: Decision record for the Process Pool architecture. |
| 87 | + |
| 88 | +--- |
| 89 | + |
| 90 | +## 📦 Repository Structure |
| 91 | + |
| 92 | +```text |
| 93 | +packages/ |
| 94 | +├── core/ # The Engine (Graph, State, Logic) |
| 95 | +├── cli/ # Terminal Interface & Ops Tools |
| 96 | +├── adapter-sdk/ # Interface Contract for new Databases |
| 97 | +└── adapters/ # Official Dialects (Postgres, MSSQL, MySQL) |
| 98 | +configs/ # Runtime Configuration (Policies, Prompts) |
| 99 | +docs/ # Architecture & Operations Manual |
| 100 | +``` |
0 commit comments