nadeem4
diff --git a/‎README.md‎
Lines changed: 63 additions & 32 deletions b/‎README.md‎
Lines changed: 63 additions & 32 deletions
diff --git a/‎docs/adapters/architecture.md‎
Lines changed: 52 additions & 0 deletions b/‎docs/adapters/architecture.md‎
Lines changed: 52 additions & 0 deletions
diff --git a/‎docs/adr/adr-001-sandboxed-execution.md‎
Lines changed: 34 additions & 0 deletions b/‎docs/adr/adr-001-sandboxed-execution.md‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎docs/adr/adr-002-circuit-breakers.md‎
Lines changed: 35 additions & 0 deletions b/‎docs/adr/adr-002-circuit-breakers.md‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎docs/adr/index.md‎
Lines changed: 6 additions & 0 deletions b/‎docs/adr/index.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/agents/architecture.md‎
Lines changed: 38 additions & 0 deletions b/‎docs/agents/architecture.md‎
Lines changed: 38 additions & 0 deletions
@@ -1,58 +1,88 @@
-# Enterprise NL2SQL Engine
+# NL2SQL Engine
 
-> **A Production-Grade Natural Language to SQL Engine built on the principles of Zero Trust and Deterministic Execution.**
+> **Production-grade Natural Language → SQL runtime with deterministic orchestration.**
 
-This platform treats "Text-to-SQL" not as a prompt engineering problem, but as a **Distributed Systems** problem. It replaces fragile one-shot generation with a robust, compiled pipeline that bridges the gap between Unstructured Intention (User Language) and Structured Execution (SQL Databases).
+NL2SQL treats text-to-SQL as a **distributed systems** problem. The engine compiles a user query into a validated plan, executes via adapters, and aggregates results through a graph-based pipeline.
 
 ---
 
+## 🧭 What you get
+
+- Graph-based orchestration (`LangGraph`) with explicit state (`GraphState`)
+- Deterministic planning and validation before SQL generation
+- Adapter-based execution with sandbox isolation
+- Observability hooks (metrics, logs, audit events)
+
 ## 🏗️ System Topology
 
-The architecture is composed of three distinct planes, ensuring separation of concerns and failure isolation.
+The runtime is organized around a LangGraph orchestration pipeline and supporting registries. It is designed for fault isolation and deterministic execution.
+
+```mermaid
+flowchart TD
+    User[User Query] --> Resolver[DatasourceResolverNode]
+    Resolver --> Decomposer[DecomposerNode]
+    Decomposer --> Planner[GlobalPlannerNode]
+    Planner --> Router[Layer Router]
+
+    subgraph SQLAgent["SQL Agent Subgraph"]
+        Schema[SchemaRetrieverNode] --> AST[ASTPlannerNode]
+        AST -->|ok| Logical[LogicalValidatorNode]
+        AST -->|retry| Retry[retry_node]
+        Logical -->|ok| Generator[GeneratorNode]
+        Logical -->|retry| Retry
+        Generator --> Executor[ExecutorNode]
+        Retry --> Refiner[RefinerNode]
+        Refiner --> AST
+    end
+
+    Router --> Schema
+    Executor --> Router
+    Router --> Aggregator[EngineAggregatorNode]
+    Aggregator --> Synthesizer[AnswerSynthesizerNode]
+```
 
 ### 1. The Control Plane (The Graph)
 
 **Responsibility**: Reasoning, Planning, and Orchestration.
 
-* **Agentic Graph**: Implemented as a Directed Cyclic Graph (LangGraph) to enable "Refinement Loops". If a plan fails validation, the system self-corrects.
-* **State Management**: Deterministic state transitions ensure auditability and reproducibility of every decision.
+* **Agentic Graph**: Implemented as a Directed Cyclic Graph (LangGraph) to enable refinement loops. If a plan fails validation, the system self-corrects.
+* **State Management**: Shared `GraphState` ensures auditability and reproducibility of every decision.
 
 ### 2. The Security Plane (The Firewall)
 
 **Responsibility**: Invariants Enforcement.
 
-* **Valid-by-Construction**: The LLM *never* executes SQL directly. It generates an **Abstract Syntax Tree (AST)**.
-* **Static Analysis**: The [Validator Node](docs/core/nodes.md#4-logical-validator) enforces **Row-Level Security (RLS)** and type safety on the AST *before* compilation.
-* **Intent Classification**: Upstream detection of adversarial prompts (Jailbreaks/Injections).
+* **Valid-by-Construction**: The LLM generates an **Abstract Syntax Tree (AST)** rather than executing SQL.
+* **Static Analysis**: The [Logical Validator](docs/agents/nodes.md) enforces RBAC and schema constraints before SQL generation.
 
 ### 3. The Data Plane (The Sandbox)
 
 **Responsibility**: Semantic Search and Execution.
 
-* **Blast Radius Isolation**: SQL Drivers (ODBC/C-Ext) run in a dedicated **[Sandboxed Process Pool](docs/architecture/decisions/ADR-001_sandboxed_execution.md)**. A segfault in a driver kills a disposable worker, not the Agent.
-* **Partitioned Retrieval**: The [Orchestrator](docs/core/indexing.md) uses Partitioned MMR to inject only relevant schema context, preventing context window overflow.
+* **Blast Radius Isolation**: SQL drivers run in a dedicated **[Sandboxed Process Pool](docs/adr/adr-001-sandboxed-execution.md)**. A segfault in a driver kills a disposable worker, not the Agent.
+* **Partitioned Retrieval**: The [Schema Store + Retrieval](docs/schema/store.md) flow injects relevant schema context, preventing context window overflow.
 
 ### 4. The Reliability Plane (The Guard)
 
 **Responsibility**: Fault Tolerance and Stability.
 
-* **Layered Defense**: A combination of **[Retries, Circuit Breakers, and Sandboxing](docs/core/reliability.md)** ensures the system stays up even when LLMs or Databases go down.
+* **Layered Defense**: A combination of **[Circuit Breakers](docs/observability/error-handling.md)** and **[Sandboxing](docs/execution/sandbox.md)** keeps the system stable during outages.
 * **Fail-Fast**: We stop processing immediately if a dependency is unresponsive, preserving resources.
 
 ### 5. The Observability Plane (The Watchtower)
 
 **Responsibility**: Visibility, Forensics, and Compliance.
 
-* **Full-Stack Telemetry**: Native [OpenTelemetry](docs/ops/observability.md) integration provides distributed tracing (Jaeger) and metrics (Prometheus) for every node execution.
-* **Forensic Audit Logs**: A tamper-evident, persistent [Audit Log](docs/ops/observability.md#3-persistent-audit-log) records every AI decision (Prompt/Response/Reasoning) for compliance and debugging.
+* **Full-Stack Telemetry**: Native [OpenTelemetry](docs/observability/stack.md) integration provides distributed tracing (Jaeger) and metrics (Prometheus) for every node execution.
+* **Forensic Audit Logs**: A persistent [Audit Log](docs/observability/stack.md) records AI decisions for compliance and debugging.
 
 ---
 
 ## 📐 Architectural Invariants
 
 | Invariant | Rationale | Mechanism |
 | :--- | :--- | :--- |
-| **No Unvalidated SQL** | Prevent Hullucinations & Data Leaks | All plans pass through `LogicalValidator` (AST) + `PhysicalValidator` (Dry Run) before execution. |
+| **No Unvalidated SQL** | Prevent hallucinations & data leaks | All plans pass through `LogicalValidator` (AST). `PhysicalValidator` exists but is not wired into the default SQL subgraph. |
 | **Zero Shared State** | Crash Safety | Execution happens in isolated processes; no shared memory with the Control Plane. |
 | **Fail-Fast** | Reliability | Circuit Breakers and Strict Timeouts prevent cascading failures (Retry Storms). |
 | **Determinism** | Debuggability | Temperature-0 generation + Strict Typing (Pydantic) for all LLM outputs. |
@@ -64,7 +94,8 @@ The architecture is composed of three distinct planes, ensuring separation of co
 ### Prerequisites
 
 * Python 3.10+
-* Docker (Optional, for full integration environment)
+* A configured datasource (`configs/datasources.yaml`)
+* A configured LLM (`configs/llm.yaml`)
 
 ### 1. Installation
 
@@ -76,30 +107,31 @@ cd nl2sql
 python -m venv venv
 source venv/bin/activate
 
-# Install Core Engine & CLI
+# Install core engine and adapter SDK
 pip install -e packages/core
-pip install -e packages/cli
 pip install -e packages/adapter-sdk
 ```
 
-### 2. Run Demo (Lite Mode)
+### 2. Run a query (Python API)
 
-Boot the engine with an in-memory SQLite database (No Docker required).
+```python
+from nl2sql.context import NL2SQLContext
+from nl2sql.pipeline.runtime import run_with_graph
 
-```bash
-nl2sql setup --demo
-```
+ctx = NL2SQLContext()
+result = run_with_graph(ctx, "Top 5 customers by revenue last quarter?")
 
----
+print(result.get("final_answer"))
+```
 
-## 📚 Technical Documentation
+## 📚 Documentation
 
-* **[System Architecture](docs/core/architecture.md)**: Deep dive into the Control, Security, and Data planes.
-* **[Component Reference](docs/core/nodes.md)**: Detailed specs for Planner, Validator, Executor, etc.
-* **[Security Model](docs/safety/security.md)**: Defense-in-depth strategy against prompt injection and unauthorized access.
-* **[Security Model](docs/safety/security.md)**: Defense-in-depth strategy against prompt injection and unauthorized access.
-* **[Reliability & Fault Tolerance](docs/core/reliability.md)**: Guide to Circuit Breakers, Sandbox isolation, and Recovery strategies.
-* **[Observability & Operations](docs/ops/observability.md)**: Configuring OpenTelemetry, Logging, and Audit Trails.
+- **[System Architecture](docs/architecture/high-level.md)**: runtime topology and core flows
+- **[Agent Nodes](docs/agents/nodes.md)**: node-by-node specs and responsibilities
+- **[Schema Store + Retrieval](docs/schema/store.md)**: schema snapshots and vector retrieval
+- **[Execution Sandbox](docs/execution/sandbox.md)**: process isolation and failures
+- **[Observability](docs/observability/stack.md)**: metrics, logging, audit events
+  
 
 ---
 
@@ -108,7 +140,6 @@ nl2sql setup --demo
 ```text
 packages/
 ├── core/               # The Engine (Graph, State, Logic)
-├── cli/                # Terminal Interface & Ops Tools
 ├── adapter-sdk/        # Interface Contract for new Databases
 └── adapters/           # Official Dialects (Postgres, MSSQL, MySQL)
 configs/                # Runtime Configuration (Policies, Prompts)
 
@@ -0,0 +1,52 @@
+# Plugin / Adapter Architecture
+
+Adapters are discovered via Python entry points (`nl2sql.adapters`) and registered in `DatasourceRegistry`. All adapters implement the `DatasourceAdapterProtocol` and return a standardized `ResultFrame`.
+
+## Discovery and registration
+
+```mermaid
+flowchart TD
+    Config[configs/datasources.yaml] --> Registry[DatasourceRegistry]
+    Registry --> Discovery[discover_adapters()]
+    Discovery --> EntryPoints[entry_points('nl2sql.adapters')]
+    EntryPoints --> AdapterClass[Adapter Class]
+    AdapterClass --> AdapterInstance[DatasourceAdapterProtocol instance]
+```
+
+## Core contracts
+
+```mermaid
+classDiagram
+    class DatasourceAdapterProtocol {
+        +capabilities() Set
+        +connect()
+        +fetch_schema_snapshot()
+        +execute(AdapterRequest) ResultFrame
+        +get_dialect() str
+    }
+    class AdapterRequest {
+        +plan_type
+        +payload
+        +parameters
+        +limits
+        +trace_id
+    }
+    class ResultFrame {
+        +success
+        +columns
+        +rows
+        +row_count
+        +error
+    }
+```
+
+## Executor integration
+
+Execution nodes resolve the executor via `ExecutorRegistry`, which maps datasource capabilities to executor implementations (e.g., `SqlExecutorService` for SQL).
+
+## Source references
+
+- Adapter protocol and contracts: `packages/adapter-sdk/src/nl2sql_adapter_sdk/protocols.py`, `packages/adapter-sdk/src/nl2sql_adapter_sdk/contracts.py`
+- Adapter discovery: `packages/core/src/nl2sql/datasources/discovery.py`
+- Datasource registry: `packages/core/src/nl2sql/datasources/registry.py`
+- Executor registry: `packages/core/src/nl2sql/execution/executor/registry.py`
@@ -0,0 +1,34 @@
+# ADR-001: Sandboxed Execution and Indexing
+
+## Status
+
+Accepted (implemented in `SandboxManager`).
+
+## Context
+
+SQL drivers and indexing operations can crash or block the main process. The runtime needs isolation between orchestration and execution.
+
+## Decision
+
+Use **ProcessPoolExecutor** pools managed by `SandboxManager`:
+
+- `get_execution_pool()` for latency-sensitive SQL execution.
+- `get_indexing_pool()` for background indexing tasks.
+
+All sandbox calls are wrapped in `execute_in_sandbox()` to standardize timeouts and error handling.
+
+## Consequences
+
+- Worker crashes and timeouts are contained and converted into structured errors.
+- Execution concurrency is bounded by settings (`sandbox_exec_workers`, `sandbox_index_workers`).
+
+```mermaid
+flowchart TD
+    Orchestrator[run_with_graph] --> Sandbox[SandboxManager]
+    Sandbox --> ExecPool[get_execution_pool]
+    Sandbox --> IndexPool[get_indexing_pool]
+```
+
+## Source references
+
+- `packages/core/src/nl2sql/common/sandbox.py`
@@ -0,0 +1,35 @@
+# ADR-002: Circuit Breakers for External Dependencies
+
+## Status
+
+Accepted (implemented in `resilience.py`).
+
+## Context
+
+LLM providers, vector stores, and databases can experience outages. Unbounded retries degrade system reliability.
+
+## Decision
+
+Use `pybreaker` circuit breakers for each dependency class:
+
+- `LLM_BREAKER`
+- `VECTOR_BREAKER`
+- `DB_BREAKER`
+
+All breakers are created with `create_breaker()` and emit log events via `ObservabilityListener`.
+
+## Consequences
+
+- Fail-fast behavior when dependencies are down.
+- Retry loops in the SQL agent only apply to retryable errors, not open circuits.
+
+```mermaid
+flowchart TD
+    Call[Dependency Call] --> Breaker[create_breaker()]
+    Breaker -->|closed| Execute[Execute]
+    Breaker -->|open| FailFast[Fail Fast]
+```
+
+## Source references
+
+- `packages/core/src/nl2sql/common/resilience.py`
@@ -0,0 +1,6 @@
+# Architecture Decision Records
+
+This section captures architectural decisions that are reflected in the current codebase.
+
+- `adr-001-sandboxed-execution.md`
+- `adr-002-circuit-breakers.md`
@@ -0,0 +1,38 @@
+# Agent Architecture
+
+NL2SQL implements agent behavior as **LangGraph subgraphs**. The primary subgraph today is the SQL agent, built by `build_sql_agent_graph()`. Each node is a class with a `__call__` method that consumes and returns Pydantic state models (`SubgraphExecutionState`).
+
+## SQL Agent subgraph
+
+The SQL agent subgraph is built in `nl2sql.pipeline.subgraphs.sql_agent.build_sql_agent_graph`. It orchestrates schema retrieval, planning, validation, generation, execution, and optional refinement.
+
+```mermaid
+flowchart LR
+    Schema[SchemaRetrieverNode] --> AST[ASTPlannerNode]
+    AST -->|ok| Logical[LogicalValidatorNode]
+    AST -->|retry| Retry[retry_node]
+    Logical -->|ok| Generator[GeneratorNode]
+    Logical -->|retry| Retry
+    Generator --> Executor[ExecutorNode]
+    Retry --> Refiner[RefinerNode]
+    Refiner --> AST
+```
+
+### Node responsibilities
+
+- `SchemaRetrieverNode`: retrieves schema context from `VectorStore` and `SchemaStore`.
+- `ASTPlannerNode`: produces a structured plan (AST) for the sub-query.
+- `LogicalValidatorNode`: enforces schema and policy constraints on the AST.
+- `GeneratorNode`: renders SQL from the plan.
+- `ExecutorNode`: dispatches SQL to `ExecutorRegistry` and stores artifacts.
+- `RefinerNode`: refines the plan when validation fails.
+
+## Subgraph execution state
+
+`SubgraphExecutionState` tracks per-subgraph execution details including `sub_query`, `relevant_tables`, planner output, validator output, executor responses, errors, and retry counters.
+
+## Source references
+
+- SQL agent graph: `packages/core/src/nl2sql/pipeline/subgraphs/sql_agent.py`
+- Node classes: `packages/core/src/nl2sql/pipeline/nodes/`
+- Subgraph state: `packages/core/src/nl2sql/pipeline/state.py`