nadeem4
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/nodes/aggregator_node.md‎
Lines changed: 18 additions & 24 deletions b/‎docs/nodes/aggregator_node.md‎
Lines changed: 18 additions & 24 deletions
diff --git a/‎docs/nodes/decomposer_node.md‎
Lines changed: 24 additions & 26 deletions b/‎docs/nodes/decomposer_node.md‎
Lines changed: 24 additions & 26 deletions
diff --git a/‎docs/nodes/direct_sql_node.md‎
Lines changed: 0 additions & 40 deletions b/‎docs/nodes/direct_sql_node.md‎
Lines changed: 0 additions & 40 deletions
diff --git a/‎docs/nodes/executor_node.md‎
Lines changed: 21 additions & 33 deletions b/‎docs/nodes/executor_node.md‎
Lines changed: 21 additions & 33 deletions
diff --git a/‎docs/nodes/generator_node.md‎
Lines changed: 21 additions & 31 deletions b/‎docs/nodes/generator_node.md‎
Lines changed: 21 additions & 31 deletions
@@ -20,3 +20,5 @@ data/*.db
 chroma_db
 
 logs
+
+site
@@ -2,46 +2,40 @@
 
 ## Purpose
 
-The `AggregatorNode` is responsible for consolidating results from multiple parallel execution branches (triggered by the `DecomposerNode`) into a single, coherent response. It handles data merging, de-duplication, and formatting (e.g., combining two partial tables into one).
+The `AggregatorNode` combines results from the execution phase and prepares the final response. It implements a "Fast Path" for direct data streaming and a "Slow Path" for LLM-based summarization or answer synthesis.
 
-## Components
+## Class Reference
 
-- **`LLM`**: Used to synthesize the final answer and decide the best presentation format.
-- **`AggregatedResponse`**: Structured output schema.
+- **Class**: `AggregatorNode`
+- **Path**: `packages/core/src/nl2sql/pipeline/nodes/aggregator/node.py`
 
 ## Inputs
 
 The node reads the following fields from `GraphState`:
 
-- `state.intermediate_results`: A list of results collected from all parallel branches (each containing execution data or errors).
-- `state.user_query`: The original global query.
-- `state.errors`: List of errors encountered in the branches (to report partial failures).
+- `state.user_query` (str): The user's question.
+- `state.intermediate_results` (List): Results from the executor(s).
+- `state.output_mode` (str): "data" (Fast Path) or "summary"/"verbose" (Slow Path).
+- `state.errors` (List[PipelineError]): Any errors to include in the summary.
 
 ## Outputs
 
 The node updates the following fields in `GraphState`:
 
-- `state.final_answer`: A markdown-formatted string containing the summary and combined data.
-- `state.reasoning`: Log entry describing the chosen format.
-- `state.errors`: Appends `PipelineError` if aggregation fails.
+- `state.final_answer` (Any): The final text entry or data payload.
+- `state.reasoning` (List[Dict]): Log of which path was taken.
 
 ## Logic Flow
 
 1. **Fast Path Check**:
-    - Checks if `state.response_type` is `TABULAR` or `KPI`.
-    - If true, and there is a single successful result, returns `final_answer=None`.
-    - This signals the Presentation Layer (CLI) to display the raw `ExecutionModel` directly.
-2. **Slow Path (LLM)**:
-    - If `state.response_type` is `SUMMARY` or multiple results exist.
-    - Formats all `intermediate_results` into a single text block.
-    - Invokes the LLM to synthesize an answer (`AggregatedResponse`).
-3. **Formatting**:
-    - Constructs a markdown string combining the summary and content.
+    - If there is exactly one result, no errors, and `output_mode` is "data":
+    - Returns the raw data directly.
+2. **Slow Path (LLM Aggregation)**:
+    - Formats all `intermediate_results` (and errors) into a string.
+    - Prompts the LLM to synthesize an answer to the `user_query` using the provided data.
+    - Formats the LLM output (Table/List/Text).
+    - Returns the generated summary.
 
 ## Error Handling
 
-- **`AGGREGATOR_FAILED`**: If the LLM output is malformed or processing fails.
-
-## Dependencies
-
-- `nl2sql.nodes.aggregator.schemas.AggregatedResponse`
+- **`AGGREGATOR_FAILED`**: If the LLM summarization fails.
@@ -2,47 +2,45 @@
 
 ## Purpose
 
-The `DecomposerNode` acts as the **Router** and **Orchestrator** of the pipeline. It parses the canonicalized user query and breaks it down into independent sub-queries, each targeted at a specific datasource. This is crucial for handling multi-datasource requests or complex analytical questions.
+The `DecomposerNode` acts as the entry point and router for the pipeline. It is responsible for analyzing the user's query to determine which datasource(s) should handle the request. For complex requests, it can break the query down into sub-queries (though simple routing is the primary function). It also checks user authorization before proceeding.
 
-## Components
+## Class Reference
 
-- **`LLM`**: Used to perform the decomposition and reasoning.
-- **`OrchestratorVectorStore`**: Provides relevant schema context for the LLM to make informed routing decisions.
-- **`DatasourceRegistry`**: Provides metadata (descriptions) about available data sources.
+- **Class**: `DecomposerNode`
+- **Path**: `packages/core/src/nl2sql/pipeline/nodes/decomposer/node.py`
 
 ## Inputs
 
 The node reads the following fields from `GraphState`:
 
-- `state.semantic_analysis`: The **enriched** query context containing canonical query and synonyms (from SemanticAnalysisNode).
-- `state.selected_datasource_id`: (Optional) If set, the node acts in "Pass-through" mode.
+- `state.user_query` (str): The initial user question.
+- `state.user_context` (Dict): User session data, specifically `allowed_datasources` for authorization.
+- `state.semantic_analysis` (SemanticAnalysisResponse): Used to expand the query with keywords/synonyms for better vector retrieval.
 
 ## Outputs
 
 The node updates the following fields in `GraphState`:
 
-- `state.sub_queries`: A list of `SubQuery` objects, each containing:
-  - `datasource_id`: Target database.
-  - `query`: The specific question for that database.
-  - `candidate_tables`: (Optional) Pre-identified tables.
-- `state.reasoning`: Log entry explaining the decomposition logic.
-- `state.errors`: Appends `PipelineError` if orchestration fails.
+- `state.sub_queries` (List[SubQuery]): A list of routed queries. Each `SubQuery` contains:
+  - `question`: The specific question for the datasource.
+  - `datasource_id`: The ID of the chosen datasource.
+- `state.confidence` (float): The confidence score of the routing decision.
+- `state.reasoning` (List[Dict]): Explanation of why a specific datasource was selected.
+- `state.errors` (List[PipelineError]): `SECURITY_VIOLATION` if the user lacks access.
 
 ## Logic Flow
 
-1. **Direct Execution Check**:
-    - If `state.selected_datasource_id` is already present, it creates a single `SubQuery` targeting that datasource.
-2. **Context Retrieval**:
-    - Uses `state.user_query` + `state.enriched_terms` to query the `VectorStore`.
-3. **LLM Decomposition**:
-    - Prompts the LLM with the query, available datasources, and retrieved schema context.
-    - The LLM generates a plan (`DecomposerResponse`) consisting of one or more sub-queries.
-4. **State Update**: The resulting `sub_queries` are stored in the state, which triggers parallel execution branches.
+1. **Authorization Check**: Verifies if `state.user_context` contains accessible datasources. If not, returns `SECURITY_VIOLATION`.
+2. **Query Expansion**: If `state.semantic_analysis` is present, it augments the query with keywords and synonyms to improve retrieval recall.
+3. **Context Retrieval**:
+    - Queries the `OrchestratorVectorStore` using the expanded query.
+    - Retrieves relevant table schemas and datasource descriptions.
+4. **LLM Routing**:
+    - Uses the LLM to analyze the retrieved context and the user query.
+    - Decides which datasource is best suited to answer the question.
+5. **Output Generation**: Returns the routing decision (datasource selection) and confidence score.
 
 ## Error Handling
 
-- **`ORCHESTRATOR_CRASH`**: Critical failure in the decomposition process (e.g., LLM error, context retrieval failure).
-
-## Dependencies
-
-- `nl2sql.nodes.decomposer.schemas.DecomposerResponse`
+- **`SECURITY_VIOLATION`**: Critical error if the user has no allowed datasources.
+- **Retrieval Warnings**: Logs warnings if no relevant documents are found in the vector store.
@@ -2,55 +2,43 @@
 
 ## Purpose
 
-The `ExecutorNode` is responsible for waiting for a SQL query (draft) and executing it against the actual database engine. It acts as the final "Effector" in the pipeline. It strictly enforces security protocols to prevent mutation or data loss.
+The `ExecutorNode` is responsible for executing the generated SQL query against the target datasource. It handles connection management via the `DatasourceRegistry` adapters, safeguards against massive result sets, and formats the output.
 
-## Components
+## Class Reference
 
-- **`DatasourceRegistry`**: To obtain the database engine/connection.
-- **`enforce_read_only`**: Security utility to scan for forbidden SQL keywords (INSERT, UPDATE, DROP, etc.).
-- **`engine_factory.run_read_query`**: Helper to execute the query.
+- **Class**: `ExecutorNode`
+- **Path**: `packages/core/src/nl2sql/pipeline/nodes/executor/node.py`
 
 ## Inputs
 
 The node reads the following fields from `GraphState`:
 
-- `state.sql_draft`: The SQL query string to execute.
-- `state.datasource_id`: ID of the target datasource.
+- `state.sql_draft` (str): The SQL query to execute.
+- `state.selected_datasource_id` (str): The target database ID.
 
 ## Outputs
 
 The node updates the following fields in `GraphState`:
 
-- `state.execution`: A structured `ExecutionModel` containing:
-  - `row_count`: Number of rows returned.
-  - `rows`: List of dictionaries representing the result set.
-  - `columns`: List of column names.
-  - `error`: String description of any database error.
-- `state.reasoning`: Log entry summarizing the execution stats.
-- `state.errors`: Appends `PipelineError` if security check fails or DB throws an error.
+- `state.execution` (`ExecutionModel`): The result of the query.
+  - `columns` (List[str]): Column names.
+  - `rows` (List[Dict]): The data returned.
+  - `row_count` (int): Number of rows.
+- `state.errors` (List[PipelineError]): Errors during execution.
 
 ## Logic Flow
 
-1. **Validation**: Checks if `sql_draft` and `datasource_id` are present.
-2. **Datasource Resolution**: Identifies the primary datasource if a list was provided.
-3. **Security Check**:
-    - Detects the dialect based on the profile.
-    - Calls `enforce_read_only` to validate the SQL.
-    - If violation is found, returns `SECURITY_VIOLATION` critical error.
+1. **Validation**: Ensures `sql_draft` and `datasource_id` are present.
+2. **Adapter Retrieval**: Fetches the correct adapter (e.g., PostgresAdapter) from the registry.
+3. **Cost Estimation (Safeguard)**:
+    - If supported by the adapter, estimates the query cost.
+    - If the estimated row count exceeds `SAFEGUARD_ROW_LIMIT` (10,000), aborts execution and raises `SAFEGUARD_VIOLATION`.
 4. **Execution**:
-    - Uses SQLAlchemy engine to run the query.
-    - Fetches all results and maps them to a list of dictionaries.
-    - Captures metadata (column names).
-5. **Result Packaging**: Wraps results or exceptions into the `ExecutionModel`.
+    - Runs `adapter.execute(sql)`.
+    - Captures the result set.
+5. **Formatting**: Converts the results into the standard `ExecutionModel`.
 
 ## Error Handling
 
-- **`MISSING_SQL`**: If generator failed to produce output.
-- **`SECURITY_VIOLATION`**: If DML/DDL keywords are detected.
-- **`DB_EXECUTION_ERROR`**: Runtime errors from the database (e.g., syntax error, invalid table).
-- **`EXECUTOR_CRASH`**: Unhandled python exceptions.
-
-## Dependencies
-
-- `nl2sql.security`
-- `nl2sql.engine_factory`
+- **`SAFEGUARD_VIOLATION`**: If the query is predicted to return too many rows.
+- **`DB_EXECUTION_ERROR`**: If the database raises an exception (e.g., timeout, syntax error not caught by validator).
@@ -2,52 +2,42 @@
 
 ## Purpose
 
-The `GeneratorNode` is responsible for converting the abstract query plan (generated by the `PlannerNode`) into a concrete, syntactically correct SQL query. It uses `sqlglot` to handle dialect differences (e.g., PostgreSQL vs T-SQL) and enforces system-wide guardrails like row limits.
+The `GeneratorNode` is the compiler of the pipeline. It takes the abstract execution plan (`PlanModel`) produced by the Planner and generates a valid, dialect-specific SQL string. It uses `sqlglot` to transpile the internal AST into the target SQL dialect (e.g., PostgreSQL, T-SQL, MySQL), enforcing syntactic correctness.
 
-## Components
+## Class Reference
 
-- **`sqlglot`**: A powerful SQL parser and transpiler library used to construct the query AST programmatically.
-- **`DatasourceRegistry`**: Used to determine the profile and specific SQL dialect of the target database.
+- **Class**: `GeneratorNode`
+- **Path**: `packages/core/src/nl2sql/pipeline/nodes/generator/node.py`
 
 ## Inputs
 
 The node reads the following fields from `GraphState`:
 
-- `state.plan`: The structured dictionary representing the logical query plan (SELECT, FROM, JOINs, WHERE, etc.).
-- `state.datasource_id`: ID of the target datasource (used to resolve dialect).
+- `state.plan` (`PlanModel`): The logical plan to compile.
+- `state.selected_datasource_id` (str): The ID of the target database, used to determine the SQL dialect.
 
 ## Outputs
 
 The node updates the following fields in `GraphState`:
 
-- `state.sql_draft`: The generated SQL query string.
-- `state.reasoning`: Log entry showing the generated SQL and rationale.
-- `state.errors`: Appends `PipelineError` if generation fails.
+- `state.sql_draft` (str): The generated SQL query string.
+- `state.reasoning` (List[Dict]): Logs the generated SQL.
+- `state.errors` (List[PipelineError]): `SQL_GEN_FAILED` if compilation errors occur.
 
 ## Logic Flow
 
-1. **Preparation**:
-    - Validates presence of `datasource_id` and `plan`.
-    - Retrieves the correct dialect capability (e.g., "postgres", "tsql") from the registry.
-    - Determines the row limit (minimum of system limit or plan limit).
-
-2. **Visitor Compilation**:
-    - Instantiates a `SqlVisitor` class to traverse the Recursive AST.
-    - **Recursion**: Calls `visit(expr)` for every node in the tree.
-    - **Dispatch**: Routes checks to `visit_binary`, `visit_literal`, `visit_func`, etc.
-    - **Strict Ordering**: Sorts all lists (`tables`, `select_items`) by `ordinal` before visiting.
-    - **Compilation**: Returns pure `sqlglot` expression objects (no string parsing).
-
-3. **Transpilation**:
-    - Calls `query.sql(dialect=target_dialect)` to generate the final string string matching the target database's syntax.
+1. **Validation**: Checks if a plan and a datasource ID are present in the state.
+2. **Profile Lookup**: Fetches the `dialect` (e.g., "postgres", "tsql") and default `row_limit` from the datasource registry.
+3. **AST Transformation (`SqlVisitor`)**:
+    - The node uses a `SqlVisitor` class to traverse the `PlanModel` (Expr tree).
+    - It builds a corresponding `sqlglot` Expression tree.
+    - This visitor handles literals, columns, functions, binary/unary operations, and case statements.
+4. **SQL Synthesis**:
+    - Constructs the top-level `SELECT` statement using `sqlglot` builders.
+    - Applies transformations for `SELECT`, `FROM` (Tables), `JOIN`, `WHERE`, `GROUP BY`, `HAVING`, `ORDER BY`, and `LIMIT`.
+    - Handles dialect-specific nuances (e.g., quoting identifiers, function names) via `sqlglot.transpile` mechanisms (implicit in `.sql(dialect=...)`).
+5. **Output**: Returns the final SQL string.
 
 ## Error Handling
 
-- **`MISSING_DATASOURCE_ID`**: If router failed to set a datasource.
-- **`MISSING_PLAN`**: If planner failed to produce a plan.
-- **`SQL_GEN_FAILED`**: If the plan contains invalid structure or references that `sqlglot` cannot parse.
-
-## Dependencies
-
-- `sqlglot` library
-- `nl2sql.capabilities`
+- **`SQL_GEN_FAILED`**: Raised if the visitor encounters unknown expression types or if `sqlglot` fails to generate the string.
-Original file line number
+Diff line change
 chroma_db
 logs
++
 +site