|
2 | 2 |
|
3 | 3 | ## Purpose |
4 | 4 |
|
5 | | -The `GeneratorNode` is responsible for converting the abstract query plan (generated by the `PlannerNode`) into a concrete, syntactically correct SQL query. It uses `sqlglot` to handle dialect differences (e.g., PostgreSQL vs T-SQL) and enforces system-wide guardrails like row limits. |
| 5 | +The `GeneratorNode` is the compiler of the pipeline. It takes the abstract execution plan (`PlanModel`) produced by the Planner and generates a valid, dialect-specific SQL string. It uses `sqlglot` to transpile the internal AST into the target SQL dialect (e.g., PostgreSQL, T-SQL, MySQL), enforcing syntactic correctness. |
6 | 6 |
|
7 | | -## Components |
| 7 | +## Class Reference |
8 | 8 |
|
9 | | -- **`sqlglot`**: A powerful SQL parser and transpiler library used to construct the query AST programmatically. |
10 | | -- **`DatasourceRegistry`**: Used to determine the profile and specific SQL dialect of the target database. |
| 9 | +- **Class**: `GeneratorNode` |
| 10 | +- **Path**: `packages/core/src/nl2sql/pipeline/nodes/generator/node.py` |
11 | 11 |
|
12 | 12 | ## Inputs |
13 | 13 |
|
14 | 14 | The node reads the following fields from `GraphState`: |
15 | 15 |
|
16 | | -- `state.plan`: The structured dictionary representing the logical query plan (SELECT, FROM, JOINs, WHERE, etc.). |
17 | | -- `state.datasource_id`: ID of the target datasource (used to resolve dialect). |
| 16 | +- `state.plan` (`PlanModel`): The logical plan to compile. |
| 17 | +- `state.selected_datasource_id` (str): The ID of the target database, used to determine the SQL dialect. |
18 | 18 |
|
19 | 19 | ## Outputs |
20 | 20 |
|
21 | 21 | The node updates the following fields in `GraphState`: |
22 | 22 |
|
23 | | -- `state.sql_draft`: The generated SQL query string. |
24 | | -- `state.reasoning`: Log entry showing the generated SQL and rationale. |
25 | | -- `state.errors`: Appends `PipelineError` if generation fails. |
| 23 | +- `state.sql_draft` (str): The generated SQL query string. |
| 24 | +- `state.reasoning` (List[Dict]): Logs the generated SQL. |
| 25 | +- `state.errors` (List[PipelineError]): `SQL_GEN_FAILED` if compilation errors occur. |
26 | 26 |
|
27 | 27 | ## Logic Flow |
28 | 28 |
|
29 | | -1. **Preparation**: |
30 | | - - Validates presence of `datasource_id` and `plan`. |
31 | | - - Retrieves the correct dialect capability (e.g., "postgres", "tsql") from the registry. |
32 | | - - Determines the row limit (minimum of system limit or plan limit). |
33 | | - |
34 | | -2. **Visitor Compilation**: |
35 | | - - Instantiates a `SqlVisitor` class to traverse the Recursive AST. |
36 | | - - **Recursion**: Calls `visit(expr)` for every node in the tree. |
37 | | - - **Dispatch**: Routes checks to `visit_binary`, `visit_literal`, `visit_func`, etc. |
38 | | - - **Strict Ordering**: Sorts all lists (`tables`, `select_items`) by `ordinal` before visiting. |
39 | | - - **Compilation**: Returns pure `sqlglot` expression objects (no string parsing). |
40 | | - |
41 | | -3. **Transpilation**: |
42 | | - - Calls `query.sql(dialect=target_dialect)` to generate the final string string matching the target database's syntax. |
| 29 | +1. **Validation**: Checks if a plan and a datasource ID are present in the state. |
| 30 | +2. **Profile Lookup**: Fetches the `dialect` (e.g., "postgres", "tsql") and default `row_limit` from the datasource registry. |
| 31 | +3. **AST Transformation (`SqlVisitor`)**: |
| 32 | + - The node uses a `SqlVisitor` class to traverse the `PlanModel` (Expr tree). |
| 33 | + - It builds a corresponding `sqlglot` Expression tree. |
| 34 | + - This visitor handles literals, columns, functions, binary/unary operations, and case statements. |
| 35 | +4. **SQL Synthesis**: |
| 36 | + - Constructs the top-level `SELECT` statement using `sqlglot` builders. |
| 37 | + - Applies transformations for `SELECT`, `FROM` (Tables), `JOIN`, `WHERE`, `GROUP BY`, `HAVING`, `ORDER BY`, and `LIMIT`. |
| 38 | + - Handles dialect-specific nuances (e.g., quoting identifiers, function names) via `sqlglot.transpile` mechanisms (implicit in `.sql(dialect=...)`). |
| 39 | +5. **Output**: Returns the final SQL string. |
43 | 40 |
|
44 | 41 | ## Error Handling |
45 | 42 |
|
46 | | -- **`MISSING_DATASOURCE_ID`**: If router failed to set a datasource. |
47 | | -- **`MISSING_PLAN`**: If planner failed to produce a plan. |
48 | | -- **`SQL_GEN_FAILED`**: If the plan contains invalid structure or references that `sqlglot` cannot parse. |
49 | | - |
50 | | -## Dependencies |
51 | | - |
52 | | -- `sqlglot` library |
53 | | -- `nl2sql.capabilities` |
| 43 | +- **`SQL_GEN_FAILED`**: Raised if the visitor encounters unknown expression types or if `sqlglot` fails to generate the string. |
0 commit comments