|
3 | 3 | STEP = """\ |
4 | 4 | ### Current Step: Build the Plan |
5 | 5 |
|
6 | | -Present a **complete plan** for user review in a single, well-structured message. The plan should include: |
| 6 | +Present a **complete plan** for user review in a single, well-structured message. |
| 7 | +
|
| 8 | +**Guiding principle:** Use every schema feature that adds value. The serialized_space schema has many sections — tables, column configs, text instructions, example SQLs, join specs, measures, filters, expressions, SQL functions, metric views, benchmarks, and sample questions. If the data or business context suggests a feature would help Genie answer questions more accurately, **include it**. A rich config produces a more capable space. |
| 9 | +
|
| 10 | +The plan should include: |
7 | 11 |
|
8 | 12 | 1. **Space title, description, audience** |
9 | | -2. **Selected tables** (with any excluded columns noted) |
| 13 | +2. **Selected tables** (with column-level detail) |
| 14 | + - **Column descriptions**: Add descriptions for columns whose names are ambiguous or domain-specific |
| 15 | + - **Column synonyms**: Add synonyms for columns users might refer to by different names (e.g., "cust_id" → "customer ID", "account number") |
| 16 | + - **Excluded columns**: List ETL metadata, internal IDs, and irrelevant columns to hide from Genie |
| 17 | + - **Metric views**: Include any metric views discovered during inspection — they simplify pre-aggregated metrics |
10 | 18 | 3. **Text instructions** — domain knowledge that CAN'T be expressed as SQL snippets, examples, joins, or column metadata |
11 | 19 |
|
12 | 20 | Text instructions are injected into Genie's LLM prompt. To avoid overlap with other config sections, follow this MECE boundary: |
|
96 | 104 |
|
97 | 105 | Aim for a mix: ~3-5 hardcoded examples for structural patterns, ~2-5 parameterized examples for entity-specific queries. |
98 | 106 |
|
| 107 | + **Usage guidance:** Add `usage_guidance` to each example SQL to tell Genie when this pattern applies (e.g., "Use this pattern for any top-N ranking question by a numeric metric"). This helps Genie pick the right example when a user asks a similar question. |
| 108 | +
|
99 | 109 | **Testing parameterized SQL:** When calling `test_sql` on parameterized queries, pass the `parameters` array with each parameter's `name` and `default_value`. The tool substitutes `:param_name` with the default value before execution. Without this, the query will fail with an UNBOUND_SQL_PARAMETER error. |
100 | 110 |
|
101 | 111 | Incorporate patterns from `profile_table_usage` query history where available — real query patterns make better few-shot examples than synthetic ones. Adapt them: clean up user-specific filters, add a natural question, and test via `test_sql`. |
|
114 | 124 | - `comment`: internal note explaining the formula or business context |
115 | 125 | Put the actual aggregation formula here, not in text instructions. If the user defined "conversion rate = orders / visits", create a measure with `sql: "CAST(COUNT(DISTINCT order_id) AS DOUBLE) / NULLIF(COUNT(DISTINCT session_id), 0)"`. |
116 | 126 |
|
117 | | -7. **Benchmark queries** (5-10 pairs) — for validating the space after creation |
| 127 | +7. **Expressions** — reusable computed columns / dimension expressions |
| 128 | +
|
| 129 | + Each expression has an `alias`, `sql` (a dimension expression), `display_name`, and optional `synonyms`, `instruction`, and `comment`. |
| 130 | + Use for date dimensions (`YEAR(order_date)`), computed categories (`CASE WHEN amount > 1000 THEN 'High' ELSE 'Low' END`), or derived columns that Genie should know about. |
| 131 | +
|
| 132 | +8. **Join specs** — table relationships for multi-table queries |
| 133 | +
|
| 134 | + Define join specs when 2+ tables need to be joined. Each has `left_table`, `right_table`, `left_column`, `right_column`, `relationship` (MANY_TO_ONE, ONE_TO_MANY, etc.), and optional `instruction` and `comment`. |
| 135 | + - `instruction`: tells Genie WHEN to use this join (e.g., "Use when customer demographics are needed for order analysis") |
| 136 | + - `comment`: describes the relationship in plain language |
| 137 | + Always define joins proactively when multi-table data is selected — don't wait for the user to ask. |
| 138 | +
|
| 139 | +9. **SQL functions** — Unity Catalog UDFs available to the space |
| 140 | +
|
| 141 | + If `discover_tables` or the user mentioned custom SQL functions (UDFs) relevant to the domain, include them. Each needs an `identifier` (catalog.schema.function_name). The function must already be registered in Unity Catalog. |
| 142 | +
|
| 143 | +10. **Benchmark queries** (5-10 pairs) — for validating the space after creation |
118 | 144 |
|
119 | 145 | Benchmarks are test questions used to verify Genie produces correct SQL. They should: |
120 | 146 | - Include specific expected SQL or expected result characteristics |
|
125 | 151 |
|
126 | 152 | Use patterns from `profile_table_usage` query history to make benchmarks realistic. |
127 | 153 |
|
128 | | -8. **Sample questions** (3-5) — displayed in the space as conversation starters |
| 154 | +11. **Sample questions** (3-5) — displayed in the space as conversation starters |
129 | 155 |
|
130 | 156 | These should match the audience level. For executives: "What were our top 5 products by revenue this quarter?" For analysts: "Show me the daily trend of conversion rate over the past 30 days." Incorporate business context (fiscal definitions, terminology). |
131 | 157 |
|
|
140 | 166 |
|
141 | 167 | **Skipping:** If the user explicitly says "just create it" or "use defaults," generate a minimal plan with sensible defaults, present it briefly, and proceed after a quick confirmation.""" |
142 | 168 |
|
143 | | -SUMMARY = "Step 4 (Plan): Compose a full plan (instructions, SQL examples, filters, measures, benchmarks, sample questions) using inspection findings + business context." |
| 169 | +SUMMARY = "Step 4 (Plan): Compose a full plan (tables with column configs, text instructions, example SQLs, filters, measures, expressions, join specs, SQL functions, benchmarks, sample questions) using inspection findings + business context." |
0 commit comments