diff --git a/skills/semantic-view-patterns/LICENSE b/skills/semantic-view-patterns/LICENSE new file mode 100644 index 00000000..c0c061fe --- /dev/null +++ b/skills/semantic-view-patterns/LICENSE @@ -0,0 +1,161 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship made available under + the License, as indicated by a copyright notice that is included in + or attached to the work (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other + transformations represent, as a whole, an original work of authorship. + For the purposes of this License, Derivative Works shall not include + works that remain separable from, or merely link (or bind by name) + to the interfaces of, the Work and Derivative Works thereof. + + "Contribution" shall mean, as submitted to the Licensor for inclusion + in the Work by the copyright owner or by an individual or Legal Entity + authorized to submit on behalf of the copyright owner. + + "Contributor" shall mean Licensor and any Legal Entity on behalf of + whom a Contribution has been received by the Licensor and included + within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by the combined Work of their Contribution(s) + with the Work to which such Contribution(s) was submitted. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or Derivative + Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in all form of the Work or Derivative + Works that You distribute, all copyright, patent, trademark, + and attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, You must include a readable copy of the + attribution notices contained within such NOTICE file, in + at least one of the following places: within a NOTICE text + file distributed as part of the Derivative Works; within + the Source form or documentation, if provided along with + the Derivative Works; or, within a display generated by + the Derivative Works, if and wherever such third-party + notices normally appear. The contents of the NOTICE file + are for informational purposes only and do not modify the + License. You may add Your own attribution notices within + Derivative Works that You distribute, alongside or as an + addendum to the NOTICE text from the Work, provided that + such additional attribution notices cannot be construed + as modifying the License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or reproducing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or exemplary damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or all other + commercial damages or losses), even if such Contributor has been + advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may offer only + conditions consistent with the terms of this License. + + END OF TERMS AND CONDITIONS + + Copyright 2024 Snowflake Inc. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/skills/semantic-view-patterns/SKILL.md b/skills/semantic-view-patterns/SKILL.md new file mode 100644 index 00000000..552cfc4a --- /dev/null +++ b/skills/semantic-view-patterns/SKILL.md @@ -0,0 +1,459 @@ +--- +id: semantic-view-patterns +name: semantic-view-patterns +skill-name: $sv-patterns +description: Two modes for 25 Snowflake Semantic View modeling patterns — Tutorial mode deploys working examples and explains them live; Apply mode adapts a pattern to the user's own tables and generates ready-to-use DDL or YAML. +prompt: "$sv-patterns walk me through time intelligence" +language: en +categories: snowflake-site:taxonomy/product/ai, snowflake-site:taxonomy/snowflake-feature/build +status: Published +authors: Josh Klahr +type: snowflake +tools: + - snowflake_sql_execute + - bash +--- + +# Semantic View Patterns + +Interactive, end-to-end tutorials for 25 Snowflake Semantic View modeling patterns. Each tutorial deploys a working example into your Snowflake account, walks through the annotated DDL or YAML, runs live queries, and surfaces what works and what doesn't. + +# When to Use + +This skill has two modes — use the right one based on what the user is trying to do: + +**Tutorial mode** — user wants to *learn* a pattern: +- "walk me through ``", "teach me ``", "explain how `` works" +- "how does time intelligence work in SVs", "show me ASOF joins in action" +- "what snippets do you have", "what patterns are available" + +**Apply mode** — user wants to *use* a pattern on their own data: +- "help me add time intelligence to my SV" +- "my SV has tables X, Y, Z — how do I model SPLY?" +- "I want to add window metrics to my existing semantic view" +- "can you update my SV to handle SCD2 dimensions?" +- "I'm building a SV for [use case] — what pattern do I need and how do I implement it?" + +**Example triggers**: `$sv-patterns time intelligence` (Tutorial), `$sv-patterns apply time intelligence to my SV` (Apply), `$sv-patterns what snippets are available` (Discovery) + +# What This Skill Provides + +A library of 25 executable, self-contained Semantic View modeling patterns bundled alongside this skill, each with: +- Real problem statement and BI tool comparison +- Minimal but realistic seed data +- Fully annotated SV DDL **and** YAML (`semantic_view.sql` + `semantic_view.yaml`) +- Working `SEMANTIC_VIEW()` queries with live output +- Explicit gotchas and what-doesn't-work notes + +**Tutorial mode**: Deploys each snippet directly via `snowflake_sql_execute`, walks through the annotated DDL or YAML section by section, runs live queries, and offers to clean up all created objects at the end. + +**Apply mode**: The snippet files serve as annotated reference patterns. The skill reads the user's existing SV definition, maps the snippet's structure to their tables/columns, and generates adapted DDL or YAML ready to paste or deploy — no example data needed. + +## Available Patterns + +| Snippet | Concept | +|---------|---------| +| `range_join` | BETWEEN EXCLUSIVE — SCD2 temporal join | +| `asof_join` | ASOF — join to most recent record at event time | +| `multi_path_metrics` | USING — disambiguate multiple join paths | +| `shared_degenerate_dimension` | Shared degenerate dimension across two facts | +| `semi_additive_metric` | NON ADDITIVE BY — snapshot / headcount / balance | +| `window_metrics` | LAG, rolling avg, YTD window functions | +| `derived_metrics` | Cross-table derived metrics and ratios | +| `time_intelligence` | Role-playing aliases + computed-FK FACTS for SPLY/YoY/MoM | +| `entity_facts` | Aggregated entity-level facts and calculated dims | +| `variables` | VARIABLES clause for parameterized SVs | +| `multi_fact_table` | Multiple facts sharing product and date dims | +| `ai_metadata` | AI_SQL_GENERATION, AI_QUESTION_CATEGORIZATION, AI_VERIFIED_QUERIES | +| `tags` | WITH TAG on metrics | +| `introspection` | SHOW METRICS, SHOW DIMENSIONS, get_lineage() | +| `fact_as_relationship_key` | Computed FK fact — derive a join key from an expression when no physical FK column exists | +| `system_explain_semantic_query` | SYSTEM$EXPLAIN_SEMANTIC_QUERY — inspect generated SQL, debug errors without running the query | +| `caller_rights` | Ownership separation trick — make the SV owner have no base table access, forcing callers to supply their own; no privilege escalation ⚠️ Requires ACCOUNTADMIN | +| `standard_sql` | Plain SELECT on SVs — ANY_VALUE, metric-less queries | +| `inline_sv` | Inline SV CTEs ⚠️ Private Preview | +| `materialization` | SV materialization ⚠️ Private Preview | +| `scoped_dataset` | SQL query as logical table ⚠️ Private Preview | +| `row_access_policies` | RAP gotcha + two workarounds — prevent NULL rows when filtering dimension tables ⚠️ Requires ACCOUNTADMIN | +| `role_playing_dimensions` | Alias the same physical dimension table multiple times — independent ORDER_YEAR, SHIP_YEAR dimensions without USING | +| `accumulating_snapshot` | Kimball Accumulating Snapshot Fact Table — one row per pipeline entity, USING per milestone stage metric | +| `sv_diagnostics` | Six runtime and deploy-time failure modes — ambiguous path, fan trap, missing relationship, duplicate names/synonyms, wrong cardinality (silent inflation), semi-additive heuristic — with exact error messages and fixes |} + +# Instructions + +## Step 0 — Detect Mode and Authoring Format + +Before doing anything else, determine two things: + +### 0a — Detect Mode + +Determine which mode the user wants: + +- If the user said something like "walk me through", "teach me", "explain", "show me in action", "what snippets" → **Tutorial mode** → go to Tutorial Steps +- If the user said something like "apply to my SV", "add X to my semantic view", "my tables are...", "help me implement", "update my SV" → **Apply mode** → go to Apply Steps +- If ambiguous (e.g. "help me with time intelligence"), ask: + > "Do you want me to walk you through the time intelligence pattern with a live example, or help you apply it directly to your own Semantic View?" + +### 0b — Detect Authoring Format + +Once mode is determined, ask which authoring format the user prefers. Use `ask_user_question` with these options: + +| Option | Label | Description | +|--------|-------|-------------| +| DDL | `CREATE SEMANTIC VIEW` DDL | SQL-first; deploy with `CREATE OR REPLACE SEMANTIC VIEW`. Best for programmatic scripts, stored procedures, full feature access. | +| YAML | YAML + `SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML` | Config-file-first; human-readable, version-control-friendly, includes `verify_only` dry-run. Some DDL-only features require post-deploy DDL. | + +Store `AUTHORING_FORMAT = DDL` or `AUTHORING_FORMAT = YAML` and use it in all subsequent steps. + +**Skip this question** if the user already indicated a preference (e.g. "show me the YAML", "give me the DDL"). + +### YAML Authoring — Key Facts + +When `AUTHORING_FORMAT = YAML`: + +**Deployment:** +```sql +-- Deploy: +CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML( + 'TARGET_DB.TARGET_SCHEMA', + $$ $$ +); + +-- Verify without deploying (dry-run — catch errors before they hit production): +CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML( + 'TARGET_DB.TARGET_SCHEMA', + $$ $$, + TRUE -- verify_only +); + +-- Export an existing DDL SV to YAML: +SELECT SYSTEM$READ_YAML_FROM_SEMANTIC_VIEW('DB.SCHEMA.MY_SV'); +``` + +**YAML ↔ DDL feature map:** + +| DDL feature | YAML equivalent | +|---|---| +| `USING (relationship)` on metrics | `using_relationships: [rel_name]` | +| `NON ADDITIVE BY (dim)` | `non_additive_dimensions: [{table, dimension, sort_direction}]` | +| `PRIVATE` fact/metric | `access_modifier: private_access` | +| `AI_VERIFIED_QUERIES` | `verified_queries: [{question, sql, ...}]` | +| `AI_SQL_GENERATION` | `module_custom_instructions: sql_generation: ...` | +| `RANGE BETWEEN ... EXCLUSIVE` join | `type: range` + `right_range: {start_column, end_column}` + `constraints: [{distinct_range}]` | +| `WITH SYNONYMS (...)` | `synonyms: [list]` | +| `COMMENT = '...'` | `description: ...` | + +**DDL-only features (no YAML equivalent):** +- `AI_QUESTION_CATEGORIZATION` — set post-deploy via `ALTER SEMANTIC VIEW` +- `WITH TAG` — apply post-deploy via `ALTER SEMANTIC VIEW ... ADD TAG` +- `MAX_STALENESS` / `ADD MATERIALIZATION` — DDL only +- `VARIABLES` clause — DDL only +- `ASOF` relationship syntax — DDL only +- Inline SQL subqueries in `TABLES` clause — DDL only +- `WITH ... AS SEMANTIC VIEW` inline CTE — DDL only + +When a snippet has DDL-only features and `AUTHORING_FORMAT = YAML`, note the limitation and show the YAML for the base structure plus the DDL commands to apply the unsupported features post-deploy. + +--- + +# Tutorial Steps + +## Step 1 — Identify the Snippet + +If the user named a snippet or concept, match it to the closest entry in the table above. If the user said something general like "what can you teach me" or "what's available", list the snippets with one-line descriptions and use `ask_user_question` to let them choose. + +## Step 2 — Locate the Snippets Directory + +The `snippets/` directory is bundled alongside this `SKILL.md`. Find the skill's location by checking where this SKILL.md lives (use `glob` to search common skill paths). The snippets are at `/snippets//`. + +If the skill directory cannot be found automatically, ask the user: +``` +Where is the cortex-code-skills repo cloned on your machine? +``` +Then construct the path as `/skills/semantic-view-patterns/snippets/`. + +## Step 3 — Pre-Flight Check and Deploy Target (First Time Only) + +### 3a — Access-control snippets + +If the chosen snippet is `caller_rights` or `row_access_policies`: +- These snippets create **roles and a dedicated database** and require ACCOUNTADMIN (or both SECURITYADMIN + SYSADMIN). +- Run `SELECT CURRENT_ROLE(), CURRENT_WAREHOUSE()` to confirm. If the role is insufficient, warn the user before proceeding. +- These snippets use **hardcoded database names** (`SV_CALLER_TEST` or `RAP_TEST`) — no target DB question is needed. +- **Warehouse**: do NOT create a dedicated warehouse. Instead, ask: _"Which warehouse should the analyst roles use for running queries? I'll grant them USAGE on it."_ Default to `CURRENT_WAREHOUSE()` if the user has no preference. Then run: `GRANT USAGE, OPERATE ON WAREHOUSE TO ROLE ` for each role. +- Track the dedicated objects created so you can offer cleanup in Step 10. + +### 3b — Probe for Snowflake Learning Environment + +For all other snippets, silently run both checks: +```sql +SHOW DATABASES LIKE 'SNOWFLAKE_LEARNING_DB'; +SHOW ROLES LIKE 'SNOWFLAKE_LEARNING_ROLE'; +``` + +- **Both found** → include `SNOWFLAKE_LEARNING_DB.PUBLIC` as a recommended option in the next question, noting it uses `SNOWFLAKE_LEARNING_ROLE` / `SNOWFLAKE_LEARNING_WH`. +- **Either missing** → don't offer it; go straight to asking for a custom location. + +### 3c — Ask for target location and role + +Ask a single question. Options depend on 3b: + +- If Learning Environment is available: offer `SNOWFLAKE_LEARNING_DB.PUBLIC` (recommended) + custom location +- If not available: only offer custom location + +For a **custom location**, follow up with: +- Target `DATABASE.SCHEMA` (you'll create the DB if it doesn't exist) +- Which **role** to use — needs `CREATE TABLE`, `CREATE SEMANTIC VIEW`, `CREATE SCHEMA` on that database +- Which **warehouse** to use + +For the **Learning Environment**, set: +- `TARGET_DB = SNOWFLAKE_LEARNING_DB`, `TARGET_SCHEMA = PUBLIC` +- `TARGET_ROLE = SNOWFLAKE_LEARNING_ROLE`, `TARGET_WAREHOUSE = SNOWFLAKE_LEARNING_WH` + +Store `TARGET_DB`, `TARGET_SCHEMA`, `TARGET_ROLE`, `TARGET_WAREHOUSE` for the rest of the session — don't re-ask. + +### 3d — Track objects created + +Before deploying anything, record an explicit list of every object you are about to create (tables, views, semantic views, DB if new). You'll use this list in the Step 10 cleanup offer. + +## Step 4 — Read Snippet Files + +Before presenting anything, read the relevant files for the chosen snippet: +- `snippets//README.md` +- `snippets//schema.sql` +- `snippets//seed_data.sql` +- `snippets//queries.sql` +- If `AUTHORING_FORMAT = DDL`: read `snippets//semantic_view.sql` +- If `AUTHORING_FORMAT = YAML`: read `snippets//semantic_view.yaml` (and `semantic_view.sql` for context on DDL-only features not in YAML) + +## Step 5 — Act 1: The Problem + +Present the framing conversationally — do NOT just paste the README. Synthesize: +1. What problem this snippet solves (2–3 sentences) +2. The "How You Might Express This Need" list + +End with: _"Here's how Snowflake Semantic Views handle it — without any of those workarounds."_ + +Then add this prompt hint on its own line: +> 💡 _Want to learn how other tools tackle this problem? Ask me "Tell me about other approaches."_ + +## Step 5b — Other Approaches Handler + +If at any point the user says "tell me about other approaches", "how does [tool] handle this", or "what would this look like in Power BI / Tableau / dbt / SQL": + +Read the `## Equivalent in Other Tools` table from the snippet's README and present it conversationally. For each tool, briefly explain: +- What mechanism or feature they use +- Why it's more work or less reliable than the SV approach +- Any genuine strengths that tool has for this use case (be honest) + +End with: _"The SV approach encodes the constraint in the model definition itself — the right answer is the only possible answer, regardless of who writes the query."_ + +## Step 6 — Act 2: The Data Model + +Walk through `schema.sql` with inline annotations explaining what each table represents and which columns matter. + +Then deploy schema + seed by executing the SQL files directly via `snowflake_sql_execute`. **Do not use `run_snippet.py`** — execute statements directly in the active Snowflake session: + +1. Read `schema.sql` and `seed_data.sql` +2. For **standard snippets**: substitute `SNIPPETS.PUBLIC` → `TARGET_DB.TARGET_SCHEMA`, `USE DATABASE SNIPPETS` → `USE DATABASE TARGET_DB`, `USE SCHEMA PUBLIC` → `USE SCHEMA TARGET_SCHEMA` throughout +3. For **access-control snippets** (`caller_rights`, `row_access_policies`): no substitution — execute as-is +4. Run a `USE ROLE TARGET_ROLE` and `USE WAREHOUSE TARGET_WAREHOUSE` before executing +5. Execute each statement via `snowflake_sql_execute`, confirm each succeeds before continuing + +After deployment, show 3–5 sample rows from each table: +```sql +SELECT * FROM TARGET_DB.TARGET_SCHEMA.TABLE_NAME LIMIT 5; +``` + +## Step 7 — Act 3: The SV Pattern + +Walk through `semantic_view.sql` **section by section** — TABLES, RELATIONSHIPS, FACTS, DIMENSIONS, METRICS — stopping to explain each novel concept. Don't paste the full file; excerpt and annotate only the parts that are specific to this pattern. + +Format each stop as: +> **[Section]** — Here's what this part does: [explanation] +> ```sql +> [excerpt] +> ``` +> Key things to notice: [2–3 bullet points] + +Then deploy the SV using the format appropriate to `AUTHORING_FORMAT`: + +**If DDL:** Read file, substitute `SNIPPETS.PUBLIC` → `TARGET_DB.TARGET_SCHEMA`, execute via `snowflake_sql_execute`. + +**If YAML:** Read `semantic_view.yaml`. Substitute `TARGET_DB` and `TARGET_SCHEMA` throughout. Then deploy: +```sql +CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML( + 'TARGET_DB.TARGET_SCHEMA', + $$ $$ +); +``` +If the YAML has DDL-only features flagged in comments (e.g. `AI_QUESTION_CATEGORIZATION`, `MAX_STALENESS`, `WITH TAG`), note them and execute the corresponding DDL follow-up commands from `semantic_view.sql` after the YAML deploy. + +## Step 8 — Act 4: Live Queries + +Run each numbered working query from `queries.sql` one at a time using `snowflake_sql_execute`. Before each query: +1. State what it demonstrates +2. Adapt table/SV references if needed (`SNIPPETS.PUBLIC` → `TARGET_DB.TARGET_SCHEMA`) +3. Run it and show the output +4. Narrate what the specific numbers demonstrate — point to concrete rows/values + +For queries that include `USE ROLE` switches (access-control snippets), execute those role-switch statements directly via `snowflake_sql_execute` before the query that follows. + +After every 2–3 queries, check if the user wants to continue or dig deeper. + +## Step 9 — Act 5: Gotchas + +Read the `-- GOTCHAS` and `-- HOW ... WORKS` sections from `queries.sql` and the `## What Doesn't Work` section from `README.md`. Present each gotcha plainly: what trap exists, why it happens, how to avoid it. + +## Step 10 — Wrap-Up and Cleanup + +Summarize in 3–5 key takeaways. Show the `## Docs` links from `README.md`. + +Then **always offer cleanup** — list every object created during this tutorial before asking: + +For **standard snippets**, list: +- Tables created: `TARGET_DB.TARGET_SCHEMA.TABLE_1`, `TABLE_2`, ... +- Semantic views created: `TARGET_DB.TARGET_SCHEMA.SV_NAME` +- If the database was created fresh: `TARGET_DB` itself +- Do **not** offer to drop a DB that existed before the tutorial — only drop tables/SVs/views you created inside it + +For **access-control snippets**, list the full dedicated environment: +- Database: `SV_CALLER_TEST` or `RAP_TEST` +- Warehouse: `SV_CALLER_TEST` or `RAP_TEST_WH` +- Roles: (all roles created) + +Ask: _"Want me to clean all of this up now, or leave it so you can keep exploring?"_ + +If yes: execute the `-- CLEANUP` block from `queries.sql` via `snowflake_sql_execute`. Switch back to SYSADMIN / SECURITYADMIN as needed per the cleanup SQL. Confirm each drop succeeded. + +Finally, offer to run a different snippet or switch to Apply mode to adapt the pattern to the user's own tables. + +--- + +# Apply Steps + +## A1 — Identify the Pattern + +Match the user's request to the closest snippet in the Available Patterns table. If the use case is ambiguous (e.g. "I want year-over-year comparisons"), confirm: "That maps to the `time_intelligence` pattern — role-playing aliases + computed FK facts for SPLY/YoY/MoM. Does that sound right?" + +If the user isn't sure which pattern fits, ask them to describe: +- What tables they have and how they relate +- What metric or question they're trying to answer + +Then recommend the best-fit snippet with a one-sentence explanation of why. + +## A2 — Read the Snippet Reference + +Read `snippets//README.md` in full. + +Then read the authoring format file: +- If `AUTHORING_FORMAT = DDL`: read `snippets//semantic_view.sql` +- If `AUTHORING_FORMAT = YAML`: read `snippets//semantic_view.yaml` (and note any DDL-only features flagged in comments) + +Do NOT read `schema.sql`, `seed_data.sql`, or `queries.sql` — those are for Tutorial mode. + +## A3 — Get the User's Existing SV + +Ask for their current Semantic View definition. Accept any of: +- Paste DDL or YAML directly into the chat +- A local file path → use `read` to load it +- A Snowflake stage path → use `snowflake_sql_execute` with `GET_DDL('semantic_view', '')` +- Export YAML from an existing SV: `SELECT SYSTEM$READ_YAML_FROM_SEMANTIC_VIEW('DB.SCHEMA.SV_NAME')` +- "I'm building one from scratch" → ask for table names and a brief description of what they're trying to measure + +If they have no existing SV yet, proceed with just the table descriptions — you'll generate the full definition. + +## A4 — Map the Pattern to Their Schema + +Show the user the core structural roles in the snippet (e.g. for `time_intelligence`: FACT table, date key column, measure columns). Then ask them to map each role to their actual tables/columns: + +> Here's what the time intelligence pattern needs: +> | Role | Snippet uses | Your equivalent? | +> |------|-------------|------------------| +> | Fact table | `FACT_SALES` | ? | +> | Date key column | `SALE_MONTH` (DATE) | ? | +> | Measure(s) to compare | `revenue`, `units` | ? | +> | Calendar/date dimension (optional) | `DIM_CALENDAR` | ? | + +Ask only for what the pattern actually requires — don't over-ask. If they have a calendar dim, great; if not, note that a self-join on the fact works too. + +## A5 — Generate Adapted Definition + +Using their mapping, generate fully adapted SV definition in `AUTHORING_FORMAT`: + +**If DDL:** +1. **Existing SV**: produce a diff — show the exact blocks that need to be added or modified. +2. **From scratch**: produce complete `CREATE OR REPLACE SEMANTIC VIEW` DDL. + +**If YAML:** +1. **Existing SV**: produce a YAML diff — show the table entries, relationship entries, or metric entries to add or modify. +2. **From scratch**: produce complete YAML ready to pass to `SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML`. +3. If the pattern has DDL-only features (ASOF/range join, VARIABLES, WITH TAG, etc.), show the YAML base + call out the follow-up DDL commands needed. + +For YAML output, always include the deployment snippet at the top: +```sql +-- Verify (dry-run): +CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('DB.SCHEMA', $$ $$, TRUE); +-- Deploy: +CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('DB.SCHEMA', $$ $$); +``` + +Annotate each adapted block with a brief comment explaining what it does and why. + +## A6 — Gotchas for Their Case + +Read the `## What Doesn't Work` section of the snippet's README. Flag any gotchas that are specifically relevant given their schema (e.g. if they have a non-standard date granularity, a composite key, or a table used in multiple roles). + +## A7 — Offer Next Steps + +Offer any of: +- Deploy the adapted SV to their account via `snowflake_sql_execute` +- Run test `SEMANTIC_VIEW()` queries against their data to verify the pattern works +- Switch to Tutorial mode if they want to see a live walkthrough with example data first +- Apply a second pattern on top of the same SV + +# Best Practices + +**Both modes:** +- Be honest about limitations — when a pattern doesn't work or has caveats, explain exactly why +- For ⚠️ Private Preview snippets (`inline_sv`, `materialization`, `scoped_dataset`), note upfront that the user may need to contact their Snowflake account team to enable the feature +- For `caller_rights`, note upfront that it requires ACCOUNTADMIN (or both SECURITYADMIN + SYSADMIN), creates its own dedicated database/warehouse/roles (`SV_CALLER_TEST`), and includes a cleanup block — run it when done +- For `row_access_policies`, note upfront that it requires ACCOUNTADMIN (or both SECURITYADMIN + SYSADMIN), creates roles (`REGION_A_ANALYST`, `REGION_B_ANALYST`) and a dedicated database (`RAP_TEST`), and grants those roles USAGE on an existing warehouse — no new warehouse is created +- Match the user's energy — if they're exploring, be expansive; if they're in a hurry, be terse + +**Tutorial mode:** +- Teach, don't just execute — every output needs a sentence explaining what it means +- Connect abstract to concrete: "Notice how `yoy_pct` for Jan 2024 is +12.4% — East revenue went from 105,000 to 118,000" +- Keep momentum — check in for pacing but don't block on confirmations + +**Apply mode:** +- Never rewrite their whole SV unprompted — make surgical additions only +- When mapping their schema to the snippet's roles, use their exact column/table names throughout; don't revert to snippet names like `FACT_SALES` or `SALE_MONTH` in the output DDL +- If their schema has edge cases the snippet doesn't cover (composite keys, non-standard date grains, many-to-many relationships), flag it explicitly rather than silently generating broken DDL +- Always verify: after generating adapted DDL, ask "Does this mapping look right before I generate the full DDL?" unless the mapping is obvious + +# Examples + +## Example 1: Named snippet +User: `$sv-patterns walk me through time intelligence` +Assistant: Reads all five files, presents the problem (no PREVIOUSYEAR in SVs), shows BI tool equivalents, deploys schema/seed, annotates the role-playing alias + computed-FK FACT pattern, runs live SPLY/YoY queries, explains the results. + +## Example 2: Concept match +User: `$sv-patterns how do I handle SCD2 dimensions` +Assistant: Matches to `range_join`, runs the full tutorial showing BETWEEN EXCLUSIVE range relationships and how historical dimension versions auto-resolve. + +## Example 3: Discovery +User: `$sv-patterns what snippets do you have` +Assistant: Lists all 22 patterns with one-line descriptions, asks which one to walk through. + +## Example 4: Apply mode — existing SV +User: `$sv-patterns help me add year-over-year to my existing SV` +Assistant: Matches to `time_intelligence`, reads the snippet reference (README + semantic_view.sql only). Asks the user to paste their SV DDL. Shows the mapping table (fact table, date key, measures). User fills in their names. Generates only the new FACTS + METRICS blocks and an updated TABLES/RELATIONSHIPS section with the `_ly` role-playing alias — as a diff against their existing DDL. Explains what each change does and flags the computed-FK gotcha. + +## Example 5: Apply mode — building from scratch +User: `$sv-patterns I'm building a SV for subscription churn analysis — I have a subscriptions table and a customers table. What pattern do I need?` +Assistant: Asks clarifying questions (what's the grain? what do you want to measure?). Determines `semi_additive_metric` (NON ADDITIVE BY) fits for a subscriber headcount metric that shouldn't sum across time. Reads the snippet, maps their tables, generates full SV DDL with their table/column names. + +## Example 6: Ambiguous trigger → mode clarification +User: `$sv-patterns time intelligence` +Assistant: "Do you want me to walk you through the time intelligence pattern with a working example deployed to your account, or help you apply it directly to your own Semantic View?" diff --git a/skills/semantic-view-patterns/run_snippet.py b/skills/semantic-view-patterns/run_snippet.py new file mode 100644 index 00000000..74ac3be1 --- /dev/null +++ b/skills/semantic-view-patterns/run_snippet.py @@ -0,0 +1,168 @@ +""" +Run a semantic view snippet interactively against your Snowflake account. + +Usage: + python run_snippet.py [options] + +Options: + --step schema|seed|sv|queries|all Which step to run (default: all) + --db DATABASE Target database (default: CORTEX_SNIPPETS) + --schema SCHEMA Target schema (default: PUBLIC) + --connection CONNECTION_NAME Snowflake connection name (default: active connection) + --quiet Suppress query result rows + +Examples: + python run_snippet.py time_intelligence + python run_snippet.py range_join --step sv + python run_snippet.py window_metrics --db MY_DB --schema MY_SCHEMA + python run_snippet.py asof_join --connection my_connection +""" + +import os +import sys +import re +import argparse +import snowflake.connector + +SNIPPETS_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "snippets") + +STEP_FILES = { + "schema": "schema.sql", + "seed": "seed_data.sql", + "sv": "semantic_view.sql", + "queries": "queries.sql", +} + + +def adapt_sql(sql: str, target_db: str, target_schema: str) -> str: + """Rewrite SNIPPETS.PUBLIC references to the user's target database and schema.""" + sql = re.sub(r'USE DATABASE SNIPPETS\s*;', f'USE DATABASE {target_db};', sql, flags=re.IGNORECASE) + sql = re.sub(r'USE SCHEMA PUBLIC\s*;', f'USE SCHEMA {target_schema};', sql, flags=re.IGNORECASE) + sql = re.sub(r'CREATE DATABASE IF NOT EXISTS SNIPPETS\s*;', f'CREATE DATABASE IF NOT EXISTS {target_db};', sql, flags=re.IGNORECASE) + sql = re.sub(r'CREATE SCHEMA IF NOT EXISTS SNIPPETS\.PUBLIC\s*;', f'CREATE SCHEMA IF NOT EXISTS {target_db}.{target_schema};', sql, flags=re.IGNORECASE) + sql = re.sub(r'\bSNIPPETS\.PUBLIC\b', f'{target_db}.{target_schema}', sql, flags=re.IGNORECASE) + return sql + + +def split_statements(sql: str) -> list[str]: + """Split SQL into individual statements, skipping blank and comment-only blocks.""" + statements = [] + current = [] + for line in sql.splitlines(): + stripped = line.strip() + current.append(line) + if stripped.endswith(';') and not stripped.startswith('--'): + stmt = '\n'.join(current).strip() + if stmt and not all(l.strip().startswith('--') or l.strip() == '' for l in stmt.splitlines()): + statements.append(stmt) + current = [] + if current: + stmt = '\n'.join(current).strip() + if stmt and not all(l.strip().startswith('--') or l.strip() == '' for l in stmt.splitlines()): + statements.append(stmt) + return statements + + +def run_step(cur, snippet_dir: str, step: str, target_db: str, target_schema: str, verbose: bool = True): + filename = STEP_FILES[step] + filepath = os.path.join(snippet_dir, filename) + + if not os.path.exists(filepath): + print(f" ⚠️ {filename} not found — skipping") + return + + with open(filepath) as f: + raw = f.read() + + sql = adapt_sql(raw, target_db, target_schema) + statements = split_statements(sql) + + print(f"\n{'='*60}") + print(f" {step.upper()}: {filename} ({len(statements)} statements)") + print(f"{'='*60}") + + for stmt in statements: + first_line = stmt.splitlines()[0].strip() + if first_line.startswith('--'): + first_line = next((l.strip() for l in stmt.splitlines() if l.strip() and not l.strip().startswith('--')), first_line) + label = first_line[:80] + + if not stmt.strip() or all(l.strip().startswith('--') or not l.strip() for l in stmt.splitlines()): + continue + + try: + cur.execute(stmt) + rows = cur.fetchall() if cur.description else [] + if rows and verbose: + cols = [d[0] for d in cur.description] + col_widths = [max(len(str(c)), max((len(str(r[i])) for r in rows), default=0)) for i, c in enumerate(cols)] + header = ' ' + ' '.join(str(c).ljust(col_widths[i]) for i, c in enumerate(cols)) + print(f"\n ✓ {label}") + print(header) + print(' ' + ' '.join('-' * w for w in col_widths)) + for row in rows[:30]: + print(' ' + ' '.join(str(row[i]).ljust(col_widths[i]) for i in range(len(cols)))) + if len(rows) > 30: + print(f" ... ({len(rows)} rows total)") + else: + status = f"{cur.rowcount} row(s)" if cur.rowcount and cur.rowcount > 0 else "ok" + print(f" ✓ {label} [{status}]") + except Exception as e: + print(f" ✗ {label}") + print(f" ERROR: {e}") + + +def list_snippets() -> list[str]: + if not os.path.isdir(SNIPPETS_DIR): + return [] + return sorted(d for d in os.listdir(SNIPPETS_DIR) if os.path.isdir(os.path.join(SNIPPETS_DIR, d))) + + +def main(): + parser = argparse.ArgumentParser(description="Run a semantic view snippet against your Snowflake account") + parser.add_argument("snippet", nargs="?", help="Snippet name (e.g. time_intelligence). Omit to list available snippets.") + parser.add_argument("--step", choices=["schema", "seed", "sv", "queries", "all"], default="all") + parser.add_argument("--db", default="SNOWFLAKE_LEARNING_DB", help="Target database (default: SNOWFLAKE_LEARNING_DB)") + parser.add_argument("--schema", default="PUBLIC", help="Target schema (default: PUBLIC)") + parser.add_argument("--connection", default=None, help="Snowflake connection name (default: active session connection)") + parser.add_argument("--quiet", action="store_true", help="Suppress query result rows") + args = parser.parse_args() + + available = list_snippets() + + if not args.snippet: + print("\nAvailable snippets:") + for s in available: + print(f" {s}") + print(f"\nUsage: python run_snippet.py [--db DB] [--schema SCHEMA]") + sys.exit(0) + + snippet_dir = os.path.join(SNIPPETS_DIR, args.snippet) + if not os.path.isdir(snippet_dir): + print(f"ERROR: snippet '{args.snippet}' not found.") + print(f"Available: {', '.join(available)}") + sys.exit(1) + + print(f"\nTarget: {args.db}.{args.schema}") + print(f"Connecting to Snowflake...") + + conn_kwargs = {} + if args.connection: + conn_kwargs["connection_name"] = args.connection + conn = snowflake.connector.connect(**conn_kwargs) + cur = conn.cursor() + cur.execute(f"USE DATABASE {args.db}") + cur.execute(f"USE SCHEMA {args.schema}") + print(f"Connected: {conn.account}") + + steps = ["schema", "seed", "sv", "queries"] if args.step == "all" else [args.step] + for step in steps: + run_step(cur, snippet_dir, step, args.db, args.schema, verbose=not args.quiet) + + cur.close() + conn.close() + print("\nDone.") + + +if __name__ == "__main__": + main() diff --git a/skills/semantic-view-patterns/snippets/README.md b/skills/semantic-view-patterns/snippets/README.md new file mode 100644 index 00000000..9a7f4d22 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/README.md @@ -0,0 +1,73 @@ +# Semantic Snippets + +Atomic, executable patterns for Snowflake Semantic Views. Each snippet covers one modeling concept end-to-end: the use case, example schema with seed data, the SV DDL, and example queries showing what works and what doesn't. + +## How to Use + +Each snippet is self-contained and deployable against any Snowflake account. Files in each directory: + +| File | Contents | +|------|----------| +| `README.md` | Use case, how to express the need, equivalents in other tools, gotchas | +| `schema.sql` | `CREATE TABLE` DDL for the example tables | +| `seed_data.sql` | `INSERT` statements — small enough to run in any account | +| `semantic_view.sql` | The `CREATE OR REPLACE SEMANTIC VIEW` DDL | +| `queries.sql` | `SEMANTIC_VIEW()` queries — what works, what doesn't, and why | + +All SQL targets `SNIPPETS.PUBLIC` by default. Swap in your own database/schema. + +## Snippets + +### Relationship Patterns +| Directory | Concept | Use Case | +|-----------|---------|----------| +| [`range_join/`](range_join/) | Range join (BETWEEN EXCLUSIVE) | Join to the dimension record valid within an explicit start/end window (SCD2 with both dates) | +| [`asof_join/`](asof_join/) | ASOF join | Join to the most recent dimension record active *as of* the event date — no end date required | +| [`multi_path_metrics/`](multi_path_metrics/) | USING clause | Disambiguate when a fact has two range relationships to the same dimension table | +| [`shared_degenerate_dimension/`](shared_degenerate_dimension/) | Shared degenerate dimension | Two facts share a low-cardinality column (`region`, `status`) — union distinct values into a helper, create one shared dimension entity | + +### Metric Patterns +| Directory | Concept | Use Case | +|-----------|---------|----------| +| [`semi_additive_metric/`](semi_additive_metric/) | Semi-additive / NON ADDITIVE BY | Snapshot data where summing across time double-counts (balances, headcount, inventory) | +| [`window_metrics/`](window_metrics/) | Window functions (LAG, rolling AVG, YTD) | Period-over-period comparisons, smoothed trends, year-to-date cumulative totals | +| [`derived_metrics/`](derived_metrics/) | Cross-table derived metrics | Totals, ratios, and % of total across multiple fact tables | +| [`time_intelligence/`](time_intelligence/) | Role-playing aliases + computed-FK FACTS | SPLY, SPLM, YoY%, MoM% — no window functions; date shift lives in the join key | + +### Entity & Dimension Patterns +| Directory | Concept | Use Case | +|-----------|---------|----------| +| [`entity_facts/`](entity_facts/) | Entity-level aggregated facts + calculated dims | Customer LTV aggregated from orders; derived value segments; calculated age from birth year | +| [`variables/`](variables/) | VARIABLES clause | Parameterized SVs with runtime-adjustable weights, thresholds, and date windows | + +### Multi-Fact Patterns +| Directory | Concept | Use Case | +|-----------|---------|----------| +| [`multi_fact_table/`](multi_fact_table/) | Multiple fact tables | Store, web, and returns as independent facts sharing a product and date dimension | + +### AI & Governance +| Directory | Concept | Use Case | +|-----------|---------|----------| +| [`ai_metadata/`](ai_metadata/) | AI_SQL_GENERATION, AI_QUESTION_CATEGORIZATION, AI_VERIFIED_QUERIES | Steer Cortex Analyst query style, scope, and pre-approved SQL | +| [`tags/`](tags/) | `WITH TAG` on metrics | Tag metrics with owner/status metadata; discover via `tag_references()` | + +### Ops & Tooling +| Directory | Concept | Use Case | +|-----------|---------|----------| +| [`introspection/`](introspection/) | SHOW METRICS, SHOW DIMENSIONS, DESCRIBE, get_lineage() | Discover what's in a SV, check metric-dimension compatibility, trace data lineage | +| [`standard_sql/`](standard_sql/) | Standard SQL on SVs | Query a SV like a view with plain SELECT; `ANY_VALUE()`, metric-less dim queries | + +### Inline SV ⚠️ *Private Preview* +| Directory | Concept | Use Case | +|-----------|---------|----------| +| [`inline_sv/`](inline_sv/) | Inline SV + SQL subquery as table | Ad-hoc SV CTEs for testing; SQL subquery as inline table definition — inline SV syntax requires account enablement | + +### Data Scoping ⚠️ *Private Preview* +| Directory | Concept | Use Case | +|-----------|---------|----------| +| [`scoped_dataset/`](scoped_dataset/) | SQL query as logical table (LOB scoping) | Embed `WHERE lob='Enterprise'` directly in the TABLES clause to create one SV per LOB/segment from a single source table | + +### Performance ⚠️ *Private Preview* +| Directory | Concept | Use Case | +|-----------|---------|----------| +| [`materialization/`](materialization/) | Semantic view materialization | Pre-aggregate selected dimension/metric combinations to speed up repeated rollup queries; use `IMMUTABLE WHERE` for incremental refresh of historical data | diff --git a/skills/semantic-view-patterns/snippets/accumulating_snapshot/README.md b/skills/semantic-view-patterns/snippets/accumulating_snapshot/README.md new file mode 100644 index 00000000..a1916fe2 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/accumulating_snapshot/README.md @@ -0,0 +1,162 @@ +# Accumulating Snapshot Fact Table + +## The Problem + +You're modeling a multi-stage business pipeline — loan origination, hiring, claims processing, SaaS trial-to-paid — where each entity moves through a sequence of milestones. Each milestone has its own date, and you want to analyze each stage independently: + +- "How many applications did we receive in January?" +- "How many loans funded in Q1?" +- "What was our review-to-decision rate last month?" + +A standard fact table with one date column can't answer all these questions from a single table. A separate fact table per stage works but forces analysts to write multi-table joins and risks inconsistent metric definitions. + +**How You Might Express This Need:** +- "I want to see the full loan funnel — applications, reviews, decisions, fundings — all in one table." +- "How do I model a pipeline where each stage has its own date?" +- "I want conversion rates between stages, but the stages happen on different dates." +- "My data team calls this an 'accumulating snapshot' — how do I build a SV for it?" + +## The Solution: Accumulating Snapshot + USING per Stage + +Kimball's **Accumulating Snapshot Fact Table** puts one row per business entity (one per loan application). The row accumulates updates as the entity moves through stages — milestone date columns start NULL and get filled in when each stage is reached. + +```sql +-- One row per application; milestone dates NULL until stage reached +LOAN_APPLICATIONS + application_id -- PK + application_date -- always set + review_date -- NULL until underwriting starts + decision_date -- NULL until approved or denied + funding_date -- NULL until funded + funded_amount -- NULL until funded +``` + +In the Semantic View, a **single `date_dim` alias** serves all four milestone paths. Each stage metric declares its own date relationship with `USING`: + +```sql +RELATIONSHIPS ( + app_to_application_date AS applications(APPLICATION_DATE) REFERENCES date_dim(DATE_KEY) + app_to_review_date AS applications(REVIEW_DATE) REFERENCES date_dim(DATE_KEY) + app_to_decision_date AS applications(DECISION_DATE) REFERENCES date_dim(DATE_KEY) + app_to_funding_date AS applications(FUNDING_DATE) REFERENCES date_dim(DATE_KEY) +) + +METRICS ( + -- USING (relationship) comes BEFORE AS — declares the date path for this metric + applications.application_count USING (app_to_application_date) AS COUNT(APPLICATION_ID) + applications.review_count USING (app_to_review_date) AS COUNT(REVIEW_DATE) + applications.decision_count USING (app_to_decision_date) AS COUNT(DECISION_DATE) + applications.funding_count USING (app_to_funding_date) AS COUNT(FUNDING_DATE) +) +``` + +When grouped by `date_dim.month`, each metric independently uses its own date path. `application_count` buckets by `APPLICATION_DATE`; `funding_count` buckets by `FUNDING_DATE` — in a single query. + +## What the Demo Shows + +12 loan applications across January–March 2025 (4 per month). The funnel narrows naturally: + +| Stage | Count | Notes | +|-------|-------|-------| +| Applications | 12 | 4 per month | +| Reviews | 10 | 2 not yet reviewed | +| Decisions | 7 | 3 in review, not yet decided | +| Fundings | 5 | 2 denied/withdrawn | + +Milestone dates may differ from application date — a January application may not fund until February or March. This cross-stage date shift is what makes USING essential. + +**Full funnel by application month** (Q3 — all 4 metrics in one query): + +``` +YEAR MONTH APPLICATION_COUNT REVIEW_COUNT DECISION_COUNT FUNDING_COUNT +2025 January 4 4 3 2 +2025 February 4 3 2 2 +2025 March 4 3 2 1 +``` + +Each column uses a different date path under the hood. + +**Conversion rates by channel** (Q5): + +``` +CHANNEL APPLICATION_COUNT REVIEW_RATE DECISION_RATE FUNDING_RATE +Direct Mail 2 1.00 0.50 0.50 +Organic 5 0.80 0.60 0.40 +Paid Search 3 1.00 1.00 0.67 +Referral 2 0.50 0.00 0.00 +``` + +## USING Clause Syntax + +```sql +-- CORRECT: USING comes BEFORE AS +applications.funding_count USING (app_to_funding_date) AS COUNT(FUNDING_DATE) + +-- WRONG: USING after AS — will error at deploy time +applications.funding_count AS COUNT(FUNDING_DATE) USING (app_to_funding_date) +``` + +## Derived Metrics Referencing USING-Scoped Metrics + +Derived metrics that combine USING-scoped constituents must be defined **without an entity prefix on the left side**: + +```sql +-- CORRECT: no entity prefix on the left +, funding_rate AS DIV0(applications.funding_count, applications.application_count) + +-- WRONG: entity prefix on left causes compilation error +, applications.funding_rate AS DIV0(applications.funding_count, applications.application_count) +``` + +The constituent references on the right side (`applications.funding_count`) still need the entity prefix. + +## What Doesn't Work + +- **Cohort analysis**: The conversion rates here are *same-period* ratios, not cohort-based. `funding_rate` for January = fundings-in-January ÷ applications-in-January, not "of all January applications, how many eventually funded." January applications that fund in February are NOT counted in January's `funding_count` — they appear in February's. True cohort analysis requires a different model structure (e.g., a pre-aggregated cohort summary table). + +- **NULL milestone dates and COUNT**: `COUNT(REVIEW_DATE)` naturally skips NULLs, so non-reviewed applications are automatically excluded. This is intentional — it's what makes the pattern work. `COUNT(APPLICATION_ID)` counts all rows regardless. + +- **NULL row in output**: When grouping by a date dimension, applications with NULL milestone dates (e.g., unfunded loans when querying `funding_count`) produce a NULL dimension row. This is expected LEFT JOIN behavior. + +- **Mixing USING and non-USING metrics in one query**: Works correctly — each metric independently resolves its own date path. The NULL row appears when any metric in the query has a NULL date for some rows. + +## Accumulating Snapshot vs. Role-Playing Dimensions + +Both patterns handle multiple relationships to the same physical date table. The key distinction: + +| | Accumulating Snapshot (this snippet) | Role-Playing Dimensions (`role_playing_dimensions`) | +|--|--|--| +| Aliases in TABLES | One `date_dim` alias | Two aliases: `order_date_dim`, `ship_date_dim` | +| Disambiguation | `USING` on each metric | None needed — each alias has unique dim names | +| Date dimensions | Shared: `year`, `month_name` (resolve differently per metric) | Independent: `order_year`, `ship_year` | +| Use both dates together? | No — USING locks each metric to one path | Yes — produces cross-tab | +| Best for | Stage-based pipeline funnels | Entity with two independent date attributes | + +Use **accumulating snapshot + USING** when you have one entity moving through sequential stages. +Use **role-playing dimensions** when you have multiple independent date attributes (order date *and* ship date) that analysts need to group by simultaneously. + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **Power BI** | Separate inactive relationships to DIM_DATE; DAX measures use `USERELATIONSHIP()` to activate the correct path per metric. Verbose — each measure must repeat the relationship reference. | +| **Tableau** | Typically requires either a UNION of stage-level fact tables or a pre-pivoted "funnel summary" table. No clean accumulating snapshot pattern in the native semantic layer. | +| **LookML** | `dimension: review_date` with `fanout_on: applications`; derived measures using `sql_table_name` overrides. Requires careful handling to avoid double-counting. | +| **dbt** | Model the accumulating snapshot in SQL; metrics layer (dbt Semantic Layer / MetricFlow) can define multiple `time_grains` but doesn't natively handle per-metric date disambiguation at query time. | +| **Raw SQL** | Four LEFT JOINs to DIM_DATE with aliases (`app_date`, `review_date_dim`, etc.); each COUNT wrapped in a CASE or separate CTE. The SV encodes this join structure once and exposes it cleanly. | + +## Docs + +- [Semantic View — USING clause on metrics](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#metrics) +- [Semantic View — RELATIONSHIPS clause](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#relationships) +- [SEMANTIC_VIEW() table function](https://docs.snowflake.com/en/sql-reference/functions/semantic_view) +- [Kimball Group — Accumulating Snapshot Fact Tables](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/accumulating-snapshot-fact-table/) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `DIM_DATE` calendar table + `LOAN_APPLICATIONS` accumulating snapshot fact table | +| `seed_data.sql` | 12 loan applications (Jan–Mar 2025) + 26 DIM_DATE rows; funnel: 12→10→7→5 | +| `semantic_view.sql` | `LOAN_PIPELINE_SV` — one `date_dim` alias, four milestone relationships, USING per stage metric | +| `queries.sql` | Five verification queries with expected outputs: single-stage, multi-stage, rates by product and channel | diff --git a/skills/semantic-view-patterns/snippets/accumulating_snapshot/queries.sql b/skills/semantic-view-patterns/snippets/accumulating_snapshot/queries.sql new file mode 100644 index 00000000..c4720cdd --- /dev/null +++ b/skills/semantic-view-patterns/snippets/accumulating_snapshot/queries.sql @@ -0,0 +1,102 @@ +-- Accumulating Snapshot: Verification Queries +-- +-- All queries use SEMANTIC_VIEW() against LOAN_PIPELINE_SV. +-- Expected outputs are from live runs against the seed data. + +-- ── Q1: Applications by application month ───────────────────────────────── +-- Baseline: all 12 applications, dated by when they were submitted. +-- Expected: 4 rows (Jan=4, Feb=4, Mar=4) + NULL row for metrics with no date match + +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.LOAN_PIPELINE_SV + DIMENSIONS date_dim.year, date_dim.month_num, date_dim.month_name + METRICS applications.application_count +) +ORDER BY year, month_num; + +-- | YEAR | MONTH_NUM | MONTH_NAME | APPLICATION_COUNT | +-- |------|-----------|------------|-------------------| +-- | 2025 | 1 | January | 4 | +-- | 2025 | 2 | February | 4 | +-- | 2025 | 3 | March | 4 | +-- | NULL | NULL | | NULL | + +-- ── Q2: Fundings by FUNDING month ───────────────────────────────────────── +-- The USING clause switches the date path to FUNDING_DATE. +-- Fundings are distributed differently than applications — some lag 1-2 months. +-- Expected: Jan=2, Feb=2, Mar=1 (5 total funded; NULL row for unfunded NULLs → 0) + +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.LOAN_PIPELINE_SV + DIMENSIONS date_dim.year, date_dim.month_num, date_dim.month_name + METRICS applications.funding_count +) +ORDER BY year, month_num; + +-- | YEAR | MONTH_NUM | MONTH_NAME | FUNDING_COUNT | +-- |------|-----------|------------|---------------| +-- | 2025 | 1 | January | 2 | +-- | 2025 | 2 | February | 2 | +-- | 2025 | 3 | March | 1 | +-- | NULL | NULL | | 0 | + +-- ── Q3: Full funnel — all 4 stage counts in one query ───────────────────── +-- The critical test: can multiple USING-scoped metrics share one date dimension? +-- Each metric independently resolves its date path via USING. +-- When grouped by date_dim.month, each count is bucketed by ITS OWN milestone date. +-- (application_count by APPLICATION_DATE, review_count by REVIEW_DATE, etc.) +-- Expected funnel by application month: 12→10→7→5 spread across Jan/Feb/Mar + +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.LOAN_PIPELINE_SV + DIMENSIONS date_dim.year, date_dim.month_num, date_dim.month_name + METRICS applications.application_count, applications.review_count, + applications.decision_count, applications.funding_count +) +ORDER BY year, month_num; + +-- | YEAR | MONTH_NUM | MONTH_NAME | APPLICATION_COUNT | REVIEW_COUNT | DECISION_COUNT | FUNDING_COUNT | +-- |------|-----------|------------|-------------------|--------------|----------------|---------------| +-- | 2025 | 1 | January | 4 | 4 | 3 | 2 | +-- | 2025 | 2 | February | 4 | 3 | 2 | 2 | +-- | 2025 | 3 | March | 4 | 3 | 2 | 1 | +-- | NULL | NULL | | NULL | 0 | 0 | 0 | + +-- ── Q4: Conversion rates by loan product ────────────────────────────────── +-- Derived metrics (funding_rate) reference USING-scoped constituent metrics. +-- No date dimension needed — product is a non-date attribute. +-- Note: Student Refi has 0% funding rate (no funded loans in seed data). + +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.LOAN_PIPELINE_SV + DIMENSIONS applications.loan_product + METRICS applications.application_count, applications.funding_count, + funding_rate +) +ORDER BY loan_product; + +-- | LOAN_PRODUCT | APPLICATION_COUNT | FUNDING_COUNT | FUNDING_RATE | +-- |---------------|-------------------|---------------|--------------| +-- | Home Equity | 2 | 1 | 0.500000 | +-- | Personal Loan | 7 | 4 | 0.571429 | +-- | Student Refi | 3 | 0 | 0.000000 | + +-- ── Q5: Full funnel rates by channel ────────────────────────────────────── +-- Referral channel: 2 applications, 1 review, 0 decisions → 0% decision and funding rate. +-- Paid Search: perfect 100% review and decision rate, 67% funding rate. + +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.LOAN_PIPELINE_SV + DIMENSIONS applications.channel + METRICS applications.application_count, applications.review_count, + applications.decision_count, applications.funding_count, + review_rate, decision_rate, funding_rate +) +ORDER BY channel; + +-- | CHANNEL | APPLICATION_COUNT | REVIEW_COUNT | DECISION_COUNT | FUNDING_COUNT | REVIEW_RATE | DECISION_RATE | FUNDING_RATE | +-- |-------------|-------------------|--------------|----------------|---------------|-------------|---------------|--------------| +-- | Direct Mail | 2 | 2 | 1 | 1 | 1.000000 | 0.500000 | 0.500000 | +-- | Organic | 5 | 4 | 3 | 2 | 0.800000 | 0.600000 | 0.400000 | +-- | Paid Search | 3 | 3 | 3 | 2 | 1.000000 | 1.000000 | 0.666667 | +-- | Referral | 2 | 1 | 0 | 0 | 0.500000 | 0.000000 | 0.000000 | diff --git a/skills/semantic-view-patterns/snippets/accumulating_snapshot/schema.sql b/skills/semantic-view-patterns/snippets/accumulating_snapshot/schema.sql new file mode 100644 index 00000000..3e3414bc --- /dev/null +++ b/skills/semantic-view-patterns/snippets/accumulating_snapshot/schema.sql @@ -0,0 +1,48 @@ +-- Accumulating Snapshot: Schema Setup +-- +-- Kimball's "Accumulating Snapshot Fact Table" pattern: +-- one row per business entity (loan application), updated as it moves +-- through pipeline stages. Each milestone gets its own date column. +-- +-- Four milestone FKs all reference the same DIM_DATE table. +-- In the Semantic View, each stage metric uses USING to route through +-- the correct date relationship — no ambiguity, no dedicated date alias per stage. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- DIMENSION TABLE +-- ============================================================ + +CREATE OR REPLACE TABLE DIM_DATE ( + date_key DATE NOT NULL, + month_num INTEGER NOT NULL, + month_name VARCHAR(10) NOT NULL, + quarter VARCHAR(2) NOT NULL, + year INTEGER NOT NULL, + CONSTRAINT pk_dim_date PRIMARY KEY (date_key) +); + +-- ============================================================ +-- FACT TABLE — Accumulating Snapshot +-- ============================================================ + +-- One row per loan application. Milestone columns are NULL until +-- the application reaches that stage. FUNDED_AMOUNT is NULL for +-- denied, withdrawn, or in-progress applications. +CREATE OR REPLACE TABLE LOAN_APPLICATIONS ( + application_id INTEGER NOT NULL, + loan_product VARCHAR(20) NOT NULL, -- Personal Loan | Student Refi | Home Equity + state VARCHAR(2) NOT NULL, + channel VARCHAR(20) NOT NULL, -- Organic | Paid Search | Referral | Direct Mail + -- Milestone timestamps — FK to DIM_DATE, NULL until stage reached + application_date DATE NOT NULL, -- always set at row creation + review_date DATE, -- set when underwriting starts + decision_date DATE, -- set when approved or denied + funding_date DATE, -- set only for approved + funded loans + -- Measures + requested_amount NUMBER(10,2) NOT NULL, + funded_amount NUMBER(10,2), -- NULL until funded + CONSTRAINT pk_loan_app PRIMARY KEY (application_id) +); diff --git a/skills/semantic-view-patterns/snippets/accumulating_snapshot/seed_data.sql b/skills/semantic-view-patterns/snippets/accumulating_snapshot/seed_data.sql new file mode 100644 index 00000000..0f40ccfc --- /dev/null +++ b/skills/semantic-view-patterns/snippets/accumulating_snapshot/seed_data.sql @@ -0,0 +1,80 @@ +-- Accumulating Snapshot: Seed Data +-- +-- 12 loan applications modeled after a B2C lender (SoFi-style). +-- Funnel: 12 applied → 10 reviewed → 7 decisions → 5 funded +-- Conversion rates: review 83%, decision 58%, funding 42% +-- +-- Three months of applications (Jan–Mar 2025), 4 per month. +-- Some applications are still in-flight (NULL milestones) — that's the point. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- DIM_DATE — one row per distinct date referenced in LOAN_APPLICATIONS +-- ============================================================ + +INSERT INTO DIM_DATE (date_key, month_num, month_name, quarter, year) VALUES + -- January 2025 + ('2025-01-06', 1, 'January', 'Q1', 2025), + ('2025-01-08', 1, 'January', 'Q1', 2025), + ('2025-01-10', 1, 'January', 'Q1', 2025), + ('2025-01-13', 1, 'January', 'Q1', 2025), + ('2025-01-15', 1, 'January', 'Q1', 2025), + ('2025-01-16', 1, 'January', 'Q1', 2025), + ('2025-01-18', 1, 'January', 'Q1', 2025), + ('2025-01-22', 1, 'January', 'Q1', 2025), + ('2025-01-25', 1, 'January', 'Q1', 2025), + -- February 2025 + ('2025-02-03', 2, 'February', 'Q1', 2025), + ('2025-02-06', 2, 'February', 'Q1', 2025), + ('2025-02-10', 2, 'February', 'Q1', 2025), + ('2025-02-11', 2, 'February', 'Q1', 2025), + ('2025-02-13', 2, 'February', 'Q1', 2025), + ('2025-02-17', 2, 'February', 'Q1', 2025), + ('2025-02-18', 2, 'February', 'Q1', 2025), + ('2025-02-20', 2, 'February', 'Q1', 2025), + ('2025-02-24', 2, 'February', 'Q1', 2025), + -- March 2025 + ('2025-03-03', 3, 'March', 'Q1', 2025), + ('2025-03-06', 3, 'March', 'Q1', 2025), + ('2025-03-10', 3, 'March', 'Q1', 2025), + ('2025-03-12', 3, 'March', 'Q1', 2025), + ('2025-03-13', 3, 'March', 'Q1', 2025), + ('2025-03-17', 3, 'March', 'Q1', 2025), + ('2025-03-20', 3, 'March', 'Q1', 2025), + ('2025-03-24', 3, 'March', 'Q1', 2025); + +-- ============================================================ +-- LOAN_APPLICATIONS — 12 rows across 3 months +-- NULL milestone = not yet reached +-- ============================================================ + +INSERT INTO LOAN_APPLICATIONS ( + application_id, loan_product, state, channel, + application_date, review_date, decision_date, funding_date, + requested_amount, funded_amount +) VALUES + -- January cohort (4 apps) — mostly complete + (1, 'Personal Loan', 'CA', 'Organic', '2025-01-06', '2025-01-08', '2025-01-13', '2025-01-15', 25000, 25000), + (2, 'Personal Loan', 'TX', 'Paid Search', '2025-01-10', '2025-01-13', '2025-01-16', '2025-01-18', 15000, 15000), + (3, 'Student Refi', 'CA', 'Organic', '2025-01-15', '2025-01-18', '2025-01-22', NULL, 45000, NULL), -- denied + (4, 'Personal Loan', 'FL', 'Referral', '2025-01-22', '2025-01-25', NULL, NULL, 12000, NULL), -- in decision + + -- February cohort (4 apps) — partially complete + (5, 'Home Equity', 'NY', 'Direct Mail', '2025-02-03', '2025-02-06', '2025-02-11', '2025-02-13', 80000, 80000), + (6, 'Personal Loan', 'CA', 'Paid Search', '2025-02-10', '2025-02-13', '2025-02-18', '2025-02-20', 20000, 20000), + (7, 'Student Refi', 'TX', 'Organic', '2025-02-17', '2025-02-20', NULL, NULL, 55000, NULL), -- in decision + (8, 'Personal Loan', 'WA', 'Referral', '2025-02-24', NULL, NULL, NULL, 8000, NULL), -- just applied + + -- March cohort (4 apps) — early stage + (9, 'Personal Loan', 'CA', 'Organic', '2025-03-03', '2025-03-06', '2025-03-10', '2025-03-12', 18000, 18000), + (10, 'Home Equity', 'FL', 'Paid Search', '2025-03-10', '2025-03-13', '2025-03-17', NULL, 120000, NULL), -- denied + (11, 'Student Refi', 'NY', 'Direct Mail', '2025-03-17', '2025-03-20', NULL, NULL, 40000, NULL), -- in decision + (12, 'Personal Loan', 'TX', 'Organic', '2025-03-24', NULL, NULL, NULL, 10000, NULL); -- just applied + +-- Funnel summary: +-- Applied: 12 (all rows have application_date) +-- Reviewed: 10 (rows 1-7, 9-11 have review_date; 8 and 12 are NULL) +-- Decided: 7 (rows 1-3, 5-6, 9-10 have decision_date) +-- Funded: 5 (rows 1, 2, 5, 6, 9 have funding_date) diff --git a/skills/semantic-view-patterns/snippets/accumulating_snapshot/semantic_view.sql b/skills/semantic-view-patterns/snippets/accumulating_snapshot/semantic_view.sql new file mode 100644 index 00000000..3eccc9b7 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/accumulating_snapshot/semantic_view.sql @@ -0,0 +1,129 @@ +-- Accumulating Snapshot: Semantic View DDL +-- +-- Pattern: one DIM_DATE alias, four relationships (one per milestone). +-- Each stage metric uses USING to declare which date relationship it counts through. +-- This is the multi-path metrics pattern applied to a funnel. +-- +-- Syntax: entity.logical_name USING (relationship) AS physical_expression +-- USING comes BEFORE AS + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.LOAN_PIPELINE_SV + + TABLES ( + applications AS SNIPPETS.PUBLIC.LOAN_APPLICATIONS + PRIMARY KEY (APPLICATION_ID) + + -- One date dimension alias, four relationships pointing into it. + -- Each USING clause on a metric selects which relationship to traverse. + , date_dim AS SNIPPETS.PUBLIC.DIM_DATE + PRIMARY KEY (DATE_KEY) + ) + + RELATIONSHIPS ( + -- Four milestone paths — all lead to the same DIM_DATE table + app_to_application_date AS applications(APPLICATION_DATE) + REFERENCES date_dim(DATE_KEY) + + , app_to_review_date AS applications(REVIEW_DATE) + REFERENCES date_dim(DATE_KEY) + + , app_to_decision_date AS applications(DECISION_DATE) + REFERENCES date_dim(DATE_KEY) + + , app_to_funding_date AS applications(FUNDING_DATE) + REFERENCES date_dim(DATE_KEY) + ) + + FACTS ( + -- logical: requested_amount → physical: REQUESTED_AMOUNT + applications.requested_amount AS REQUESTED_AMOUNT + + -- logical: funded_amount → physical: FUNDED_AMOUNT (NULL until loan funds) + , applications.funded_amount AS FUNDED_AMOUNT + ) + + DIMENSIONS ( + -- Application attributes — no date path needed + applications.loan_product AS LOAN_PRODUCT + WITH SYNONYMS ('product', 'loan type', 'product type') + , applications.state AS STATE + WITH SYNONYMS ('state', 'us state', 'geography') + , applications.channel AS CHANNEL + WITH SYNONYMS ('channel', 'acquisition channel', 'marketing channel') + + -- Date dimension — the same columns serve all four milestone roles via USING + , date_dim.month_name AS MONTH_NAME + WITH SYNONYMS ('month', 'month name') + , date_dim.month_num AS MONTH_NUM + WITH SYNONYMS ('month number') + , date_dim.quarter AS QUARTER + WITH SYNONYMS ('quarter', 'qtr') + , date_dim.year AS YEAR + WITH SYNONYMS ('year') + ) + + METRICS ( + -- ── Stage counts ───────────────────────────────────────────────────────── + -- USING (relationship) comes before AS — declares the date path for this metric. + -- "Count of X by the date that X happened." + + -- Applications submitted — dated by APPLICATION_DATE + applications.application_count USING (app_to_application_date) AS COUNT(APPLICATION_ID) + WITH SYNONYMS ('applications', 'apps submitted', 'application volume') + COMMENT = 'Count of submitted applications, dated by application_date' + + -- Reviews started — COUNT(REVIEW_DATE) skips NULLs (not-yet-reviewed apps) + , applications.review_count USING (app_to_review_date) AS COUNT(REVIEW_DATE) + WITH SYNONYMS ('reviews', 'reviews started', 'underwriting count') + COMMENT = 'Count of applications that entered review, dated by review_date' + + -- Decisions made (approved or denied) + , applications.decision_count USING (app_to_decision_date) AS COUNT(DECISION_DATE) + WITH SYNONYMS ('decisions', 'decisions made', 'approvals and denials') + COMMENT = 'Count of applications with a final decision, dated by decision_date' + + -- Loans funded — COUNT(FUNDING_DATE) skips denied/in-progress applications + , applications.funding_count USING (app_to_funding_date) AS COUNT(FUNDING_DATE) + WITH SYNONYMS ('fundings', 'funded loans', 'loan count', 'originations') + COMMENT = 'Count of funded loans, dated by funding_date' + + -- ── Dollar volumes ──────────────────────────────────────────────────────── + , applications.total_requested USING (app_to_application_date) AS SUM(REQUESTED_AMOUNT) + WITH SYNONYMS ('requested amount', 'application volume dollars', 'pipeline value') + + , applications.total_funded USING (app_to_funding_date) AS SUM(FUNDED_AMOUNT) + WITH SYNONYMS ('funded amount', 'origination volume', 'funded dollars') + + -- ── Funnel conversion rates ─────────────────────────────────────────────── + -- Derived metrics combine stage counts from different USING paths. + -- When grouped by date_dim.month, each constituent is counted in its own + -- date bucket — the ratio is same-period, NOT cohort-based (see GOTCHAS). + , applications.review_rate AS DIV0(review_count, application_count) + WITH SYNONYMS ('review rate', 'application to review rate') + COMMENT = 'Fraction of applications that entered review (same-period)' + + , applications.decision_rate AS DIV0(decision_count, application_count) + WITH SYNONYMS ('decision rate', 'approval rate', 'application to decision rate') + COMMENT = 'Fraction of applications that received a decision (same-period)' + + , applications.funding_rate AS DIV0(funding_count, application_count) + WITH SYNONYMS ('funding rate', 'close rate', 'conversion rate', 'pull-through rate') + COMMENT = 'Fraction of applications that funded (same-period, not cohort-based)' + ) + + COMMENT = 'Loan origination pipeline modeled as an Accumulating Snapshot Fact Table (Kimball). One row per application; four milestone dates (application, review, decision, funding). Each stage metric uses USING to declare its date relationship — enabling stage-specific time analysis from a single DIM_DATE alias.' + + AI_SQL_GENERATION 'This SV models a loan origination funnel as an Accumulating Snapshot Fact Table. One DIM_DATE alias with four milestone relationships; each metric uses USING to declare which milestone date it is counted against. + +Stage count metrics and their date paths: + application_count USING (app_to_application_date) → APPLICATION_DATE + review_count USING (app_to_review_date) → REVIEW_DATE + decision_count USING (app_to_decision_date) → DECISION_DATE + funding_count USING (app_to_funding_date) → FUNDING_DATE + +To analyze a single stage over time: use that stage metric alone with date_dim.year / date_dim.month_name dimensions. +Funnel conversion metrics (review_rate, decision_rate, funding_rate) are same-period ratios. +To slice by loan type or geography, add applications.loan_product / applications.state — these do not require a date path.'; diff --git a/skills/semantic-view-patterns/snippets/accumulating_snapshot/semantic_view.yaml b/skills/semantic-view-patterns/snippets/accumulating_snapshot/semantic_view.yaml new file mode 100644 index 00000000..f22bd52d --- /dev/null +++ b/skills/semantic-view-patterns/snippets/accumulating_snapshot/semantic_view.yaml @@ -0,0 +1,193 @@ +# Accumulating Snapshot: Semantic View YAML +# +# This file is the canonical YAML for LOAN_PIPELINE_SV, exported via: +# SELECT SYSTEM$READ_YAML_FROM_SEMANTIC_VIEW('SEMANTIC_SKILLS.SNIPPETS.LOAN_PIPELINE_SV'); +# then lightly formatted and annotated. +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# Note: AI_SQL_GENERATION maps to module_custom_instructions: sql_generation: in YAML. + +name: LOAN_PIPELINE_SV + +tables: + - name: APPLICATIONS + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: LOAN_APPLICATIONS + primary_key: + columns: + - APPLICATION_ID + dimensions: + - name: CHANNEL + synonyms: + - channel + - acquisition channel + - marketing channel + expr: CHANNEL + data_type: VARCHAR(20) + - name: LOAN_PRODUCT + synonyms: + - product + - loan type + - product type + expr: LOAN_PRODUCT + data_type: VARCHAR(20) + - name: STATE + synonyms: + - state + - us state + - geography + expr: STATE + data_type: VARCHAR(2) + facts: + - name: FUNDED_AMOUNT + expr: FUNDED_AMOUNT + data_type: NUMBER(10,2) + access_modifier: public_access + - name: REQUESTED_AMOUNT + expr: REQUESTED_AMOUNT + data_type: NUMBER(10,2) + access_modifier: public_access + metrics: + # using_relationships is the YAML equivalent of DDL's USING (relationship_name) + - name: APPLICATION_COUNT + synonyms: + - applications + - apps submitted + - application volume + expr: COUNT(APPLICATION_ID) + access_modifier: public_access + using_relationships: + - APP_TO_APPLICATION_DATE + - name: REVIEW_COUNT + synonyms: + - reviews + - reviews started + - underwriting count + expr: COUNT(REVIEW_DATE) + access_modifier: public_access + using_relationships: + - APP_TO_REVIEW_DATE + - name: DECISION_COUNT + synonyms: + - decisions + - decisions made + - approvals and denials + expr: COUNT(DECISION_DATE) + access_modifier: public_access + using_relationships: + - APP_TO_DECISION_DATE + - name: FUNDING_COUNT + synonyms: + - fundings + - funded loans + - originations + expr: COUNT(FUNDING_DATE) + access_modifier: public_access + using_relationships: + - APP_TO_FUNDING_DATE + - name: TOTAL_REQUESTED + synonyms: + - requested amount + - pipeline value + expr: SUM(REQUESTED_AMOUNT) + access_modifier: public_access + using_relationships: + - APP_TO_APPLICATION_DATE + - name: TOTAL_FUNDED + synonyms: + - funded amount + - origination volume + expr: SUM(FUNDED_AMOUNT) + access_modifier: public_access + using_relationships: + - APP_TO_FUNDING_DATE + + - name: DATE_DIM + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DIM_DATE + primary_key: + columns: + - DATE_KEY + dimensions: + - name: MONTH_NAME + synonyms: + - month + - month name + expr: MONTH_NAME + data_type: VARCHAR(10) + - name: MONTH_NUM + synonyms: + - month number + expr: MONTH_NUM + data_type: NUMBER + - name: QUARTER + synonyms: + - quarter + - qtr + expr: QUARTER + data_type: VARCHAR(2) + - name: YEAR + synonyms: + - year + expr: YEAR + data_type: NUMBER + +relationships: + - name: APP_TO_APPLICATION_DATE + left_table: APPLICATIONS + right_table: DATE_DIM + relationship_columns: + - left_column: APPLICATION_DATE + right_column: DATE_KEY + - name: APP_TO_REVIEW_DATE + left_table: APPLICATIONS + right_table: DATE_DIM + relationship_columns: + - left_column: REVIEW_DATE + right_column: DATE_KEY + - name: APP_TO_DECISION_DATE + left_table: APPLICATIONS + right_table: DATE_DIM + relationship_columns: + - left_column: DECISION_DATE + right_column: DATE_KEY + - name: APP_TO_FUNDING_DATE + left_table: APPLICATIONS + right_table: DATE_DIM + relationship_columns: + - left_column: FUNDING_DATE + right_column: DATE_KEY + +# Derived metrics combining using_relationships-scoped metrics +# must be view-level (not nested under a table) +metrics: + - name: REVIEW_RATE + synonyms: + - review rate + - application to review rate + expr: "DIV0(applications.review_count, applications.application_count)" + access_modifier: public_access + - name: DECISION_RATE + synonyms: + - decision rate + - approval rate + - application to decision rate + expr: "DIV0(applications.decision_count, applications.application_count)" + access_modifier: public_access + - name: FUNDING_RATE + synonyms: + - funding rate + - close rate + - conversion rate + - pull-through rate + expr: "DIV0(applications.funding_count, applications.application_count)" + access_modifier: public_access diff --git a/skills/semantic-view-patterns/snippets/ai_metadata/README.md b/skills/semantic-view-patterns/snippets/ai_metadata/README.md new file mode 100644 index 00000000..275ca9ad --- /dev/null +++ b/skills/semantic-view-patterns/snippets/ai_metadata/README.md @@ -0,0 +1,73 @@ +# AI Metadata in DDL + +## The Problem + +Out-of-the-box, Cortex Analyst uses the SV's metric/dimension definitions to generate SQL. But you want to: +1. **Steer query style** (e.g. always round amounts, never include refunded orders) +2. **Control topic scope** (reject or redirect off-topic questions) +3. **Pre-approve SQL** for common questions so the AI reuses exact, verified SQL instead of regenerating + +## The Three AI Metadata Blocks + +### `AI_SQL_GENERATION` +Free-text instructions injected into every SQL generation call for this SV. Used to encode: +- Formatting preferences (`round to 2 decimal places`) +- Implicit business rules (`never include refunded orders`) +- Disambiguation hints (`use customer_name for customer breakdowns`) + +```sql +AI_SQL_GENERATION 'Always round monetary values to 2 decimal places. +When asked about revenue, never include orders with status = ''refunded''.' +``` + +### `AI_QUESTION_CATEGORIZATION` +Instructions for the intent classification step — before SQL generation. Used to: +- Define which topics the SV handles +- Reject or redirect out-of-scope questions with a natural language message + +```sql +AI_QUESTION_CATEGORIZATION 'Answer questions about revenue, orders, and customers. +Politely decline questions about PII or internal cost structure.' +``` + +### `AI_VERIFIED_QUERIES` +Pre-approved SQL paired with a natural language question. When a user's question closely matches, the engine uses this SQL verbatim — bypassing generation. + +```sql +AI_VERIFIED_QUERIES ( + order_count_by_customer AS ( + QUESTION 'How many orders does each customer have?' + VERIFIED_BY 'jklahr' + VERIFIED_AT 1750000000 + SQL 'SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_AI_SV + METRICS ai_orders.order_count + DIMENSIONS ai_customers.customer_name + ) ORDER BY order_count DESC' + ) +) +``` + +## Physical SQL VQR vs SEMANTIC_VIEW() VQR + +| | Physical SQL | SEMANTIC_VIEW() SQL | +|--|-------------|---------------------| +| Works in | AUTO mode only | AUTO + REQUIRE modes | +| Format | `SELECT col FROM table WHERE...` | `SELECT * FROM SEMANTIC_VIEW(sv METRICS ... DIMENSIONS ...)` | +| **Recommended** | Legacy | Preferred | + +Use `SEMANTIC_VIEW()` format in VQRs to ensure they work in both modes. + +## Docs + +- [CREATE SEMANTIC VIEW — AI_SQL_GENERATION / AI_QUESTION_CATEGORIZATION / AI_VERIFIED_QUERIES](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#ai-sql-generation) +- [Cortex Analyst overview](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/cortex-analyst-overview) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `ai_orders` + `ai_customers` | +| `seed_data.sql` | 6 orders, 3 customers | +| `semantic_view.sql` | SV with all three AI metadata blocks + 2 VQRs | +| `queries.sql` | Working queries + explanation of how each AI block functions | diff --git a/skills/semantic-view-patterns/snippets/ai_metadata/queries.sql b/skills/semantic-view-patterns/snippets/ai_metadata/queries.sql new file mode 100644 index 00000000..5475b289 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/ai_metadata/queries.sql @@ -0,0 +1,69 @@ +-- AI Metadata: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES (standard SEMANTIC_VIEW queries) +-- ============================================================ + +-- 1. Order count by customer name (matches the VQR exactly) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_AI_SV + DIMENSIONS ai_customers.customer_name + METRICS ai_orders.order_count +) +ORDER BY order_count DESC; + + +-- 2. Revenue by region +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_AI_SV + DIMENSIONS ai_customers.region + METRICS ai_orders.total_revenue +) +ORDER BY total_revenue DESC; + + +-- 3. Monthly revenue trend +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_AI_SV + DIMENSIONS ai_orders.order_month + METRICS ai_orders.total_revenue +) +ORDER BY order_month; + + +-- ============================================================ +-- AI METADATA IN ACTION +-- ============================================================ + +-- The AI_SQL_GENERATION instructions steer the LLM's query construction: +-- - "Round amounts to 2 decimal places" → AI will use ROUND(amount, 2) +-- - "Never include refunded orders" → AI adds WHERE status != 'refunded' + +-- The AI_QUESTION_CATEGORIZATION instructions enable pre-query steering: +-- - "Reject questions about internal cost structure" → AI responds with +-- a refusal or redirection instead of generating SQL + +-- AI_VERIFIED_QUERIES gives the engine pre-approved SQL to use verbatim +-- when a question closely matches — bypassing AI SQL generation entirely. + +-- To retrieve the VQRs from the DDL: +SHOW SEMANTIC VIEWS LIKE 'ORDERS_AI_SV' IN SNIPPETS.PUBLIC; +DESCRIBE SEMANTIC VIEW SNIPPETS.PUBLIC.ORDERS_AI_SV; + + +-- ============================================================ +-- PHYSICAL SQL VQR vs SEMANTIC_VIEW() VQR +-- ============================================================ + +-- Physical SQL VQR (in the DDL comments): +-- Works in AUTO mode only (Cortex Analyst backend). +-- SELECT ai_customers.customer_name, COUNT(ai_orders.order_id)... + +-- SEMANTIC_VIEW() SQL VQR (in the DDL above): +-- Works in both AUTO mode (Cortex Analyst) and REQUIRE mode (direct SEMANTIC_VIEW() invocation). +-- SELECT * FROM SEMANTIC_VIEW(ORDERS_AI_SV METRICS ... DIMENSIONS ...) + +-- Use SEMANTIC_VIEW() format in VQRs when you want them to work across both modes. diff --git a/skills/semantic-view-patterns/snippets/ai_metadata/schema.sql b/skills/semantic-view-patterns/snippets/ai_metadata/schema.sql new file mode 100644 index 00000000..45fb15f5 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/ai_metadata/schema.sql @@ -0,0 +1,21 @@ +-- AI Metadata: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE ai_orders ( + order_id INTEGER NOT NULL, + customer_id INTEGER NOT NULL, + amount NUMBER(10,2) NOT NULL, + status VARCHAR(20) NOT NULL, + order_date DATE NOT NULL +); + +CREATE OR REPLACE TABLE ai_customers ( + customer_id INTEGER NOT NULL, + customer_name VARCHAR(50) NOT NULL, + region VARCHAR(30) NOT NULL, + CONSTRAINT pk_ai_customers PRIMARY KEY (customer_id) +); diff --git a/skills/semantic-view-patterns/snippets/ai_metadata/seed_data.sql b/skills/semantic-view-patterns/snippets/ai_metadata/seed_data.sql new file mode 100644 index 00000000..7a3a2e45 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/ai_metadata/seed_data.sql @@ -0,0 +1,17 @@ +-- AI Metadata: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO ai_customers VALUES + (1, 'Alice Martin', 'West'), + (2, 'Bob Chen', 'East'), + (3, 'Carol White', 'West'); + +INSERT INTO ai_orders VALUES + (101, 1, 1200.00, 'completed', '2024-01-10'), + (102, 1, 800.00, 'completed', '2024-02-15'), + (103, 2, 500.00, 'completed', '2024-01-12'), + (104, 2, 300.00, 'refunded', '2024-03-01'), + (105, 3, 950.00, 'completed', '2024-02-20'), + (106, 1, 1500.00, 'pending', '2024-03-30'); diff --git a/skills/semantic-view-patterns/snippets/ai_metadata/semantic_view.sql b/skills/semantic-view-patterns/snippets/ai_metadata/semantic_view.sql new file mode 100644 index 00000000..8bf847cc --- /dev/null +++ b/skills/semantic-view-patterns/snippets/ai_metadata/semantic_view.sql @@ -0,0 +1,61 @@ +-- AI Metadata: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.ORDERS_AI_SV + + TABLES ( + ai_orders, + ai_customers UNIQUE (customer_id) + ) + + RELATIONSHIPS ( + ai_orders(customer_id) REFERENCES ai_customers + ) + + DIMENSIONS ( + ai_customers.customer_name AS customer_name + WITH SYNONYMS ('customer', 'name', 'who'), + ai_customers.region AS region + WITH SYNONYMS ('region', 'area'), + ai_orders.status AS status + WITH SYNONYMS ('order status', 'fulfillment status'), + ai_orders.order_month AS DATE_TRUNC('month', order_date) + WITH SYNONYMS ('month', 'order month') + ) + + METRICS ( + ai_orders.total_revenue AS SUM(amount) + WITH SYNONYMS ('revenue', 'total orders', 'sales'), + ai_orders.order_count AS COUNT(order_id) + WITH SYNONYMS ('orders', 'number of orders', 'order volume'), + ai_orders.avg_order_value AS AVG(amount) + WITH SYNONYMS ('AOV', 'average order', 'average order value') + ) + + -- Steers LLM query generation style — applied to every query on this SV + AI_SQL_GENERATION 'Always round monetary values to 2 decimal places. When asked about revenue, never include orders with status = ''refunded''. Use customer_name for customer-level breakdowns.' + + -- Steers how the LLM categorizes incoming questions — can reject/redirect + AI_QUESTION_CATEGORIZATION 'Answer questions about revenue, orders, and customers. Politely decline questions about individual customer PII (e.g. contact details) or internal pricing margins.' + + -- Pre-approved SQL snippets used verbatim when a question closely matches. + -- SEMANTIC_VIEW() format works in both AUTO (Cortex Analyst) and REQUIRE mode. + -- Physical SQL format works in AUTO mode only. + AI_VERIFIED_QUERIES ( + order_count_by_customer AS ( + QUESTION 'How many orders does each customer have?' + VERIFIED_BY 'jklahr' + VERIFIED_AT 1750000000 + SQL 'SELECT * FROM SEMANTIC_VIEW(SNIPPETS.PUBLIC.ORDERS_AI_SV METRICS ai_orders.order_count DIMENSIONS ai_customers.customer_name) ORDER BY order_count DESC' + ), + revenue_by_region AS ( + QUESTION 'What is the revenue by region?' + VERIFIED_BY 'jklahr' + VERIFIED_AT 1750000000 + SQL 'SELECT * FROM SEMANTIC_VIEW(SNIPPETS.PUBLIC.ORDERS_AI_SV METRICS ai_orders.total_revenue DIMENSIONS ai_customers.region) ORDER BY total_revenue DESC' + ) + ) + + COMMENT = 'Order analytics with all AI metadata: AI_SQL_GENERATION (query style hints), AI_QUESTION_CATEGORIZATION (topic steering), and AI_VERIFIED_QUERIES (pre-approved SQL for common questions).'; diff --git a/skills/semantic-view-patterns/snippets/ai_metadata/semantic_view.yaml b/skills/semantic-view-patterns/snippets/ai_metadata/semantic_view.yaml new file mode 100644 index 00000000..c61b60e6 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/ai_metadata/semantic_view.yaml @@ -0,0 +1,92 @@ +# AI Metadata: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# DDL-only features → YAML equivalents: +# AI_SQL_GENERATION → not supported in YAML; set post-deploy via ALTER SEMANTIC VIEW +# AI_QUESTION_CATEGORIZATION → not supported in YAML; set post-deploy via ALTER SEMANTIC VIEW +# AI_VERIFIED_QUERIES → YAML equivalent: verified_queries section (see below) + +name: ORDERS_AI_SV +description: > + Order analytics with AI metadata: verified queries (pre-approved SQL for + common questions). AI_SQL_GENERATION and AI_QUESTION_CATEGORIZATION + must be set post-deploy via ALTER SEMANTIC VIEW. + +tables: + - name: ai_orders + description: Order transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: AI_ORDERS + dimensions: + - name: status + synonyms: [order status, fulfillment status] + expr: STATUS + data_type: VARCHAR + - name: order_month + synonyms: [month, order month] + expr: DATE_TRUNC('month', ORDER_DATE) + data_type: DATE + metrics: + - name: total_revenue + synonyms: [revenue, total orders, sales] + expr: SUM(AMOUNT) + - name: order_count + synonyms: [orders, number of orders, order volume] + expr: COUNT(ORDER_ID) + - name: avg_order_value + synonyms: [AOV, average order, average order value] + expr: AVG(AMOUNT) + + - name: ai_customers + description: Customer master data + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: AI_CUSTOMERS + primary_key: + columns: [CUSTOMER_ID] + dimensions: + - name: customer_name + synonyms: [customer, name, who] + expr: CUSTOMER_NAME + data_type: VARCHAR + - name: region + synonyms: [region, area] + expr: REGION + data_type: VARCHAR + +relationships: + - name: orders_to_customers + left_table: ai_orders + right_table: ai_customers + relationship_columns: + - left_column: CUSTOMER_ID + right_column: CUSTOMER_ID + +# YAML equivalent of DDL's AI_VERIFIED_QUERIES block +verified_queries: + - name: order_count_by_customer + question: How many orders does each customer have? + verified_by: jklahr + verified_at: 1750000000 + sql: > + SELECT * FROM SEMANTIC_VIEW(TARGET_DB.TARGET_SCHEMA.ORDERS_AI_SV + METRICS ai_orders.order_count + DIMENSIONS ai_customers.customer_name) + ORDER BY order_count DESC + - name: revenue_by_region + question: What is the revenue by region? + verified_by: jklahr + verified_at: 1750000000 + sql: > + SELECT * FROM SEMANTIC_VIEW(TARGET_DB.TARGET_SCHEMA.ORDERS_AI_SV + METRICS ai_orders.total_revenue + DIMENSIONS ai_customers.region) + ORDER BY total_revenue DESC diff --git a/skills/semantic-view-patterns/snippets/asof_join/README.md b/skills/semantic-view-patterns/snippets/asof_join/README.md new file mode 100644 index 00000000..d3badf52 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/asof_join/README.md @@ -0,0 +1,66 @@ +# ASOF Join + +## The Problem + +You need to join a fact to the dimension record that was active **at the time of the event**, but your dimension table only has a `start_date` — no explicit end date. You want "the most recent record whose start date is on or before the event date." + +This is an **ASOF join** (as-of join): "give me the record that was in effect *as of* this date." + +**Example in this snippet**: A customer moves addresses over time. Each order should be attributed to the address the customer lived at when they placed the order. + +## ASOF vs BETWEEN EXCLUSIVE (When to Use Which) + +| | ASOF Join | Range Join (BETWEEN EXCLUSIVE) | +|--|-----------|-------------------------------| +| **Dimension has** | Only a start date | Explicit start + end date | +| **Semantics** | "Latest record on or before the event date" | "Record whose range contains the event date" | +| **NULL valid_to handling** | Automatic — no sentinel needed | Requires sentinel (e.g. `9999-12-31`) | +| **Syntax** | `REFERENCES dim(id, ASOF start_date)` | `REFERENCES dim(id, BETWEEN start AND end EXCLUSIVE)` | +| **Use when** | Address history, price lists, org hierarchy changes | SCD2 with explicit validity windows | + +## How You Might Express This Need + +- "Join orders to the address the customer was at when they ordered" +- "Show revenue by the price tier that was in effect at purchase time, but we don't have explicit end dates" +- "My dimension table has a `valid_from` but no `valid_to` — how do I join?" +- "Which account manager owned this customer when the deal closed?" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | `JOIN dim ON id = id AND dim.start_date = (SELECT MAX(start_date) FROM dim WHERE id = fact.id AND start_date <= fact.date)` | +| **dbt** | No native ASOF; requires the subquery pattern above in a Jinja model | +| **LookML** | No native support; pre-join at ETL time | +| **Power BI** | CALCULATE + FILTER to find latest active record | +| **Tableau** | FIXED LOD `{ FIXED [ID]: MAX([start_date]) }` filtered to dates ≤ event timestamp; or blend a date-filtered extract. No native ASOF join. | + +## The SV Approach + +Two things are required: + +**1. Declare UNIQUE on the entity key + start date** (no end date needed): +```sql +Customer_address UNIQUE (ca_custid, ca_start_date) +``` + +**2. Reference with ASOF**: +```sql +Orders(o_custid, o_orddate) REFERENCES Customer_address(ca_custid, ASOF ca_start_date) +``` + +This automatically finds the `Customer_address` row with the largest `ca_start_date` that is ≤ `o_orddate` for the same `ca_custid`. + +## Docs + +- [Using a date, time, timestamp, or numeric range to join logical tables (ASOF)](https://docs.snowflake.com/en/user-guide/views-semantic/sql#using-a-date-time-timestamp-or-numeric-range-to-join-logical-tables) +- [ASOF JOIN syntax reference](https://docs.snowflake.com/en/sql-reference/constructs/asof-join) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `Customer_address`, `Customer_name`, `Orders` table DDL | +| `seed_data.sql` | Address history for 2 customers, 6 orders | +| `semantic_view.sql` | SV using ASOF relationship | +| `queries.sql` | Revenue by zip code per month + comparison with naive SQL mistake | diff --git a/skills/semantic-view-patterns/snippets/asof_join/queries.sql b/skills/semantic-view-patterns/snippets/asof_join/queries.sql new file mode 100644 index 00000000..2ab86fd9 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/asof_join/queries.sql @@ -0,0 +1,62 @@ +-- ASOF Join Example: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Revenue by zip code (historically resolved address at order time) +-- Expected: +-- 90001 $300 (Ord100 + Ord101, Jan address) +-- 90002 $700 (Ord102 + Ord103, Apr address) +-- 90003 $500 (Ord104, Jul address) +-- 90010 $600 (Ord105, Cust002) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_BY_ADDRESS + DIMENSIONS Customer_address.zip + METRICS Orders.total_revenue +) +ORDER BY zip; + + +-- 2. Revenue by customer name and zip — shows address transitions clearly +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_BY_ADDRESS + DIMENSIONS Customer_name.name, Customer_address.zip + METRICS Orders.total_revenue +) +ORDER BY name, zip; + + +-- 3. Monthly revenue with zip — shows Mary moving between addresses each month +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_BY_ADDRESS + DIMENSIONS Customer_name.name, Orders.year_month, Customer_address.zip + METRICS Orders.total_revenue +) +ORDER BY year_month; + + +-- ============================================================ +-- THE MISTAKE THIS PATTERN PREVENTS +-- ============================================================ + +-- WRONG: Join on customer ID only (ignores address history) +-- Returns one "current" address per customer — all of Mary's orders +-- get attributed to her most recent zip (90003), which is wrong for +-- orders placed before she moved. +SELECT + o.o_custid, + a.ca_zipcode AS current_zip, -- always most recent + SUM(o.o_amount) AS wrong_revenue +FROM SNIPPETS.PUBLIC.Orders o +JOIN ( + SELECT ca_custid, ca_zipcode, + ROW_NUMBER() OVER (PARTITION BY ca_custid ORDER BY ca_start_date DESC) AS rn + FROM SNIPPETS.PUBLIC.Customer_address +) a ON o.o_custid = a.ca_custid AND a.rn = 1 +GROUP BY 1, 2 +ORDER BY 1; +-- Mary's $300 (zip 90001) + $700 (90002) all attributed to 90003 diff --git a/skills/semantic-view-patterns/snippets/asof_join/schema.sql b/skills/semantic-view-patterns/snippets/asof_join/schema.sql new file mode 100644 index 00000000..4c4f411e --- /dev/null +++ b/skills/semantic-view-patterns/snippets/asof_join/schema.sql @@ -0,0 +1,26 @@ +-- ASOF Join Example: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE Customer_address ( + ca_custid VARCHAR(10) NOT NULL, + ca_zipcode INTEGER NOT NULL, + ca_street_addr VARCHAR(50) NOT NULL, + ca_start_date DATE NOT NULL +); + +CREATE OR REPLACE TABLE Customer_name ( + c_custid VARCHAR(10) NOT NULL, + c_first_name VARCHAR(20) NOT NULL, + c_last_name VARCHAR(20) NOT NULL +); + +CREATE OR REPLACE TABLE Orders ( + o_ordid VARCHAR(10) NOT NULL, + o_custid VARCHAR(10) NOT NULL, + o_orddate DATE NOT NULL, + o_amount NUMBER(10,2) NOT NULL +); diff --git a/skills/semantic-view-patterns/snippets/asof_join/seed_data.sql b/skills/semantic-view-patterns/snippets/asof_join/seed_data.sql new file mode 100644 index 00000000..405a1e2e --- /dev/null +++ b/skills/semantic-view-patterns/snippets/asof_join/seed_data.sql @@ -0,0 +1,30 @@ +-- ASOF Join Example: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- Cust001 moved addresses twice; Cust002 stayed put +INSERT INTO Customer_address VALUES + ('Cust001', 90001, '100 First St.', '2024-01-01'), + ('Cust001', 90002, '200 First St.', '2024-04-01'), + ('Cust001', 90003, '300 First St.', '2024-07-01'), + ('Cust002', 90010, '10 Second St.', '2024-01-01'); + +INSERT INTO Customer_name VALUES + ('Cust001', 'Mary', 'Smith'), + ('Cust002', 'Bill', 'Wilson'); + +-- Expected address at time of order: +-- Ord100 Feb 01 → Cust001 zip 90001 (moved Apr 01) +-- Ord101 Feb 02 → Cust001 zip 90001 +-- Ord102 May 01 → Cust001 zip 90002 (moved Apr 01, not yet Jul 01) +-- Ord103 May 02 → Cust001 zip 90002 +-- Ord104 Aug 01 → Cust001 zip 90003 (moved Jul 01) +-- Ord105 Aug 02 → Cust002 zip 90010 (never moved) +INSERT INTO Orders VALUES + ('Ord100', 'Cust001', '2024-02-01', 100), + ('Ord101', 'Cust001', '2024-02-02', 200), + ('Ord102', 'Cust001', '2024-05-01', 300), + ('Ord103', 'Cust001', '2024-05-02', 400), + ('Ord104', 'Cust001', '2024-08-01', 500), + ('Ord105', 'Cust002', '2024-08-02', 600); diff --git a/skills/semantic-view-patterns/snippets/asof_join/semantic_view.sql b/skills/semantic-view-patterns/snippets/asof_join/semantic_view.sql new file mode 100644 index 00000000..cc20b874 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/asof_join/semantic_view.sql @@ -0,0 +1,43 @@ +-- ASOF Join Example: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.ORDERS_BY_ADDRESS + + TABLES ( + Customer_address UNIQUE (ca_custid, ca_start_date), + Customer_name UNIQUE (c_custid), + Orders UNIQUE (o_ordid) + ) + + RELATIONSHIPS ( + -- Join address to name (simple 1:1) + addr_to_name AS Customer_address(ca_custid) REFERENCES Customer_name, + + -- ASOF join: for each order, find the address record with the + -- largest ca_start_date that is <= o_orddate for the same customer + orders_to_addr AS Orders(o_custid, o_orddate) + REFERENCES Customer_address(ca_custid, ASOF ca_start_date) + ) + + DIMENSIONS ( + Customer_name.name AS CONCAT(c_first_name, ' ', c_last_name) + COMMENT = 'Full customer name', + Customer_address.zip AS ca_zipcode + WITH SYNONYMS ('zip code', 'postal code', 'delivery zip'), + Customer_address.street AS ca_street_addr, + Orders.year_month AS DATE_TRUNC('month', o_orddate) + WITH SYNONYMS ('order month', 'month') + ) + + METRICS ( + Orders.total_revenue AS SUM(o_amount) + WITH SYNONYMS ('revenue', 'order revenue', 'total order value'), + Orders.order_count AS COUNT(o_ordid) + WITH SYNONYMS ('number of orders', 'orders') + ) + + COMMENT = 'Orders attributed to the customer address active at time of order via ASOF join.' + + AI_SQL_GENERATION 'Use Customer_address.zip to break down orders by the delivery zip code the customer had at order time. The ASOF relationship resolves the historically-correct address automatically — no date filtering needed.'; diff --git a/skills/semantic-view-patterns/snippets/asof_join/semantic_view.yaml b/skills/semantic-view-patterns/snippets/asof_join/semantic_view.yaml new file mode 100644 index 00000000..c6bb1fd6 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/asof_join/semantic_view.yaml @@ -0,0 +1,88 @@ +# ASOF Join: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# ⚠️ ASOF JOIN LIMITATION: The YAML spec does not currently support the +# ASOF relationship syntax. This YAML defines the table structure and standard +# relationships as placeholders. Use semantic_view.sql for the full ASOF pattern. + +name: ORDERS_BY_ADDRESS +description: > + Orders attributed to the customer address active at time of order via ASOF join. + NOTE: full ASOF join requires DDL authoring (semantic_view.sql). + +tables: + - name: Orders + description: Order transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: ORDERS + primary_key: + columns: [O_ORDID] + dimensions: + - name: year_month + synonyms: [order month, month] + description: Month of order + expr: DATE_TRUNC('month', O_ORDDATE) + data_type: DATE + metrics: + - name: total_revenue + synonyms: [revenue, order revenue, total order value] + expr: SUM(O_AMOUNT) + - name: order_count + synonyms: [number of orders, orders] + expr: COUNT(O_ORDID) + + - name: Customer_address + description: Customer address history — one row per address period per customer + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CUSTOMER_ADDRESS + primary_key: + columns: [CA_CUSTID, CA_START_DATE] + dimensions: + - name: zip + synonyms: [zip code, postal code, delivery zip] + expr: CA_ZIPCODE + data_type: VARCHAR + - name: street + expr: CA_STREET_ADDR + data_type: VARCHAR + + - name: Customer_name + description: Customer master — one row per customer + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CUSTOMER_NAME + primary_key: + columns: [C_CUSTID] + dimensions: + - name: name + description: Full customer name + expr: CONCAT(C_FIRST_NAME, ' ', C_LAST_NAME) + data_type: VARCHAR + +# NOTE: YAML does not support ASOF relationship syntax. +# Full pattern requires DDL: +# orders_to_addr AS Orders(O_CUSTID, O_ORDDATE) +# REFERENCES Customer_address(CA_CUSTID, ASOF CA_START_DATE) +relationships: + - name: addr_to_name + left_table: Customer_address + right_table: Customer_name + relationship_columns: + - left_column: CA_CUSTID + right_column: C_CUSTID + - name: orders_to_addr + left_table: Orders + right_table: Customer_address + relationship_columns: + - left_column: O_CUSTID + right_column: CA_CUSTID diff --git a/skills/semantic-view-patterns/snippets/caller_rights/README.md b/skills/semantic-view-patterns/snippets/caller_rights/README.md new file mode 100644 index 00000000..9f54ee5c --- /dev/null +++ b/skills/semantic-view-patterns/snippets/caller_rights/README.md @@ -0,0 +1,91 @@ +# Caller Rights — Semantic View Access Control + +## The Problem + +By default, both standard Snowflake views and semantic views execute with **owner rights**: when a user queries the view, it runs with the *view owner's* privileges. This means a user with SELECT on the view can read data even if they have no SELECT on the underlying tables — the owner's access covers them. + +Sometimes that's exactly what you want. But for a governed semantic layer, it creates a problem: you may want to ensure that users can only query the semantic view if they also have *direct* access to the underlying base tables — so that row-level security policies, column masking, and schema-level access controls still apply to the caller. + +## The Trick — Ownership Separation + +The key is a single design decision: **make the SV owner a role that has no access to the base tables.** + +``` +SV_CREATOR creates the SV (needs base table access to define it) + ↓ future grant transfers ownership immediately +SV_OWNER owns the SV (deliberately has NO base table access) +``` + +Because the SV runs with the *owner's* rights (`SV_OWNER`), and `SV_OWNER` cannot access the base tables, the query cannot succeed on owner rights alone. The only way it can succeed is if the **caller** brings their own base table access. This converts the effective execution model to caller rights — without any special DDL clause. + +The critical line that makes this work: +```sql +GRANT OWNERSHIP ON FUTURE SEMANTIC VIEWS IN SCHEMA SV_CALLER_TEST.SV TO ROLE SV_OWNER; +``` + +`SV_OWNER` can grant SELECT on the SV to users — but granting SELECT on the SV alone is not enough. The caller must also have USAGE on the DATA schema and SELECT on every base table. + +## How This Compares to a Standard View + +| | Standard view (owner has table access) | This SV pattern (owner has NO table access) | +|--|---------------------------------------|---------------------------------------------| +| Executes with | Owner's rights | Owner's rights | +| Owner has SELECT on base tables? | Yes | **No — deliberately** | +| User needs SELECT on view? | Yes | Yes | +| User needs SELECT on base tables? | **No** — owner provides it | **Yes** — owner can't provide it | +| Effective execution model | Owner rights | Effectively caller rights | + +## How You Might Express This Need + +- "We want the SV to be an additional access gate, not a bypass around table-level permissions" +- "Our base tables have row-level security / column masking — we need the caller's policies to apply, not the owner's" +- "Can a user with SELECT on the SV read data they don't have SELECT on in the base tables?" +- "How do we design roles for a semantic layer so that base table access is still required?" + +## The Four-Role Pattern + +| Role | Creates SVs? | Owns SVs? | DATA schema access? | SELECT on SV? | Can query? | +|------|-------------|-----------|---------------------|----------------|------------| +| `SV_CREATOR` | Yes | No (future grant hands off) | **Yes** | Implicitly | Yes | +| `SV_OWNER` | No | **Yes** | **No** | Owns | N/A (grants, doesn't query) | +| `SV_USER` | No | No | **Yes** | **Yes** | ✅ Yes | +| `SV_USER_NO_BASE_SELECT` | No | No | **No** | **Yes** | ❌ Fails | + +`SV_USER_NO_BASE_SELECT` has SELECT on the SV but the query fails because `SV_OWNER` (the view executor) has no base table access, and neither does the caller. The error is immediate and clear. + +## Schema Layout + +Two separate schemas reinforce the boundary: + +``` +SV_CALLER_TEST.SV ← semantic view lives here (SV_CREATOR creates, SV_OWNER owns) +SV_CALLER_TEST.DATA ← base tables live here (SV_USER can see, SV_OWNER cannot) +``` + +## What Doesn't Work + +- **`USE SECONDARY ROLES ALL` can unexpectedly grant access** — if a user has secondary roles that include DATA schema access, the query may succeed. Always use `USE SECONDARY ROLES NONE` when testing access boundaries. +- **The trick only works if the owner truly lacks table access** — if `SV_OWNER` accidentally gets USAGE on the DATA schema (e.g. via a future grant or role inheritance), the whole pattern breaks and the SV reverts to effectively owner-rights behavior. +- **Column masking and row access policies on base tables are respected** — because the query can only succeed when the *caller* has base table access, any policies on those tables apply to the caller's role. + +## Cleanup + +Run the cleanup block at the bottom of `queries.sql` to remove all objects created by this snippet (roles, warehouse, database). + +## Docs + +- [Semantic view privileges](https://docs.snowflake.com/en/user-guide/views-semantic/privileges) +- [CREATE SEMANTIC VIEW — access control](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#access-control-requirements) +- [GRANT privilege on semantic view](https://docs.snowflake.com/en/sql-reference/sql/grant-privilege) +- [GRANT OWNERSHIP on future objects](https://docs.snowflake.com/en/sql-reference/sql/grant-ownership) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | Roles, warehouse, DB/schemas, tables, and all privilege grants | +| `seed_data.sql` | Customer, address, and order data | +| `semantic_view.sql` | SV creation (as SV_CREATOR) + SELECT grants (as SV_OWNER) | +| `queries.sql` | Succeeding query (SV_USER), failing query (SV_USER_NO_BASE_SELECT), cleanup | + +> ⚠️ **Requires ACCOUNTADMIN** (or both SECURITYADMIN and SYSADMIN). This snippet creates roles, a warehouse, and a dedicated database (`SV_CALLER_TEST`). It does **not** use the `--db` / `--schema` arguments from `run_snippet.py` — all objects are created under `SV_CALLER_TEST`. diff --git a/skills/semantic-view-patterns/snippets/caller_rights/queries.sql b/skills/semantic-view-patterns/snippets/caller_rights/queries.sql new file mode 100644 index 00000000..129ca190 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/caller_rights/queries.sql @@ -0,0 +1,85 @@ +-- Caller Rights: Queries +-- +-- Demonstrates that SEMANTIC_VIEW() uses caller rights: +-- the querying user must have SELECT on both the SV AND its base tables. + +USE WAREHOUSE SV_CALLER_TEST; + +-- ============================================================ +-- WORKING QUERY — SV_USER has SELECT on SV + base tables +-- ============================================================ + +-- SV_USER privileges: +-- ✓ USAGE on SV_CALLER_TEST.SV schema +-- ✓ USAGE on SV_CALLER_TEST.DATA schema +-- ✓ SELECT on CUSTOMER, CUSTOMER_ADDRESS, ORDERS +-- ✓ SELECT on CUSTOMER_ORDERS_VIEW + +USE SECONDARY ROLES NONE; +USE ROLE SV_USER; + +-- Expected: 12 rows — monthly order totals with historically correct zip codes +SELECT * FROM SEMANTIC_VIEW( + SV_CALLER_TEST.SV.CUSTOMER_ORDERS_VIEW + DIMENSIONS orders.dim_year_month, orders.f_cust_zipcode + METRICS orders.m_order_amount +) +ORDER BY dim_year_month, f_cust_zipcode; + + +-- ============================================================ +-- FAILING QUERY — SV_USER_NO_BASE_SELECT has SV SELECT only +-- ============================================================ + +-- SV_USER_NO_BASE_SELECT privileges: +-- ✓ USAGE on SV_CALLER_TEST.SV schema +-- ✗ NO USAGE on SV_CALLER_TEST.DATA schema +-- ✗ NO SELECT on CUSTOMER, CUSTOMER_ADDRESS, ORDERS +-- ✓ SELECT on CUSTOMER_ORDERS_VIEW +-- +-- Despite having SELECT on the SV, the query fails because the engine +-- resolves the base tables with the CALLER's privileges, not the owner's. + +USE ROLE SV_USER_NO_BASE_SELECT; + +-- Expected: ERROR — insufficient privileges on the DATA schema / base tables +SELECT * FROM SEMANTIC_VIEW( + SV_CALLER_TEST.SV.CUSTOMER_ORDERS_VIEW + DIMENSIONS orders.dim_year_month, orders.f_cust_zipcode + METRICS orders.m_order_amount +) +ORDER BY dim_year_month, f_cust_zipcode; + + +-- ============================================================ +-- HOW CALLER RIGHTS WORKS: +-- When a SEMANTIC_VIEW() query runs, the engine rewrites it into SQL +-- against the underlying base tables and executes it with the calling +-- user's active role. That role must have: +-- 1. USAGE on the database and schema containing the SV +-- 2. SELECT on the SV itself +-- 3. USAGE on the database and schema containing every base table +-- 4. SELECT on every base table referenced by the SV +-- +-- This is different from standard Snowflake views, which use OWNER RIGHTS +-- by default: a user with SELECT on a regular view can read data even +-- without SELECT on the underlying tables. +-- +-- USE SECONDARY ROLES ALL can alter this test — if a secondary role grants +-- DATA schema access, SV_USER_NO_BASE_SELECT may unexpectedly succeed. +-- Always use USE SECONDARY ROLES NONE when testing access boundaries. + + +-- ============================================================ +-- CLEANUP — run to remove all objects created by this snippet +-- ============================================================ + +USE ROLE SYSADMIN; +DROP DATABASE IF EXISTS SV_CALLER_TEST; +DROP WAREHOUSE IF EXISTS SV_CALLER_TEST; + +USE ROLE SECURITYADMIN; +DROP ROLE IF EXISTS SV_OWNER; +DROP ROLE IF EXISTS SV_CREATOR; +DROP ROLE IF EXISTS SV_USER; +DROP ROLE IF EXISTS SV_USER_NO_BASE_SELECT; diff --git a/skills/semantic-view-patterns/snippets/caller_rights/schema.sql b/skills/semantic-view-patterns/snippets/caller_rights/schema.sql new file mode 100644 index 00000000..b9d1217d --- /dev/null +++ b/skills/semantic-view-patterns/snippets/caller_rights/schema.sql @@ -0,0 +1,107 @@ +-- Caller Rights: Schema Setup +-- +-- ⚠️ Requires ACCOUNTADMIN (or SECURITYADMIN + SYSADMIN). +-- Creates dedicated resources: roles, warehouse SV_CALLER_TEST, database SV_CALLER_TEST. +-- Does NOT use the --db / --schema arguments from run_snippet.py. + +-- Isolate role privileges for accurate testing +USE SECONDARY ROLES NONE; + +-- ============================================================ +-- ROLES +-- ============================================================ + +USE ROLE SECURITYADMIN; + +CREATE ROLE IF NOT EXISTS SV_OWNER; +CREATE ROLE IF NOT EXISTS SV_CREATOR; +CREATE ROLE IF NOT EXISTS SV_USER; +CREATE ROLE IF NOT EXISTS SV_USER_NO_BASE_SELECT; + +-- Grant to SYSADMIN so all roles are usable via the admin hierarchy +GRANT ROLE SV_OWNER TO ROLE SYSADMIN; +GRANT ROLE SV_CREATOR TO ROLE SYSADMIN; +GRANT ROLE SV_USER TO ROLE SYSADMIN; +GRANT ROLE SV_USER_NO_BASE_SELECT TO ROLE SYSADMIN; + +-- ============================================================ +-- WAREHOUSE +-- ============================================================ + +USE ROLE SYSADMIN; + +CREATE OR REPLACE WAREHOUSE SV_CALLER_TEST + WAREHOUSE_SIZE = XSMALL + AUTO_SUSPEND = 60 + AUTO_RESUME = TRUE; + +GRANT USAGE, OPERATE ON WAREHOUSE SV_CALLER_TEST TO ROLE SV_OWNER; +GRANT USAGE, OPERATE ON WAREHOUSE SV_CALLER_TEST TO ROLE SV_CREATOR; +GRANT USAGE, OPERATE ON WAREHOUSE SV_CALLER_TEST TO ROLE SV_USER; +GRANT USAGE, OPERATE ON WAREHOUSE SV_CALLER_TEST TO ROLE SV_USER_NO_BASE_SELECT; + +-- ============================================================ +-- DATABASE & SCHEMAS +-- ============================================================ + +CREATE DATABASE IF NOT EXISTS SV_CALLER_TEST; + +-- Separate schemas enforce the semantic layer / data layer boundary +CREATE OR REPLACE SCHEMA SV_CALLER_TEST.SV; -- semantic views live here +CREATE OR REPLACE SCHEMA SV_CALLER_TEST.DATA; -- base tables live here + +-- All roles can use the database +GRANT USAGE ON DATABASE SV_CALLER_TEST TO ROLE SV_OWNER; +GRANT USAGE ON DATABASE SV_CALLER_TEST TO ROLE SV_CREATOR; +GRANT USAGE ON DATABASE SV_CALLER_TEST TO ROLE SV_USER; +GRANT USAGE ON DATABASE SV_CALLER_TEST TO ROLE SV_USER_NO_BASE_SELECT; + +-- SV schema: all roles can use it; only SV_CREATOR can create SVs +GRANT USAGE ON SCHEMA SV_CALLER_TEST.SV TO ROLE SV_OWNER; +GRANT USAGE, CREATE SEMANTIC VIEW ON SCHEMA SV_CALLER_TEST.SV TO ROLE SV_CREATOR; +GRANT USAGE ON SCHEMA SV_CALLER_TEST.SV TO ROLE SV_USER; +GRANT USAGE ON SCHEMA SV_CALLER_TEST.SV TO ROLE SV_USER_NO_BASE_SELECT; + +-- DATA schema: granted only to SV_CREATOR and SV_USER. +-- SV_OWNER and SV_USER_NO_BASE_SELECT deliberately do NOT get DATA schema access. +GRANT USAGE ON SCHEMA SV_CALLER_TEST.DATA TO ROLE SV_CREATOR; +GRANT USAGE ON SCHEMA SV_CALLER_TEST.DATA TO ROLE SV_USER; + +-- Future SVs created in SV_CALLER_TEST.SV are owned by SV_OWNER +GRANT OWNERSHIP ON FUTURE SEMANTIC VIEWS IN SCHEMA SV_CALLER_TEST.SV TO ROLE SV_OWNER; + +-- ============================================================ +-- BASE TABLES +-- ============================================================ + +USE SCHEMA SV_CALLER_TEST.DATA; + +CREATE OR REPLACE TABLE CUSTOMER ( + c_cust_id VARCHAR NOT NULL, + c_first_name VARCHAR NOT NULL, + c_last_name VARCHAR NOT NULL +); + +CREATE OR REPLACE TABLE CUSTOMER_ADDRESS ( + ca_cust_id VARCHAR NOT NULL, + ca_zipcode VARCHAR NOT NULL, + ca_street_addr VARCHAR NOT NULL, + ca_start_date DATE NOT NULL, + ca_end_date DATE -- NULL = currently active address +); + +CREATE OR REPLACE TABLE ORDERS ( + o_ord_id VARCHAR NOT NULL, + o_cust_id VARCHAR NOT NULL, + o_ord_date DATE NOT NULL, + o_amount NUMBER(10, 2) NOT NULL +); + +-- SV_CREATOR and SV_USER can read the base tables. +-- SV_USER_NO_BASE_SELECT is explicitly NOT granted access here — that's the test. +GRANT SELECT ON TABLE SV_CALLER_TEST.DATA.CUSTOMER TO ROLE SV_CREATOR; +GRANT SELECT ON TABLE SV_CALLER_TEST.DATA.CUSTOMER TO ROLE SV_USER; +GRANT SELECT ON TABLE SV_CALLER_TEST.DATA.CUSTOMER_ADDRESS TO ROLE SV_CREATOR; +GRANT SELECT ON TABLE SV_CALLER_TEST.DATA.CUSTOMER_ADDRESS TO ROLE SV_USER; +GRANT SELECT ON TABLE SV_CALLER_TEST.DATA.ORDERS TO ROLE SV_CREATOR; +GRANT SELECT ON TABLE SV_CALLER_TEST.DATA.ORDERS TO ROLE SV_USER; diff --git a/skills/semantic-view-patterns/snippets/caller_rights/seed_data.sql b/skills/semantic-view-patterns/snippets/caller_rights/seed_data.sql new file mode 100644 index 00000000..0ea16d55 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/caller_rights/seed_data.sql @@ -0,0 +1,38 @@ +-- Caller Rights: Seed Data + +USE ROLE SYSADMIN; +USE SCHEMA SV_CALLER_TEST.DATA; + +INSERT INTO CUSTOMER VALUES + ('cust001', 'Mary', 'Smith'), + ('cust002', 'Bill', 'Wilson'); + +-- cust001 moved twice; cust002 moved once — used to test ASOF join resolves correct zip +INSERT INTO CUSTOMER_ADDRESS VALUES + ('cust001', '94025', '100 Main Street', '2024-01-01', '2024-03-31'), + ('cust001', '94026', '200 Main Street', '2024-04-01', '2024-06-30'), + ('cust001', '94027', '300 Main Street', '2024-07-01', NULL), + ('cust002', '94028', '400 Main Street', '2024-01-01', '2024-04-30'), + ('cust002', '94029', '500 Main Street', '2024-05-01', '2024-07-31'), + ('cust002', '94030', '600 Main Street', '2024-08-01', NULL); + +-- Expected zip at order time (ASOF resolves to the address active on o_ord_date): +-- ord100 2024-02-01 cust001 → 94025 ord101 2024-02-02 cust001 → 94025 +-- ord102 2024-05-01 cust001 → 94026 ord103 2024-05-02 cust001 → 94026 +-- ord104 2024-08-01 cust001 → 94027 ord105 2024-08-02 cust001 → 94027 +-- ord106 2024-03-01 cust002 → 94028 ord107 2024-03-02 cust002 → 94028 +-- ord108 2024-06-01 cust002 → 94029 ord109 2024-06-02 cust002 → 94029 +-- ord110 2024-09-01 cust002 → 94030 ord111 2024-09-02 cust002 → 94030 +INSERT INTO ORDERS VALUES + ('ord100', 'cust001', '2024-02-01', 100), + ('ord101', 'cust001', '2024-02-02', 200), + ('ord102', 'cust001', '2024-05-01', 300), + ('ord103', 'cust001', '2024-05-02', 400), + ('ord104', 'cust001', '2024-08-01', 500), + ('ord105', 'cust001', '2024-08-02', 600), + ('ord106', 'cust002', '2024-03-01', 100), + ('ord107', 'cust002', '2024-03-02', 200), + ('ord108', 'cust002', '2024-06-01', 300), + ('ord109', 'cust002', '2024-06-02', 400), + ('ord110', 'cust002', '2024-09-01', 500), + ('ord111', 'cust002', '2024-09-02', 600); diff --git a/skills/semantic-view-patterns/snippets/caller_rights/semantic_view.sql b/skills/semantic-view-patterns/snippets/caller_rights/semantic_view.sql new file mode 100644 index 00000000..76ec9dd3 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/caller_rights/semantic_view.sql @@ -0,0 +1,49 @@ +-- Caller Rights: Semantic View DDL + Access Grants + +-- SV_CREATOR has SELECT on base tables and CREATE SEMANTIC VIEW on SV_CALLER_TEST.SV +USE ROLE SV_CREATOR; +USE WAREHOUSE SV_CALLER_TEST; +USE SCHEMA SV_CALLER_TEST.SV; + +CREATE OR REPLACE SEMANTIC VIEW SV_CALLER_TEST.SV.CUSTOMER_ORDERS_VIEW + + TABLES ( + -- Alias the fully-qualified DATA schema tables into the SV namespace + customer_address AS SV_CALLER_TEST.DATA.CUSTOMER_ADDRESS + UNIQUE (ca_cust_id, ca_start_date), + customer AS SV_CALLER_TEST.DATA.CUSTOMER + UNIQUE (c_cust_id), + orders AS SV_CALLER_TEST.DATA.ORDERS + UNIQUE (o_ord_id) + ) + + RELATIONSHIPS ( + customer_address(ca_cust_id) REFERENCES customer, + + -- ASOF join: each order resolves to the address active at order time + orders(o_cust_id, o_ord_date) + REFERENCES customer_address(ca_cust_id, ASOF ca_start_date) + ) + + FACTS ( + customer_address.f_zipcode AS ca_zipcode + ) + + DIMENSIONS ( + -- Zip code resolved via ASOF — the address the customer had on the order date + orders.f_cust_zipcode AS customer_address.f_zipcode, + orders.dim_year_month AS DATE_TRUNC('month', o_ord_date) + ) + + METRICS ( + orders.m_order_amount AS SUM(o_amount) + ) + + COMMENT = 'Customer orders attributed to the address active at time of order (ASOF join). Used to demonstrate caller-rights access control: users must have SELECT on both this SV and the DATA schema base tables.'; + +-- The future-grant in schema.sql set SV_OWNER as the owner. +-- SV_OWNER now grants SELECT to both user roles. +USE ROLE SV_OWNER; + +GRANT SELECT ON SEMANTIC VIEW SV_CALLER_TEST.SV.CUSTOMER_ORDERS_VIEW TO ROLE SV_USER; +GRANT SELECT ON SEMANTIC VIEW SV_CALLER_TEST.SV.CUSTOMER_ORDERS_VIEW TO ROLE SV_USER_NO_BASE_SELECT; diff --git a/skills/semantic-view-patterns/snippets/caller_rights/semantic_view.yaml b/skills/semantic-view-patterns/snippets/caller_rights/semantic_view.yaml new file mode 100644 index 00000000..a75f09de --- /dev/null +++ b/skills/semantic-view-patterns/snippets/caller_rights/semantic_view.yaml @@ -0,0 +1,81 @@ +# Caller Rights: Semantic View YAML +# +# ⚠️ CALLER RIGHTS IS A DDL-ONLY PATTERN: +# The caller-rights access control pattern relies on role-based ownership +# separation (SV_CREATOR, SV_OWNER, SV_USER roles) and specific GRANT sequences +# that cannot be expressed in a YAML specification. +# +# Additionally, ASOF join syntax is not supported in YAML. +# +# Use semantic_view.sql for this entire pattern. This YAML is provided only +# for structural reference. +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('SV_CALLER_TEST.SV', $$ $$); +# But note: role context (USE ROLE SV_CREATOR), grants, and ASOF join +# must all be applied manually via DDL from semantic_view.sql. + +name: CUSTOMER_ORDERS_VIEW +description: > + Customer orders attributed to the address active at time of order (ASOF join). + Used to demonstrate caller-rights access control. + NOTE: Full pattern (role switching, grants, ASOF join) requires DDL authoring. + +tables: + - name: orders + description: Order transactions + base_table: + database: SV_CALLER_TEST + schema: DATA + table: ORDERS + primary_key: + columns: [O_ORD_ID] + dimensions: + - name: dim_year_month + expr: DATE_TRUNC('month', O_ORD_DATE) + data_type: DATE + metrics: + - name: m_order_amount + expr: SUM(O_AMOUNT) + + - name: customer_address + description: Customer address history + base_table: + database: SV_CALLER_TEST + schema: DATA + table: CUSTOMER_ADDRESS + primary_key: + columns: [CA_CUST_ID, CA_START_DATE] + facts: + - name: f_zipcode + expr: CA_ZIPCODE + data_type: VARCHAR + dimensions: + - name: f_cust_zipcode + expr: CA_ZIPCODE + data_type: VARCHAR + + - name: customer + description: Customer master + base_table: + database: SV_CALLER_TEST + schema: DATA + table: CUSTOMER + primary_key: + columns: [C_CUST_ID] + +# NOTE: ASOF join syntax not supported in YAML. +# DDL: orders(O_CUST_ID, O_ORD_DATE) REFERENCES customer_address(CA_CUST_ID, ASOF CA_START_DATE) +relationships: + - name: customer_address_to_customer + left_table: customer_address + right_table: customer + relationship_columns: + - left_column: CA_CUST_ID + right_column: C_CUST_ID + - name: orders_to_customer_address + left_table: orders + right_table: customer_address + relationship_columns: + - left_column: O_CUST_ID + right_column: CA_CUST_ID diff --git a/skills/semantic-view-patterns/snippets/derived_metrics/README.md b/skills/semantic-view-patterns/snippets/derived_metrics/README.md new file mode 100644 index 00000000..3ec3d67f --- /dev/null +++ b/skills/semantic-view-patterns/snippets/derived_metrics/README.md @@ -0,0 +1,59 @@ +# Derived Metrics + +## The Problem + +You have metrics defined on separate entities (e.g. store sales, web sales, catalog sales) and want to combine them into **cross-entity derived metrics**: totals, ratios, and % of total — all maintained in one place without duplicating SQL. + +## How You Might Express This Need + +- "What's our total revenue across all channels? And what % does each channel contribute?" +- "Show net revenue = gross revenue minus returns" +- "Store is growing — what's its share of total sales vs last quarter?" +- "Derive a metric from two other metrics without writing a new SQL model" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | `store_revenue + web_revenue AS total_revenue` in a SELECT or CTE | +| **LookML** | `measure: total_revenue { type: number; sql: ${store_revenue} + ${web_revenue} }` | +| **dbt** | Calculated metric in metrics YAML or derived model | +| **Power BI** | DAX `TOTAL_REVENUE = [STORE_REVENUE] + [WEB_REVENUE]` | +| **Tableau** | Calculated fields: `[Store Revenue] + [Web Revenue]`. Cross-data-source ratios require blending; single-source is straightforward. | + +## The SV Approach + +Derived metrics reference other metric names by logical name — **no table prefix**: +```sql +METRICS ( + store_sales.store_revenue AS SUM(revenue), + web_sales.web_revenue AS SUM(revenue), + + -- Cross-table derived: NO table prefix on the derived metric name + total_revenue AS store_sales.store_revenue + web_sales.web_revenue, + + -- Ratio: derives from the derived metric itself + store_pct AS store_sales.store_revenue / total_revenue +) +``` + +## Key Rules + +- Derived metric names **must not** have a table prefix — they are not scoped to an entity +- A derived metric can reference other derived metrics as building blocks +- Division returns a decimal (0.0–1.0) — multiply × 100 in standard SQL wrapping for display as percent +- All referenced metrics must be reachable via the same set of relationships/dimensions in the query +- Derived metrics are additive by default — they do not support NON ADDITIVE BY + +## Docs + +- [Defining derived metrics](https://docs.snowflake.com/en/user-guide/views-semantic/sql#defining-derived-metrics) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | Three channel fact tables + date dimension | +| `seed_data.sql` | 6 months × 3 channels | +| `semantic_view.sql` | SV with per-channel metrics, total, and % of total | +| `queries.sql` | Channel mix, quarterly comparison, standard SQL for display | diff --git a/skills/semantic-view-patterns/snippets/derived_metrics/queries.sql b/skills/semantic-view-patterns/snippets/derived_metrics/queries.sql new file mode 100644 index 00000000..47413f36 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/derived_metrics/queries.sql @@ -0,0 +1,64 @@ +-- Derived Metrics: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Total revenue and per-channel breakdown by month +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_SALES_SV + DIMENSIONS dim_date.month + METRICS store_sales.store_revenue, web_sales.web_revenue, + catalog_sales.catalog_revenue, total_revenue +) +ORDER BY month; + + +-- 2. Channel mix — % of total per month +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_SALES_SV + DIMENSIONS dim_date.month + METRICS store_pct_of_total, web_pct_of_total, catalog_pct_of_total +) +ORDER BY month; + + +-- 3. Q1 vs Q2 channel revenue +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_SALES_SV + DIMENSIONS dim_date.quarter + METRICS store_sales.store_revenue, web_sales.web_revenue, + catalog_sales.catalog_revenue, total_revenue +) +ORDER BY quarter; + + +-- 4. Full year summary (no time dimension needed) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_SALES_SV + METRICS total_revenue, store_pct_of_total, web_pct_of_total, catalog_pct_of_total +); + + +-- ============================================================ +-- GOTCHAS +-- ============================================================ + +-- NOTE: Derived metric names have NO table prefix in the DDL: +-- CORRECT: total_revenue AS store_sales.store_revenue + ... +-- INCORRECT: store_sales.total_revenue AS ... (would scope it to store entity) + +-- NOTE: Ratio metrics return decimals (0.0 - 1.0). +-- To display as percent, use standard SQL on top of the SV: +SELECT + month, + ROUND(store_pct_of_total * 100, 1) AS store_pct, + ROUND(web_pct_of_total * 100, 1) AS web_pct, + ROUND(catalog_pct_of_total * 100, 1) AS catalog_pct +FROM SNIPPETS.PUBLIC.CHANNEL_SALES_SV +WHERE year = 2024 +GROUP BY ALL +ORDER BY month; diff --git a/skills/semantic-view-patterns/snippets/derived_metrics/schema.sql b/skills/semantic-view-patterns/snippets/derived_metrics/schema.sql new file mode 100644 index 00000000..2e511c38 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/derived_metrics/schema.sql @@ -0,0 +1,36 @@ +-- Derived Metrics: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE dim_date ( + date_id INTEGER NOT NULL, + full_date DATE NOT NULL, + year INTEGER NOT NULL, + quarter INTEGER NOT NULL, + month INTEGER NOT NULL, + CONSTRAINT pk_dim_date PRIMARY KEY (date_id) +); + +CREATE OR REPLACE TABLE store_sales ( + sale_id INTEGER NOT NULL, + date_id INTEGER NOT NULL, + revenue NUMBER(10,2) NOT NULL, + quantity INTEGER NOT NULL +); + +CREATE OR REPLACE TABLE web_sales ( + sale_id INTEGER NOT NULL, + date_id INTEGER NOT NULL, + revenue NUMBER(10,2) NOT NULL, + quantity INTEGER NOT NULL +); + +CREATE OR REPLACE TABLE catalog_sales ( + sale_id INTEGER NOT NULL, + date_id INTEGER NOT NULL, + revenue NUMBER(10,2) NOT NULL, + quantity INTEGER NOT NULL +); diff --git a/skills/semantic-view-patterns/snippets/derived_metrics/seed_data.sql b/skills/semantic-view-patterns/snippets/derived_metrics/seed_data.sql new file mode 100644 index 00000000..6f3fbf5b --- /dev/null +++ b/skills/semantic-view-patterns/snippets/derived_metrics/seed_data.sql @@ -0,0 +1,36 @@ +-- Derived Metrics: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO dim_date VALUES + (1, '2024-01-01', 2024, 1, 1), + (2, '2024-02-01', 2024, 1, 2), + (3, '2024-03-01', 2024, 1, 3), + (4, '2024-04-01', 2024, 2, 4), + (5, '2024-05-01', 2024, 2, 5), + (6, '2024-06-01', 2024, 2, 6); + +INSERT INTO store_sales VALUES + (1, 1, 5000, 50), + (2, 2, 6000, 60), + (3, 3, 7000, 70), + (4, 4, 4500, 45), + (5, 5, 5500, 55), + (6, 6, 6500, 65); + +INSERT INTO web_sales VALUES + (1, 1, 2000, 25), + (2, 2, 2500, 30), + (3, 3, 3000, 35), + (4, 4, 3500, 40), + (5, 5, 4000, 45), + (6, 6, 4500, 50); + +INSERT INTO catalog_sales VALUES + (1, 1, 1000, 10), + (2, 2, 1200, 12), + (3, 3, 1400, 14), + (4, 4, 800, 8), + (5, 5, 1100, 11), + (6, 6, 1300, 13); diff --git a/skills/semantic-view-patterns/snippets/derived_metrics/semantic_view.sql b/skills/semantic-view-patterns/snippets/derived_metrics/semantic_view.sql new file mode 100644 index 00000000..26c59477 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/derived_metrics/semantic_view.sql @@ -0,0 +1,50 @@ +-- Derived Metrics: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.CHANNEL_SALES_SV + + TABLES ( + dim_date PRIMARY KEY (date_id), + store_sales, + web_sales, + catalog_sales + ) + + RELATIONSHIPS ( + store_to_date AS store_sales(date_id) REFERENCES dim_date, + web_to_date AS web_sales(date_id) REFERENCES dim_date, + catalog_to_date AS catalog_sales(date_id) REFERENCES dim_date + ) + + DIMENSIONS ( + dim_date.year AS year WITH SYNONYMS ('year'), + dim_date.quarter AS quarter WITH SYNONYMS ('quarter', 'qtr'), + dim_date.month AS month WITH SYNONYMS ('month') + ) + + METRICS ( + store_sales.store_revenue AS SUM(revenue) + WITH SYNONYMS ('store sales', 'store revenue', 'brick and mortar revenue'), + web_sales.web_revenue AS SUM(revenue) + WITH SYNONYMS ('web sales', 'online revenue', 'e-commerce revenue'), + catalog_sales.catalog_revenue AS SUM(revenue) + WITH SYNONYMS ('catalog sales', 'catalog revenue', 'mail order revenue'), + + -- Cross-table derived metric: no table prefix; references per-channel metrics by name + total_revenue AS store_sales.store_revenue + web_sales.web_revenue + catalog_sales.catalog_revenue + WITH SYNONYMS ('total sales', 'all channel revenue', 'combined revenue'), + + -- Ratios/% of total — derived metrics using the total above + store_pct_of_total AS store_sales.store_revenue / total_revenue + WITH SYNONYMS ('store share', 'store contribution', '% from store'), + web_pct_of_total AS web_sales.web_revenue / total_revenue + WITH SYNONYMS ('web share', 'web contribution', '% from web'), + catalog_pct_of_total AS catalog_sales.catalog_revenue / total_revenue + WITH SYNONYMS ('catalog share', 'catalog contribution', '% from catalog') + ) + + COMMENT = 'Multi-channel revenue analytics. Demonstrates cross-table derived metrics (total_revenue = sum of three channels) and ratio metrics (% of total per channel).' + + AI_SQL_GENERATION 'Use total_revenue for combined across all channels. Use per-channel metrics (store_revenue, web_revenue, catalog_revenue) for channel comparison. Percent-of-total metrics (store_pct_of_total etc.) show channel mix as decimals — multiply by 100 for percentages. All metrics are combinable with dim_date dimensions for time breakdowns.'; diff --git a/skills/semantic-view-patterns/snippets/derived_metrics/semantic_view.yaml b/skills/semantic-view-patterns/snippets/derived_metrics/semantic_view.yaml new file mode 100644 index 00000000..01dcd403 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/derived_metrics/semantic_view.yaml @@ -0,0 +1,113 @@ +# Derived Metrics: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# Export an existing DDL SV to YAML: +# SELECT SYSTEM$READ_YAML_FROM_SEMANTIC_VIEW('TARGET_DB.TARGET_SCHEMA.CHANNEL_SALES_SV'); +# +# DDL-only features not in YAML: +# - AI_SQL_GENERATION → module_custom_instructions: sql_generation: in YAML + +name: CHANNEL_SALES_SV +description: > + Multi-channel revenue analytics. Demonstrates cross-table derived metrics + (total_revenue = sum of three channels) and ratio metrics (% of total per channel). + +tables: + - name: dim_date + description: Date dimension + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DIM_DATE + primary_key: + columns: [DATE_ID] + dimensions: + - name: year + synonyms: [year] + expr: YEAR + data_type: NUMBER + - name: quarter + synonyms: [quarter, qtr] + expr: QUARTER + data_type: VARCHAR + - name: month + synonyms: [month] + expr: MONTH + data_type: NUMBER + + - name: store_sales + description: Brick-and-mortar store sales transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: STORE_SALES + metrics: + - name: store_revenue + synonyms: [store sales, store revenue, brick and mortar revenue] + expr: SUM(REVENUE) + + - name: web_sales + description: Online / e-commerce sales transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: WEB_SALES + metrics: + - name: web_revenue + synonyms: [web sales, online revenue, e-commerce revenue] + expr: SUM(REVENUE) + + - name: catalog_sales + description: Mail order / catalog sales transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CATALOG_SALES + metrics: + - name: catalog_revenue + synonyms: [catalog sales, catalog revenue, mail order revenue] + expr: SUM(REVENUE) + +relationships: + - name: store_to_date + left_table: store_sales + right_table: dim_date + relationship_columns: + - left_column: DATE_ID + right_column: DATE_ID + - name: web_to_date + left_table: web_sales + right_table: dim_date + relationship_columns: + - left_column: DATE_ID + right_column: DATE_ID + - name: catalog_to_date + left_table: catalog_sales + right_table: dim_date + relationship_columns: + - left_column: DATE_ID + right_column: DATE_ID + +# Cross-table derived metrics live at the view level (not nested under any table) +metrics: + - name: total_revenue + synonyms: [total sales, all channel revenue, combined revenue] + description: Combined revenue across all three channels + expr: store_sales.store_revenue + web_sales.web_revenue + catalog_sales.catalog_revenue + - name: store_pct_of_total + synonyms: [store share, store contribution, "% from store"] + description: Store revenue as a fraction of total revenue + expr: store_sales.store_revenue / total_revenue + - name: web_pct_of_total + synonyms: [web share, web contribution, "% from web"] + description: Web revenue as a fraction of total revenue + expr: web_sales.web_revenue / total_revenue + - name: catalog_pct_of_total + synonyms: [catalog share, catalog contribution, "% from catalog"] + description: Catalog revenue as a fraction of total revenue + expr: catalog_sales.catalog_revenue / total_revenue diff --git a/skills/semantic-view-patterns/snippets/entity_facts/README.md b/skills/semantic-view-patterns/snippets/entity_facts/README.md new file mode 100644 index 00000000..82c34dd2 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/entity_facts/README.md @@ -0,0 +1,81 @@ +# Entity Facts and Calculated Dimensions + +## The Problem + +You need analytics at the **customer (entity) level**, not just the order level. For example: +- A customer's **lifetime value** (total spend across all orders) +- A **value tier** ("high", "medium", "low") derived from that LTV +- A **calculated age** dimension from a birth year column + +These patterns require entity-level aggregation, derived dimensions, and expression-based dimensions — none of which require separate tables or pre-computed columns. + +## How You Might Express This Need + +- "Segment customers by total lifetime spend — show order volume per segment" +- "Each customer has a birth year. Compute their age and use it to filter/bucket." +- "I want VALUE_BUCKET to be derived dynamically from total spend, not stored in the DB" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | `SUM(amount) OVER (PARTITION BY customer_id)` as subquery for LTV; CASE in SELECT | +| **LookML** | `derived_table` + `dimension: value_segment { sql: CASE WHEN ... }` | +| **dbt** | `metrics.yml` customer_ltv + downstream dimension in model | +| **Power BI** | DAX CALCULATE + ALLEXCEPT for entity-level aggregation | +| **Tableau** | Fixed LOD for entity aggregation: `{ FIXED [Customer ID]: SUM([Order Amount]) }`; CASE WHEN on the LOD result for segmentation. | + +## Three Patterns in This Snippet + +### 1. Entity-Level Aggregated Fact +```sql +FACTS ( + PRIVATE customers.lifetime_value AS SUM(orders.order_amount) +) +``` +Aggregates `order_amount` up to the `customers` entity — produces one number per customer. `PRIVATE` means it's not directly queryable but can be used in DIMENSIONS expressions. + +### 2. Derived Dimension from Aggregated Fact +```sql +DIMENSIONS ( + customers.value_segment AS ( + CASE + WHEN customers.lifetime_value < 1000 THEN 'low' + WHEN customers.lifetime_value <= 3000 THEN 'medium' + ELSE 'high' + END + ) +) +``` +The CASE expression uses `lifetime_value` — which is a PRIVATE fact — to produce a queryable `value_segment` dimension. The LTV is never exposed directly; only the tier. + +### 3. Calculated Dimension (Expression on Physical Column) +```sql +DIMENSIONS ( + customers.age AS (YEAR(CURRENT_DATE()) - birth_year) +) +``` +Expression evaluated at query time. No stored column needed. + +## PRIVATE vs Public Facts + +| | PRIVATE fact | Public fact | +|--|-------------|-------------| +| Queryable as dimension | No | Yes | +| Usable in DIMENSIONS expressions | Yes | Yes | +| Shows in DESCRIBE / Cortex Analyst | No | Yes | +| Use when | Intermediate computation only | You want users to see and filter by the value | + +## Docs + +- [Defining facts, dimensions, and metrics](https://docs.snowflake.com/en/user-guide/views-semantic/sql#defining-facts-dimensions-and-metrics) +- [CREATE SEMANTIC VIEW — FACTS / DIMENSIONS syntax](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#facts) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `customers` + `orders` | +| `seed_data.sql` | 4 customers, 10 orders with known LTV tiers | +| `semantic_view.sql` | SV with PRIVATE fact, derived segment, and age dimension | +| `queries.sql` | Revenue by segment, age filtering, per-order fact in WHERE | diff --git a/skills/semantic-view-patterns/snippets/entity_facts/queries.sql b/skills/semantic-view-patterns/snippets/entity_facts/queries.sql new file mode 100644 index 00000000..0928d977 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/entity_facts/queries.sql @@ -0,0 +1,62 @@ +-- Entity Facts: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Revenue by customer value segment +-- (derived dimension from PRIVATE aggregated fact) +-- Expected: high=$4200, medium=$1700, low=$600+$200=$800 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CUSTOMER_ORDERS_SV + DIMENSIONS customers.value_segment + METRICS orders.total_revenue, customers.customer_count +) +ORDER BY value_segment; + + +-- 2. Number of customers by segment and age bucket +-- (calculated dimension customers.age) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CUSTOMER_ORDERS_SV + DIMENSIONS customers.value_segment, customers.age + METRICS customers.customer_count +) +ORDER BY value_segment, age; + + +-- 3. Monthly revenue by segment — show transitions over time +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CUSTOMER_ORDERS_SV + DIMENSIONS customers.value_segment, orders.order_month + METRICS orders.total_revenue +) +ORDER BY order_month, value_segment; + + +-- 4. Filter using the per-order fact in WHERE clause +-- (row-level filtering on orders.order_amount) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CUSTOMER_ORDERS_SV + DIMENSIONS customers.customer_name + METRICS orders.total_revenue, orders.order_count + WHERE orders.order_amount > 500 +); + + +-- ============================================================ +-- WHAT DOESN'T WORK +-- ============================================================ + +-- ERROR: Cannot directly query a PRIVATE fact as a dimension or metric. +-- customers.lifetime_value is PRIVATE — it only exists to power value_segment. +-- +-- SELECT * FROM SEMANTIC_VIEW( +-- SNIPPETS.PUBLIC.CUSTOMER_ORDERS_SV +-- DIMENSIONS customers.lifetime_value -- Error: no dimension named lifetime_value +-- ); +-- +-- Remove PRIVATE from the DDL if you want it directly queryable. diff --git a/skills/semantic-view-patterns/snippets/entity_facts/schema.sql b/skills/semantic-view-patterns/snippets/entity_facts/schema.sql new file mode 100644 index 00000000..17fdda9d --- /dev/null +++ b/skills/semantic-view-patterns/snippets/entity_facts/schema.sql @@ -0,0 +1,20 @@ +-- Entity Facts: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE customers ( + customer_id INTEGER NOT NULL, + customer_name VARCHAR(50) NOT NULL, + birth_year INTEGER NOT NULL, + CONSTRAINT pk_customers PRIMARY KEY (customer_id) +); + +CREATE OR REPLACE TABLE orders ( + order_id INTEGER NOT NULL, + customer_id INTEGER NOT NULL, + order_date DATE NOT NULL, + amount NUMBER(10,2) NOT NULL +); diff --git a/skills/semantic-view-patterns/snippets/entity_facts/seed_data.sql b/skills/semantic-view-patterns/snippets/entity_facts/seed_data.sql new file mode 100644 index 00000000..f1f2b04e --- /dev/null +++ b/skills/semantic-view-patterns/snippets/entity_facts/seed_data.sql @@ -0,0 +1,26 @@ +-- Entity Facts: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO customers VALUES + (1, 'Alice Martin', 1988), + (2, 'Bob Chen', 1975), + (3, 'Carol White', 1995), + (4, 'Dan Patel', 2000); + +-- Alice: 4 orders totaling $4200 → "high value" (>$3000) +-- Bob: 3 orders totaling $1700 → "medium value" ($1000-$3000) +-- Carol: 2 orders totaling $600 → "low value" (<$1000) +-- Dan: 1 order totaling $200 → "low value" +INSERT INTO orders VALUES + (101, 1, '2024-01-10', 1200), + (102, 1, '2024-02-15', 800), + (103, 1, '2024-03-20', 1500), + (104, 1, '2024-04-05', 700), + (105, 2, '2024-01-12', 900), + (106, 2, '2024-03-18', 500), + (107, 2, '2024-05-01', 300), + (108, 3, '2024-02-22', 400), + (109, 3, '2024-04-30', 200), + (110, 4, '2024-03-15', 200); diff --git a/skills/semantic-view-patterns/snippets/entity_facts/semantic_view.sql b/skills/semantic-view-patterns/snippets/entity_facts/semantic_view.sql new file mode 100644 index 00000000..c2e7ddea --- /dev/null +++ b/skills/semantic-view-patterns/snippets/entity_facts/semantic_view.sql @@ -0,0 +1,66 @@ +-- Entity Facts: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.CUSTOMER_ORDERS_SV + + TABLES ( + customers PRIMARY KEY (customer_id), + orders + ) + + RELATIONSHIPS ( + orders(customer_id) REFERENCES customers + ) + + FACTS ( + -- Per-order fact: accessible in WHERE and as a dimension for row-level filtering + orders.order_amount AS amount + WITH SYNONYMS ('order value', 'transaction amount'), + + -- Entity-level aggregated fact: aggregates up to the customer entity. + -- This creates a single number per customer (their total lifetime spend) + -- which can then be used in dimension CASE expressions below. + PRIVATE customers.lifetime_value AS SUM(orders.order_amount) + WITH SYNONYMS ('customer LTV', 'customer lifetime value', 'total spend') + ) + + DIMENSIONS ( + customers.customer_name AS customer_name + WITH SYNONYMS ('name', 'customer'), + + -- Calculated dimension: expression evaluated row-by-row + customers.age AS (YEAR(CURRENT_DATE()) - birth_year) + WITH SYNONYMS ('customer age', 'age in years'), + + -- Derived dimension from an entity-level aggregated fact: + -- lifetime_value is PRIVATE (not directly queryable) but drives this bucketing + customers.value_segment AS ( + CASE + WHEN customers.lifetime_value < 1000 THEN 'low' + WHEN customers.lifetime_value <= 3000 THEN 'medium' + ELSE 'high' + END + ) + WITH SYNONYMS ('customer tier', 'value tier', 'segment'), + + orders.order_date AS order_date + WITH SYNONYMS ('date', 'purchase date'), + + orders.order_month AS DATE_TRUNC('month', order_date) + WITH SYNONYMS ('month', 'order month') + ) + + METRICS ( + orders.total_revenue AS SUM(amount) + WITH SYNONYMS ('revenue', 'total orders'), + orders.order_count AS COUNT(order_id) + WITH SYNONYMS ('orders', 'number of orders'), + customers.customer_count AS COUNT(customer_id) + WITH SYNONYMS ('customers', 'number of customers') + ) + + COMMENT = 'Customer order history. Demonstrates PRIVATE entity-level aggregated facts (lifetime_value) used to define a value_segment dimension, and a calculated age dimension using YEAR(CURRENT_DATE()).' + + AI_SQL_GENERATION 'Use customers.value_segment to segment customers by lifetime spend. Use customers.age to filter or bucket by customer age. The lifetime_value fact is PRIVATE — it powers value_segment internally but cannot be queried directly.'; diff --git a/skills/semantic-view-patterns/snippets/entity_facts/semantic_view.yaml b/skills/semantic-view-patterns/snippets/entity_facts/semantic_view.yaml new file mode 100644 index 00000000..b23d6d97 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/entity_facts/semantic_view.yaml @@ -0,0 +1,96 @@ +# Entity Facts: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# DDL-only features: +# - AI_SQL_GENERATION → module_custom_instructions: sql_generation: in YAML +# - PRIVATE entity-level aggregated facts → YAML equivalent: access_modifier: private_access + +name: CUSTOMER_ORDERS_SV +description: > + Customer order history. Demonstrates PRIVATE entity-level aggregated facts + (lifetime_value) used to define a value_segment dimension, and a calculated + age dimension using YEAR(CURRENT_DATE()). + +tables: + - name: customers + description: Customer master data + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CUSTOMERS + primary_key: + columns: [CUSTOMER_ID] + dimensions: + - name: customer_name + synonyms: [name, customer] + expr: CUSTOMER_NAME + data_type: VARCHAR + - name: age + synonyms: [customer age, age in years] + expr: YEAR(CURRENT_DATE()) - BIRTH_YEAR + data_type: NUMBER + - name: value_segment + synonyms: [customer tier, value tier, segment] + description: > + Customer tier based on lifetime spend. References the private + lifetime_value fact by logical name — not a raw aggregate expression. + expr: > + CASE + WHEN lifetime_value < 1000 THEN 'low' + WHEN lifetime_value <= 3000 THEN 'medium' + ELSE 'high' + END + data_type: VARCHAR + facts: + # YAML equivalent of DDL's PRIVATE: access_modifier: private_access + - name: lifetime_value + synonyms: [customer LTV, customer lifetime value, total spend] + description: Aggregated total lifetime spend per customer. Private — drives value_segment. + expr: SUM(orders.order_amount) + data_type: NUMBER + access_modifier: private_access + metrics: + - name: customer_count + synonyms: [customers, number of customers] + expr: COUNT(CUSTOMER_ID) + + - name: orders + description: Order transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: ORDERS + dimensions: + - name: order_date + synonyms: [date, purchase date] + expr: ORDER_DATE + data_type: DATE + - name: order_month + synonyms: [month, order month] + expr: DATE_TRUNC('month', ORDER_DATE) + data_type: DATE + facts: + - name: order_amount + synonyms: [order value, transaction amount] + expr: AMOUNT + data_type: NUMBER + metrics: + - name: total_revenue + synonyms: [revenue, total orders] + expr: SUM(order_amount) + - name: order_count + synonyms: [orders, number of orders] + expr: COUNT(ORDER_ID) + +relationships: + - name: orders_to_customers + left_table: orders + right_table: customers + relationship_columns: + - left_column: CUSTOMER_ID + right_column: CUSTOMER_ID diff --git a/skills/semantic-view-patterns/snippets/fact_as_relationship_key/README.md b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/README.md new file mode 100644 index 00000000..3ecc450c --- /dev/null +++ b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/README.md @@ -0,0 +1,67 @@ +# Fact as Relationship Key + +## The Problem + +You need to join two tables, but the join key doesn't exist as a physical column on the fact table — it has to be **computed** from columns that are there. There's no way to add the derived key to the source table (read-only source, or it would be redundant denormalization). + +**Example in this snippet**: A `sales` table stores individual transactions with a `sale_date`. A separate `fiscal_quarters` table stores quarterly budget targets, keyed by a string like `"2024-Q2"`. The sales table has no `fiscal_quarter_key` column — but you can derive it from `sale_date`. The goal: join every sale to its fiscal quarter budget without transforming the source data. + +## How You Might Express This Need + +- "I want to join my sales table to a quota/budget table by fiscal quarter, but there's no fiscal quarter column on sales" +- "My dimension table is keyed by a composite or computed value — how do I join to it from a fact that only has the raw components?" +- "I need to map events to lookup values using a derived key (e.g. region extracted from a longer code)" +- "Can I define a computed FK in a semantic view without adding a column to the source table?" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | `JOIN fiscal_quarters fq ON CONCAT(YEAR(sale_date), '-Q', QUARTER(sale_date)) = fq.fiscal_quarter_key` | +| **dbt** | Add a computed column in the staging model: `CONCAT(YEAR(sale_date), '-Q', QUARTER(sale_date)) AS fiscal_qtr_key` | +| **LookML** | `dimension: fiscal_qtr_key { sql: CONCAT(YEAR(${sale_date}), '-Q', QUARTER(${sale_date})) }` + `join` block | +| **Power BI** | Add a calculated column in Power Query: `Text.From(Date.Year([sale_date])) & "-Q" & Text.From(Date.QuarterOfYear([sale_date]))` | +| **Tableau** | Computed join field in the data source dialog, or a pre-joined extract | + +All of these require either modifying the source table/model or writing the join expression directly in every query. The SV encodes it once in the model definition. + +## The SV Approach + +Two things are required: + +**1. Define the computed key as a FACT on the source table:** +```sql +FACTS ( + sales.fiscal_qtr_key AS CONCAT(TO_VARCHAR(YEAR(sale_date)), '-Q', TO_VARCHAR(QUARTER(sale_date))) +) +``` + +**2. Reference that fact in the RELATIONSHIP:** +```sql +RELATIONSHIPS ( + sales(sales.fiscal_qtr_key) REFERENCES fiscal_quarters +) +``` + +The engine evaluates `fiscal_qtr_key` per row at query time and uses it as the FK — no physical column needed. + +## What Doesn't Work + +- **The computed fact is not a metric or dimension** — you can't query `sales.fiscal_qtr_key` in a `SEMANTIC_VIEW()` call directly. It exists only to power the join. +- **Aggregation expressions are not valid** — the fact used as a FK must be a scalar (row-level) expression. `SUM(...)`, `COUNT(...)`, etc. will fail. +- **The referenced table must have a matching PRIMARY KEY** — the right-hand side of `REFERENCES` must be the table's declared `PRIMARY KEY` (or omitted to use it implicitly). + +## Docs + +- [Defining facts, dimensions, and metrics](https://docs.snowflake.com/en/user-guide/views-semantic/sql#defining-facts-dimensions-and-metrics) +- [RELATIONSHIPS — using a fact as a foreign key](https://docs.snowflake.com/en/user-guide/views-semantic/sql#relationships) +- [CREATE SEMANTIC VIEW syntax reference](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `sales`, `fiscal_quarters`, `products` table DDL | +| `seed_data.sql` | 6 quarters of targets, 13 sales across 3 products | +| `semantic_view.sql` | SV with computed-FK fact + budget metrics | +| `queries.sql` | Revenue vs budget by quarter, attainment by category, gotchas | diff --git a/skills/semantic-view-patterns/snippets/fact_as_relationship_key/queries.sql b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/queries.sql new file mode 100644 index 00000000..586a6b75 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/queries.sql @@ -0,0 +1,81 @@ +-- Fact as Relationship Key: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Revenue vs budget by fiscal quarter +-- The computed FK (fiscal_qtr_key) silently resolves the join to fiscal_quarters. +-- Expected: Q1-Q4 2023 at 43-44%, declining late year; Q2 2024 best at 53.7% +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SALES_VS_BUDGET_SV + METRICS sales.total_revenue, fiscal_quarters.total_budget + DIMENSIONS fiscal_quarters.quarter_name +) +ORDER BY quarter_name; + + +-- 2. Revenue vs budget by fiscal year — multi-quarter rollup +-- fiscal_year rolls up all quarters in the year; budget sums their targets +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SALES_VS_BUDGET_SV + METRICS sales.total_revenue, fiscal_quarters.total_budget + DIMENSIONS fiscal_quarters.fiscal_year +) +ORDER BY fiscal_year; + + +-- 3. Revenue by product category + fiscal quarter +-- Shows how product mix shifts across quarters (Services strong in Q4, Hardware in Q2 2024) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SALES_VS_BUDGET_SV + METRICS sales.total_revenue + DIMENSIONS products.category, fiscal_quarters.quarter_name +) +ORDER BY quarter_name, category; + + +-- 4. Budget attainment % — computed in the outer query +-- The SV exposes the raw revenue and budget; attainment is a derived ratio +SELECT + sv.quarter_name, + sv.total_revenue, + sv.total_budget, + ROUND(sv.total_revenue / NULLIF(sv.total_budget, 0) * 100, 1) AS attainment_pct +FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SALES_VS_BUDGET_SV + METRICS sales.total_revenue, fiscal_quarters.total_budget + DIMENSIONS fiscal_quarters.quarter_name +) AS sv +ORDER BY sv.quarter_name; + + +-- ============================================================ +-- WHAT DOESN'T WORK +-- ============================================================ + +-- ERROR: Cannot query fiscal_qtr_key as a dimension — it is a FACT +-- used only to resolve the join, not a queryable dimension or metric. +-- +-- SELECT * FROM SEMANTIC_VIEW( +-- SNIPPETS.PUBLIC.SALES_VS_BUDGET_SV +-- DIMENSIONS sales.fiscal_qtr_key -- Error: no dimension named fiscal_qtr_key +-- ); +-- +-- If you want the quarter key to be queryable, add it as a DIMENSION instead: +-- DIMENSIONS ( +-- sales.fiscal_qtr_key_dim AS CONCAT(TO_VARCHAR(YEAR(sale_date)), '-Q', TO_VARCHAR(QUARTER(sale_date))) +-- ) +-- Note: a column cannot simultaneously be a FACT (for join use) and a DIMENSION. +-- You would define TWO separate entries — the FACT for the relationship and a +-- separate DIMENSION with the same expression for display purposes. + + +-- HOW COMPUTED FK FACTS WORK: +-- The engine evaluates the FACT expression (e.g. '2024-Q2') per row on the sales +-- table, then uses that value to look up a matching row in fiscal_quarters via its +-- PRIMARY KEY (fiscal_quarter_key). If no match is found, the row is excluded +-- (same semantics as an INNER JOIN). The computed value is never stored. diff --git a/skills/semantic-view-patterns/snippets/fact_as_relationship_key/schema.sql b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/schema.sql new file mode 100644 index 00000000..165dd484 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/schema.sql @@ -0,0 +1,29 @@ +-- Fact as Relationship Key: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE products ( + product_id INTEGER NOT NULL, + product_name VARCHAR(50) NOT NULL, + category VARCHAR(30) NOT NULL, + CONSTRAINT pk_products PRIMARY KEY (product_id) +); + +CREATE OR REPLACE TABLE fiscal_quarters ( + fiscal_quarter_key VARCHAR(10) NOT NULL, -- e.g. '2024-Q2' + quarter_name VARCHAR(20) NOT NULL, -- e.g. 'Q2 FY2024' + fiscal_year INTEGER NOT NULL, + budget_amount NUMBER(12,2) NOT NULL, + CONSTRAINT pk_fiscal_quarters PRIMARY KEY (fiscal_quarter_key) +); + +CREATE OR REPLACE TABLE sales ( + sale_id INTEGER NOT NULL, + sale_date DATE NOT NULL, + product_id INTEGER NOT NULL, + amount NUMBER(10,2) NOT NULL, + CONSTRAINT pk_sales PRIMARY KEY (sale_id) +); diff --git a/skills/semantic-view-patterns/snippets/fact_as_relationship_key/seed_data.sql b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/seed_data.sql new file mode 100644 index 00000000..64beba49 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/seed_data.sql @@ -0,0 +1,35 @@ +-- Fact as Relationship Key: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO products VALUES + (1, 'Widget Pro', 'Hardware'), + (2, 'Widget Lite', 'Hardware'), + (3, 'SupportPlus', 'Services'); + +-- 6 fiscal quarters of budget targets +INSERT INTO fiscal_quarters VALUES + ('2023-Q1', 'Q1 FY2023', 2023, 80000.00), + ('2023-Q2', 'Q2 FY2023', 2023, 90000.00), + ('2023-Q3', 'Q3 FY2023', 2023, 95000.00), + ('2023-Q4', 'Q4 FY2023', 2023, 100000.00), + ('2024-Q1', 'Q1 FY2024', 2024, 85000.00), + ('2024-Q2', 'Q2 FY2024', 2024, 95000.00); + +-- 13 sales spanning 2023-Q1 through 2024-Q2 +-- Designed so attainment ranges from ~37% (2023-Q3) to ~53% (2024-Q2) +INSERT INTO sales VALUES + ( 1, '2023-01-15', 1, 15000.00), -- Q1 2023 + ( 2, '2023-02-20', 2, 8000.00), -- Q1 2023 + ( 3, '2023-03-10', 3, 12000.00), -- Q1 2023 → Q1 total: 35000 / 80000 = 43.8% + ( 4, '2023-04-05', 1, 18000.00), -- Q2 2023 + ( 5, '2023-05-12', 3, 22000.00), -- Q2 2023 → Q2 total: 40000 / 90000 = 44.4% + ( 6, '2023-07-08', 2, 14000.00), -- Q3 2023 + ( 7, '2023-09-25', 1, 20000.00), -- Q3 2023 → Q3 total: 34000 / 95000 = 35.8% + ( 8, '2023-10-11', 3, 25000.00), -- Q4 2023 + ( 9, '2023-11-30', 2, 11000.00), -- Q4 2023 → Q4 total: 36000 / 100000 = 36.0% + (10, '2024-01-20', 1, 17000.00), -- Q1 2024 + (11, '2024-02-14', 3, 19000.00), -- Q1 2024 → Q1 total: 36000 / 85000 = 42.4% + (12, '2024-04-03', 2, 21000.00), -- Q2 2024 + (13, '2024-05-22', 1, 30000.00); -- Q2 total: 51000 / 95000 = 53.7% diff --git a/skills/semantic-view-patterns/snippets/fact_as_relationship_key/semantic_view.sql b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/semantic_view.sql new file mode 100644 index 00000000..71e0fcee --- /dev/null +++ b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/semantic_view.sql @@ -0,0 +1,58 @@ +-- Fact as Relationship Key: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.SALES_VS_BUDGET_SV + + TABLES ( + products PRIMARY KEY (product_id), + fiscal_quarters PRIMARY KEY (fiscal_quarter_key), + sales PRIMARY KEY (sale_id) + ) + + RELATIONSHIPS ( + -- Standard FK: each sale references a product + sales_to_products AS sales(product_id) REFERENCES products, + + -- Computed FK: sales has no fiscal_quarter_key column, so we derive + -- it as a FACT below and use that fact as the join key here. + sales_to_quarters AS sales(sales.fiscal_qtr_key) REFERENCES fiscal_quarters + ) + + FACTS ( + -- Computed FK fact: derives the fiscal quarter key from sale_date. + -- No physical column on the sales table — the engine evaluates this + -- expression per row and uses the result to resolve the join above. + sales.fiscal_qtr_key AS CONCAT( + TO_VARCHAR(YEAR(sale_date)), + '-Q', + TO_VARCHAR(QUARTER(sale_date)) + ) + ) + + DIMENSIONS ( + products.category AS category + WITH SYNONYMS ('product category', 'category'), + products.product_name AS product_name + WITH SYNONYMS ('product', 'item'), + + fiscal_quarters.quarter_name AS quarter_name + WITH SYNONYMS ('quarter', 'fiscal quarter', 'period'), + fiscal_quarters.fiscal_year AS fiscal_year + WITH SYNONYMS ('year', 'fy'), + + sales.sale_date AS sale_date + WITH SYNONYMS ('date', 'transaction date') + ) + + METRICS ( + sales.total_revenue AS SUM(amount) + WITH SYNONYMS ('revenue', 'sales', 'total sales'), + fiscal_quarters.total_budget AS SUM(budget_amount) + WITH SYNONYMS ('budget', 'target', 'quota') + ) + + COMMENT = 'Sales vs fiscal-quarter budgets. Demonstrates joining a fact table to a dimension using a computed FK fact (CONCAT of YEAR + QUARTER) when no physical FK column exists on the source table.' + + AI_SQL_GENERATION 'Use fiscal_quarters.quarter_name or fiscal_quarters.fiscal_year to break down results by time period. Use products.category to compare Hardware vs Services. To compare revenue against budget, query both sales.total_revenue and fiscal_quarters.total_budget together.'; diff --git a/skills/semantic-view-patterns/snippets/fact_as_relationship_key/semantic_view.yaml b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/semantic_view.yaml new file mode 100644 index 00000000..5ded03bb --- /dev/null +++ b/skills/semantic-view-patterns/snippets/fact_as_relationship_key/semantic_view.yaml @@ -0,0 +1,104 @@ +# Fact as Relationship Key: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# DDL-only features: +# - AI_SQL_GENERATION +# - Computed FK facts used as relationship keys (e.g. CONCAT(YEAR, '-Q', QUARTER)) +# are not supported as relationship join keys in YAML. YAML relationships must +# reference physical columns. Use semantic_view.sql for the computed-FK pattern. +# This YAML uses a placeholder relationship on SALE_DATE for structural completeness. + +name: SALES_VS_BUDGET_SV +description: > + Sales vs fiscal-quarter budgets. The computed FK fact pattern (deriving a join + key via CONCAT(YEAR + QUARTER)) requires DDL authoring. This YAML defines the + base structure; use semantic_view.sql for the full computed-FK relationship. + +tables: + - name: products + description: Product catalog + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: PRODUCTS + primary_key: + columns: [PRODUCT_ID] + dimensions: + - name: category + synonyms: [product category, category] + expr: CATEGORY + data_type: VARCHAR + - name: product_name + synonyms: [product, item] + expr: PRODUCT_NAME + data_type: VARCHAR + + - name: fiscal_quarters + description: Fiscal quarter budget targets + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: FISCAL_QUARTERS + primary_key: + columns: [FISCAL_QUARTER_KEY] + dimensions: + - name: quarter_name + synonyms: [quarter, fiscal quarter, period] + expr: QUARTER_NAME + data_type: VARCHAR + - name: fiscal_year + synonyms: [year, fy] + expr: FISCAL_YEAR + data_type: NUMBER + metrics: + - name: total_budget + synonyms: [budget, target, quota] + expr: SUM(BUDGET_AMOUNT) + + - name: sales + description: Sales transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: SALES + primary_key: + columns: [SALE_ID] + dimensions: + - name: sale_date + synonyms: [date, transaction date] + expr: SALE_DATE + data_type: DATE + facts: + # The computed FK (CONCAT(YEAR, '-Q', QUARTER)) that drives the fiscal_quarter + # join is a DDL-only feature. Shown here as a regular fact for documentation. + - name: fiscal_qtr_key + description: > + Computed fiscal quarter key. NOTE: as a relationship join key, + this requires DDL authoring (semantic_view.sql). + expr: CONCAT(TO_VARCHAR(YEAR(SALE_DATE)), '-Q', TO_VARCHAR(QUARTER(SALE_DATE))) + data_type: VARCHAR + metrics: + - name: total_revenue + synonyms: [revenue, sales, total sales] + expr: SUM(AMOUNT) + +relationships: + - name: sales_to_products + left_table: sales + right_table: products + relationship_columns: + - left_column: PRODUCT_ID + right_column: PRODUCT_ID + # NOTE: Full computed-FK join requires DDL. YAML placeholder uses a direct column match. + # In DDL: sales_to_quarters AS sales(sales.fiscal_qtr_key) REFERENCES fiscal_quarters + - name: sales_to_quarters + left_table: sales + right_table: fiscal_quarters + relationship_columns: + - left_column: FISCAL_QUARTER_KEY + right_column: FISCAL_QUARTER_KEY diff --git a/skills/semantic-view-patterns/snippets/inline_sv/README.md b/skills/semantic-view-patterns/snippets/inline_sv/README.md new file mode 100644 index 00000000..20bb1797 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/inline_sv/README.md @@ -0,0 +1,85 @@ +# Inline Semantic View and SQL Subquery as Table + +> ⚠️ **Private Preview feature** — the inline SV syntax (`WITH ... AS SEMANTIC VIEW`) is not yet generally available. Contact your Snowflake account team to enable. + +## Two Related Patterns + +Both patterns let you work with Semantic Views **without creating a persistent named object**. + +--- + +## Pattern 1: SQL Subquery as Table Definition + +Use a SQL query as the source for a table in the TABLES clause. The SV definition is persisted (via CREATE), but one of its "tables" is actually an inline SQL expression. + +**Use when:** You want to filter source data, exclude certain rows, or combine tables before exposing them through the SV — without creating an intermediate view. + +```sql +CREATE SEMANTIC VIEW my_sv +TABLES ( + orders, + customers AS ( + SELECT * FROM customers WHERE tier = 'premium' + ) UNIQUE (customer_id) +) +... +``` + +The subquery filters to only premium customers. The SV consumers see a "customers" entity that already has the filter applied. + +--- + +## Pattern 2: Inline / Ad-Hoc Semantic View (SV CTE) + +Define and query a SV in a single statement — no CREATE needed. The SV exists only for the duration of the query. + +**Use when:** Testing DDL before committing, writing dbt unit tests, rapid prototyping. + +```sql +WITH adhoc_sv AS SEMANTIC VIEW +TABLES ( + orders, + customers UNIQUE (customer_id) +) +RELATIONSHIPS ( + orders(customer_id) REFERENCES customers +) +DIMENSIONS ( + customers.customer_name AS customer_name +) +METRICS ( + orders.total_revenue AS SUM(amount) +) +SELECT * FROM SEMANTIC_VIEW( + adhoc_sv + DIMENSIONS customers.customer_name + METRICS orders.total_revenue +); +``` + +--- + +## Pattern Comparison + +| | SQL Subquery in TABLES | Ad-Hoc SV (WITH ... AS SEMANTIC VIEW) | +|--|------------------------|--------------------------------------| +| Persisted | Yes (CREATE SEMANTIC VIEW) | No — exists for one query only | +| Usable by Cortex Analyst | Yes | No | +| Use for testing | Limited | Ideal — no DDL pollution | +| dbt unit testing | No | Yes | +| Filter source data | Yes | Yes | + +--- + +## Docs + +- [Using an SQL query as a logical table in a semantic view ⚠️ Private Preview](https://docs.snowflake.com/en/LIMITEDACCESS/semantic-views-inline-view) +- [WITH ... AS SEMANTIC VIEW (inline SV)](https://docs.snowflake.com/en/sql-reference/constructs/semantic_view) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `inline_orders` + `inline_customers` | +| `seed_data.sql` | 4 customers (2 premium, 2 standard), 6 orders | +| `semantic_view.sql` | Both patterns with full working SQL | diff --git a/skills/semantic-view-patterns/snippets/inline_sv/schema.sql b/skills/semantic-view-patterns/snippets/inline_sv/schema.sql new file mode 100644 index 00000000..63610ded --- /dev/null +++ b/skills/semantic-view-patterns/snippets/inline_sv/schema.sql @@ -0,0 +1,21 @@ +-- Inline SV: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- Tables shared by both inline SV patterns +CREATE OR REPLACE TABLE inline_orders ( + order_id INTEGER NOT NULL, + customer_id INTEGER NOT NULL, + amount NUMBER(10,2) NOT NULL, + status VARCHAR(20) NOT NULL +); + +CREATE OR REPLACE TABLE inline_customers ( + customer_id INTEGER NOT NULL, + customer_name VARCHAR(50) NOT NULL, + tier VARCHAR(20) NOT NULL, + CONSTRAINT pk_inline_customers PRIMARY KEY (customer_id) +); diff --git a/skills/semantic-view-patterns/snippets/inline_sv/seed_data.sql b/skills/semantic-view-patterns/snippets/inline_sv/seed_data.sql new file mode 100644 index 00000000..83810e7d --- /dev/null +++ b/skills/semantic-view-patterns/snippets/inline_sv/seed_data.sql @@ -0,0 +1,18 @@ +-- Inline SV: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO inline_customers VALUES + (1, 'Alice Martin', 'premium'), + (2, 'Bob Chen', 'standard'), + (3, 'Carol White', 'premium'), + (4, 'Dan Patel', 'standard'); + +INSERT INTO inline_orders VALUES + (101, 1, 1200.00, 'completed'), + (102, 1, 800.00, 'completed'), + (103, 2, 500.00, 'completed'), + (104, 2, 300.00, 'refunded'), + (105, 3, 950.00, 'completed'), + (106, 4, 200.00, 'pending'); diff --git a/skills/semantic-view-patterns/snippets/inline_sv/semantic_view.sql b/skills/semantic-view-patterns/snippets/inline_sv/semantic_view.sql new file mode 100644 index 00000000..8d89c23a --- /dev/null +++ b/skills/semantic-view-patterns/snippets/inline_sv/semantic_view.sql @@ -0,0 +1,111 @@ +-- Inline SV: Queries / Semantic View DDL +-- This snippet has two distinct patterns — both are shown here. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- PATTERN 1: SQL SUBQUERY AS TABLE DEFINITION +-- A SQL query used as the source of a table in the TABLES clause. +-- Useful for: filtering source data, combining tables at load time, +-- exposing only certain rows to the SV without an intermediate view. +-- ============================================================ + +-- SV with inline subquery filter: +-- Only "premium" customers are exposed to the SV consumer. +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.PREMIUM_ORDERS_SV +TABLES ( + inline_orders, + inline_customers AS ( + SELECT * FROM inline_customers + WHERE tier = 'premium' + ) UNIQUE (customer_id) +) +RELATIONSHIPS ( + inline_orders(customer_id) REFERENCES inline_customers +) +DIMENSIONS ( + inline_customers.customer_name AS customer_name +) +METRICS ( + inline_orders.total_revenue AS SUM(amount), + inline_orders.order_count AS COUNT(order_id) +); + +-- Query it — only premium customers are included +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.PREMIUM_ORDERS_SV + DIMENSIONS inline_customers.customer_name + METRICS inline_orders.total_revenue +); + + +-- ============================================================ +-- PATTERN 2: INLINE / AD-HOC SEMANTIC VIEW (SV CTE) +-- Define and query a SV in one statement — no CREATE needed. +-- The SV exists only for the duration of the query. +-- Useful for: testing SV DDL before committing, +-- dbt unit testing, ad-hoc exploration. +-- ============================================================ + +-- Inline SV using WITH ... AS SEMANTIC VIEW: +WITH adhoc_sv AS SEMANTIC VIEW +TABLES ( + inline_orders, + inline_customers UNIQUE (customer_id) +) +RELATIONSHIPS ( + inline_orders(customer_id) REFERENCES inline_customers +) +DIMENSIONS ( + inline_customers.customer_name AS customer_name, + inline_customers.tier AS tier +) +METRICS ( + inline_orders.total_revenue AS SUM(amount), + inline_orders.order_count AS COUNT(order_id) +) +SELECT * FROM SEMANTIC_VIEW( + adhoc_sv + DIMENSIONS inline_customers.customer_name, inline_customers.tier + METRICS inline_orders.total_revenue +) +ORDER BY total_revenue DESC; + + +-- Another inline SV — test a filter before committing to the DDL: +WITH test_sv AS SEMANTIC VIEW +TABLES ( + inline_orders, + inline_customers UNIQUE (customer_id) +) +RELATIONSHIPS ( + inline_orders(customer_id) REFERENCES inline_customers +) +DIMENSIONS ( + inline_customers.customer_name AS customer_name +) +METRICS ( + inline_orders.completed_revenue AS SUM(amount) +) +-- Metric-only query (no dimension needed): +SELECT * FROM SEMANTIC_VIEW( + test_sv + METRICS inline_orders.completed_revenue +); + + +-- ============================================================ +-- RULES AND GOTCHAS +-- ============================================================ + +-- Pattern 1 (subquery in TABLES clause): +-- - SQL query must include the unique/primary key columns +-- - Subquery syntax: table_alias AS (SELECT ...) UNIQUE (key_col) +-- - Changes to underlying tables NOT visible until SV is replaced + +-- Pattern 2 (WITH ... AS SEMANTIC VIEW): +-- - Does NOT create a persistent SV — exists only for the query +-- - Cannot be referenced by Cortex Analyst (no saved SV to target) +-- - Great for dbt model testing and iterative DDL development +-- - The SEMANTIC_VIEW() call immediately follows the WITH block (no SELECT needed) diff --git a/skills/semantic-view-patterns/snippets/inline_sv/semantic_view.yaml b/skills/semantic-view-patterns/snippets/inline_sv/semantic_view.yaml new file mode 100644 index 00000000..f4ea8a92 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/inline_sv/semantic_view.yaml @@ -0,0 +1,62 @@ +# Inline SV: Semantic View YAML +# +# ⚠️ INLINE SQL SUBQUERIES IN TABLES NOT SUPPORTED IN YAML: +# The scoped_dataset / inline SQL pattern (WHERE filter or JOIN embedded in +# the TABLES clause) is a DDL-only feature. YAML base_table must point to +# a physical table or view — inline SQL is not supported. +# +# ⚠️ WITH ... AS SEMANTIC VIEW (inline / ad-hoc SV CTE) is DDL-only. +# There is no YAML equivalent; YAML always creates a persistent named SV. +# +# This YAML defines the base SV structure using the physical tables. +# For the filter-in-TABLES and inline CTE patterns, use semantic_view.sql. +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); + +name: PREMIUM_ORDERS_SV +description: > + Orders for premium customers. NOTE: the inline SQL filter (WHERE tier='premium') + embedded in the TABLES clause is DDL-only. This YAML uses the physical tables; + create a helper view (CREATE VIEW ... AS SELECT ... WHERE tier='premium') and + reference that view here as the base_table for the same scoping effect. + +tables: + - name: inline_orders + description: Order transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: INLINE_ORDERS + metrics: + - name: total_revenue + expr: SUM(AMOUNT) + - name: order_count + expr: COUNT(ORDER_ID) + + - name: inline_customers + description: > + Customer master. To scope to premium-only, replace this base_table + with a helper view: CREATE VIEW inline_customers_premium AS + SELECT * FROM INLINE_CUSTOMERS WHERE tier = 'premium'. + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: INLINE_CUSTOMERS + primary_key: + columns: [CUSTOMER_ID] + dimensions: + - name: customer_name + expr: CUSTOMER_NAME + data_type: VARCHAR + - name: tier + expr: TIER + data_type: VARCHAR + +relationships: + - name: orders_to_customers + left_table: inline_orders + right_table: inline_customers + relationship_columns: + - left_column: CUSTOMER_ID + right_column: CUSTOMER_ID diff --git a/skills/semantic-view-patterns/snippets/introspection/README.md b/skills/semantic-view-patterns/snippets/introspection/README.md new file mode 100644 index 00000000..3a03e998 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/introspection/README.md @@ -0,0 +1,72 @@ +# Introspection Commands + +## The Problem + +You need to discover what's in a Semantic View, understand metric-dimension compatibility, or trace data lineage — without reading the raw DDL. + +## Commands Covered + +### `DESCRIBE SEMANTIC VIEW` +Full DDL round-trip inspection — returns every table, relationship, dimension, metric, fact, VQR, and AI metadata block. +```sql +DESCRIBE SEMANTIC VIEW SNIPPETS.PUBLIC.MULTI_CHANNEL_SV; +``` + +### `SHOW SEMANTIC VIEWS` +List all SVs in a schema (with optional pattern matching). +```sql +SHOW SEMANTIC VIEWS IN SNIPPETS.PUBLIC; +SHOW SEMANTIC VIEWS LIKE '%CHANNEL%' IN SNIPPETS.PUBLIC; +``` + +### `SHOW SEMANTIC METRICS` +List all metrics in a SV — logical name, expression, synonyms, tags. +```sql +SHOW SEMANTIC METRICS IN SNIPPETS.PUBLIC.MULTI_CHANNEL_SV; +``` + +### `SHOW SEMANTIC DIMENSIONS FOR METRIC` +Critical for multi-fact SVs: which dimensions are **compatible** with a specific metric? +```sql +SHOW SEMANTIC DIMENSIONS IN SNIPPETS.PUBLIC.MULTI_CHANNEL_SV +FOR METRIC CHANNEL_STORE_SALES.STORE_REVENUE; +``` +Different metrics may have different dimension availability — this command tells you exactly what can be paired without errors. + +### `snowflake.core.get_lineage()` +Table function for upstream (source tables) and downstream (reports, agents) dependency tracing. +```sql +SELECT * FROM TABLE( + SNOWFLAKE.CORE.GET_LINEAGE( + 'SNIPPETS.PUBLIC.MULTI_CHANNEL_SV', 'SEMANTIC_VIEW', 'UPSTREAM', 5 + ) +); +``` + +## When to Use Each + +| Task | Command | +|------|---------| +| Read the SV definition | `DESCRIBE SEMANTIC VIEW` | +| Find all SVs in a schema | `SHOW SEMANTIC VIEWS` | +| Discover available metrics | `SHOW SEMANTIC METRICS` | +| Check metric-dim compatibility | `SHOW SEMANTIC DIMENSIONS FOR METRIC` | +| Find which tables feed this SV | `GET_LINEAGE ... 'UPSTREAM'` | +| Find what depends on this SV | `GET_LINEAGE ... 'DOWNSTREAM'` | + +## Docs + +- [DESCRIBE SEMANTIC VIEW](https://docs.snowflake.com/en/sql-reference/sql/desc-semantic-view) +- [SHOW SEMANTIC VIEWS](https://docs.snowflake.com/en/sql-reference/sql/show-semantic-views) +- [SHOW SEMANTIC METRICS](https://docs.snowflake.com/en/sql-reference/sql/show-semantic-metrics) +- [SHOW SEMANTIC DIMENSIONS FOR METRIC](https://docs.snowflake.com/en/sql-reference/sql/show-semantic-dimensions-for-metric) +- [GET_LINEAGE function](https://docs.snowflake.com/en/sql-reference/functions/get_lineage) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | Reference note — uses `multi_fact_table` SV | +| `queries.sql` | All introspection commands with annotations | + +**Prerequisites:** Deploy `multi_fact_table/semantic_view.sql` first (creates `SNIPPETS.PUBLIC.MULTI_CHANNEL_SV`). diff --git a/skills/semantic-view-patterns/snippets/introspection/queries.sql b/skills/semantic-view-patterns/snippets/introspection/queries.sql new file mode 100644 index 00000000..9509c6e5 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/introspection/queries.sql @@ -0,0 +1,84 @@ +-- Introspection: Queries +-- Prerequisites: deploy multi_fact_table/semantic_view.sql first + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- DESCRIBE — full DDL round-trip view +-- ============================================================ + +-- Returns all tables, relationships, dimensions, metrics, facts, VQRs, and AI metadata +DESCRIBE SEMANTIC VIEW SNIPPETS.PUBLIC.MULTI_CHANNEL_SV; + + +-- ============================================================ +-- SHOW SEMANTIC VIEWS +-- ============================================================ + +-- List all SVs in a schema +SHOW SEMANTIC VIEWS IN SNIPPETS.PUBLIC; + +-- List SVs matching a pattern +SHOW SEMANTIC VIEWS LIKE '%CHANNEL%' IN SNIPPETS.PUBLIC; + + +-- ============================================================ +-- SHOW SEMANTIC METRICS — discover what metrics are available +-- ============================================================ + +-- List all metrics in a SV (name, synonyms, expression, tags) +SHOW SEMANTIC METRICS IN SNIPPETS.PUBLIC.MULTI_CHANNEL_SV; + +-- You can also use this output in downstream tooling: +-- the result includes metric logical names usable in SEMANTIC_VIEW() queries. + + +-- ============================================================ +-- SHOW SEMANTIC DIMENSIONS FOR METRIC — dimension compatibility +-- ============================================================ + +-- Which dimensions can be used with store_revenue? +SHOW SEMANTIC DIMENSIONS IN SNIPPETS.PUBLIC.MULTI_CHANNEL_SV +FOR METRIC CHANNEL_STORE_SALES.STORE_REVENUE; + +-- Which dimensions can be used with web_revenue? +SHOW SEMANTIC DIMENSIONS IN SNIPPETS.PUBLIC.MULTI_CHANNEL_SV +FOR METRIC CHANNEL_WEB_SALES.WEB_REVENUE; + +-- Key insight: metrics from different fact tables may have different +-- dimension compatibility. SHOW SEMANTIC DIMENSIONS tells you exactly +-- which dimensions are reachable for each metric — useful when debugging +-- "dimension not available for this metric" errors. + + +-- ============================================================ +-- LINEAGE — upstream and downstream dependencies +-- ============================================================ + +-- What tables does this SV depend on? (upstream) +SELECT SOURCE_OBJECT_NAME, TARGET_OBJECT_NAME, SOURCE_OBJECT_DOMAIN, + TARGET_OBJECT_DOMAIN, DISTANCE +FROM TABLE( + SNOWFLAKE.CORE.GET_LINEAGE( + 'SNIPPETS.PUBLIC.MULTI_CHANNEL_SV', + 'SEMANTIC_VIEW', + 'UPSTREAM', + 5 + ) +) +ORDER BY DISTANCE, SOURCE_OBJECT_NAME; + + +-- What depends on this SV? (downstream — reports, pipelines, agents) +SELECT SOURCE_OBJECT_NAME, TARGET_OBJECT_NAME, SOURCE_OBJECT_DOMAIN, + TARGET_OBJECT_DOMAIN, DISTANCE +FROM TABLE( + SNOWFLAKE.CORE.GET_LINEAGE( + 'SNIPPETS.PUBLIC.MULTI_CHANNEL_SV', + 'SEMANTIC_VIEW', + 'DOWNSTREAM', + 5 + ) +) +ORDER BY DISTANCE, TARGET_OBJECT_NAME; diff --git a/skills/semantic-view-patterns/snippets/introspection/schema.sql b/skills/semantic-view-patterns/snippets/introspection/schema.sql new file mode 100644 index 00000000..e7e60874 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/introspection/schema.sql @@ -0,0 +1,8 @@ +-- Introspection: Schema +-- This snippet uses the multi_fact_table SV as its example target. +-- Run multi_fact_table/schema.sql and seed_data.sql first, then deploy the SV. + +-- No new tables needed — introspection commands operate on existing objects. + +-- Reference SV: SNIPPETS.PUBLIC.MULTI_CHANNEL_SV +-- Created by: multi_fact_table/semantic_view.sql diff --git a/skills/semantic-view-patterns/snippets/materialization/README.md b/skills/semantic-view-patterns/snippets/materialization/README.md new file mode 100644 index 00000000..0c94a9ae --- /dev/null +++ b/skills/semantic-view-patterns/snippets/materialization/README.md @@ -0,0 +1,95 @@ +# Materialization + +> ⚠️ **Private Preview feature** — available only to selected accounts. Contact your Snowflake account team to enable. + +## The Problem + +Semantic view queries scan the underlying base tables and re-aggregate on every request. For large datasets or high-query-volume analytics, this can be slow and expensive. **Materialization** pre-computes selected dimension/metric combinations and stores them, so queries can read from the pre-aggregated result instead of scanning base tables. + +## How You Might Express This Need + +- "Our revenue-by-customer query runs on 100M rows every time — can we pre-aggregate it?" +- "We have historical data from 3 years ago that never changes — can we freeze-materialize it?" +- "The SV is fast for small queries but slow for the full rollup our CFO dashboard runs daily" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | Materialized views; pre-aggregated summary tables | +| **dbt** | `+materialized: table` on summary models; incremental models | +| **LookML** | Aggregate awareness / persistent derived tables (PDTs) | +| **Power BI** | Aggregations feature on Import tables | +| **Tableau** | Materialized extracts | + +## How It Works + +**Step 1:** Set `MAX_STALENESS` on the SV (minimum 120 seconds): +```sql +CREATE SEMANTIC VIEW my_sv + ... + MAX_STALENESS = '1 hour'; +``` + +**Step 2:** Grant the materialization privilege to your role: +```sql +GRANT ADD SEMANTIC VIEW MATERIALIZATION ON SCHEMA db.schema TO ROLE my_role; +``` + +**Step 3:** Add a materialization for the dimensions/metrics you want pre-aggregated: +```sql +ALTER SEMANTIC VIEW my_sv ADD MATERIALIZATION revenue_by_customer + WAREHOUSE = my_wh + AS + DIMENSIONS mat_customers.customer_name, mat_orders.order_year + METRICS mat_orders.total_revenue; +``` + +Queries automatically use the materialization — no change to query syntax required. + +## Reaggregation: Additive vs Non-Additive + +A materialization on `(customer, year)` can serve a query for just `(customer)` — by summing across year values. + +| Metric type | Reaggregatable? | Notes | +|-------------|----------------|-------| +| `SUM` | ✅ Yes | Sum across extra dimensions | +| `COUNT` | ✅ Yes | Sum across extra dimensions | +| `MIN` / `MAX` | ✅ Yes | Re-apply MIN/MAX | +| `AVG` | ❌ No | Weighted average can't be derived from group averages | +| `COUNT(DISTINCT ...)` | ❌ No | Can't re-count distinct from a pre-counted result | +| `MEDIAN`, `PERCENTILE` | ❌ No | Non-decomposable statistics | + +## IMMUTABLE WHERE — Incremental Refresh + +Without `IMMUTABLE WHERE`, every refresh recomputes the entire materialization (expensive for large SVs). + +```sql +ALTER SEMANTIC VIEW my_sv ADD MATERIALIZATION historical_revenue + WAREHOUSE = my_wh + IMMUTABLE WHERE (order_date < '2024-01-01') -- only rows AFTER this date are refreshed + AS ... +``` + +**Strongly recommended** for historical data that doesn't change. + +## What Cannot Be Materialized + +- Window function metrics (LAG, rolling AVG, YTD) +- Semi-additive metrics (`NON ADDITIVE BY`) +- Metrics with `USING` clause (multi-path disambiguation) + +## Docs + +- [Materializing dimensions and metrics in semantic views ⚠️ Private Preview](https://docs.snowflake.com/en/LIMITEDACCESS/semantic-views-materialization) +- [ALTER SEMANTIC VIEW — ADD / REFRESH / DROP MATERIALIZATION](https://docs.snowflake.com/en/sql-reference/sql/alter-semantic-view) +- [SEMANTIC_VIEW_MATERIALIZATION_REFRESH_HISTORY](https://docs.snowflake.com/en/sql-reference/functions/semantic_view_materialization_refresh_history) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `mat_orders` + `mat_customers` | +| `seed_data.sql` | 5 customers, 12 orders spanning 2023-2024 | +| `semantic_view.sql` | SV creation + ADD MATERIALIZATION + all operational commands | +| `queries.sql` | Queries showing when materialization is used, reaggregation, fallback | diff --git a/skills/semantic-view-patterns/snippets/materialization/queries.sql b/skills/semantic-view-patterns/snippets/materialization/queries.sql new file mode 100644 index 00000000..7738ad3b --- /dev/null +++ b/skills/semantic-view-patterns/snippets/materialization/queries.sql @@ -0,0 +1,86 @@ +-- Materialization: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- QUERIES — the SV is queried identically with or without materialization. +-- When a suitable materialization exists and is fresh, Snowflake rewrites +-- the query to read from the materialized result instead of the base tables. +-- No change to query syntax is needed. +-- ============================================================ + +-- 1. Revenue by customer and year +-- → WILL use the revenue_by_customer_year materialization (exact match) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV + DIMENSIONS mat_customers.customer_name, mat_orders.order_year + METRICS mat_orders.total_revenue +) +ORDER BY customer_name, order_year; + + +-- 2. Revenue by customer only (no year) +-- → WILL use revenue_by_customer_year: reaggregates by summing across order_year. +-- Reaggregation works because total_revenue is SUM (additive). +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV + DIMENSIONS mat_customers.customer_name + METRICS mat_orders.total_revenue +) +ORDER BY total_revenue DESC; + + +-- 3. Revenue by year only +-- → WILL use revenue_by_customer_year: reaggregates by summing across customer_name. +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV + DIMENSIONS mat_orders.order_year + METRICS mat_orders.total_revenue +); + + +-- 4. Revenue by date, segment, region (pre-2024 data) +-- → WILL use historical_revenue materialization (IMMUTABLE WHERE order_date < 2024) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV + DIMENSIONS mat_orders.order_date, mat_customers.segment, mat_orders.region + METRICS mat_orders.total_revenue + WHERE mat_orders.order_date < '2024-01-01' +) +ORDER BY order_date; + + +-- 5. Average order value (AVG = non-additive) +-- → CANNOT use materialization for reaggregation — falls back to base tables. +-- To use a materialization for AVG queries, the materialization must include +-- the EXACT same set of dimensions as the query. +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV + DIMENSIONS mat_customers.segment + METRICS mat_orders.avg_order +); + + +-- ============================================================ +-- REAGGREGATION RULES SUMMARY +-- ============================================================ +-- Additive (can reaggregate from finer-grained materialization): +-- SUM ✓ COUNT ✓ MIN ✓ MAX ✓ + +-- Non-additive (cannot reaggregate — materialization must be exact match): +-- AVG ✗ COUNT(DISTINCT) ✗ MEDIAN ✗ PERCENTILE ✗ + +-- Cannot be materialized at all: +-- Window function metrics (LAG, rolling AVG, YTD) ✗ +-- Semi-additive metrics (NON ADDITIVE BY) ✗ +-- Metrics with USING clause ✗ + + +-- ============================================================ +-- WHEN MATERIALIZATION IS SKIPPED (fallback to base tables) +-- ============================================================ +-- • No materialization covers the requested dimensions/metrics +-- • Materialization is older than MAX_STALENESS +-- • A masking policy or row access policy exists on base tables +-- • Non-additive metric requires reaggregation (dimensions don't match exactly) diff --git a/skills/semantic-view-patterns/snippets/materialization/schema.sql b/skills/semantic-view-patterns/snippets/materialization/schema.sql new file mode 100644 index 00000000..fdc77594 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/materialization/schema.sql @@ -0,0 +1,21 @@ +-- Materialization: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE mat_orders ( + order_id INTEGER NOT NULL, + customer_id INTEGER NOT NULL, + order_date DATE NOT NULL, + amount NUMBER(10,2) NOT NULL, + region VARCHAR(30) NOT NULL +); + +CREATE OR REPLACE TABLE mat_customers ( + customer_id INTEGER NOT NULL, + customer_name VARCHAR(50) NOT NULL, + segment VARCHAR(30) NOT NULL, + CONSTRAINT pk_mat_customers PRIMARY KEY (customer_id) +); diff --git a/skills/semantic-view-patterns/snippets/materialization/seed_data.sql b/skills/semantic-view-patterns/snippets/materialization/seed_data.sql new file mode 100644 index 00000000..3ea1056a --- /dev/null +++ b/skills/semantic-view-patterns/snippets/materialization/seed_data.sql @@ -0,0 +1,25 @@ +-- Materialization: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO mat_customers VALUES + (1, 'Acme Corp', 'Enterprise'), + (2, 'Globex LLC', 'SMB'), + (3, 'Initech', 'Enterprise'), + (4, 'Umbrella Co', 'SMB'), + (5, 'Soylent Corp', 'Enterprise'); + +INSERT INTO mat_orders VALUES + (101, 1, '2023-01-10', 1200, 'West'), + (102, 1, '2023-03-15', 800, 'West'), + (103, 1, '2024-01-20', 1500, 'West'), + (104, 2, '2023-02-14', 300, 'East'), + (105, 2, '2023-08-22', 450, 'East'), + (106, 3, '2023-04-01', 2100, 'South'), + (107, 3, '2024-02-28', 1800, 'South'), + (108, 4, '2023-06-10', 150, 'North'), + (109, 4, '2023-11-05', 200, 'North'), + (110, 5, '2023-07-07', 3400, 'West'), + (111, 5, '2024-03-12', 2900, 'West'), + (112, 1, '2024-04-18', 950, 'East'); diff --git a/skills/semantic-view-patterns/snippets/materialization/semantic_view.sql b/skills/semantic-view-patterns/snippets/materialization/semantic_view.sql new file mode 100644 index 00000000..b096fa62 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/materialization/semantic_view.sql @@ -0,0 +1,120 @@ +-- Materialization: Semantic View DDL + Materialization Setup + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- STEP 1: Create the SV with MAX_STALENESS +-- MAX_STALENESS enables the materialization feature. +-- Without it, ADD MATERIALIZATION will fail. +-- Minimum allowed: 120 seconds. +-- ============================================================ + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV + + TABLES ( + mat_orders, + mat_customers UNIQUE (customer_id) + ) + + RELATIONSHIPS ( + orders_to_customers AS mat_orders(customer_id) REFERENCES mat_customers + ) + + DIMENSIONS ( + mat_customers.customer_name AS customer_name + WITH SYNONYMS ('customer', 'account name'), + mat_customers.segment AS segment + WITH SYNONYMS ('customer segment', 'tier'), + mat_orders.region AS region + WITH SYNONYMS ('region', 'geo'), + mat_orders.order_date AS order_date + WITH SYNONYMS ('date', 'order date'), + mat_orders.order_year AS YEAR(order_date) + WITH SYNONYMS ('year') + ) + + METRICS ( + -- Additive metrics (SUM, COUNT, MIN, MAX) can be reaggregated + -- from a materialization with MORE dimensions — great for rollup queries + mat_orders.total_revenue AS SUM(amount) + WITH SYNONYMS ('revenue', 'total sales'), + mat_orders.order_count AS COUNT(order_id) + WITH SYNONYMS ('orders', 'number of orders'), + mat_orders.min_order AS MIN(amount) + WITH SYNONYMS ('smallest order'), + mat_orders.max_order AS MAX(amount) + WITH SYNONYMS ('largest order'), + + -- Non-additive metrics (AVG, COUNT DISTINCT) cannot be reaggregated + -- They require the materialization to include ALL the same dimensions as the query + mat_orders.avg_order AS AVG(amount) + WITH SYNONYMS ('average order', 'AOV') + ) + + -- MAX_STALENESS: how stale can materialized data be before falling back to base tables? + -- Minimum: 120 seconds + MAX_STALENESS = '1 hour' + + COMMENT = 'Revenue analysis SV with materialization support. Demonstrates full-refresh vs incremental-refresh materializations and reaggregation of additive metrics.'; + + +-- ============================================================ +-- STEP 2: Grant the materialization privilege (run as ACCOUNTADMIN) +-- ============================================================ + +-- GRANT ADD SEMANTIC VIEW MATERIALIZATION ON SCHEMA SNIPPETS.PUBLIC TO ROLE ; + + +-- ============================================================ +-- STEP 3: Add materializations +-- ============================================================ + +-- Materialization 1: Customer + year revenue rollup +-- No IMMUTABLE WHERE → full refresh each time (expensive for large datasets) +ALTER SEMANTIC VIEW SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV + ADD MATERIALIZATION revenue_by_customer_year + WAREHOUSE = SNOWADHOC + AS + DIMENSIONS mat_customers.customer_name, mat_orders.order_year + METRICS mat_orders.total_revenue, mat_orders.order_count; + + +-- Materialization 2: Historical data (pre-2024) — IMMUTABLE WHERE limits refresh scope +-- Snowflake strongly recommends IMMUTABLE WHERE to reduce refresh cost. +-- Rows where order_date < '2024-01-01' are computed once and not recomputed +-- unless the materialization is explicitly dropped and re-added. +ALTER SEMANTIC VIEW SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV + ADD MATERIALIZATION historical_revenue + WAREHOUSE = SNOWADHOC + IMMUTABLE WHERE (order_date < '2024-01-01') + AS + DIMENSIONS mat_orders.order_date, mat_customers.segment, mat_orders.region + METRICS mat_orders.total_revenue, mat_orders.order_count; + + +-- ============================================================ +-- STEP 4: Operational commands +-- ============================================================ + +-- List all materializations for this SV +-- (shows name, state, stale_by, warehouse, dimensions, metrics, immutable_where) +SHOW MATERIALIZATIONS IN SEMANTIC VIEW SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV; + +-- Manual refresh (uses current session warehouse) +ALTER SEMANTIC VIEW SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV + REFRESH MATERIALIZATION revenue_by_customer_year; + +-- View refresh history +SELECT * FROM TABLE(SNIPPETS.INFORMATION_SCHEMA.SEMANTIC_VIEW_MATERIALIZATION_REFRESH_HISTORY( + NAME => 'revenue_by_customer_year' +)); + +-- Remove a materialization +-- ALTER SEMANTIC VIEW SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV +-- DROP MATERIALIZATION revenue_by_customer_year; + +-- Change MAX_STALENESS (e.g. if refreshes are taking too long and materializations +-- are exceeding the staleness limit and being skipped) +-- ALTER SEMANTIC VIEW SNIPPETS.PUBLIC.REVENUE_ANALYSIS_SV +-- SET MAX_STALENESS = '2 hours'; diff --git a/skills/semantic-view-patterns/snippets/materialization/semantic_view.yaml b/skills/semantic-view-patterns/snippets/materialization/semantic_view.yaml new file mode 100644 index 00000000..e88a34dc --- /dev/null +++ b/skills/semantic-view-patterns/snippets/materialization/semantic_view.yaml @@ -0,0 +1,85 @@ +# Materialization: Semantic View YAML +# +# ⚠️ MATERIALIZATION NOT SUPPORTED IN YAML: +# The MAX_STALENESS option and ALTER SEMANTIC VIEW ... ADD MATERIALIZATION +# are DDL-only features. There is no YAML equivalent. +# +# This YAML defines the base SV structure. After deploying via +# SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML, run the ALTER SEMANTIC VIEW DDL +# commands from semantic_view.sql to add materializations. +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# Then add materializations via DDL: +# ALTER SEMANTIC VIEW TARGET_DB.TARGET_SCHEMA.REVENUE_ANALYSIS_SV SET MAX_STALENESS = '1 hour'; +# ALTER SEMANTIC VIEW TARGET_DB.TARGET_SCHEMA.REVENUE_ANALYSIS_SV +# ADD MATERIALIZATION ... AS DIMENSIONS ... METRICS ...; + +name: REVENUE_ANALYSIS_SV +description: > + Revenue analysis SV with materialization support. NOTE: MAX_STALENESS and + ADD MATERIALIZATION are DDL-only — apply them via semantic_view.sql after + deploying this YAML. + +tables: + - name: mat_orders + description: Order transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: MAT_ORDERS + dimensions: + - name: region + synonyms: [region, geo] + expr: REGION + data_type: VARCHAR + - name: order_date + synonyms: [date, order date] + expr: ORDER_DATE + data_type: DATE + - name: order_year + synonyms: [year] + expr: YEAR(ORDER_DATE) + data_type: NUMBER + metrics: + - name: total_revenue + synonyms: [revenue, total sales] + expr: SUM(AMOUNT) + - name: order_count + synonyms: [orders, number of orders] + expr: COUNT(ORDER_ID) + - name: min_order + synonyms: [smallest order] + expr: MIN(AMOUNT) + - name: max_order + synonyms: [largest order] + expr: MAX(AMOUNT) + - name: avg_order + synonyms: [average order, AOV] + expr: AVG(AMOUNT) + + - name: mat_customers + description: Customer master + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: MAT_CUSTOMERS + primary_key: + columns: [CUSTOMER_ID] + dimensions: + - name: customer_name + synonyms: [customer, account name] + expr: CUSTOMER_NAME + data_type: VARCHAR + - name: segment + synonyms: [customer segment, tier] + expr: SEGMENT + data_type: VARCHAR + +relationships: + - name: orders_to_customers + left_table: mat_orders + right_table: mat_customers + relationship_columns: + - left_column: CUSTOMER_ID + right_column: CUSTOMER_ID diff --git a/skills/semantic-view-patterns/snippets/multi_fact_table/README.md b/skills/semantic-view-patterns/snippets/multi_fact_table/README.md new file mode 100644 index 00000000..0f06e345 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_fact_table/README.md @@ -0,0 +1,67 @@ +# Multi-Fact Table + +## The Problem + +A single business domain has **multiple independent fact tables** that should all be queryable through one semantic view — sharing common dimensions (product, date) but with separate metrics. You also want **cross-fact derived metrics** (e.g. net revenue = store + web − returns). + +## How You Might Express This Need + +- "I have store_sales, web_sales, and returns tables — I want them all in one SV sharing a product and date dimension" +- "Total revenue should include both channels. Net revenue subtracts returns." +- "SHOW DIMENSIONS for store_sales shouldn't require including web_sales columns" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | Multiple CTEs, LEFT JOINs on shared dims; fan-out/funnel query pattern | +| **LookML** | Multiple explores joined to a single shared dimension view | +| **dbt** | `metrics.yml` with multiple models; union or join at reporting layer | +| **Power BI** | Multiple fact tables in a star schema with shared dim tables | +| **Tableau** | Multi-source Relationships or data blending. Each fact connects to shared dimension tables; cross-fact comparisons require blends or custom SQL. | + +## The SV Approach + +Each fact table is declared independently and joined to the **shared dimensions**: +```sql +TABLES ( + dim_product PRIMARY KEY (product_id), + channel_dim_date PRIMARY KEY (date_id), + channel_store_sales, + channel_web_sales, + channel_returns +) +RELATIONSHIPS ( + store_to_date AS channel_store_sales(date_id) REFERENCES channel_dim_date, + store_to_product AS channel_store_sales(product_id) REFERENCES dim_product, + web_to_date AS channel_web_sales(date_id) REFERENCES channel_dim_date, + ... +) +``` + +Cross-fact derived metrics reference both fact entities: +```sql +total_gross_revenue AS channel_store_sales.store_revenue + channel_web_sales.web_revenue +net_revenue AS total_gross_revenue - channel_returns.total_returns +``` + +## Key Behavior + +- Querying only `store_revenue` does **not** join `channel_web_sales` — the engine is selective +- `SHOW SEMANTIC DIMENSIONS FOR METRIC store_revenue` will show only dims reachable via store_sales's relationships +- Cross-fact derived metrics trigger a join/aggregation across all referenced facts when queried +- Each fact can have its own set of fact-specific metrics; they coexist in the same SV + +## Docs + +- [Using SQL commands to create and manage semantic views](https://docs.snowflake.com/en/user-guide/views-semantic/sql) +- [Defining derived metrics (cross-fact totals)](https://docs.snowflake.com/en/user-guide/views-semantic/sql#defining-derived-metrics) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | Product dim, date dim, 3 fact tables | +| `seed_data.sql` | 3 products × 6 months across all 3 facts | +| `semantic_view.sql` | SV with 3 facts sharing 2 dimensions + cross-fact derived metrics | +| `queries.sql` | Channel comparison, net revenue, return rate | diff --git a/skills/semantic-view-patterns/snippets/multi_fact_table/queries.sql b/skills/semantic-view-patterns/snippets/multi_fact_table/queries.sql new file mode 100644 index 00000000..237631d4 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_fact_table/queries.sql @@ -0,0 +1,59 @@ +-- Multi-Fact Table: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Store vs web revenue by category (cross-fact comparison) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.MULTI_CHANNEL_SV + DIMENSIONS dim_product.category + METRICS channel_store_sales.store_revenue, channel_web_sales.web_revenue, total_gross_revenue +) +ORDER BY total_gross_revenue DESC; + + +-- 2. Gross vs net revenue by month (returns subtracted) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.MULTI_CHANNEL_SV + DIMENSIONS channel_dim_date.month + METRICS total_gross_revenue, channel_returns.total_returns, net_revenue +) +ORDER BY month; + + +-- 3. Return rate by product +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.MULTI_CHANNEL_SV + DIMENSIONS dim_product.product_name + METRICS channel_store_sales.store_quantity, channel_returns.return_quantity +) +ORDER BY product_name; + + +-- 4. Brand quarterly performance +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.MULTI_CHANNEL_SV + DIMENSIONS dim_product.brand, channel_dim_date.quarter + METRICS net_revenue, store_share +) +ORDER BY quarter, brand; + + +-- ============================================================ +-- MULTI-FACT KEY CONCEPTS +-- ============================================================ + +-- Each fact table is independent — a query requesting only store_revenue +-- will only involve channel_store_sales in the generated SQL. The web and +-- returns tables are NOT joined unless their metrics or dimensions are requested. + +-- Cross-fact derived metrics (total_gross_revenue, net_revenue) will trigger +-- a join/union across the relevant fact tables when queried. + +-- Shared dimensions (dim_product, channel_dim_date) work as a fan-out: +-- the SV resolves aggregates per-fact then joins them on the shared dim keys. +-- This is semantically equivalent to a fanout query pattern in standard SQL. diff --git a/skills/semantic-view-patterns/snippets/multi_fact_table/schema.sql b/skills/semantic-view-patterns/snippets/multi_fact_table/schema.sql new file mode 100644 index 00000000..5676e4c4 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_fact_table/schema.sql @@ -0,0 +1,47 @@ +-- Multi-Fact Table: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE dim_product ( + product_id INTEGER NOT NULL, + product_name VARCHAR(50) NOT NULL, + category VARCHAR(30) NOT NULL, + brand VARCHAR(30) NOT NULL, + CONSTRAINT pk_dim_product PRIMARY KEY (product_id) +); + +CREATE OR REPLACE TABLE channel_dim_date ( + date_id INTEGER NOT NULL, + full_date DATE NOT NULL, + year INTEGER NOT NULL, + quarter INTEGER NOT NULL, + month INTEGER NOT NULL, + CONSTRAINT pk_channel_dim_date PRIMARY KEY (date_id) +); + +CREATE OR REPLACE TABLE channel_store_sales ( + sale_id INTEGER NOT NULL, + date_id INTEGER NOT NULL, + product_id INTEGER NOT NULL, + revenue NUMBER(10,2) NOT NULL, + quantity INTEGER NOT NULL +); + +CREATE OR REPLACE TABLE channel_web_sales ( + sale_id INTEGER NOT NULL, + date_id INTEGER NOT NULL, + product_id INTEGER NOT NULL, + revenue NUMBER(10,2) NOT NULL, + quantity INTEGER NOT NULL +); + +CREATE OR REPLACE TABLE channel_returns ( + return_id INTEGER NOT NULL, + date_id INTEGER NOT NULL, + product_id INTEGER NOT NULL, + amount NUMBER(10,2) NOT NULL, + quantity INTEGER NOT NULL +); diff --git a/skills/semantic-view-patterns/snippets/multi_fact_table/seed_data.sql b/skills/semantic-view-patterns/snippets/multi_fact_table/seed_data.sql new file mode 100644 index 00000000..b4ff645d --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_fact_table/seed_data.sql @@ -0,0 +1,39 @@ +-- Multi-Fact Table: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO dim_product VALUES + (1, 'Laptop Pro', 'Electronics', 'TechBrand'), + (2, 'Desk Chair', 'Furniture', 'OfficeCo'), + (3, 'Monitor 4K', 'Electronics', 'TechBrand'); + +INSERT INTO channel_dim_date VALUES + (1, '2024-01-01', 2024, 1, 1), + (2, '2024-02-01', 2024, 1, 2), + (3, '2024-03-01', 2024, 1, 3), + (4, '2024-04-01', 2024, 2, 4), + (5, '2024-05-01', 2024, 2, 5), + (6, '2024-06-01', 2024, 2, 6); + +INSERT INTO channel_store_sales VALUES + (1, 1, 1, 2400, 2), (2, 1, 2, 750, 5), (3, 1, 3, 2400, 4), + (4, 2, 1, 1200, 1), (5, 2, 2, 300, 2), (6, 2, 3, 1800, 3), + (7, 3, 1, 3600, 3), (8, 3, 2, 1500, 10), (9, 3, 3, 3000, 5), + (10, 4, 1, 1200, 1), (11, 4, 2, 450, 3), (12, 4, 3, 1200, 2), + (13, 5, 1, 2400, 2), (14, 5, 2, 900, 6), (15, 5, 3, 2400, 4), + (16, 6, 1, 3600, 3), (17, 6, 2, 600, 4), (18, 6, 3, 3600, 6); + +INSERT INTO channel_web_sales VALUES + (1, 1, 1, 1200, 1), (2, 1, 3, 600, 1), + (3, 2, 1, 2400, 2), (4, 2, 3, 1200, 2), + (5, 3, 1, 1200, 1), (6, 3, 2, 150, 1), (7, 3, 3, 1800, 3), + (8, 4, 1, 2400, 2), (9, 4, 3, 2400, 4), + (10, 5, 2, 300, 2), (11, 5, 3, 3000, 5), + (12, 6, 1, 3600, 3), (13, 6, 2, 450, 3), (14, 6, 3, 3600, 6); + +INSERT INTO channel_returns VALUES + (1, 2, 1, 1200, 1), + (2, 3, 2, 150, 1), + (3, 5, 3, 600, 1), + (4, 6, 1, 1200, 1); diff --git a/skills/semantic-view-patterns/snippets/multi_fact_table/semantic_view.sql b/skills/semantic-view-patterns/snippets/multi_fact_table/semantic_view.sql new file mode 100644 index 00000000..1a011231 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_fact_table/semantic_view.sql @@ -0,0 +1,70 @@ +-- Multi-Fact Table: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.MULTI_CHANNEL_SV + + TABLES ( + dim_product PRIMARY KEY (product_id), + channel_dim_date PRIMARY KEY (date_id), + channel_store_sales, + channel_web_sales, + channel_returns + ) + + RELATIONSHIPS ( + -- Store sales joins to both shared dimensions + store_to_date AS channel_store_sales(date_id) REFERENCES channel_dim_date, + store_to_product AS channel_store_sales(product_id) REFERENCES dim_product, + + -- Web sales joins to both shared dimensions + web_to_date AS channel_web_sales(date_id) REFERENCES channel_dim_date, + web_to_product AS channel_web_sales(product_id) REFERENCES dim_product, + + -- Returns joins to both shared dimensions + returns_to_date AS channel_returns(date_id) REFERENCES channel_dim_date, + returns_to_product AS channel_returns(product_id) REFERENCES dim_product + ) + + DIMENSIONS ( + dim_product.category AS category WITH SYNONYMS ('category', 'product category'), + dim_product.brand AS brand WITH SYNONYMS ('brand'), + dim_product.product_name AS product_name WITH SYNONYMS ('product'), + channel_dim_date.year AS year WITH SYNONYMS ('year'), + channel_dim_date.quarter AS quarter WITH SYNONYMS ('quarter', 'qtr'), + channel_dim_date.month AS month WITH SYNONYMS ('month') + ) + + METRICS ( + channel_store_sales.store_revenue AS SUM(revenue) + WITH SYNONYMS ('store sales', 'store revenue'), + channel_store_sales.store_quantity AS SUM(quantity) + WITH SYNONYMS ('store units', 'units sold in store'), + + channel_web_sales.web_revenue AS SUM(revenue) + WITH SYNONYMS ('web sales', 'online revenue'), + channel_web_sales.web_quantity AS SUM(quantity) + WITH SYNONYMS ('web units', 'units sold online'), + + channel_returns.total_returns AS SUM(amount) + WITH SYNONYMS ('returns', 'total returned amount'), + channel_returns.return_quantity AS SUM(quantity) + WITH SYNONYMS ('returned units', 'return volume'), + + -- Cross-fact derived metric: combines revenue from both channel fact tables + total_gross_revenue AS channel_store_sales.store_revenue + channel_web_sales.web_revenue + WITH SYNONYMS ('gross revenue', 'combined revenue', 'all channels revenue'), + + -- Net revenue = gross minus returns + net_revenue AS total_gross_revenue - channel_returns.total_returns + WITH SYNONYMS ('net revenue', 'revenue after returns'), + + -- Store share of total channel revenue + store_share AS channel_store_sales.store_revenue / total_gross_revenue + WITH SYNONYMS ('store contribution', 'store share') + ) + + COMMENT = 'Multi-fact table SV: store sales, web sales, and returns as three independent fact tables sharing a product and date dimension. Demonstrates cross-fact derived metrics (total_gross_revenue, net_revenue).' + + AI_SQL_GENERATION 'This SV has three fact tables: channel_store_sales, channel_web_sales, and channel_returns. Use total_gross_revenue for combined channel sales. Use net_revenue for returns-adjusted revenue. Use store_share for channel mix. All metrics can be broken down by dim_product and channel_dim_date dimensions.'; diff --git a/skills/semantic-view-patterns/snippets/multi_fact_table/semantic_view.yaml b/skills/semantic-view-patterns/snippets/multi_fact_table/semantic_view.yaml new file mode 100644 index 00000000..24a930d8 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_fact_table/semantic_view.yaml @@ -0,0 +1,154 @@ +# Multi-Fact Table: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# DDL-only features: AI_SQL_GENERATION + +name: MULTI_CHANNEL_SV +description: > + Multi-fact table SV: store sales, web sales, and returns as three independent + fact tables sharing a product and date dimension. Demonstrates cross-fact + derived metrics (total_gross_revenue, net_revenue). + +tables: + - name: dim_product + description: Product catalog + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DIM_PRODUCT + primary_key: + columns: [PRODUCT_ID] + dimensions: + - name: category + synonyms: [category, product category] + expr: CATEGORY + data_type: VARCHAR + - name: brand + synonyms: [brand] + expr: BRAND + data_type: VARCHAR + - name: product_name + synonyms: [product] + expr: PRODUCT_NAME + data_type: VARCHAR + + - name: channel_dim_date + description: Date dimension + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CHANNEL_DIM_DATE + primary_key: + columns: [DATE_ID] + dimensions: + - name: year + synonyms: [year] + expr: YEAR + data_type: NUMBER + - name: quarter + synonyms: [quarter, qtr] + expr: QUARTER + data_type: VARCHAR + - name: month + synonyms: [month] + expr: MONTH + data_type: NUMBER + + - name: channel_store_sales + description: Store sales transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CHANNEL_STORE_SALES + metrics: + - name: store_revenue + synonyms: [store sales, store revenue] + expr: SUM(REVENUE) + - name: store_quantity + synonyms: [store units, units sold in store] + expr: SUM(QUANTITY) + + - name: channel_web_sales + description: Web / online sales transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CHANNEL_WEB_SALES + metrics: + - name: web_revenue + synonyms: [web sales, online revenue] + expr: SUM(REVENUE) + - name: web_quantity + synonyms: [web units, units sold online] + expr: SUM(QUANTITY) + + - name: channel_returns + description: Product returns + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CHANNEL_RETURNS + metrics: + - name: total_returns + synonyms: [returns, total returned amount] + expr: SUM(AMOUNT) + - name: return_quantity + synonyms: [returned units, return volume] + expr: SUM(QUANTITY) + +relationships: + - name: store_to_date + left_table: channel_store_sales + right_table: channel_dim_date + relationship_columns: + - left_column: DATE_ID + right_column: DATE_ID + - name: store_to_product + left_table: channel_store_sales + right_table: dim_product + relationship_columns: + - left_column: PRODUCT_ID + right_column: PRODUCT_ID + - name: web_to_date + left_table: channel_web_sales + right_table: channel_dim_date + relationship_columns: + - left_column: DATE_ID + right_column: DATE_ID + - name: web_to_product + left_table: channel_web_sales + right_table: dim_product + relationship_columns: + - left_column: PRODUCT_ID + right_column: PRODUCT_ID + - name: returns_to_date + left_table: channel_returns + right_table: channel_dim_date + relationship_columns: + - left_column: DATE_ID + right_column: DATE_ID + - name: returns_to_product + left_table: channel_returns + right_table: dim_product + relationship_columns: + - left_column: PRODUCT_ID + right_column: PRODUCT_ID + +metrics: + - name: total_gross_revenue + synonyms: [gross revenue, combined revenue, all channels revenue] + description: Combined revenue from store and web channels + expr: channel_store_sales.store_revenue + channel_web_sales.web_revenue + - name: net_revenue + synonyms: [net revenue, revenue after returns] + description: Gross revenue minus returns + expr: total_gross_revenue - channel_returns.total_returns + - name: store_share + synonyms: [store contribution, store share] + description: Store revenue as a fraction of total gross revenue + expr: channel_store_sales.store_revenue / total_gross_revenue diff --git a/skills/semantic-view-patterns/snippets/multi_path_metrics/README.md b/skills/semantic-view-patterns/snippets/multi_path_metrics/README.md new file mode 100644 index 00000000..19b1f494 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_path_metrics/README.md @@ -0,0 +1,68 @@ +# Multi-Path Metrics (USING Clause) + +## The Problem + +A fact table has **two foreign keys that both point to the same dimension**. You want separate metrics for each path — for example, a flight has both a `departure_city` and an `arrival_city`, and you want to look up the weather conditions at each city at the time of departure/arrival. + +Without disambiguation, the SV engine would raise: `Multi-path relationship between dimension entity 'X' and base metric entity 'Y'`. + +## How You Might Express This Need + +- "Flights have both a departure and arrival airport. I want to break down delays by departure weather AND by arrival weather separately." +- "My orders table has both a ship-to address and a bill-to address, both joining the same address dim. How do I use both?" +- "I want a metric for sales by origin region AND a metric for sales by destination region." + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | Two separate JOINs with aliases: `JOIN weather AS dep_weather ON ... JOIN weather AS arr_weather ON ...` | +| **LookML** | `view: departure_weather { extends: [weather_base] }` — role-playing views | +| **dbt** | Two separate `ref()` models or CTEs aliasing the same source | +| **Power BI** | Multiple relationships to the same table; mark one as inactive; use USERELATIONSHIP() | +| **Tableau** | Duplicate data source connections with aliases as role-playing workaround, or custom SQL self-joins. No native multi-path disambiguation. | + +## The SV Approach + +Two mechanisms work together: + +**1. Two range relationships** to the same physical table (weather): +```sql +flight_departure_weather AS flights(departure_city, departure_time) + REFERENCES weather(city_code, BETWEEN start_date AND end_date EXCLUSIVE), +flight_arrival_weather AS flights(arrival_city, arrival_time) + REFERENCES weather(city_code, BETWEEN start_date AND end_date EXCLUSIVE) +``` + +**2. USING clause** on each metric to specify which path to follow: +```sql +flights.m_late_departure_count AS COUNT_IF(is_late) + USING (flight_departure_weather) + WITH SYNONYMS ('late flights by departure weather'), + +flights.m_late_arrival_count AS COUNT_IF(is_late) + USING (flight_arrival_weather) + WITH SYNONYMS ('late flights by arrival weather') +``` + +The `USING` clause tells the engine: "when resolving `weather.weather_condition` for this metric, take the `flight_departure_weather` path." + +## Key Rules + +- `USING` specifies a **path prefix** from the metric entity to a disambiguating entity +- Without `USING`, querying `weather.weather_condition` with a metric that has two paths to `weather` will error +- Metrics without `USING` cannot be broken down by the ambiguous dimension at all +- Each metric can have a different `USING` path — enabling side-by-side comparison in one query + +## Docs + +- [Specifying the relationship for a metric when multiple relationship paths exist](https://docs.snowflake.com/en/user-guide/views-semantic/sql#specifying-the-relationship-for-a-metric-when-multiple-relationship-paths-exist) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `flights` and `weather` table DDL | +| `seed_data.sql` | 4 flights, 5 weather records spanning departure/arrival windows | +| `semantic_view.sql` | SV with two range relationships + USING on each metric | +| `queries.sql` | Departure vs arrival weather breakdown + the error without USING | diff --git a/skills/semantic-view-patterns/snippets/multi_path_metrics/queries.sql b/skills/semantic-view-patterns/snippets/multi_path_metrics/queries.sql new file mode 100644 index 00000000..448ff02c --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_path_metrics/queries.sql @@ -0,0 +1,58 @@ +-- Multi-Path Metrics: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Late flights by DEPARTURE weather condition +-- (uses departure_flight_count/late_departure_count → flight_departure_weather path) +-- +-- Expected: sunny=2 total (2 late), rainy=2 total (0 late) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.FLIGHT_WEATHER_SV + DIMENSIONS weather.weather_condition + METRICS flights.departure_flight_count, flights.late_departure_count +); + + +-- 2. Late flights by ARRIVAL weather condition +-- (uses arrival_flight_count/late_arrival_count → flight_arrival_weather path) +-- +-- Expected: rainy=1 (1 late), sunny=1 (0 late), cloudy=2 (1 late) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.FLIGHT_WEATHER_SV + DIMENSIONS weather.weather_condition + METRICS flights.arrival_flight_count, flights.late_arrival_count +); + + +-- 3. Total flights (no weather dim — no disambiguation needed) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.FLIGHT_WEATHER_SV + METRICS flights.total_flights +); + + +-- ============================================================ +-- WHAT DOESN'T WORK +-- ============================================================ + +-- ERROR: Combining a non-USING metric with the ambiguous weather dimension +-- The engine can't determine which of the two paths to use for total_flights +-- when grouped by weather.weather_condition. +-- +-- SELECT * FROM SEMANTIC_VIEW( +-- SNIPPETS.PUBLIC.FLIGHT_WEATHER_SV +-- DIMENSIONS weather.weather_condition +-- METRICS flights.total_flights -- no USING — ambiguous path to weather +-- ); +-- Error: Multi-path relationship between dimension entity 'WEATHER' +-- and base metric entity 'FLIGHTS' + +-- NOTE: You CANNOT mix a departure-USING metric with an arrival-USING metric +-- in the same query and group by weather.weather_condition — the dimension +-- would resolve differently for each metric, which is undefined behavior. +-- Instead, run two separate queries or use subqueries. diff --git a/skills/semantic-view-patterns/snippets/multi_path_metrics/schema.sql b/skills/semantic-view-patterns/snippets/multi_path_metrics/schema.sql new file mode 100644 index 00000000..c6f10a78 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_path_metrics/schema.sql @@ -0,0 +1,22 @@ +-- Multi-Path Metrics: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE flights ( + flight_id INTEGER NOT NULL, + departure_city VARCHAR(10) NOT NULL, + arrival_city VARCHAR(10) NOT NULL, + is_late BOOLEAN NOT NULL, + departure_time TIMESTAMP_NTZ NOT NULL, + arrival_time TIMESTAMP_NTZ NOT NULL +); + +CREATE OR REPLACE TABLE weather ( + city_code VARCHAR(10) NOT NULL, + weather_condition VARCHAR(20) NOT NULL, + start_date TIMESTAMP_NTZ NOT NULL, + end_date TIMESTAMP_NTZ NOT NULL +); diff --git a/skills/semantic-view-patterns/snippets/multi_path_metrics/seed_data.sql b/skills/semantic-view-patterns/snippets/multi_path_metrics/seed_data.sql new file mode 100644 index 00000000..33037c1c --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_path_metrics/seed_data.sql @@ -0,0 +1,27 @@ +-- Multi-Path Metrics: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO flights VALUES + (1, 'SFO', 'SEA', true, '2025-01-03 06:00:00', '2025-01-03 11:00:00'), + (2, 'SEA', 'SFO', false, '2025-01-03 11:00:00', '2025-01-03 16:00:00'), + (3, 'SEA', 'PVG', false, '2025-01-03 11:00:00', '2025-01-04 11:00:00'), + (4, 'SFO', 'PVG', true, '2025-01-03 06:00:00', '2025-01-04 11:00:00'); + +INSERT INTO weather VALUES + ('SEA', 'rainy', '2025-01-03 10:00:00', '2025-01-03 12:00:00'), + ('SFO', 'sunny', '2025-01-03 05:00:00', '2025-01-03 09:00:00'), + ('SFO', 'sunny', '2025-01-03 10:00:00', '2025-01-03 18:00:00'), + ('PVG', 'cloudy', '2025-01-04 10:00:00', '2025-01-04 12:00:00'); + +-- Expected: departure weather conditions +-- Flight 1 (SFO dep 06:00) → SFO sunny (05:00-09:00) → late=true +-- Flight 2 (SEA dep 11:00) → SEA rainy (10:00-12:00) → late=false +-- Flight 3 (SEA dep 11:00) → SEA rainy (10:00-12:00) → late=false +-- Flight 4 (SFO dep 06:00) → SFO sunny (05:00-09:00) → late=true +-- Expected: arrival weather conditions +-- Flight 1 (SEA arr 11:00) → SEA rainy → late=true +-- Flight 2 (SFO arr 16:00) → SFO sunny → late=false +-- Flight 3 (PVG arr Jan04 11:00) → PVG cloudy → late=false +-- Flight 4 (PVG arr Jan04 11:00) → PVG cloudy → late=true diff --git a/skills/semantic-view-patterns/snippets/multi_path_metrics/semantic_view.sql b/skills/semantic-view-patterns/snippets/multi_path_metrics/semantic_view.sql new file mode 100644 index 00000000..a9fb8d43 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_path_metrics/semantic_view.sql @@ -0,0 +1,55 @@ +-- Multi-Path Metrics: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.FLIGHT_WEATHER_SV + + TABLES ( + flights PRIMARY KEY (flight_id), + weather PRIMARY KEY (city_code, start_date, end_date) + UNIQUE (city_code, start_date, end_date) + CONSTRAINT weather_range DISTINCT RANGE BETWEEN start_date AND end_date EXCLUSIVE + ) + + RELATIONSHIPS ( + -- Two paths from flights to weather — one per city role + flight_departure_weather AS flights(departure_city, departure_time) + REFERENCES weather(city_code, BETWEEN start_date AND end_date EXCLUSIVE), + flight_arrival_weather AS flights(arrival_city, arrival_time) + REFERENCES weather(city_code, BETWEEN start_date AND end_date EXCLUSIVE) + ) + + DIMENSIONS ( + -- A single physical column (weather_condition) reached via two paths. + -- Queries using this dimension must pair it with a USING-scoped metric. + weather.weather_condition AS weather_condition + WITH SYNONYMS ('weather', 'conditions', 'sky condition') + ) + + METRICS ( + -- Total flights — no disambiguation needed (doesn't use weather dim) + flights.total_flights AS COUNT(flight_id) + WITH SYNONYMS ('number of flights', 'flight count'), + + -- Late flights broken down by DEPARTURE weather + -- USING comes before AS: entity.metric USING (relationship) AS expression + flights.late_departure_count USING (flight_departure_weather) AS COUNT_IF(is_late) + WITH SYNONYMS ('late departures', 'delayed departures', 'flights late at departure'), + + -- Late flights broken down by ARRIVAL weather + flights.late_arrival_count USING (flight_arrival_weather) AS COUNT_IF(is_late) + WITH SYNONYMS ('late arrivals', 'delayed arrivals', 'flights late at arrival'), + + -- All flights broken down by departure weather + flights.departure_flight_count USING (flight_departure_weather) AS COUNT(flight_id) + WITH SYNONYMS ('flights by departure weather'), + + -- All flights broken down by arrival weather + flights.arrival_flight_count USING (flight_arrival_weather) AS COUNT(flight_id) + WITH SYNONYMS ('flights by arrival weather') + ) + + COMMENT = 'Flight delays analyzed by weather at departure city and weather at arrival city. Uses USING clause to disambiguate two range relationships to the same weather table.' + + AI_SQL_GENERATION 'This SV has two relationships to the weather table: departure weather and arrival weather. Use USING-scoped metrics: late_departure_count/departure_flight_count for departure weather analysis; late_arrival_count/arrival_flight_count for arrival weather analysis. You can combine both in one query to compare departure vs arrival conditions side by side.'; diff --git a/skills/semantic-view-patterns/snippets/multi_path_metrics/semantic_view.yaml b/skills/semantic-view-patterns/snippets/multi_path_metrics/semantic_view.yaml new file mode 100644 index 00000000..989314ae --- /dev/null +++ b/skills/semantic-view-patterns/snippets/multi_path_metrics/semantic_view.yaml @@ -0,0 +1,86 @@ +# Multi-Path Metrics (USING): Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# DDL-only features: AI_SQL_GENERATION +# YAML USING equivalent: using_relationships list on each metric + +name: FLIGHT_WEATHER_SV +description: > + Flight delays analyzed by weather at departure city and weather at arrival city. + Uses using_relationships to disambiguate two range relationships to the same + weather table (YAML equivalent of DDL's USING clause). + +tables: + - name: flights + description: Flight records with departure and arrival cities and times + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: FLIGHTS + primary_key: + columns: [FLIGHT_ID] + metrics: + - name: total_flights + synonyms: [number of flights, flight count] + expr: COUNT(FLIGHT_ID) + - name: late_departure_count + synonyms: [late departures, delayed departures, flights late at departure] + description: Late flights broken down by departure weather + expr: COUNT_IF(IS_LATE) + using_relationships: + - flight_departure_weather + - name: late_arrival_count + synonyms: [late arrivals, delayed arrivals, flights late at arrival] + description: Late flights broken down by arrival weather + expr: COUNT_IF(IS_LATE) + using_relationships: + - flight_arrival_weather + - name: departure_flight_count + synonyms: [flights by departure weather] + description: All flights broken down by departure weather + expr: COUNT(FLIGHT_ID) + using_relationships: + - flight_departure_weather + - name: arrival_flight_count + synonyms: [flights by arrival weather] + description: All flights broken down by arrival weather + expr: COUNT(FLIGHT_ID) + using_relationships: + - flight_arrival_weather + + - name: weather + description: Weather conditions by city and date range + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: WEATHER + primary_key: + columns: [CITY_CODE, START_DATE, END_DATE] + dimensions: + - name: weather_condition + synonyms: [weather, conditions, sky condition] + expr: WEATHER_CONDITION + data_type: VARCHAR + +# Two paths from flights to weather — one per city role. +# NOTE: range relationship syntax (BETWEEN EXCLUSIVE) not supported in YAML. +# These are standard column-equality relationships as placeholders. +# For full range join, use semantic_view.sql. +relationships: + - name: flight_departure_weather + left_table: flights + right_table: weather + relationship_columns: + - left_column: DEPARTURE_CITY + right_column: CITY_CODE + - name: flight_arrival_weather + left_table: flights + right_table: weather + relationship_columns: + - left_column: ARRIVAL_CITY + right_column: CITY_CODE diff --git a/skills/semantic-view-patterns/snippets/range_join/README.md b/skills/semantic-view-patterns/snippets/range_join/README.md new file mode 100644 index 00000000..bdebbd37 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/range_join/README.md @@ -0,0 +1,99 @@ +# Range Join (Temporal / SCD2 Join) + +## The Problem + +You have a fact table and a dimension table where the dimension data changes over time. When you join them, you need the version of the dimension that was **active at the time of the fact event** — not the current version. + +**Example in this snippet**: A customer's subscription tier (Free → Growth → Enterprise) changes over time. When reporting revenue by tier, each order should be attributed to the tier the customer was on *at the time of purchase*. + +## How You Might Express This Need + +- "Show me revenue broken down by the subscription tier the customer was on at time of purchase" +- "What plan was each user on when they churned?" +- "Join each event to the pricing tier that was in effect at that time" +- "I want to use our SCD2 customer dimension — but the segment should reflect what it was historically, not what it is today" +- "My dimension table has `valid_from` / `valid_to` columns. How do I use those in the semantic layer?" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **dbt** | `dbt snapshot` creates SCD2 history. Temporal join is done manually in SQL: `JOIN dim ON id = id AND event_ts BETWEEN dbt_valid_from AND dbt_valid_to` | +| **LookML** | No native SCD2 support. Typically denormalized at ETL time or handled in a Liquid-templated derived table. | +| **Power BI / DAX** | No native temporal join. Requires pre-built snapshot tables or complex CALCULATE + FILTER patterns. | +| **SSAS / Tabular** | No native temporal join. Denormalization is standard. | +| **Raw SQL** | `JOIN dim ON fact.id = dim.id AND fact.event_date BETWEEN dim.valid_from AND dim.valid_to` | +| **Tableau** | Custom SQL data source with date-range JOIN, or pre-joined SCD2 snapshot at extract time. No native temporal join. | + +Snowflake Semantic Views handle this natively with a **range relationship** — no denormalization or ETL-time join needed. + +## The SV Approach + +Three things are required: + +**1. Declare the time range on the dimension table** (`UNIQUE` + `CONSTRAINT DISTINCT RANGE`): +```sql +customer_segments AS DB.SCHEMA.CUSTOMER_SEGMENTS + PRIMARY KEY (SEGMENT_ID) + UNIQUE (CUSTOMER_ID, VALID_FROM, VALID_TO) + CONSTRAINT segment_period DISTINCT RANGE BETWEEN VALID_FROM AND VALID_TO EXCLUSIVE +``` + +**2. Define the compound relationship** (entity key + temporal column → entity key + range): +```sql +orders_to_segment AS orders(CUSTOMER_ID, ORDER_DATE) + REFERENCES customer_segments(CUSTOMER_ID, BETWEEN VALID_FROM AND VALID_TO EXCLUSIVE) +``` + +**3. Use dimensions from `customer_segments`** in your queries — they'll automatically resolve to the historically-correct record. + +### EXCLUSIVE vs INCLUSIVE End Dates + +This snippet uses **EXCLUSIVE** end dates: `valid_to` is the first day the record is *no longer* active. + +| Customer | Segment | `valid_from` | `valid_to` (exclusive) | Active through | +|----------|---------|-------------|----------------------|----------------| +| C001 | Free | 2024-01-01 | 2024-04-01 | March 31 | +| C001 | Growth | 2024-04-01 | 2024-07-01 | June 30 | +| C001 | Enterprise | 2024-07-01 | 9999-12-31 | current | + +An order on `2024-03-31` falls in `[2024-01-01, 2024-04-01)` → Free. ✓ +An order on `2024-04-01` falls in `[2024-04-01, 2024-07-01)` → Growth. ✓ + +> If your data uses **inclusive** end dates (`valid_to = 2024-03-31`), either convert them at load time or create a view with `valid_to + 1 day` before referencing in the SV. + +### Type Compatibility + +The fact's temporal FK column must be type-coercible to the dimension's range columns. If your order date is `DATE` and your segment dates are `TIMESTAMP_NTZ`, add a `PRIVATE` FACT to cast: + +```sql +FACTS ( + PRIVATE orders.order_ts AS ORDER_DATE::TIMESTAMP_NTZ +) +RELATIONSHIPS ( + orders_to_segment AS orders(CUSTOMER_ID, order_ts) + REFERENCES customer_segments(CUSTOMER_ID, BETWEEN VALID_FROM AND VALID_TO EXCLUSIVE) +) +``` + +## Entity Isolation (Key Gotcha) + +The SV engine enforces **entity isolation across range join boundaries**. You cannot use a dimension from the range-joined entity (`customer_segments`) with a metric defined on a *different* entity that is only connected through that range join. + +If you add a second fact table (e.g., `support_tickets`) that is NOT directly related to `customer_segments`, you cannot query `support_tickets` metrics broken down by `customer_segments.segment`. The dimension and metric must share a direct join path. + +**Fix**: Add the dimension you need directly to the metric's entity table (if the physical column exists there), or establish a direct relationship from the second fact to the dimension. + +## Docs + +- [Joining logical tables that contain ranges of values](https://docs.snowflake.com/en/user-guide/views-semantic/sql#joining-logical-tables-that-contain-ranges-of-values) +- [CREATE SEMANTIC VIEW — CONSTRAINT / BETWEEN syntax](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#label-create-semantic-view-tables-constraint) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `ORDERS` and `CUSTOMER_SEGMENTS` table DDL | +| `seed_data.sql` | 3 customers × segment history + 8 orders across different tier periods | +| `semantic_view.sql` | SV with range relationship between orders and segment history | +| `queries.sql` | Working queries + the naive SQL mistake this pattern prevents | diff --git a/skills/semantic-view-patterns/snippets/range_join/queries.sql b/skills/semantic-view-patterns/snippets/range_join/queries.sql new file mode 100644 index 00000000..e0901fd8 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/range_join/queries.sql @@ -0,0 +1,123 @@ +-- Range Join Example: Queries +-- Run schema.sql, seed_data.sql, and semantic_view.sql first. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Revenue by subscription tier (historically resolved) +-- Each order is matched to the tier the customer was on at time of purchase. +-- +-- Expected: +-- Enterprise $998.00 +-- Growth $298.00 +-- Free $196.00 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_BY_SEGMENT + DIMENSIONS customer_segments.segment + METRICS orders.total_revenue +) +ORDER BY total_revenue DESC; + + +-- 2. Order count by tier +-- +-- Expected: +-- Free 4 +-- Enterprise 2 +-- Growth 2 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_BY_SEGMENT + DIMENSIONS customer_segments.segment + METRICS orders.order_count +) +ORDER BY order_count DESC; + + +-- 3. Revenue by customer and tier (shows the historical transitions clearly) +-- +-- Expected: +-- C001 Free $49 +-- C001 Growth $149 +-- C001 Enterprise $499 +-- C002 Growth $149 +-- C002 Enterprise $499 +-- C003 Free $147 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_BY_SEGMENT + DIMENSIONS orders.customer_id, customer_segments.segment + METRICS orders.total_revenue +) +ORDER BY customer_id, total_revenue; + + +-- 4. Revenue for Enterprise tier only +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_BY_SEGMENT + DIMENSIONS customer_segments.segment + METRICS orders.total_revenue + WHERE customer_segments.segment = 'Enterprise' +); + + +-- ============================================================ +-- THE MISTAKE THIS PATTERN PREVENTS +-- ============================================================ + +-- WRONG: Naive SQL join without temporal constraint +-- This joins ALL segment records for a customer to ALL their orders, +-- causing each order to appear once per segment history record (fan-out). +-- +-- Result: C001 has 3 segment records → each of C001's 3 orders appears 3 times. +-- Total "revenue" = $1,492 × 3 for C001 = massively overcounted. +-- +-- (Run this to see the incorrect output) +SELECT + o.customer_id, + cs.segment, + SUM(o.order_amount) AS wrong_revenue +FROM SNIPPETS.PUBLIC.ORDERS o +JOIN SNIPPETS.PUBLIC.CUSTOMER_SEGMENTS cs ON o.customer_id = cs.customer_id +GROUP BY 1, 2 +ORDER BY 1, 3 DESC; +-- C001 shows revenue in ALL THREE tiers, even for orders placed before they upgraded. + + +-- ALSO WRONG: Current-only join (loses history) +-- Joining only on is_current = true / max(valid_from) assigns today's tier to all orders. +-- C001's January order (correctly "Free") gets credited to "Enterprise" (their current tier). +SELECT + o.customer_id, + cs.segment AS current_segment, + SUM(o.order_amount) AS wrong_revenue +FROM SNIPPETS.PUBLIC.ORDERS o +JOIN SNIPPETS.PUBLIC.CUSTOMER_SEGMENTS cs + ON o.customer_id = cs.customer_id + AND cs.valid_to = '9999-12-31' -- "current record only" +GROUP BY 1, 2 +ORDER BY 1; +-- All of C001's revenue ($697) attributed to Enterprise, none to Free or Growth. + + +-- ============================================================ +-- WHAT DOESN'T WORK IN SEMANTIC_VIEW() +-- ============================================================ + +-- ERROR: Querying customer_segments dimensions without the orders metrics +-- (The range-joined entity doesn't have its own metrics defined) +-- +-- SELECT * FROM SEMANTIC_VIEW( +-- SNIPPETS.PUBLIC.ORDERS_BY_SEGMENT +-- DIMENSIONS customer_segments.segment +-- METRICS customer_segments.some_metric -- no metrics defined on this entity +-- ); + +-- NOTE: If you add a second fact table (e.g., support_tickets) to this SV that is +-- NOT directly related to customer_segments, you cannot break down support_tickets +-- metrics by customer_segments.segment. The segment dimension is only reachable +-- from entities that join to customer_segments directly. +-- Error you'd see: "The dimension entity 'CUSTOMER_SEGMENTS' must be related to +-- the base metric entity 'SUPPORT_TICKETS'" diff --git a/skills/semantic-view-patterns/snippets/range_join/schema.sql b/skills/semantic-view-patterns/snippets/range_join/schema.sql new file mode 100644 index 00000000..5935fa17 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/range_join/schema.sql @@ -0,0 +1,30 @@ +-- Range Join Example: Schema +-- Target: SNIPPETS.PUBLIC (replace with your database/schema) +-- +-- Scenario: E-commerce orders joined to the customer subscription tier +-- that was active at the time of each order. + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- Customer subscription tier history (SCD Type 2) +-- valid_to uses EXCLUSIVE semantics: the tier is active through (valid_to - 1 day) +-- Current records use the sentinel value 9999-12-31 for valid_to +CREATE OR REPLACE TABLE CUSTOMER_SEGMENTS ( + SEGMENT_ID INTEGER NOT NULL, -- surrogate key + CUSTOMER_ID VARCHAR(10) NOT NULL, + SEGMENT VARCHAR(20) NOT NULL, -- Free | Growth | Enterprise + VALID_FROM DATE NOT NULL, + VALID_TO DATE NOT NULL -- exclusive end date; 9999-12-31 = current +); + +-- Orders fact table +CREATE OR REPLACE TABLE ORDERS ( + ORDER_ID INTEGER NOT NULL, + CUSTOMER_ID VARCHAR(10) NOT NULL, + ORDER_DATE DATE NOT NULL, + ORDER_AMOUNT NUMBER(10,2) NOT NULL +); diff --git a/skills/semantic-view-patterns/snippets/range_join/seed_data.sql b/skills/semantic-view-patterns/snippets/range_join/seed_data.sql new file mode 100644 index 00000000..1361edfa --- /dev/null +++ b/skills/semantic-view-patterns/snippets/range_join/seed_data.sql @@ -0,0 +1,45 @@ +-- Range Join Example: Seed Data +-- Run schema.sql first. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- Customer segment history (SCD2, EXCLUSIVE end dates) +-- +-- C001: Free → Growth → Enterprise +-- C002: Growth → Enterprise +-- C003: Free (never upgraded) +INSERT INTO CUSTOMER_SEGMENTS VALUES + (1, 'C001', 'Free', '2024-01-01', '2024-04-01'), + (2, 'C001', 'Growth', '2024-04-01', '2024-07-01'), + (3, 'C001', 'Enterprise', '2024-07-01', '9999-12-31'), + (4, 'C002', 'Growth', '2024-01-01', '2024-06-01'), + (5, 'C002', 'Enterprise', '2024-06-01', '9999-12-31'), + (6, 'C003', 'Free', '2024-01-01', '9999-12-31'); + +-- Orders +-- +-- Expected tier at time of purchase: +-- O001: C001 ordered 2024-01-15 → Free (valid 2024-01-01 to 2024-04-01) +-- O002: C001 ordered 2024-04-20 → Growth (valid 2024-04-01 to 2024-07-01) +-- O003: C001 ordered 2024-09-10 → Enterprise (valid 2024-07-01+) +-- O004: C002 ordered 2024-02-05 → Growth (valid 2024-01-01 to 2024-06-01) +-- O005: C002 ordered 2024-07-03 → Enterprise (valid 2024-06-01+) +-- O006: C003 ordered 2024-03-12 → Free +-- O007: C003 ordered 2024-06-08 → Free +-- O008: C003 ordered 2024-10-01 → Free +INSERT INTO ORDERS VALUES + (1, 'C001', '2024-01-15', 49.00), + (2, 'C001', '2024-04-20', 149.00), + (3, 'C001', '2024-09-10', 499.00), + (4, 'C002', '2024-02-05', 149.00), + (5, 'C002', '2024-07-03', 499.00), + (6, 'C003', '2024-03-12', 49.00), + (7, 'C003', '2024-06-08', 49.00), + (8, 'C003', '2024-10-01', 49.00); + +-- Verify: Revenue by correct historical segment should be: +-- Free: O001 + O006 + O007 + O008 = 49 + 49 + 49 + 49 = $196 +-- Growth: O002 + O004 = 149 + 149 = $298 +-- Enterprise: O003 + O005 = 499 + 499 = $998 +-- Total: $1,492 diff --git a/skills/semantic-view-patterns/snippets/range_join/semantic_view.sql b/skills/semantic-view-patterns/snippets/range_join/semantic_view.sql new file mode 100644 index 00000000..44378a2f --- /dev/null +++ b/skills/semantic-view-patterns/snippets/range_join/semantic_view.sql @@ -0,0 +1,56 @@ +-- Range Join Example: Semantic View DDL +-- Run schema.sql and seed_data.sql first. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.ORDERS_BY_SEGMENT + + TABLES ( + orders AS SNIPPETS.PUBLIC.ORDERS + PRIMARY KEY (ORDER_ID), + + customer_segments AS SNIPPETS.PUBLIC.CUSTOMER_SEGMENTS + PRIMARY KEY (SEGMENT_ID) + UNIQUE (CUSTOMER_ID, VALID_FROM, VALID_TO) + CONSTRAINT segment_period DISTINCT RANGE BETWEEN VALID_FROM AND VALID_TO EXCLUSIVE + ) + + RELATIONSHIPS ( + -- Compound key: match on customer_id AND the order_date falling within the segment's valid range + orders_to_segment AS orders(CUSTOMER_ID, ORDER_DATE) + REFERENCES customer_segments(CUSTOMER_ID, BETWEEN VALID_FROM AND VALID_TO EXCLUSIVE) + ) + + FACTS ( + orders.order_revenue AS ORDER_AMOUNT + COMMENT = 'Order amount in USD' + ) + + DIMENSIONS ( + orders.order_id AS ORDER_ID, + orders.customer_id AS CUSTOMER_ID, + orders.order_date AS ORDER_DATE, + + customer_segments.segment AS SEGMENT + WITH SYNONYMS ('tier', 'subscription tier', 'plan', 'customer plan') + COMMENT = 'Subscription tier active at time of order (historically resolved via range join)', + customer_segments.valid_from AS VALID_FROM + COMMENT = 'Start of this segment period (inclusive)', + customer_segments.valid_to AS VALID_TO + COMMENT = 'End of this segment period (exclusive; 9999-12-31 = current)' + ) + + METRICS ( + orders.total_revenue AS SUM(ORDER_AMOUNT) + WITH SYNONYMS ('revenue', 'total sales', 'gmv') + COMMENT = 'Sum of order amounts', + + orders.order_count AS COUNT(ORDER_ID) + WITH SYNONYMS ('number of orders', 'orders', 'order volume') + COMMENT = 'Number of orders' + ) + + COMMENT = 'Orders joined to the subscription tier active at time of purchase via SCD2 range relationship.' + + AI_SQL_GENERATION 'Use customer_segments.segment to break down order metrics by the subscription tier the customer was on at the time of each order. The range relationship automatically resolves the historically-correct segment — there is no need to filter on valid_from/valid_to manually. Each order maps to exactly one segment record.'; diff --git a/skills/semantic-view-patterns/snippets/range_join/semantic_view.yaml b/skills/semantic-view-patterns/snippets/range_join/semantic_view.yaml new file mode 100644 index 00000000..1e86b28a --- /dev/null +++ b/skills/semantic-view-patterns/snippets/range_join/semantic_view.yaml @@ -0,0 +1,120 @@ +# Range Join (SCD2 Temporal): Semantic View YAML +# +# This file is the canonical YAML for ORDERS_BY_SEGMENT, exported via: +# SELECT SYSTEM$READ_YAML_FROM_SEMANTIC_VIEW('SEMANTIC_SKILLS.SNIPPETS.ORDERS_BY_SEGMENT'); +# then lightly formatted and annotated. +# +# KEY FINDING: Range join relationships ARE fully supported in YAML. +# The syntax uses `type: range` on the date column and a `constraints` +# block on the dimension table — no DDL required. +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); + +name: ORDERS_BY_SEGMENT +description: Orders joined to the subscription tier active at time of purchase via SCD2 range relationship. + +tables: + - name: CUSTOMER_SEGMENTS + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CUSTOMER_SEGMENTS + primary_key: + columns: + - SEGMENT_ID + unique_keys: + - columns: + - CUSTOMER_ID + - VALID_FROM + - VALID_TO + # DISTINCT RANGE constraint — enables the range join relationship below + constraints: + - name: SEGMENT_PERIOD + distinct_range: + start_column: VALID_FROM + end_column: VALID_TO + dimensions: + - name: SEGMENT + synonyms: + - tier + - subscription tier + - plan + - customer plan + description: Subscription tier active at time of order (historically resolved via range join) + expr: SEGMENT + data_type: VARCHAR(20) + - name: VALID_FROM + description: Start of this segment period (inclusive) + expr: VALID_FROM + data_type: DATE + - name: VALID_TO + description: End of this segment period (exclusive; 9999-12-31 = current) + expr: VALID_TO + data_type: DATE + + - name: ORDERS + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: ORDERS + primary_key: + columns: + - ORDER_ID + dimensions: + - name: CUSTOMER_ID + expr: CUSTOMER_ID + data_type: VARCHAR(10) + - name: ORDER_DATE + expr: ORDER_DATE + data_type: DATE + - name: ORDER_ID + expr: ORDER_ID + data_type: NUMBER + facts: + - name: ORDER_REVENUE + description: Order amount in USD + expr: ORDER_AMOUNT + data_type: NUMBER(10,2) + access_modifier: public_access + metrics: + - name: ORDER_COUNT + synonyms: + - number of orders + - orders + - order volume + description: Number of orders + expr: COUNT(ORDER_ID) + access_modifier: public_access + - name: TOTAL_REVENUE + synonyms: + - revenue + - total sales + - gmv + description: Sum of order amounts + expr: SUM(ORDER_AMOUNT) + access_modifier: public_access + +relationships: + - name: ORDERS_TO_SEGMENT + left_table: ORDERS + right_table: CUSTOMER_SEGMENTS + relationship_columns: + - left_column: CUSTOMER_ID + right_column: CUSTOMER_ID + # Range join: ORDER_DATE must fall within [VALID_FROM, VALID_TO) + - left_column: ORDER_DATE + type: range + right_range: + start_column: VALID_FROM + end_column: VALID_TO + +module_custom_instructions: + sql_generation: > + Use customer_segments.segment to break down order metrics by the subscription + tier the customer was on at the time of each order. The range relationship + automatically resolves the historically-correct segment — there is no need to + filter on valid_from/valid_to manually. Each order maps to exactly one segment record. diff --git a/skills/semantic-view-patterns/snippets/role_playing_dimensions/README.md b/skills/semantic-view-patterns/snippets/role_playing_dimensions/README.md new file mode 100644 index 00000000..c49b0ce5 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/role_playing_dimensions/README.md @@ -0,0 +1,133 @@ +# Role-Playing Dimensions + +## The Problem + +You have a fact table with multiple foreign keys that all reference the same physical dimension table. A classic example: an `ORDERS` table with both an `ORDER_DATE` and a `SHIP_DATE`, both of which should join to the same `DIM_DATE` calendar table. + +The naive approach — one dimension table, two relationships, shared columns — hits an immediate ambiguity. The SV engine can't tell which relationship path to use when you group by a dimension like `year` or `month`, because that column belongs to both roles simultaneously. + +**How You Might Express This Need:** +- "I want to see revenue both by when orders were placed *and* by when they shipped." +- "My DIM_DATE has 50 columns. I don't want to duplicate it just to get two date roles." +- "In Power BI I'd just create two date table relationships. How do I do that in a Semantic View?" +- "I need ORDER_YEAR and SHIP_YEAR as independent dimensions in the same report." + +## The Solution: Alias the Dimension Table Twice + +In the `TABLES` clause, list the same physical table under two different logical names. The SV engine treats each alias as a completely separate entity — separate joins, separate dimension columns, no ambiguity. + +```sql +TABLES ( + orders AS ORDERS PRIMARY KEY (ORDER_ID), + order_date_dim AS DIM_DATE PRIMARY KEY (DATE_KEY), -- role 1 + ship_date_dim AS DIM_DATE PRIMARY KEY (DATE_KEY) -- role 2, same physical table +) +RELATIONSHIPS ( + orders_to_order_date AS orders(ORDER_DATE) REFERENCES order_date_dim(DATE_KEY), + orders_to_ship_date AS orders(SHIP_DATE) REFERENCES ship_date_dim(DATE_KEY) +) +DIMENSIONS ( + -- logical_name AS physical_column — each role gets unique logical names + order_date_dim.order_year AS YEAR, -- logical: order_year → physical: YEAR + order_date_dim.order_month_name AS MONTH_NAME, -- logical: order_month_name → physical: MONTH_NAME + ship_date_dim.ship_year AS YEAR, -- logical: ship_year → physical: YEAR (same col, different role) + ship_date_dim.ship_month_name AS MONTH_NAME -- logical: ship_month_name → physical: MONTH_NAME +) +``` + +Each role gets its own uniquely named dimensions. No `USING` clause is needed. You can use `ORDER_YEAR` and `SHIP_YEAR` independently or together in the same query. + +## What the Demo Shows + +This snippet uses 8 orders placed between November 2024 and February 2025. Four of them **ship in a different month than they were placed** — that's what makes the role distinction meaningful: + +| Order | Customer | Order Date | Ship Date | Amount | Cross-month? | +|-------|----------|-----------|-----------|--------|--------------| +| 1 | Acme Corp | Nov 15, 2024 | Nov 20, 2024 | $500 | — | +| 2 | Beta LLC | Nov 28, 2024 | Dec 3, 2024 | $800 | ← Nov order, Dec ship | +| 3 | Gamma Inc | Dec 1, 2024 | Dec 5, 2024 | $300 | — | +| 4 | Delta Co | Dec 20, 2024 | Jan 4, 2025 | $1,200 | ← Dec order, **Jan ship (crosses year!)** | +| 5 | Acme Corp | Jan 10, 2025 | Jan 15, 2025 | $450 | — | +| 6 | Epsilon Ltd | Jan 25, 2025 | Feb 2, 2025 | $900 | ← Jan order, Feb ship | +| 7 | Beta LLC | Feb 14, 2025 | Feb 20, 2025 | $650 | — | +| 8 | Gamma Inc | Feb 28, 2025 | Mar 5, 2025 | $1,100 | ← Feb order, Mar ship | + +**Revenue by ORDER month** (4 rows): +``` +November 2024 $1,300 2 orders +December 2024 $1,500 2 orders +January 2025 $1,350 2 orders +February 2025 $1,750 2 orders +``` + +**Revenue by SHIP month** (5 rows — same $5,900, different distribution): +``` +November 2024 $500 1 order +December 2024 $1,100 2 orders +January 2025 $1,650 2 orders ← Order 4 (Dec) shows up here +February 2025 $1,550 2 orders +March 2025 $1,100 1 order +``` + +## Combining Both Roles in One Query + +Because `order_date_dim` and `ship_date_dim` are independent entities, you can group by both simultaneously. The result is a cross-tab showing the joint distribution of order date and ship date — useful for fulfillment lag analysis: + +``` +ORDER_MONTH SHIP_MONTH REVENUE +November 2024 → November 2024 $500 (same-month) +November 2024 → December 2024 $800 (1-month lag) +December 2024 → December 2024 $300 (same-month) +December 2024 → January 2025 $1,200 (crosses year) +... +``` + +## Role-Playing Dimensions vs. Multi-Path Metrics + +Both patterns handle a single physical table reached via two join paths. The choice depends on what you want to expose: + +| | Role-Playing Dimensions (this snippet) | Multi-Path Metrics (`multi_path_metrics`) | +|--|--|--| +| Aliases in TABLES | Two aliases of the dim table | One alias; two relationships point to it | +| Disambiguation | None needed — each alias has unique dim names | `USING` clause on each metric | +| Dimensions | `ORDER_YEAR`, `SHIP_YEAR` — independently named | Single shared column (e.g. `weather_condition`) | +| Use both in one query? | Yes — produces cross-tab | No — USING locks each metric to one path | +| Best for | Multiple date roles, multiple geography roles | Weather-at-departure vs weather-at-arrival style analysis | + +Use **role-playing dimensions** when each role needs its own independently named columns. +Use **multi-path metrics** when the dimension column is shared and disambiguation is needed at the metric level. + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **Power BI** | Mark the same date table as active for one relationship, inactive for the others; use `USERELATIONSHIP()` in DAX measures. Requires a separate DAX expression per relationship — role-playing is implicit, not structural. | +| **Tableau** | Duplicate the date dimension data source, rename it, join each copy to the appropriate date FK. Doubles the data loaded into the extract. | +| **LookML** | `view: order_date { from: dim_date }` and `view: ship_date { from: dim_date }` — exact equivalent. LookML pioneered the from-alias pattern; SV TABLES aliasing follows the same idea. | +| **dbt** | No semantic layer equivalent; handled in SQL via aliased CTEs or multiple joins. Query author must remember which date column to use. | +| **Raw SQL** | `LEFT JOIN dim_date AS order_date_dim ON ... LEFT JOIN dim_date AS ship_date_dim ON ...` — the SV pattern encodes this join structure once in the model definition. | + +## What Doesn't Work + +- **Using a single alias with two relationships (multi-path without USING)**: If you define only one `date_dim` alias but two relationships pointing to it, any metric grouped by `date_dim.year` will error with "multi-path relationship". The engine can't resolve which path to use without explicit `USING` disambiguation. + +- **Expecting a simple list when combining both roles**: Grouping by `ORDER_MONTH_NAME` and `SHIP_MONTH_NAME` together produces a cross-tab (one row per unique combination), not a flat list. This is correct behavior, but can produce many rows if orders span many months. + +- **Sparse DIM_DATE**: The SV uses LEFT JOINs. An `ORDER_DATE` with no matching row in `DIM_DATE` will produce NULL values for all order date dimensions (`ORDER_YEAR`, `ORDER_MONTH_NAME`, etc.). Populate `DIM_DATE` for the full date range of the fact table. + +- **Column name collisions**: If both aliases expose a dimension with the same name (e.g., `year AS YEAR`), the SV will fail to deploy — dimension names must be globally unique. Always prefix them per role (`ORDER_YEAR`, `SHIP_YEAR`). + +## Docs + +- [Semantic View — TABLES clause (logical table aliases)](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view) +- [Semantic View — RELATIONSHIPS clause](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#relationships) +- [SEMANTIC_VIEW() table function](https://docs.snowflake.com/en/sql-reference/functions/semantic_view) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `DIM_DATE` calendar table + `ORDERS` fact table with two date FKs | +| `seed_data.sql` | 16 calendar dates + 8 orders (4 with cross-month ship dates) | +| `semantic_view.sql` | `ORDERS_RPD_SV` — DIM_DATE aliased as `order_date_dim` and `ship_date_dim` | +| `queries.sql` | Revenue by order month, revenue by ship month, fulfillment lag cross-tab, gotchas | diff --git a/skills/semantic-view-patterns/snippets/role_playing_dimensions/queries.sql b/skills/semantic-view-patterns/snippets/role_playing_dimensions/queries.sql new file mode 100644 index 00000000..db9eb2d9 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/role_playing_dimensions/queries.sql @@ -0,0 +1,139 @@ +-- Role-Playing Dimensions: Queries +-- +-- Demonstrates how the same DIM_DATE table, aliased under two logical names, +-- produces completely independent date dimensions for order date and ship date. +-- +-- In SEMANTIC_VIEW() queries, dimensions are referenced as entity_alias.logical_name. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Revenue by ORDER month — when were orders placed? +-- +-- Expected (4 rows): +-- November 2024 $1,300 2 orders (orders 1 + 2) +-- December 2024 $1,500 2 orders (orders 3 + 4) +-- January 2025 $1,350 2 orders (orders 5 + 6) +-- February 2025 $1,750 2 orders (orders 7 + 8) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_RPD_SV + DIMENSIONS order_date_dim.order_year, + order_date_dim.order_month_num, + order_date_dim.order_month_name + METRICS orders.total_revenue, orders.order_count +) +ORDER BY order_year, order_month_num; + + +-- 2. Revenue by SHIP month — when did revenue actually leave the warehouse? +-- +-- Expected (5 rows — order 4 crosses a year boundary): +-- November 2024 $500 1 order (order 1) +-- December 2024 $1,100 2 orders (orders 2 + 3) +-- January 2025 $1,650 2 orders (orders 4 + 5) ← Dec order shows up here +-- February 2025 $1,550 2 orders (orders 6 + 7) +-- March 2025 $1,100 1 order (order 8) +-- +-- Note: total is still $5,900 — same revenue, different monthly distribution. +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_RPD_SV + DIMENSIONS ship_date_dim.ship_year, + ship_date_dim.ship_month_num, + ship_date_dim.ship_month_name + METRICS orders.total_revenue, orders.order_count +) +ORDER BY ship_year, ship_month_num; + + +-- 3. Fulfillment lag — order_month_name and ship_month_name in the same query +-- +-- Because order_date_dim and ship_date_dim are independent entities, +-- you can combine them freely. The result is a cross-tab: +-- each row shows (order_month, ship_month, revenue) for orders +-- where those two dates occur. +-- +-- Expected (8 rows — one per order): +-- November 2024 → November 2024 $500 (order 1, same month) +-- November 2024 → December 2024 $800 (order 2, 1-month lag) +-- December 2024 → December 2024 $300 (order 3, same month) +-- December 2024 → January 2025 $1,200 (order 4, crosses year!) +-- January 2025 → January 2025 $450 (order 5, same month) +-- January 2025 → February 2025 $900 (order 6, 1-month lag) +-- February 2025 → February 2025 $650 (order 7, same month) +-- February 2025 → March 2025 $1,100 (order 8, 1-month lag) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_RPD_SV + DIMENSIONS order_date_dim.order_year, + order_date_dim.order_month_num, + order_date_dim.order_month_name, + ship_date_dim.ship_year, + ship_date_dim.ship_month_num, + ship_date_dim.ship_month_name + METRICS orders.total_revenue, orders.order_count +) +ORDER BY order_year, order_month_num, ship_year, ship_month_num; + + +-- 4. Revenue per customer broken down by both order_year and ship_year +-- +-- Delta Co has one order placed in Dec 2024 but shipped in Jan 2025 — +-- watch for the (2024, 2025) cross-year row. +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ORDERS_RPD_SV + DIMENSIONS orders.customer_name, + order_date_dim.order_year, + ship_date_dim.ship_year + METRICS orders.total_revenue, orders.order_count +) +ORDER BY customer_name, order_year; + + +-- ============================================================ +-- HOW ROLE-PLAYING DIMENSIONS WORK +-- ============================================================ + +-- The SV engine generates one JOIN per alias: +-- +-- SELECT ... +-- FROM ORDERS o +-- LEFT JOIN DIM_DATE odate ON o.ORDER_DATE = odate.DATE_KEY ← order_date_dim path +-- LEFT JOIN DIM_DATE sdate ON o.SHIP_DATE = sdate.DATE_KEY ← ship_date_dim path +-- +-- odate.YEAR → logical name order_year (independent GROUP BY) +-- sdate.YEAR → logical name ship_year (independent GROUP BY) +-- +-- No USING clause needed because there is no ambiguity — each alias +-- is bound to exactly one relationship. + + +-- ============================================================ +-- GOTCHAS +-- ============================================================ + +-- Using order_month_name and ship_month_name together produces a cross-tab, not a list. +-- Query 3 above returns 8 rows (one per order) rather than 4–5 rows (one per month). +-- This is correct behavior — it shows the joint distribution of (order month, ship month). +-- If you only want one date perspective, use EITHER set of dimensions, not both. + +-- The multi_path_metrics approach (single alias + USING) does NOT work here. +-- That pattern requires a single dimension column shared across both paths. +-- Role-playing dimensions gives each path its own logical names — use it whenever you +-- want to independently slice by each date role. + +-- NULL month names appear when an order_date or ship_date has no matching row in DIM_DATE. +-- Always ensure DIM_DATE is fully populated for all dates in the fact table. +-- A sparse DIM_DATE will cause silent NULLs in dimension columns (LEFT JOIN semantics). + + +-- ============================================================ +-- CLEANUP — run to remove objects created by this snippet +-- ============================================================ + +DROP TABLE IF EXISTS SNIPPETS.PUBLIC.ORDERS; +DROP TABLE IF EXISTS SNIPPETS.PUBLIC.DIM_DATE; +DROP SEMANTIC VIEW IF EXISTS SNIPPETS.PUBLIC.ORDERS_RPD_SV; diff --git a/skills/semantic-view-patterns/snippets/role_playing_dimensions/schema.sql b/skills/semantic-view-patterns/snippets/role_playing_dimensions/schema.sql new file mode 100644 index 00000000..f85cf7a5 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/role_playing_dimensions/schema.sql @@ -0,0 +1,41 @@ +-- Role-Playing Dimensions: Schema Setup +-- +-- One physical date dimension table (DIM_DATE) aliased twice in the SV: +-- order_date_dim → joined on ORDERS.ORDER_DATE +-- ship_date_dim → joined on ORDERS.SHIP_DATE +-- +-- This gives each date role its own named dimensions (ORDER_YEAR, SHIP_YEAR, etc.) +-- without any USING clause or metric-level disambiguation. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- DIMENSION TABLE +-- ============================================================ + +-- One row per calendar date. +-- In a real DW this table may have hundreds of columns; here we use five. +CREATE OR REPLACE TABLE DIM_DATE ( + date_key DATE NOT NULL, + month_num INTEGER NOT NULL, + month_name VARCHAR(10) NOT NULL, + quarter VARCHAR(2) NOT NULL, + year INTEGER NOT NULL, + CONSTRAINT pk_dim_date PRIMARY KEY (date_key) +); + +-- ============================================================ +-- FACT TABLE +-- ============================================================ + +-- ORDERS: each row is one order. +-- Two date FKs — both reference the same physical DIM_DATE. +CREATE OR REPLACE TABLE ORDERS ( + order_id INTEGER NOT NULL, + customer_name VARCHAR(50) NOT NULL, + order_date DATE NOT NULL, + ship_date DATE NOT NULL, + amount NUMBER(10,2) NOT NULL, + CONSTRAINT pk_orders PRIMARY KEY (order_id) +); diff --git a/skills/semantic-view-patterns/snippets/role_playing_dimensions/seed_data.sql b/skills/semantic-view-patterns/snippets/role_playing_dimensions/seed_data.sql new file mode 100644 index 00000000..3cf9ef22 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/role_playing_dimensions/seed_data.sql @@ -0,0 +1,52 @@ +-- Role-Playing Dimensions: Seed Data +-- +-- 8 orders spanning Nov 2024 – Feb 2025. +-- Four orders ship in a different month than they were placed — +-- that's what makes the role-playing demo interesting. +-- +-- Revenue by ORDER_MONTH: Nov=$1,300 Dec=$1,500 Jan=$1,350 Feb=$1,750 (total $5,900) +-- Revenue by SHIP_MONTH: Nov=$500 Dec=$1,100 Jan=$1,650 Feb=$1,550 Mar=$1,100 + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- DIM_DATE — one row for every date referenced by ORDERS +-- ============================================================ + +INSERT INTO DIM_DATE (date_key, month_num, month_name, quarter, year) VALUES + -- November 2024 dates + ('2024-11-15', 11, 'November', 'Q4', 2024), + ('2024-11-20', 11, 'November', 'Q4', 2024), + ('2024-11-28', 11, 'November', 'Q4', 2024), + -- December 2024 dates + ('2024-12-01', 12, 'December', 'Q4', 2024), + ('2024-12-03', 12, 'December', 'Q4', 2024), + ('2024-12-05', 12, 'December', 'Q4', 2024), + ('2024-12-20', 12, 'December', 'Q4', 2024), + -- January 2025 dates + ('2025-01-04', 1, 'January', 'Q1', 2025), + ('2025-01-10', 1, 'January', 'Q1', 2025), + ('2025-01-15', 1, 'January', 'Q1', 2025), + ('2025-01-25', 1, 'January', 'Q1', 2025), + -- February 2025 dates + ('2025-02-02', 2, 'February', 'Q1', 2025), + ('2025-02-14', 2, 'February', 'Q1', 2025), + ('2025-02-20', 2, 'February', 'Q1', 2025), + ('2025-02-28', 2, 'February', 'Q1', 2025), + -- March 2025 dates + ('2025-03-05', 3, 'March', 'Q1', 2025); + +-- ============================================================ +-- ORDERS — 8 rows; 4 cross-month shippers marked with ← +-- ============================================================ + +INSERT INTO ORDERS (order_id, customer_name, order_date, ship_date, amount) VALUES + (1, 'Acme Corp', '2024-11-15', '2024-11-20', 500.00), -- same month + (2, 'Beta LLC', '2024-11-28', '2024-12-03', 800.00), -- ← Nov order, Dec ship + (3, 'Gamma Inc', '2024-12-01', '2024-12-05', 300.00), -- same month + (4, 'Delta Co', '2024-12-20', '2025-01-04', 1200.00), -- ← Dec order, Jan ship (crosses year!) + (5, 'Acme Corp', '2025-01-10', '2025-01-15', 450.00), -- same month + (6, 'Epsilon Ltd', '2025-01-25', '2025-02-02', 900.00), -- ← Jan order, Feb ship + (7, 'Beta LLC', '2025-02-14', '2025-02-20', 650.00), -- same month + (8, 'Gamma Inc', '2025-02-28', '2025-03-05', 1100.00); -- ← Feb order, Mar ship diff --git a/skills/semantic-view-patterns/snippets/role_playing_dimensions/semantic_view.sql b/skills/semantic-view-patterns/snippets/role_playing_dimensions/semantic_view.sql new file mode 100644 index 00000000..4b932299 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/role_playing_dimensions/semantic_view.sql @@ -0,0 +1,94 @@ +-- Role-Playing Dimensions: Semantic View DDL +-- +-- Pattern: alias the same physical DIM_DATE table twice under different names. +-- Each alias gets its own dedicated logical dimension names — no USING clause needed. +-- +-- Syntax reminder: entity.logical_name AS physical_column + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.ORDERS_RPD_SV + + TABLES ( + orders AS SNIPPETS.PUBLIC.ORDERS + PRIMARY KEY (ORDER_ID) + + -- The same physical DIM_DATE table aliased under two logical names. + -- 'order_date_dim' will be joined on ORDER_DATE. + -- 'ship_date_dim' will be joined on SHIP_DATE. + -- The SV engine treats them as completely separate entities. + , order_date_dim AS SNIPPETS.PUBLIC.DIM_DATE + PRIMARY KEY (DATE_KEY) + COMMENT = 'Date dimension for when the order was placed' + + , ship_date_dim AS SNIPPETS.PUBLIC.DIM_DATE + PRIMARY KEY (DATE_KEY) + COMMENT = 'Date dimension for when the order was shipped' + ) + + RELATIONSHIPS ( + -- orders.ORDER_DATE → the order-date role of DIM_DATE + orders_to_order_date AS orders(ORDER_DATE) + REFERENCES order_date_dim(DATE_KEY) + + -- orders.SHIP_DATE → the ship-date role of DIM_DATE + -- Identical physical table; different logical role; no conflict. + , orders_to_ship_date AS orders(SHIP_DATE) + REFERENCES ship_date_dim(DATE_KEY) + ) + + FACTS ( + -- logical: revenue → physical column: AMOUNT + orders.revenue AS AMOUNT + ) + + DIMENSIONS ( + -- logical: customer_name → physical: CUSTOMER_NAME + orders.customer_name AS CUSTOMER_NAME + WITH SYNONYMS ('customer', 'buyer', 'account') + + -- ORDER DATE role — logical names are unique per role; physical columns come from DIM_DATE + , order_date_dim.order_year AS YEAR + WITH SYNONYMS ('order year', 'year ordered', 'placed year') + , order_date_dim.order_quarter AS QUARTER + WITH SYNONYMS ('order quarter', 'quarter ordered') + , order_date_dim.order_month_num AS MONTH_NUM + WITH SYNONYMS ('order month number', 'month number ordered') + , order_date_dim.order_month_name AS MONTH_NAME + WITH SYNONYMS ('order month', 'order month name', 'month ordered') + + -- SHIP DATE role — same physical columns (YEAR, QUARTER, etc.), unique logical names + , ship_date_dim.ship_year AS YEAR + WITH SYNONYMS ('ship year', 'shipped year', 'fulfillment year') + , ship_date_dim.ship_quarter AS QUARTER + WITH SYNONYMS ('ship quarter', 'shipped quarter') + , ship_date_dim.ship_month_num AS MONTH_NUM + WITH SYNONYMS ('ship month number', 'month number shipped') + , ship_date_dim.ship_month_name AS MONTH_NAME + WITH SYNONYMS ('ship month', 'ship month name', 'month shipped') + ) + + METRICS ( + -- logical: total_revenue → physical: SUM(AMOUNT) + orders.total_revenue AS SUM(AMOUNT) + WITH SYNONYMS ('revenue', 'sales', 'total sales') + COMMENT = 'Sum of order amounts' + + -- logical: order_count → physical: COUNT(ORDER_ID) + , orders.order_count AS COUNT(ORDER_ID) + WITH SYNONYMS ('orders', 'order count', 'number of orders') + COMMENT = 'Number of orders' + ) + + COMMENT = 'Orders with two independent date roles — order date and ship date — both backed by the same physical DIM_DATE table. Demonstrates role-playing dimensions: aliasing a dimension table multiple times so each role gets its own logical dimension names without any USING clause.' + + AI_SQL_GENERATION 'This SV uses two aliases of the same physical DIM_DATE table to model order date and ship date as independent roles. + +ORDER DATE dimensions: order_date_dim.order_year, order_date_dim.order_quarter, order_date_dim.order_month_num, order_date_dim.order_month_name +SHIP DATE dimensions: ship_date_dim.ship_year, ship_date_dim.ship_quarter, ship_date_dim.ship_month_num, ship_date_dim.ship_month_name + +Use ORDER date dimensions when the question is about when orders were placed. +Use SHIP date dimensions when the question is about when orders shipped or were fulfilled. + +You can combine both in one query (e.g. order_date_dim.order_month_name + ship_date_dim.ship_month_name) to see the cross-tab of order date vs ship date — useful for lead-time or fulfillment lag analysis.'; diff --git a/skills/semantic-view-patterns/snippets/role_playing_dimensions/semantic_view.yaml b/skills/semantic-view-patterns/snippets/role_playing_dimensions/semantic_view.yaml new file mode 100644 index 00000000..1a3fe685 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/role_playing_dimensions/semantic_view.yaml @@ -0,0 +1,144 @@ +# Role-Playing Dimensions: Semantic View YAML +# +# This file is the canonical YAML for ORDERS_RPD_SV, exported via: +# SELECT SYSTEM$READ_YAML_FROM_SEMANTIC_VIEW('SEMANTIC_SKILLS.SNIPPETS.ORDERS_RPD_SV'); +# then lightly formatted and annotated. +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); + +name: ORDERS_RPD_SV +description: Orders with two independent date roles backed by the same physical DIM_DATE table. + +tables: + - name: ORDERS + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: ORDERS + primary_key: + columns: + - ORDER_ID + dimensions: + - name: CUSTOMER_NAME + synonyms: + - customer + - buyer + - account + expr: CUSTOMER_NAME + data_type: VARCHAR(50) + facts: + - name: REVENUE + expr: AMOUNT + data_type: NUMBER(10,2) + access_modifier: public_access + metrics: + - name: TOTAL_REVENUE + synonyms: + - revenue + - sales + - total sales + expr: SUM(AMOUNT) + access_modifier: public_access + - name: ORDER_COUNT + synonyms: + - orders + - order count + - number of orders + expr: COUNT(ORDER_ID) + access_modifier: public_access + + # Role 1: date dimension for when the order was placed + # Same physical DIM_DATE table, aliased as ORDER_DATE_DIM + - name: ORDER_DATE_DIM + description: Date dimension for when the order was placed + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DIM_DATE + primary_key: + columns: + - DATE_KEY + dimensions: + - name: ORDER_YEAR + synonyms: + - order year + - year ordered + - placed year + expr: YEAR + data_type: NUMBER + - name: ORDER_QUARTER + synonyms: + - order quarter + - quarter ordered + expr: QUARTER + data_type: VARCHAR(2) + - name: ORDER_MONTH_NUM + synonyms: + - order month number + - month number ordered + expr: MONTH_NUM + data_type: NUMBER + - name: ORDER_MONTH_NAME + synonyms: + - order month + - order month name + - month ordered + expr: MONTH_NAME + data_type: VARCHAR(10) + + # Role 2: same physical DIM_DATE table, aliased as SHIP_DATE_DIM + # Unique logical dimension names prevent any ambiguity with ORDER_DATE_DIM + - name: SHIP_DATE_DIM + description: Date dimension for when the order was shipped + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DIM_DATE + primary_key: + columns: + - DATE_KEY + dimensions: + - name: SHIP_YEAR + synonyms: + - ship year + - shipped year + - fulfillment year + expr: YEAR + data_type: NUMBER + - name: SHIP_QUARTER + synonyms: + - ship quarter + - shipped quarter + expr: QUARTER + data_type: VARCHAR(2) + - name: SHIP_MONTH_NUM + synonyms: + - ship month number + - month number shipped + expr: MONTH_NUM + data_type: NUMBER + - name: SHIP_MONTH_NAME + synonyms: + - ship month + - ship month name + - month shipped + expr: MONTH_NAME + data_type: VARCHAR(10) + +relationships: + - name: ORDERS_TO_ORDER_DATE + left_table: ORDERS + right_table: ORDER_DATE_DIM + relationship_columns: + - left_column: ORDER_DATE + right_column: DATE_KEY + - name: ORDERS_TO_SHIP_DATE + left_table: ORDERS + right_table: SHIP_DATE_DIM + relationship_columns: + - left_column: SHIP_DATE + right_column: DATE_KEY diff --git a/skills/semantic-view-patterns/snippets/row_access_policies/README.md b/skills/semantic-view-patterns/snippets/row_access_policies/README.md new file mode 100644 index 00000000..c933966b --- /dev/null +++ b/skills/semantic-view-patterns/snippets/row_access_policies/README.md @@ -0,0 +1,102 @@ +# Row Access Policies with Semantic Views + +## The Problem + +You have a fact table and a dimension table. You want users to see only metrics for the dimensions they are authorized to access — for example, a regional sales analyst should see only their region's revenue, with no visibility into other regions. + +The natural instinct is to apply a Row Access Policy (RAP) to the dimension table. But in a Semantic View, this creates an unexpected result: fact rows for filtered-out dimensions still appear — they just show up with **NULL dimension values**. The aggregated metrics from those rows are visible in the NULL row, leaking both the existence and magnitude of data the user should not see. + +**Example in this snippet**: Four sales regions (Northeast, Southeast, Northwest, Southwest). `REGION_A_ANALYST` should see only Northeast and Southeast. With a RAP on the dimension table only, they instead see: + +``` +Northeast $1,250 2 orders ← correct +Southeast $750 2 orders ← correct +NULL $2,600 4 orders ← should not exist +``` + +The NULL row reveals that $2,600 of revenue exists in other regions — even though those region names are hidden. + +## Why This Happens + +The SV engine generates a **LEFT JOIN** between the fact table and the dimension table. The RAP filters rows in the dimension table, but LEFT JOIN semantics mean unmatched fact rows survive — they just receive NULL for every dimension column. Those rows are then grouped under a single NULL dimension value in the result. + +A RAP on the dimension table controls **what dimension data is visible**, but it does not control **which fact rows are included**. + +## Two Workarounds + +### Workaround 1: Helper view with inner join + +Create a SQL view that **inner-joins** the fact table to the dimension table, and use that view as the fact entity in the Semantic View instead of the raw fact table: + +```sql +CREATE OR REPLACE VIEW ORDERS_FILTERED AS + SELECT o.order_id, o.region_id, o.order_date, o.amount + FROM ORDERS o + INNER JOIN SALES_REGIONS r ON o.region_id = r.region_id; +``` + +When the RAP hides a `SALES_REGIONS` row, the `INNER JOIN` also drops the corresponding `ORDERS` rows from the view. No orphaned fact rows reach the SV — no NULL rows appear. + +**Best for**: Situations where you cannot or should not alter the physical fact table (e.g., it is shared by other SVs or queries that must not be filtered). + +**Trade-off**: Adds an intermediate view object to manage; the filter lives in two places (view DDL + RAP on dimension table). + +### Workaround 2: Apply the RAP to the fact table + +Apply the same Row Access Policy directly to the fact table: + +```sql +ALTER TABLE ORDERS + ADD ROW ACCESS POLICY region_access_policy ON (REGION_ID); +``` + +Now fact rows are filtered at the source — before any join occurs. The original SV (without the helper view) works correctly. No NULL rows, no leaked metrics. + +**Best for**: Most cases. Simpler than the helper view approach and more robust — the filter is enforced regardless of how the data is queried (SV, direct SQL, etc.). + +**Trade-off**: Modifies the underlying table. All queries against `ORDERS` — not just through the SV — will be subject to the RAP. + +## Comparison + +| | RAP on dimension only (anti-pattern) | Workaround 1: helper view | Workaround 2: RAP on fact | +|--|--|--|--| +| NULL rows in results? | **Yes — data leakage** | No | No | +| Modifies underlying fact table? | No | No | **Yes** | +| Intermediate view required? | No | **Yes** | No | +| Applies outside the SV too? | No | No | **Yes** | +| Simpler to maintain? | — | Moderate | **Yes** | + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **Power BI** | Row-level security (RLS) defined on the semantic model. Applied at the model layer, not the table. Fan-out/NULL behavior depends on report layout and cross-filter direction. | +| **Tableau** | User filters or row-level security via data source filters or `USERNAME()` / `ISMEMBEROF()` calculations. Must be applied to every data source that joins to a sensitive table. | +| **dbt** | No native row-level security. Typically handled at the warehouse layer via Snowflake policies or views, then modeled in dbt. | +| **LookML** | `access_filter` or `sql_where` on an Explore. Looker applies this as a WHERE clause on the SQL query, effectively filtering both the dimension and any joined facts. | +| **Raw SQL** | Requires a WHERE clause or JOIN condition on every query — no enforcement guarantee. | + +Snowflake RAPs give you centralized, policy-driven enforcement at the storage layer. The challenge is understanding _where_ in the data pipeline to apply them when using the semantic layer. + +## What Doesn't Work + +- **RAP on dimension table alone**: Causes NULL rows in SEMANTIC_VIEW() results for any fact rows whose dimension join is blocked. This is the core anti-pattern this snippet addresses. +- **Filtering in AI_SQL_GENERATION instructions**: Instructing Cortex Analyst to "only return results for the user's region" is not a security boundary — it can be bypassed and is not enforced by the engine. +- **Relying on SEMANTIC_VIEW() WHERE clause**: A caller can omit or override WHERE conditions. Policy enforcement must live at the table or view layer. + +## Docs + +- [Row Access Policies — Snowflake Documentation](https://docs.snowflake.com/en/user-guide/security-row-intro) +- [ALTER TABLE — ADD ROW ACCESS POLICY](https://docs.snowflake.com/en/sql-reference/sql/alter-table) +- [Semantic Views — Overview](https://docs.snowflake.com/en/user-guide/views-semantic) + +## Files + +> **Note**: This snippet creates a dedicated environment (`RAP_TEST` database, `REGION_A_ANALYST` / `REGION_B_ANALYST` roles). The `--db` / `--schema` arguments to `run_snippet.py` are ignored — all objects are hardcoded to `RAP_TEST.PUBLIC`. The analyst roles need USAGE on an existing warehouse (Tutorial mode handles this automatically). Run the cleanup block in `queries.sql` when done. + +| File | Description | +|------|-------------| +| `schema.sql` | `ORDERS` and `SALES_REGIONS` tables; roles `REGION_A_ANALYST` and `REGION_B_ANALYST`; RAP applied to dimension table (anti-pattern setup) | +| `seed_data.sql` | 4 regions + 8 orders (2 per region) with known per-region totals | +| `semantic_view.sql` | Anti-pattern SV + helper view + workaround 1 SV | +| `queries.sql` | Role-switching demo: NULL-row problem → workaround 1 → workaround 2 → cleanup | diff --git a/skills/semantic-view-patterns/snippets/row_access_policies/queries.sql b/skills/semantic-view-patterns/snippets/row_access_policies/queries.sql new file mode 100644 index 00000000..9ee8aea6 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/row_access_policies/queries.sql @@ -0,0 +1,152 @@ +-- Row Access Policy Example: Queries +-- +-- Demonstrates the NULL-row problem and both workarounds in sequence. +-- Run schema.sql, seed_data.sql, and semantic_view.sql first. +-- +-- Roles used: +-- REGION_A_ANALYST — can see R001 (Northeast) and R002 (Southeast) only +-- REGION_B_ANALYST — can see R003 (Northwest) and R004 (Southwest) only +-- SYSADMIN — unrestricted access + +-- Queries use whatever warehouse the analyst roles have been granted USAGE on. +-- In Tutorial mode, this is handled by the skill before running queries. +-- If running standalone: GRANT USAGE ON WAREHOUSE TO ROLE REGION_A_ANALYST; +-- GRANT USAGE ON WAREHOUSE TO ROLE REGION_B_ANALYST; +USE DATABASE RAP_TEST; +USE SCHEMA PUBLIC; + + +-- ============================================================ +-- BASELINE: SYSADMIN sees all four regions correctly +-- ============================================================ + +USE SECONDARY ROLES NONE; +USE ROLE SYSADMIN; + +-- Expected: 4 rows, total $4,600 +SELECT * FROM SEMANTIC_VIEW( + RAP_TEST.PUBLIC.SALES_BY_REGION_SV + DIMENSIONS regions.region_name + METRICS orders.total_revenue, orders.order_count +) +ORDER BY total_revenue DESC; +-- Northeast $1,250 2 +-- Southwest $1,200 2 +-- Northwest $1,400 2 +-- Southeast $750 2 + + +-- ============================================================ +-- ANTI-PATTERN: RAP on dimension table only +-- Expected problem: REGION_A sees 2 allowed regions PLUS a NULL row +-- that aggregates all revenue from the filtered-out regions (R003+R004). +-- ============================================================ + +USE ROLE REGION_A_ANALYST; +USE SECONDARY ROLES NONE; + +-- Expected (broken): 3 rows instead of 2 +-- Northeast $1,250 2 +-- Southeast $750 2 +-- NULL $2,600 4 ← R003+R004 orders: fact rows survive, dimension is NULL +SELECT * FROM SEMANTIC_VIEW( + RAP_TEST.PUBLIC.SALES_BY_REGION_SV + DIMENSIONS regions.region_name + METRICS orders.total_revenue, orders.order_count +) +ORDER BY total_revenue DESC NULLS LAST; + +-- WHY THIS HAPPENS: +-- The SV engine generates a LEFT JOIN between ORDERS and SALES_REGIONS. +-- The RAP filters SALES_REGIONS rows for R003 and R004, so those region rows +-- are invisible. The LEFT JOIN still includes the ORDERS rows for R003/R004, +-- but produces NULL for every dimension column. Those orphaned orders are +-- grouped together under a single NULL dimension row. +-- +-- The NULL row is not just cosmetically wrong — it leaks information: +-- REGION_A_ANALYST can now infer that $2,600 of revenue exists in regions +-- they are not supposed to see at all. + + +-- ============================================================ +-- WORKAROUND 1: Helper view with inner join +-- The ORDERS_FILTERED view inner-joins ORDERS to SALES_REGIONS. +-- When the RAP hides a SALES_REGIONS row, the INNER JOIN also drops +-- the corresponding ORDERS row — no orphaned facts reach the SV. +-- ============================================================ + +-- Expected (correct): 2 rows, no NULL +-- Northeast $1,250 2 +-- Southeast $750 2 +SELECT * FROM SEMANTIC_VIEW( + RAP_TEST.PUBLIC.SALES_BY_REGION_VIEW_SV + DIMENSIONS regions.region_name + METRICS orders.total_revenue, orders.order_count +) +ORDER BY total_revenue DESC; + + +-- ============================================================ +-- WORKAROUND 2: Apply the RAP directly to the fact table +-- Adding the same RAP to ORDERS means the fact rows themselves are +-- filtered before any join occurs. The original (simpler) SV works +-- correctly without needing an intermediate helper view. +-- ============================================================ + +USE ROLE SYSADMIN; + +ALTER TABLE RAP_TEST.PUBLIC.ORDERS + ADD ROW ACCESS POLICY RAP_TEST.PUBLIC.region_access_policy ON (REGION_ID); + +USE ROLE REGION_A_ANALYST; +USE SECONDARY ROLES NONE; + +-- Same SV as the anti-pattern — but now ORDERS itself is also filtered. +-- Expected (correct): 2 rows, no NULL +-- Northeast $1,250 2 +-- Southeast $750 2 +SELECT * FROM SEMANTIC_VIEW( + RAP_TEST.PUBLIC.SALES_BY_REGION_SV + DIMENSIONS regions.region_name + METRICS orders.total_revenue, orders.order_count +) +ORDER BY total_revenue DESC; + +-- REGION_B_ANALYST sees their two regions +USE ROLE REGION_B_ANALYST; +USE SECONDARY ROLES NONE; + +-- Expected: 2 rows +-- Northwest $1,400 2 +-- Southwest $1,200 2 +SELECT * FROM SEMANTIC_VIEW( + RAP_TEST.PUBLIC.SALES_BY_REGION_SV + DIMENSIONS regions.region_name + METRICS orders.total_revenue, orders.order_count +) +ORDER BY total_revenue DESC; + + +-- ============================================================ +-- HOW THE JOIN DIRECTION MATTERS: +-- RAP on dimension → LEFT JOIN survives, orphaned facts get NULL dims. +-- RAP on fact → fact rows are filtered first; no orphaned rows exist. +-- RAP on both → belt-and-suspenders; the fact-table RAP alone is sufficient. +-- +-- Helper view approach: the INNER JOIN in the view mimics fact-table filtering +-- without modifying the underlying table. Useful when you cannot or prefer not +-- to alter the physical table (e.g., shared tables used by other SVs or queries +-- that should NOT be filtered). +-- ============================================================ + + +-- ============================================================ +-- CLEANUP — run to remove all objects created by this snippet +-- ============================================================ + +USE ROLE SYSADMIN; +DROP DATABASE IF EXISTS RAP_TEST; + +USE ROLE SECURITYADMIN; +DROP ROLE IF EXISTS REGION_A_ANALYST; +DROP ROLE IF EXISTS REGION_B_ANALYST; diff --git a/skills/semantic-view-patterns/snippets/row_access_policies/schema.sql b/skills/semantic-view-patterns/snippets/row_access_policies/schema.sql new file mode 100644 index 00000000..9fd3e9e2 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/row_access_policies/schema.sql @@ -0,0 +1,88 @@ +-- Row Access Policy Example: Schema Setup +-- +-- ⚠️ Requires ACCOUNTADMIN (or SECURITYADMIN + SYSADMIN). +-- Creates a dedicated environment: database RAP_TEST and +-- roles REGION_A_ANALYST and REGION_B_ANALYST. +-- Does NOT create a warehouse — grant the analyst roles USAGE on an existing +-- warehouse before running queries.sql (see Tutorial mode instructions). +-- Does NOT use the --db / --schema arguments from run_snippet.py. + +USE SECONDARY ROLES NONE; + +-- ============================================================ +-- ROLES +-- ============================================================ + +USE ROLE SECURITYADMIN; + +CREATE ROLE IF NOT EXISTS REGION_A_ANALYST; +CREATE ROLE IF NOT EXISTS REGION_B_ANALYST; + +GRANT ROLE REGION_A_ANALYST TO ROLE SYSADMIN; +GRANT ROLE REGION_B_ANALYST TO ROLE SYSADMIN; + +-- ============================================================ +-- DATABASE & SCHEMA +-- ============================================================ + +CREATE DATABASE IF NOT EXISTS RAP_TEST; +CREATE SCHEMA IF NOT EXISTS RAP_TEST.PUBLIC; + +GRANT USAGE ON DATABASE RAP_TEST TO ROLE REGION_A_ANALYST; +GRANT USAGE ON DATABASE RAP_TEST TO ROLE REGION_B_ANALYST; +GRANT USAGE ON SCHEMA RAP_TEST.PUBLIC TO ROLE REGION_A_ANALYST; +GRANT USAGE ON SCHEMA RAP_TEST.PUBLIC TO ROLE REGION_B_ANALYST; + +-- ============================================================ +-- TABLES +-- ============================================================ + +USE DATABASE RAP_TEST; +USE SCHEMA PUBLIC; + +-- Dimension table: one row per sales region +CREATE OR REPLACE TABLE RAP_TEST.PUBLIC.SALES_REGIONS ( + region_id VARCHAR(10) NOT NULL, + region_name VARCHAR(50) NOT NULL, + reporting_manager VARCHAR(50) NOT NULL, + CONSTRAINT pk_regions PRIMARY KEY (region_id) +); + +-- Fact table: individual orders +CREATE OR REPLACE TABLE RAP_TEST.PUBLIC.ORDERS ( + order_id INTEGER NOT NULL, + region_id VARCHAR(10) NOT NULL, + order_date DATE NOT NULL, + amount NUMBER(10,2) NOT NULL, + CONSTRAINT pk_orders PRIMARY KEY (order_id) +); + +-- ============================================================ +-- ROW ACCESS POLICY +-- ============================================================ + +-- REGION_A_ANALYST: Northeast (R001) + Southeast (R002) only. +-- REGION_B_ANALYST: Northwest (R003) + Southwest (R004) only. +-- SYSADMIN / ACCOUNTADMIN: unrestricted access. +CREATE OR REPLACE ROW ACCESS POLICY RAP_TEST.PUBLIC.region_access_policy + AS (region_id VARCHAR) RETURNS BOOLEAN -> + CASE + WHEN CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN') THEN TRUE + WHEN CURRENT_ROLE() = 'REGION_A_ANALYST' THEN region_id IN ('R001', 'R002') + WHEN CURRENT_ROLE() = 'REGION_B_ANALYST' THEN region_id IN ('R003', 'R004') + ELSE FALSE + END; + +-- ANTI-PATTERN SETUP: Apply the RAP to the dimension table only. +-- This is the configuration that causes NULL rows — see queries.sql. +ALTER TABLE RAP_TEST.PUBLIC.SALES_REGIONS + ADD ROW ACCESS POLICY RAP_TEST.PUBLIC.region_access_policy ON (REGION_ID); + +-- ============================================================ +-- GRANTS TO ANALYST ROLES +-- ============================================================ + +GRANT SELECT ON TABLE RAP_TEST.PUBLIC.ORDERS TO ROLE REGION_A_ANALYST; +GRANT SELECT ON TABLE RAP_TEST.PUBLIC.ORDERS TO ROLE REGION_B_ANALYST; +GRANT SELECT ON TABLE RAP_TEST.PUBLIC.SALES_REGIONS TO ROLE REGION_A_ANALYST; +GRANT SELECT ON TABLE RAP_TEST.PUBLIC.SALES_REGIONS TO ROLE REGION_B_ANALYST; diff --git a/skills/semantic-view-patterns/snippets/row_access_policies/seed_data.sql b/skills/semantic-view-patterns/snippets/row_access_policies/seed_data.sql new file mode 100644 index 00000000..101bfc5b --- /dev/null +++ b/skills/semantic-view-patterns/snippets/row_access_policies/seed_data.sql @@ -0,0 +1,33 @@ +-- Row Access Policy Example: Seed Data +-- Run schema.sql first. + +USE DATABASE RAP_TEST; +USE SCHEMA PUBLIC; + +-- Four sales regions — two accessible per analyst role +INSERT INTO SALES_REGIONS VALUES + ('R001', 'Northeast', 'Alice Johnson'), + ('R002', 'Southeast', 'Bob Smith'), + ('R003', 'Northwest', 'Carol Davis'), + ('R004', 'Southwest', 'David Wilson'); + +-- Eight orders, two per region +INSERT INTO ORDERS VALUES + (1, 'R001', '2024-01-15', 500.00), + (2, 'R001', '2024-02-20', 750.00), + (3, 'R002', '2024-01-10', 300.00), + (4, 'R002', '2024-03-05', 450.00), + (5, 'R003', '2024-01-25', 600.00), + (6, 'R003', '2024-02-14', 800.00), + (7, 'R004', '2024-01-08', 250.00), + (8, 'R004', '2024-03-20', 950.00); + +-- Expected totals by region: +-- R001 Northeast $1,250 ← REGION_A_ANALYST can see +-- R002 Southeast $750 ← REGION_A_ANALYST can see +-- R003 Northwest $1,400 ← REGION_B_ANALYST can see +-- R004 Southwest $1,200 ← REGION_B_ANALYST can see +-- +-- REGION_A total: $2,000 across 4 orders +-- REGION_B total: $2,600 across 4 orders +-- Full total: $4,600 across 8 orders diff --git a/skills/semantic-view-patterns/snippets/row_access_policies/semantic_view.sql b/skills/semantic-view-patterns/snippets/row_access_policies/semantic_view.sql new file mode 100644 index 00000000..775500e6 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/row_access_policies/semantic_view.sql @@ -0,0 +1,119 @@ +-- Row Access Policy Example: Semantic View DDL +-- Run schema.sql and seed_data.sql first. +-- +-- Three objects are created here: +-- 1. SALES_BY_REGION_SV — anti-pattern: RAP on dim only → NULL row for filtered facts +-- 2. ORDERS_FILTERED — helper view (workaround 1): inner join drops unmatched fact rows +-- 3. SALES_BY_REGION_VIEW_SV — workaround 1: uses ORDERS_FILTERED as the fact entity +-- +-- Workaround 2 (apply RAP directly to the ORDERS fact table) is demonstrated +-- in queries.sql after the anti-pattern behaviour is shown. + +USE ROLE SYSADMIN; +USE DATABASE RAP_TEST; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- ANTI-PATTERN SV +-- The RAP is attached to SALES_REGIONS (the dimension). When the SV +-- engine joins ORDERS to SALES_REGIONS, filtered-out region rows produce +-- NULL dimension values for the orphaned ORDERS rows. +-- ============================================================ + +CREATE OR REPLACE SEMANTIC VIEW RAP_TEST.PUBLIC.SALES_BY_REGION_SV + + TABLES ( + orders AS RAP_TEST.PUBLIC.ORDERS + PRIMARY KEY (ORDER_ID), + + regions AS RAP_TEST.PUBLIC.SALES_REGIONS + PRIMARY KEY (REGION_ID) + ) + + RELATIONSHIPS ( + orders_to_regions AS orders(REGION_ID) REFERENCES regions + ) + + FACTS ( + orders.revenue AS AMOUNT + COMMENT = 'Order amount in USD' + ) + + DIMENSIONS ( + orders.order_date AS ORDER_DATE, + regions.region_name AS REGION_NAME, + regions.reporting_manager AS REPORTING_MANAGER + ) + + METRICS ( + orders.total_revenue AS SUM(AMOUNT) + WITH SYNONYMS ('revenue', 'sales', 'total sales') + COMMENT = 'Sum of order amounts in USD', + + orders.order_count AS COUNT(ORDER_ID) + WITH SYNONYMS ('orders', 'number of orders', 'order volume') + COMMENT = 'Number of orders' + ) + + COMMENT = 'Anti-pattern: RAP on dimension only. Query as REGION_A_ANALYST to see the NULL-row problem.'; + +GRANT SELECT ON SEMANTIC VIEW RAP_TEST.PUBLIC.SALES_BY_REGION_SV TO ROLE REGION_A_ANALYST; +GRANT SELECT ON SEMANTIC VIEW RAP_TEST.PUBLIC.SALES_BY_REGION_SV TO ROLE REGION_B_ANALYST; + + +-- ============================================================ +-- WORKAROUND 1: HELPER VIEW WITH INNER JOIN +-- The view pre-filters ORDERS by inner-joining to SALES_REGIONS. +-- When the RAP hides a region row, the INNER JOIN also drops the +-- corresponding ORDERS rows — no orphaned facts, no NULL dimension rows. +-- ============================================================ + +CREATE OR REPLACE VIEW RAP_TEST.PUBLIC.ORDERS_FILTERED AS + SELECT o.order_id, o.region_id, o.order_date, o.amount + FROM RAP_TEST.PUBLIC.ORDERS o + INNER JOIN RAP_TEST.PUBLIC.SALES_REGIONS r ON o.region_id = r.region_id; + +GRANT SELECT ON VIEW RAP_TEST.PUBLIC.ORDERS_FILTERED TO ROLE REGION_A_ANALYST; +GRANT SELECT ON VIEW RAP_TEST.PUBLIC.ORDERS_FILTERED TO ROLE REGION_B_ANALYST; + +-- Structurally identical to SALES_BY_REGION_SV, but the fact entity is +-- ORDERS_FILTERED (the pre-filtered view) instead of raw ORDERS. +CREATE OR REPLACE SEMANTIC VIEW RAP_TEST.PUBLIC.SALES_BY_REGION_VIEW_SV + + TABLES ( + orders AS RAP_TEST.PUBLIC.ORDERS_FILTERED + PRIMARY KEY (ORDER_ID), + + regions AS RAP_TEST.PUBLIC.SALES_REGIONS + PRIMARY KEY (REGION_ID) + ) + + RELATIONSHIPS ( + orders_to_regions AS orders(REGION_ID) REFERENCES regions + ) + + FACTS ( + orders.revenue AS AMOUNT + COMMENT = 'Order amount in USD' + ) + + DIMENSIONS ( + orders.order_date AS ORDER_DATE, + regions.region_name AS REGION_NAME, + regions.reporting_manager AS REPORTING_MANAGER + ) + + METRICS ( + orders.total_revenue AS SUM(AMOUNT) + WITH SYNONYMS ('revenue', 'sales', 'total sales') + COMMENT = 'Sum of order amounts in USD', + + orders.order_count AS COUNT(ORDER_ID) + WITH SYNONYMS ('orders', 'number of orders', 'order volume') + COMMENT = 'Number of orders' + ) + + COMMENT = 'Workaround 1: fact entity is a helper view that inner-joins to the dimension, eliminating NULL rows.'; + +GRANT SELECT ON SEMANTIC VIEW RAP_TEST.PUBLIC.SALES_BY_REGION_VIEW_SV TO ROLE REGION_A_ANALYST; +GRANT SELECT ON SEMANTIC VIEW RAP_TEST.PUBLIC.SALES_BY_REGION_VIEW_SV TO ROLE REGION_B_ANALYST; diff --git a/skills/semantic-view-patterns/snippets/row_access_policies/semantic_view.yaml b/skills/semantic-view-patterns/snippets/row_access_policies/semantic_view.yaml new file mode 100644 index 00000000..32c0a306 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/row_access_policies/semantic_view.yaml @@ -0,0 +1,78 @@ +# Row Access Policies: Semantic View YAML +# +# ⚠️ ROW ACCESS POLICIES ARE DDL-ONLY: +# RAPs are attached to tables via ALTER TABLE ... ADD ROW ACCESS POLICY, +# which cannot be expressed in a YAML SV specification. The YAML defines +# the SV structure only; RAP setup must be done via DDL (semantic_view.sql). +# +# This YAML defines both the anti-pattern SV and the workaround SV. +# The workaround (using a helper view with INNER JOIN) can be expressed +# in YAML once the helper view is created as a physical object. +# +# Pre-requisite (run before deploying workaround YAML): +# CREATE OR REPLACE VIEW RAP_TEST.PUBLIC.ORDERS_FILTERED AS +# SELECT o.order_id, o.region_id, o.order_date, o.amount +# FROM RAP_TEST.PUBLIC.ORDERS o +# INNER JOIN RAP_TEST.PUBLIC.SALES_REGIONS r ON o.region_id = r.region_id; +# +# Deploy anti-pattern SV: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('RAP_TEST.PUBLIC', $$ $$); + +# ── ANTI-PATTERN SV ────────────────────────────────────────────────────────── +name: SALES_BY_REGION_SV +description: > + Anti-pattern: RAP on dimension only. Query as REGION_A_ANALYST to see the + NULL-row problem — filtered region rows produce NULL dimension values for + orphaned ORDERS rows. + +tables: + - name: orders + description: Order transactions + base_table: + database: RAP_TEST + schema: PUBLIC + table: ORDERS + primary_key: + columns: [ORDER_ID] + dimensions: + - name: order_date + expr: ORDER_DATE + data_type: DATE + facts: + - name: revenue + description: Order amount in USD + expr: AMOUNT + data_type: NUMBER + metrics: + - name: total_revenue + synonyms: [revenue, sales, total sales] + description: Sum of order amounts in USD + expr: SUM(revenue) + - name: order_count + synonyms: [orders, number of orders, order volume] + description: Number of orders + expr: COUNT(ORDER_ID) + + - name: regions + description: Sales regions (has RAP attached) + base_table: + database: RAP_TEST + schema: PUBLIC + table: SALES_REGIONS + primary_key: + columns: [REGION_ID] + dimensions: + - name: region_name + expr: REGION_NAME + data_type: VARCHAR + - name: reporting_manager + expr: REPORTING_MANAGER + data_type: VARCHAR + +relationships: + - name: orders_to_regions + left_table: orders + right_table: regions + relationship_columns: + - left_column: REGION_ID + right_column: REGION_ID diff --git a/skills/semantic-view-patterns/snippets/scoped_dataset/README.md b/skills/semantic-view-patterns/snippets/scoped_dataset/README.md new file mode 100644 index 00000000..69436e76 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/scoped_dataset/README.md @@ -0,0 +1,80 @@ +# Scoped Dataset (SQL Query as Logical Table) + +> ⚠️ **Private Preview feature** — available only to selected accounts. Contact your Snowflake account team to enable. + +## The Problem + +Your source table contains data for multiple lines of business, regions, or tenants. You want to create a Semantic View that is **pre-scoped** to one subset — so that users of the SV only see data for their specific LOB/region, without adding a WHERE clause to every query. No intermediate view or additional table required. + +A related use case: **pre-joining two tables** into a single logical entity inside the SV, so downstream consumers see it as one flat entity. + +## How You Might Express This Need + +- "I have a single `sales_transactions` table with a `lob` column. I want one SV for 'Retail' and one for 'Enterprise' — without creating separate physical tables." +- "My orders and order_items tables should look like one entity to analysts — join them inline before exposing." +- "Don't show SMB data in the EMEA team's SV." + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | `CREATE VIEW retail_sales AS SELECT * FROM sales WHERE lob = 'Retail'` | +| **LookML** | `sql_table_name: (SELECT * FROM sales WHERE lob = 'Retail') ;;` in a view | +| **dbt** | Separate model with `WHERE` filter | +| **Power BI** | Row-level security or filtered Import table | +| **Tableau** | Custom SQL in the data source definition, or Data Source Filters applied at connect time. | + +## The SV Approach + +Use `AS (SELECT ...)` in the TABLES clause: + +```sql +CREATE SEMANTIC VIEW retail_orders_sv +TABLES ( + -- The alias 'orders' is required when using AS (...) + orders AS ( + SELECT * FROM sales_transactions WHERE lob = 'Retail' + ) PRIMARY KEY (transaction_id) +) +... +``` + +You can also **pre-join tables** into a single logical entity: +```sql +customer_info AS ( + SELECT * FROM customers JOIN addresses + ON customers.id = addresses.customer_id +) PRIMARY KEY (id) +``` + +## Key Rules (from docs) + +- The alias for the logical table is **required** when using `AS (...)` +- Session variables (`$var`) cannot be used in the inline query +- Same limitations as `CREATE VIEW` — no DDL, no DML, no transactions +- The filter is embedded in the SV DDL — changing it requires `CREATE OR REPLACE SEMANTIC VIEW` +- `DESCRIBE SEMANTIC VIEW` shows the inline query in a `DEFINITION` property (not `BASE_TABLE_NAME`) + +## Two SVs from One Table Pattern + +A powerful use: create **separate SVs for each LOB** from a single source table. Each SV has its own metrics and dimension scoping: + +``` +sales_transactions (all LOBs) + ↓ ↓ +retail_sv enterprise_sv +(lob='Retail') (lob='Enterprise') +``` + +## Docs + +- [Using an SQL query as a logical table in a semantic view ⚠️ Private Preview](https://docs.snowflake.com/en/LIMITEDACCESS/semantic-views-inline-view) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `sales_transactions` table with `lob` column | +| `seed_data.sql` | Transactions across Retail, Enterprise, and SMB LOBs | +| `semantic_view.sql` | Two separate SVs scoped by LOB + a join-inline example | +| `queries.sql` | Queries against each scoped SV; DESCRIBE to verify inline filter | diff --git a/skills/semantic-view-patterns/snippets/scoped_dataset/queries.sql b/skills/semantic-view-patterns/snippets/scoped_dataset/queries.sql new file mode 100644 index 00000000..0d276dc7 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/scoped_dataset/queries.sql @@ -0,0 +1,81 @@ +-- Scoped Dataset: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- QUERIES ON ENTERPRISE SV (scoped by lob='Enterprise') +-- Only the 4 Enterprise transactions are visible. +-- ============================================================ + +-- 1. Total revenue for Enterprise — the lob filter is baked in +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ENTERPRISE_ORDERS_SV + METRICS ent_orders.total_revenue +); +-- Expected: 5000 + 3200 + 7500 + 4100 = $19,800 + + +-- 2. Revenue by region (Enterprise only) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ENTERPRISE_ORDERS_SV + DIMENSIONS ent_orders.region + METRICS ent_orders.total_revenue +) +ORDER BY total_revenue DESC; +-- Expected: East=$11,600, West=$8,200 + + +-- 3. Customer name + revenue (uses the joined-in customer data) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ENTERPRISE_ORDERS_SV + DIMENSIONS ent_orders.customer_name + METRICS ent_orders.total_revenue +) +ORDER BY total_revenue DESC; + + +-- ============================================================ +-- QUERIES ON RETAIL SV (scoped by lob='Retail') +-- Only 4 Retail transactions are visible. +-- ============================================================ + +-- 4. Retail revenue total +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.RETAIL_ORDERS_SV + METRICS retail_orders.total_revenue +); +-- Expected: 120 + 95 + 210 + 175 = $600 + + +-- 5. Retail revenue by region +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.RETAIL_ORDERS_SV + DIMENSIONS retail_orders.region + METRICS retail_orders.total_revenue +) +ORDER BY region; + + +-- ============================================================ +-- VERIFY THE INLINE FILTER VIA DESCRIBE +-- ============================================================ + +-- DESCRIBE shows the DEFINITION property (the inline SQL) instead of +-- BASE_TABLE_NAME — confirming the filter is embedded in the SV DDL. +DESCRIBE SEMANTIC VIEW SNIPPETS.PUBLIC.ENTERPRISE_ORDERS_SV; +-- Look for: object_kind=TABLE, property=DEFINITION, +-- property_value=SELECT * FROM sales_transactions WHERE lob = 'Enterprise' + +DESCRIBE SEMANTIC VIEW SNIPPETS.PUBLIC.RETAIL_ORDERS_SV; + + +-- ============================================================ +-- THE ALTERNATIVE WITHOUT THIS PATTERN (for comparison) +-- ============================================================ +-- Without inline dataset, you'd need to: +-- 1. CREATE VIEW enterprise_sales AS SELECT * FROM sales_transactions WHERE lob = 'Enterprise'; +-- 2. Reference enterprise_sales in the SV's TABLES clause +-- +-- The inline approach avoids creating an intermediate view object, +-- keeps the filter co-located with the SV definition, and simplifies governance. diff --git a/skills/semantic-view-patterns/snippets/scoped_dataset/schema.sql b/skills/semantic-view-patterns/snippets/scoped_dataset/schema.sql new file mode 100644 index 00000000..10f3f7aa --- /dev/null +++ b/skills/semantic-view-patterns/snippets/scoped_dataset/schema.sql @@ -0,0 +1,25 @@ +-- Scoped Dataset: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- Single source table with data across all lines of business +CREATE OR REPLACE TABLE sales_transactions ( + transaction_id INTEGER NOT NULL, + customer_id INTEGER NOT NULL, + order_date DATE NOT NULL, + amount NUMBER(10,2) NOT NULL, + region VARCHAR(20) NOT NULL, + lob VARCHAR(20) NOT NULL -- 'Retail', 'Enterprise', 'SMB' +); + +-- Customer table used for the join-inline example +CREATE OR REPLACE TABLE lob_customers ( + customer_id INTEGER NOT NULL, + customer_name VARCHAR(50) NOT NULL, + tier VARCHAR(20) NOT NULL, + lob VARCHAR(20) NOT NULL, + CONSTRAINT pk_lob_customers PRIMARY KEY (customer_id) +); diff --git a/skills/semantic-view-patterns/snippets/scoped_dataset/seed_data.sql b/skills/semantic-view-patterns/snippets/scoped_dataset/seed_data.sql new file mode 100644 index 00000000..a203c6e9 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/scoped_dataset/seed_data.sql @@ -0,0 +1,29 @@ +-- Scoped Dataset: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO lob_customers VALUES + (1, 'Acme Corp', 'Enterprise', 'Enterprise'), + (2, 'TechStart LLC', 'Enterprise', 'Enterprise'), + (3, 'Corner Store', 'SMB', 'SMB'), + (4, 'Main St Deli', 'SMB', 'SMB'), + (5, 'City Dept Store','Standard', 'Retail'), + (6, 'Fashion Hub', 'Standard', 'Retail'); + +INSERT INTO sales_transactions VALUES + -- Enterprise LOB + (1, 1, '2024-01-10', 5000, 'West', 'Enterprise'), + (2, 1, '2024-02-15', 3200, 'West', 'Enterprise'), + (3, 2, '2024-01-20', 7500, 'East', 'Enterprise'), + (4, 2, '2024-03-08', 4100, 'East', 'Enterprise'), + -- SMB LOB + (5, 3, '2024-01-12', 450, 'North', 'SMB'), + (6, 3, '2024-02-22', 320, 'North', 'SMB'), + (7, 4, '2024-01-30', 180, 'South', 'SMB'), + (8, 4, '2024-03-15', 250, 'South', 'SMB'), + -- Retail LOB + (9, 5, '2024-01-05', 120, 'West', 'Retail'), + (10, 5, '2024-02-10', 95, 'West', 'Retail'), + (11, 6, '2024-01-25', 210, 'East', 'Retail'), + (12, 6, '2024-03-20', 175, 'East', 'Retail'); diff --git a/skills/semantic-view-patterns/snippets/scoped_dataset/semantic_view.sql b/skills/semantic-view-patterns/snippets/scoped_dataset/semantic_view.sql new file mode 100644 index 00000000..c166b2cb --- /dev/null +++ b/skills/semantic-view-patterns/snippets/scoped_dataset/semantic_view.sql @@ -0,0 +1,91 @@ +-- Scoped Dataset: Semantic View DDL +-- Two SVs from one source table, each scoped to a different LOB + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- PATTERN 1: Two separate SVs, each scoped to one LOB +-- ============================================================ + +-- Enterprise SV — only sees transactions where lob = 'Enterprise' +-- The alias 'ent_orders' is REQUIRED when using AS (...) +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.ENTERPRISE_ORDERS_SV +TABLES ( + ent_orders AS ( + -- This filter is embedded permanently in the SV. + -- Enterprise SV users never see SMB or Retail rows. + SELECT + t.transaction_id, + t.customer_id, + t.order_date, + t.amount, + t.region, + c.customer_name, + c.tier + FROM sales_transactions t + JOIN lob_customers c ON t.customer_id = c.customer_id + WHERE t.lob = 'Enterprise' + ) PRIMARY KEY (transaction_id) +) +DIMENSIONS ( + ent_orders.region AS region + WITH SYNONYMS ('region', 'geo'), + ent_orders.customer_name AS customer_name + WITH SYNONYMS ('customer', 'account'), + ent_orders.order_month AS DATE_TRUNC('month', ent_orders.order_date) + WITH SYNONYMS ('month', 'order month') +) +METRICS ( + ent_orders.total_revenue AS SUM(amount) + WITH SYNONYMS ('revenue', 'enterprise revenue'), + ent_orders.deal_count AS COUNT(transaction_id) + WITH SYNONYMS ('deals', 'transactions') +) +COMMENT = 'Enterprise LOB only. Inline SQL filter (lob=''Enterprise'') + join to customer table embedded in TABLES clause. Pre-scoped — no WHERE needed in queries.'; + + +-- Retail SV — only sees transactions where lob = 'Retail' +-- Notice: different entity alias (retail_orders), different synonyms +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.RETAIL_ORDERS_SV +TABLES ( + retail_orders AS ( + SELECT * FROM sales_transactions + WHERE lob = 'Retail' + ) PRIMARY KEY (transaction_id) +) +DIMENSIONS ( + retail_orders.region AS region + WITH SYNONYMS ('region', 'store region'), + retail_orders.order_month AS DATE_TRUNC('month', retail_orders.order_date) + WITH SYNONYMS ('month') +) +METRICS ( + retail_orders.total_revenue AS SUM(amount) + WITH SYNONYMS ('revenue', 'retail revenue', 'store revenue'), + retail_orders.transaction_count AS COUNT(transaction_id) + WITH SYNONYMS ('transactions', 'purchases') +) +COMMENT = 'Retail LOB only. Filtered via inline SQL in TABLES clause.'; + + +-- ============================================================ +-- PATTERN 2: Pre-join two tables into one logical entity +-- The SV consumer sees 'customer_info' as a single flat entity +-- combining both customer and address columns. +-- ============================================================ +-- CREATE OR REPLACE SEMANTIC VIEW my_sv +-- TABLES ( +-- customer_info AS ( +-- SELECT c.*, a.zipcode, a.street_addr +-- FROM customers c +-- JOIN addresses a ON c.customer_id = a.customer_id +-- ) PRIMARY KEY (customer_id) WITH SYNONYMS ('customer') +-- ) +-- DIMENSIONS ( +-- customer_info.customer_name AS customer_name, +-- customer_info.zipcode AS zip +-- ) +-- METRICS ( +-- ... +-- ); diff --git a/skills/semantic-view-patterns/snippets/scoped_dataset/semantic_view.yaml b/skills/semantic-view-patterns/snippets/scoped_dataset/semantic_view.yaml new file mode 100644 index 00000000..59eaacce --- /dev/null +++ b/skills/semantic-view-patterns/snippets/scoped_dataset/semantic_view.yaml @@ -0,0 +1,60 @@ +# Scoped Dataset: Semantic View YAML +# +# ⚠️ INLINE SQL SUBQUERIES IN TABLES NOT SUPPORTED IN YAML: +# The core pattern of this snippet — embedding a SQL filter in the TABLES clause — +# is DDL-only. YAML base_table must reference a physical table or view. +# +# Workaround: create helper views that apply the filters, then reference those +# views as base_table entries. This YAML demonstrates that approach. +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Pre-requisite helper views (run before deploying YAML): +# CREATE OR REPLACE VIEW TARGET_DB.TARGET_SCHEMA.ENT_ORDERS_VIEW AS +# SELECT t.*, c.customer_name, c.tier +# FROM TARGET_DB.TARGET_SCHEMA.SALES_TRANSACTIONS t +# JOIN TARGET_DB.TARGET_SCHEMA.LOB_CUSTOMERS c ON t.customer_id = c.customer_id +# WHERE t.lob = 'Enterprise'; +# +# CREATE OR REPLACE VIEW TARGET_DB.TARGET_SCHEMA.RETAIL_ORDERS_VIEW AS +# SELECT * FROM TARGET_DB.TARGET_SCHEMA.SALES_TRANSACTIONS +# WHERE lob = 'Retail'; + +name: ENTERPRISE_ORDERS_SV +description: > + Enterprise LOB only — scoped via helper view (WHERE lob='Enterprise'). + In DDL, the filter is embedded inline in the TABLES clause; in YAML it + requires a pre-created helper view as the base_table. + +tables: + - name: ent_orders + description: > + Enterprise orders — pre-filtered helper view. + Equivalent to the inline SQL: SELECT ... WHERE lob = 'Enterprise'. + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: ENT_ORDERS_VIEW + primary_key: + columns: [TRANSACTION_ID] + dimensions: + - name: region + synonyms: [region, geo] + expr: REGION + data_type: VARCHAR + - name: customer_name + synonyms: [customer, account] + expr: CUSTOMER_NAME + data_type: VARCHAR + - name: order_month + synonyms: [month, order month] + expr: DATE_TRUNC('month', ORDER_DATE) + data_type: DATE + metrics: + - name: total_revenue + synonyms: [revenue, enterprise revenue] + expr: SUM(AMOUNT) + - name: deal_count + synonyms: [deals, transactions] + expr: COUNT(TRANSACTION_ID) diff --git a/skills/semantic-view-patterns/snippets/semi_additive_metric/README.md b/skills/semantic-view-patterns/snippets/semi_additive_metric/README.md new file mode 100644 index 00000000..e3c5d1c4 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/semi_additive_metric/README.md @@ -0,0 +1,76 @@ +# Semi-Additive Metric + +## The Problem + +Your fact table represents a **snapshot in time** — each row records a value at a specific point (e.g., account balance at end of day, headcount at end of month, inventory on hand at midnight). + +Summing these snapshots **across time is mathematically wrong** — it double-counts. A balance of $1,000 on Monday and $1,000 on Tuesday is still $1,000, not $2,000. But summing across accounts on the *same* date is fine. + +This kind of measure is called **semi-additive**: additive across some dimensions (accounts, regions), non-additive across others (time). + +## How You Might Express This Need + +- "What is the total account balance?" (wants a point-in-time sum across accounts) +- "Show me average daily balance by account over the last quarter" +- "How many employees did we have at the end of each month?" +- "What was our inventory level on hand last Friday?" +- "My numbers are way too high — I think I'm double-counting across dates" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **Power BI / DAX** | `LASTNONBLANK`, `FIRSTNONBLANK`, `AVERAGEX` — semi-additive measures are a first-class concept | +| **SSAS / Tabular** | `FirstChild`, `LastChild`, `AverageOfChildren` measure types | +| **LookML** | `type: sum` with a date filter in the measure, or `type: average` — no native semi-additive guard | +| **dbt** | No native concept; requires window functions (`LAST_VALUE`, `SUM OVER`) in the model | +| **Raw SQL** | `WHERE balance_date = CURRENT_DATE` for point-in-time; `AVG(balance_usd)` for trend | +| **Tableau** | Fixed LOD for point-in-time: `{ FIXED [Date]: SUM([Balance]) }`; `WINDOW_AVG` table calculation for trends. Requires careful scope management to avoid double-counting. | + +Snowflake Semantic Views handle this with `NON ADDITIVE BY` — it marks a metric as non-aggregatable across a specific time dimension, preventing accidental cross-date summing. + +## The SV Approach + +Define **two separate metrics** with non-overlapping synonyms — one for point-in-time, one for averages: + +```sql +METRICS ( + -- NON ADDITIVE BY prevents summing across balance_date + -- Use this when you want totals at a specific point in time + balances.total_balance NON ADDITIVE BY (balance_date) AS SUM(BALANCE_USD) + WITH SYNONYMS ('current balance', 'balance as of date', 'snapshot balance'), + + -- Use this when you want trends or averages across time + balances.avg_daily_balance AS AVG(BALANCE_USD) + WITH SYNONYMS ('average balance', 'average daily balance', 'mean balance over time') +) +``` + +### What `NON ADDITIVE BY` Does + +When you include `balance_date` as a dimension in your query, `total_balance` correctly sums across accounts for that date. When you *don't* include `balance_date`, the SV engine refuses to sum across all dates — instead it returns the metric grouped by date internally (you'll see date-level values, not a single cross-date total). + +### Why You Need Two Metrics (Not One) + +You **cannot** apply `AVG()` to a `NON ADDITIVE` metric. They are separate operations on the underlying fact, not composable. Define: +- `total_balance NON ADDITIVE BY (balance_date) AS SUM(BALANCE_USD)` for "what is the total right now" +- `avg_daily_balance AS AVG(BALANCE_USD)` for "what is the typical balance over a period" + +### Synonym Discipline + +If both metrics mention "balance", the AI may pick the wrong one. Make synonyms explicitly intent-oriented: +- `total_balance`: *"current balance", "snapshot balance", "balance as of a date", "end of day balance"* +- `avg_daily_balance`: *"average balance", "mean balance", "typical balance", "balance trend"* + +## Docs + +- [Identifying the dimensions that should be non-additive for a metric](https://docs.snowflake.com/en/user-guide/views-semantic/sql#identifying-the-dimensions-that-should-be-non-additive-for-a-metric) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `ACCOUNT_BALANCES` snapshot table DDL | +| `seed_data.sql` | 3 accounts × 5 daily snapshots (15 rows) | +| `semantic_view.sql` | SV with both `NON ADDITIVE BY` and `AVG` metrics | +| `queries.sql` | Correct point-in-time and trend queries + the double-counting mistake | diff --git a/skills/semantic-view-patterns/snippets/semi_additive_metric/queries.sql b/skills/semantic-view-patterns/snippets/semi_additive_metric/queries.sql new file mode 100644 index 00000000..071806eb --- /dev/null +++ b/skills/semantic-view-patterns/snippets/semi_additive_metric/queries.sql @@ -0,0 +1,117 @@ +-- Semi-Additive Metric Example: Queries +-- Run schema.sql, seed_data.sql, and semantic_view.sql first. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Total balance across all accounts as of a specific date +-- Include balance_date as a dimension so NON ADDITIVE BY sums correctly. +-- +-- Expected: 2024-05-31 → $7,500.00 (1250 + 5300 + 950) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ACCOUNT_BALANCES_SV + DIMENSIONS balances.balance_date + METRICS balances.total_balance + WHERE balances.balance_date = '2024-05-31' +); + + +-- 2. Balance by account as of a specific date +-- +-- Expected (2024-05-31): +-- A001 Checking Account $1,250 +-- A002 Business Reserve $5,300 +-- A003 Savings Account $ 950 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ACCOUNT_BALANCES_SV + DIMENSIONS balances.account_id, balances.account_name, balances.balance_date + METRICS balances.total_balance + WHERE balances.balance_date = '2024-05-31' +) +ORDER BY total_balance DESC; + + +-- 3. Total balance per date (all months — shows the portfolio growing over time) +-- +-- Expected: +-- 2024-01-31 $6,800 +-- 2024-02-29 $6,900 +-- 2024-03-31 $7,150 +-- 2024-04-30 $7,400 +-- 2024-05-31 $7,500 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ACCOUNT_BALANCES_SV + DIMENSIONS balances.balance_date + METRICS balances.total_balance +) +ORDER BY balance_date; + + +-- 4. Average monthly balance per account (trend analysis) +-- Uses avg_daily_balance — correct for this question. +-- +-- Expected: +-- A002 Business Reserve $5,080 +-- A001 Checking Account $1,170 +-- A003 Savings Account $ 900 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ACCOUNT_BALANCES_SV + DIMENSIONS balances.account_id, balances.account_name + METRICS balances.avg_daily_balance +) +ORDER BY avg_daily_balance DESC; + + +-- 5. Average monthly balance over Q1 only +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.ACCOUNT_BALANCES_SV + DIMENSIONS balances.account_id + METRICS balances.avg_daily_balance + WHERE balances.balance_date BETWEEN '2024-01-01' AND '2024-03-31' +) +ORDER BY account_id; + + +-- ============================================================ +-- THE MISTAKE THIS PATTERN PREVENTS +-- ============================================================ + +-- WRONG: Naive SUM across all rows +-- Counts each account's balance once per snapshot date → 5× overcount. +-- A $5,000 balance that existed for 5 months looks like $25,000. +-- +-- Raw SQL that gives the wrong answer: +SELECT + account_id, + SUM(balance_usd) AS wrong_total -- $5,000 × 5 months = $25,000 for A002 +FROM SNIPPETS.PUBLIC.ACCOUNT_BALANCES +GROUP BY 1 +ORDER BY 2 DESC; +-- A002 shows $25,400 instead of the correct $5,300 (latest) or $5,080 (average) + + +-- ALSO WRONG: Grand total across all rows +SELECT SUM(balance_usd) AS wrong_grand_total FROM SNIPPETS.PUBLIC.ACCOUNT_BALANCES; +-- Returns $35,750 — the sum of all 15 snapshot rows. +-- Correct answers: $7,500 (latest point-in-time) or ~$4,383 (average per account per month) + + +-- ============================================================ +-- WHAT DOESN'T WORK IN SEMANTIC_VIEW() +-- ============================================================ + +-- NOTE: Querying total_balance WITHOUT a balance_date dimension or filter +-- will not produce a meaningful single total. +-- The NON ADDITIVE BY clause causes the engine to return per-date subtotals +-- rather than a collapsed grand total, preventing silent overcounting. +-- +-- If you want a single scalar total, always filter: WHERE balance_date = '2024-05-31' +-- If you want a collapsed average, use avg_daily_balance instead. + +-- ALSO NOTE: You cannot use AVG(total_balance) — total_balance is already an +-- aggregated metric. To get the average, use the dedicated avg_daily_balance metric. +-- Attempting to nest metric references is not supported. diff --git a/skills/semantic-view-patterns/snippets/semi_additive_metric/schema.sql b/skills/semantic-view-patterns/snippets/semi_additive_metric/schema.sql new file mode 100644 index 00000000..77e08bf8 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/semi_additive_metric/schema.sql @@ -0,0 +1,19 @@ +-- Semi-Additive Metric Example: Schema +-- Target: SNIPPETS.PUBLIC (replace with your database/schema) +-- +-- Scenario: Daily account balance snapshots for a small portfolio of accounts. +-- Each row represents the balance at end-of-day for one account on one date. + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE ACCOUNT_BALANCES ( + BALANCE_ID INTEGER NOT NULL, -- surrogate key + ACCOUNT_ID VARCHAR(10) NOT NULL, + ACCOUNT_NAME VARCHAR(50) NOT NULL, + BALANCE_DATE DATE NOT NULL, + BALANCE_USD NUMBER(12, 2) NOT NULL -- end-of-day balance +); diff --git a/skills/semantic-view-patterns/snippets/semi_additive_metric/seed_data.sql b/skills/semantic-view-patterns/snippets/semi_additive_metric/seed_data.sql new file mode 100644 index 00000000..43929a6b --- /dev/null +++ b/skills/semantic-view-patterns/snippets/semi_additive_metric/seed_data.sql @@ -0,0 +1,38 @@ +-- Semi-Additive Metric Example: Seed Data +-- Run schema.sql first. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- 3 accounts, 5 daily snapshots each (15 rows total) +-- +-- Point-in-time totals (sum across accounts per date): +-- 2024-01-31: 1000 + 5000 + 800 = 6,800 +-- 2024-02-29: 1200 + 4800 + 900 = 6,900 +-- 2024-03-31: 1100 + 5200 + 850 = 7,150 +-- 2024-04-30: 1300 + 5100 + 1000 = 7,400 +-- 2024-05-31: 1250 + 5300 + 950 = 7,500 +-- +-- WRONG: naive SUM across all rows = 6,800+6,900+7,150+7,400+7,500 = 35,750 +-- (5x overcounted — every balance counted once per month instead of once total) +-- +-- AVG balance per account (across months): +-- A001: (1000+1200+1100+1300+1250) / 5 = 1,170 +-- A002: (5000+4800+5200+5100+5300) / 5 = 5,080 +-- A003: ( 800+ 900+ 850+1000+ 950) / 5 = 900 +INSERT INTO ACCOUNT_BALANCES VALUES + ( 1, 'A001', 'Checking Account', '2024-01-31', 1000.00), + ( 2, 'A001', 'Checking Account', '2024-02-29', 1200.00), + ( 3, 'A001', 'Checking Account', '2024-03-31', 1100.00), + ( 4, 'A001', 'Checking Account', '2024-04-30', 1300.00), + ( 5, 'A001', 'Checking Account', '2024-05-31', 1250.00), + ( 6, 'A002', 'Business Reserve', '2024-01-31', 5000.00), + ( 7, 'A002', 'Business Reserve', '2024-02-29', 4800.00), + ( 8, 'A002', 'Business Reserve', '2024-03-31', 5200.00), + ( 9, 'A002', 'Business Reserve', '2024-04-30', 5100.00), + (10, 'A002', 'Business Reserve', '2024-05-31', 5300.00), + (11, 'A003', 'Savings Account', '2024-01-31', 800.00), + (12, 'A003', 'Savings Account', '2024-02-29', 900.00), + (13, 'A003', 'Savings Account', '2024-03-31', 850.00), + (14, 'A003', 'Savings Account', '2024-04-30', 1000.00), + (15, 'A003', 'Savings Account', '2024-05-31', 950.00); diff --git a/skills/semantic-view-patterns/snippets/semi_additive_metric/semantic_view.sql b/skills/semantic-view-patterns/snippets/semi_additive_metric/semantic_view.sql new file mode 100644 index 00000000..db79763e --- /dev/null +++ b/skills/semantic-view-patterns/snippets/semi_additive_metric/semantic_view.sql @@ -0,0 +1,54 @@ +-- Semi-Additive Metric Example: Semantic View DDL +-- Run schema.sql and seed_data.sql first. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.ACCOUNT_BALANCES_SV + + TABLES ( + balances AS SNIPPETS.PUBLIC.ACCOUNT_BALANCES + PRIMARY KEY (BALANCE_ID) + ) + + FACTS ( + balances.balance_usd AS BALANCE_USD + COMMENT = 'End-of-day account balance in USD' + ) + + DIMENSIONS ( + balances.account_id AS ACCOUNT_ID + WITH SYNONYMS ('account', 'account number') + COMMENT = 'Account identifier', + + balances.account_name AS ACCOUNT_NAME + WITH SYNONYMS ('account label', 'account description'), + + balances.balance_date AS BALANCE_DATE + WITH SYNONYMS ('date', 'snapshot date', 'as of date', 'month end') + COMMENT = 'The date this balance snapshot was recorded (end of day)' + ) + + METRICS ( + -- Point-in-time balance: additive across accounts, NOT across dates. + -- NON ADDITIVE BY (balance_date) prevents summing across time periods. + -- Use this when you want: "total balance as of [date]" or "balance by account on [date]" + balances.total_balance NON ADDITIVE BY (balance_date) AS SUM(BALANCE_USD) + WITH SYNONYMS ('current balance', 'balance as of date', 'snapshot balance', + 'end of day balance', 'point in time balance', 'balance on hand') + COMMENT = 'Sum of balances across accounts for a given date. Non-additive across time — always filter or group by balance_date to get a meaningful total.', + + -- Average balance over time: use for trend analysis, not point-in-time. + -- Use this when you want: "average monthly balance over Q1" or "mean balance by account" + balances.avg_daily_balance AS AVG(BALANCE_USD) + WITH SYNONYMS ('average balance', 'average daily balance', 'mean balance', + 'typical balance', 'balance trend', 'average monthly balance') + COMMENT = 'Average balance across snapshot periods. Use for trend analysis, not point-in-time reporting.' + ) + + COMMENT = 'Daily account balance snapshots. Balances are semi-additive: sum across accounts is valid; sum across time periods double-counts. Use total_balance for point-in-time; avg_daily_balance for trends.' + + AI_SQL_GENERATION 'IMPORTANT: ACCOUNT_BALANCES is a snapshot table — each row is a balance at a point in time, not a transaction. +- For point-in-time totals: use total_balance with balance_date as a dimension or filter. Summing total_balance across all dates is meaningless (double-counts). +- For trends over time: use avg_daily_balance — it averages across the snapshot dates. +- Never use total_balance without a balance_date dimension or WHERE filter on balance_date.'; diff --git a/skills/semantic-view-patterns/snippets/semi_additive_metric/semantic_view.yaml b/skills/semantic-view-patterns/snippets/semi_additive_metric/semantic_view.yaml new file mode 100644 index 00000000..31c9cf9a --- /dev/null +++ b/skills/semantic-view-patterns/snippets/semi_additive_metric/semantic_view.yaml @@ -0,0 +1,78 @@ +# Semi-Additive Metric: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# DDL-only features not in YAML: +# - AI_SQL_GENERATION → module_custom_instructions: sql_generation: in YAML +# - NON ADDITIVE BY uses the YAML equivalent: non_additive_dimensions + +name: ACCOUNT_BALANCES_SV +description: > + Daily account balance snapshots. Balances are semi-additive: sum across + accounts is valid; sum across time periods double-counts. Use total_balance + for point-in-time; avg_daily_balance for trends. + +tables: + - name: balances + description: End-of-day account balance snapshots + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: ACCOUNT_BALANCES + primary_key: + columns: [BALANCE_ID] + dimensions: + - name: account_id + synonyms: [account, account number] + description: Account identifier + expr: ACCOUNT_ID + data_type: NUMBER + - name: account_name + synonyms: [account label, account description] + expr: ACCOUNT_NAME + data_type: VARCHAR + - name: balance_date + synonyms: [date, snapshot date, as of date, month end] + description: The date this balance snapshot was recorded (end of day) + expr: BALANCE_DATE + data_type: DATE + facts: + - name: balance_usd + description: End-of-day account balance in USD + expr: BALANCE_USD + data_type: NUMBER + metrics: + - name: total_balance + synonyms: + - current balance + - balance as of date + - snapshot balance + - end of day balance + - point in time balance + - balance on hand + description: > + Sum of balances across accounts for a given date. Non-additive across + time — always filter or group by balance_date to get a meaningful total. + expr: SUM(balance_usd) + # YAML equivalent of DDL's NON ADDITIVE BY (balance_date) + non_additive_dimensions: + - table: balances + dimension: balance_date + sort_direction: ascending + null_order: last + - name: avg_daily_balance + synonyms: + - average balance + - average daily balance + - mean balance + - typical balance + - balance trend + - average monthly balance + description: > + Average balance across snapshot periods. Use for trend analysis, + not point-in-time reporting. + expr: AVG(balance_usd) diff --git a/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/README.md b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/README.md new file mode 100644 index 00000000..bf9a8d51 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/README.md @@ -0,0 +1,85 @@ +# Shared Degenerate Dimension + +## The Problem + +You have two (or more) fact tables that each contain a low-cardinality categorical column — like `region`, `country`, or `status` — but there is **no dedicated dimension table** for that column. The same concept exists on both facts, and you want a single dimension that works across all of them. + +This is a **degenerate dimension**: a dimension attribute that lives on the fact table itself rather than in a separate dimension table. Making it "shared" so both facts can group by it requires creating a synthetic dimension entity that aggregates the distinct values from all sources. + +## How You Might Express This Need + +- "Both `store_orders` and `web_orders` have a `region` column. How do I create one `region` dimension that works for both?" +- "My three fact tables all have a `status` column. I want to slice any metric by `status` without picking one fact table's version." +- "I don't have a region lookup table — the values are baked into the facts." + +## The Core Pattern + +``` +store_orders.region ─┐ + ├→ UNION → region_dim → regions.region (shared dimension) +web_orders.region ─┘ +``` + +**Step 1:** Create a helper that UNIONs distinct values from all fact tables: +```sql +CREATE VIEW region_dim AS + SELECT DISTINCT region FROM store_orders + UNION + SELECT DISTINCT region FROM web_orders; +``` + +**Step 2:** Reference it as a `UNIQUE` entity in TABLES: +```sql +TABLES ( + regions AS region_dim UNIQUE (region), + store_orders, + web_orders +) +``` + +**Step 3:** Create relationships from each fact to the shared dim: +```sql +RELATIONSHIPS ( + store_to_region AS store_orders(region) REFERENCES regions, + web_to_region AS web_orders(region) REFERENCES regions +) +``` + +Now `regions.region` is a single dimension that can be used with metrics from either fact. + +## Physical View vs Inline SQL + +| | Physical helper view | Inline SQL in TABLES | +|--|---------------------|---------------------| +| Syntax | `CREATE VIEW region_dim AS ...` | `regions AS (SELECT DISTINCT ... UNION ...) UNIQUE (region)` | +| Reusable across SVs | Yes | No — re-declare in each SV | +| Governance/discovery | Yes — view appears in catalog | No separate object | +| Best for | Production SVs | Quick prototyping | + +## When Values Differ Across Facts + +If `store_orders` has `region = 'Pacific'` but `web_orders` has no `'Pacific'` rows, the UNION ensures `'Pacific'` is still in `region_dim`. Web revenue will show 0 or NULL for `'Pacific'` — which is correct and expected. + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **dbt** | `ref('region_seed')` — a standalone seed or model with the dimension values | +| **LookML** | Role-playing dimension / one-to-one relationship from each explore | +| **Star schema** | Add a physical `dim_region` table to the warehouse layer | +| **Power BI** | Common "Region" table referenced by both fact tables | +| **Tableau** | Shared dimension table via Relationships or data blending. Both fact sources join to the same physical dimension. | + +## Docs + +- [Identifying the relationships between logical tables](https://docs.snowflake.com/en/user-guide/views-semantic/sql#identifying-the-relationships-between-logical-tables) +- [CREATE SEMANTIC VIEW — RELATIONSHIPS clause](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#label-create-semantic-view-relationships) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `store_orders`, `web_orders`, and `CREATE VIEW region_dim` (UNION helper) | +| `seed_data.sql` | 6 store orders and 7 web orders across 4 regions | +| `semantic_view.sql` | Two SV variants: physical helper view + inline SQL UNION | +| `queries.sql` | Shared region dimension queries, side-by-side channel comparison | diff --git a/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/queries.sql b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/queries.sql new file mode 100644 index 00000000..5642e30a --- /dev/null +++ b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/queries.sql @@ -0,0 +1,93 @@ +-- Shared Degenerate Dimension: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- Using CHANNEL_BY_REGION_SV (physical helper view approach) +-- ============================================================ + +-- 1. Total revenue by region across both channels +-- regions.region is shared — works with metrics from EITHER fact table +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_BY_REGION_SV + DIMENSIONS regions.region + METRICS total_revenue +) +ORDER BY total_revenue DESC; + + +-- 2. Store revenue only by region +-- regions.region is reachable from store_orders via store_to_region +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_BY_REGION_SV + DIMENSIONS regions.region + METRICS store_orders.store_revenue +) +ORDER BY region; + + +-- 3. Web revenue only by region +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_BY_REGION_SV + DIMENSIONS regions.region + METRICS web_orders.web_revenue +) +ORDER BY region; + + +-- 4. Side-by-side channel comparison by region +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_BY_REGION_SV + DIMENSIONS regions.region + METRICS store_orders.store_revenue, web_orders.web_revenue, total_revenue +) +ORDER BY total_revenue DESC; + + +-- 5. Store revenue by category (fact-specific dimension — NOT shared) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_BY_REGION_SV + DIMENSIONS store_orders.category + METRICS store_orders.store_revenue +) +ORDER BY store_revenue DESC; + + +-- 6. Web revenue by channel and region +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_BY_REGION_SV + DIMENSIONS regions.region, web_orders.channel + METRICS web_orders.web_revenue +) +ORDER BY region, channel; + + +-- ============================================================ +-- INLINE APPROACH (same queries, different SV name) +-- ============================================================ + +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_BY_REGION_INLINE_SV + DIMENSIONS regions.region + METRICS store_orders.store_revenue, web_orders.web_revenue, total_revenue +) +ORDER BY total_revenue DESC; + + +-- ============================================================ +-- WHAT DOESN'T WORK (WITHOUT THIS PATTERN) +-- ============================================================ + +-- If you tried to use store_orders.region and web_orders.region as separate dimensions, +-- there is no relationship between them — they're just independent columns on separate facts. +-- A query asking "revenue by region" would have to pick one fact's region dimension, +-- and the other fact's metrics would not be groupable by the same region concept. + +-- The union helper creates a SINGLE authoritative entity that both facts reference, +-- enabling consistent region-level analytics across both channels. + +-- Also note: if one fact had a region value the other didn't (e.g. 'Pacific' only +-- in store_orders), UNION ensures 'Pacific' is still in the region_dim so it can +-- serve as the outer join anchor for web_revenue = 0 in that region. diff --git a/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/schema.sql b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/schema.sql new file mode 100644 index 00000000..c9ab2389 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/schema.sql @@ -0,0 +1,33 @@ +-- Shared Degenerate Dimension: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- Two fact tables — each has a 'region' column but there is no dedicated region dim table. +-- Region is a "degenerate dimension": a dimension stored on the fact table itself. + +CREATE OR REPLACE TABLE store_orders ( + order_id INTEGER NOT NULL, + order_date DATE NOT NULL, + region VARCHAR(20) NOT NULL, + category VARCHAR(30) NOT NULL, + amount NUMBER(10,2) NOT NULL +); + +CREATE OR REPLACE TABLE web_orders ( + order_id INTEGER NOT NULL, + order_date DATE NOT NULL, + region VARCHAR(20) NOT NULL, + channel VARCHAR(20) NOT NULL, + amount NUMBER(10,2) NOT NULL +); + +-- Helper view: unions the distinct region values from both fact tables. +-- This becomes the shared dimension entity in the semantic view. +-- Alternative: use inline SQL in the TABLES clause (shown in semantic_view.sql). +CREATE OR REPLACE VIEW region_dim AS + SELECT DISTINCT region FROM store_orders + UNION + SELECT DISTINCT region FROM web_orders; diff --git a/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/seed_data.sql b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/seed_data.sql new file mode 100644 index 00000000..2cc63f6d --- /dev/null +++ b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/seed_data.sql @@ -0,0 +1,25 @@ +-- Shared Degenerate Dimension: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO store_orders VALUES + (1, '2024-01-05', 'North', 'Electronics', 1200), + (2, '2024-01-12', 'South', 'Furniture', 450), + (3, '2024-02-01', 'East', 'Electronics', 800), + (4, '2024-02-15', 'North', 'Apparel', 95), + (5, '2024-03-10', 'West', 'Electronics', 1500), + (6, '2024-03-22', 'South', 'Apparel', 120); + +INSERT INTO web_orders VALUES + (1, '2024-01-08', 'North', 'mobile', 300), + (2, '2024-01-20', 'East', 'desktop', 750), + (3, '2024-02-05', 'West', 'mobile', 420), + (4, '2024-02-18', 'South', 'desktop', 680), + (5, '2024-03-01', 'North', 'mobile', 210), + (6, '2024-03-14', 'East', 'desktop', 940), + (7, '2024-03-28', 'West', 'mobile', 380); + +-- Verify the union view contains exactly the 4 regions from both fact tables +-- SELECT * FROM region_dim ORDER BY region; +-- → East, North, South, West diff --git a/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/semantic_view.sql b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/semantic_view.sql new file mode 100644 index 00000000..17a6e82c --- /dev/null +++ b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/semantic_view.sql @@ -0,0 +1,106 @@ +-- Shared Degenerate Dimension: Semantic View DDL +-- Two approaches shown. Use whichever fits your environment. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- APPROACH A: Physical helper view (region_dim) +-- Created in schema.sql as: +-- CREATE VIEW region_dim AS +-- SELECT DISTINCT region FROM store_orders +-- UNION +-- SELECT DISTINCT region FROM web_orders; +-- +-- Best for: production SVs, when the helper view is reused across multiple SVs +-- ============================================================ + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.CHANNEL_BY_REGION_SV + + TABLES ( + -- Shared dimension: a helper view that unions distinct region values + -- from both fact tables. Referenced as 'regions' in the SV. + regions AS region_dim UNIQUE (region), + + store_orders, + web_orders + ) + + RELATIONSHIPS ( + -- Both fact tables relate to the shared dimension on the 'region' column. + store_to_region AS store_orders(region) REFERENCES regions, + web_to_region AS web_orders(region) REFERENCES regions + ) + + DIMENSIONS ( + -- The shared dimension: defined once, usable with metrics from either fact. + regions.region AS region + WITH SYNONYMS ('region', 'geo', 'geography') + COMMENT = 'Shared region dimension covering all channels.', + + -- Fact-specific dimensions (not shared) + store_orders.category AS category + WITH SYNONYMS ('product category'), + web_orders.channel AS channel + WITH SYNONYMS ('device type', 'web channel'), + + store_orders.order_month AS DATE_TRUNC('month', store_orders.order_date) + WITH SYNONYMS ('store month', 'month'), + web_orders.order_month AS DATE_TRUNC('month', web_orders.order_date) + WITH SYNONYMS ('web month') + ) + + METRICS ( + store_orders.store_revenue AS SUM(amount) + WITH SYNONYMS ('store sales', 'store revenue'), + web_orders.web_revenue AS SUM(amount) + WITH SYNONYMS ('web sales', 'online revenue'), + + -- Cross-fact derived metric using the shared region dimension + total_revenue AS store_orders.store_revenue + web_orders.web_revenue + WITH SYNONYMS ('total revenue', 'all channel revenue') + ) + + COMMENT = 'Store and web orders with a shared region dimension derived from a UNION helper view. Demonstrates the degenerate dimension pattern.'; + + +-- ============================================================ +-- APPROACH B: Inline SQL dataset (no separate physical view) +-- Best for: ad-hoc or when you don't want to create a view object +-- ============================================================ + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.CHANNEL_BY_REGION_INLINE_SV + + TABLES ( + -- Same UNION logic, but inline in the TABLES clause. + -- Alias is required when using AS (...). + regions AS ( + SELECT DISTINCT region FROM SNIPPETS.PUBLIC.store_orders + UNION + SELECT DISTINCT region FROM SNIPPETS.PUBLIC.web_orders + ) UNIQUE (region), + + store_orders, + web_orders + ) + + RELATIONSHIPS ( + store_to_region AS store_orders(region) REFERENCES regions, + web_to_region AS web_orders(region) REFERENCES regions + ) + + DIMENSIONS ( + regions.region AS region + WITH SYNONYMS ('region', 'geo') + ) + + METRICS ( + store_orders.store_revenue AS SUM(amount) + WITH SYNONYMS ('store revenue'), + web_orders.web_revenue AS SUM(amount) + WITH SYNONYMS ('web revenue'), + total_revenue AS store_orders.store_revenue + web_orders.web_revenue + WITH SYNONYMS ('total revenue') + ) + + COMMENT = 'Same as CHANNEL_BY_REGION_SV but using an inline SQL UNION instead of a physical helper view.'; diff --git a/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/semantic_view.yaml b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/semantic_view.yaml new file mode 100644 index 00000000..3955fec0 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/shared_degenerate_dimension/semantic_view.yaml @@ -0,0 +1,96 @@ +# Shared Degenerate Dimension: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# YAML NOTE: Inline SQL subqueries in the TABLES clause (Approach B in the DDL) +# are not supported in YAML. This YAML implements Approach A (physical helper view). +# Approach B requires DDL authoring (semantic_view.sql). + +name: CHANNEL_BY_REGION_SV +description: > + Store and web orders with a shared region dimension derived from a UNION helper + view. Demonstrates the degenerate dimension pattern. + Approach B (inline SQL UNION) is DDL-only — see semantic_view.sql. + +tables: + - name: regions + description: > + Shared region dimension — union of distinct regions from both fact tables. + Physical helper view: CREATE VIEW region_dim AS SELECT DISTINCT region FROM + store_orders UNION SELECT DISTINCT region FROM web_orders. + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: REGION_DIM + primary_key: + columns: [REGION] + dimensions: + - name: region + synonyms: [region, geo, geography] + description: Shared region dimension covering all channels + expr: REGION + data_type: VARCHAR + + - name: store_orders + description: Store order transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: STORE_ORDERS + dimensions: + - name: category + synonyms: [product category] + expr: CATEGORY + data_type: VARCHAR + - name: order_month + synonyms: [store month, month] + expr: DATE_TRUNC('month', ORDER_DATE) + data_type: DATE + metrics: + - name: store_revenue + synonyms: [store sales, store revenue] + expr: SUM(AMOUNT) + + - name: web_orders + description: Web order transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: WEB_ORDERS + dimensions: + - name: channel + synonyms: [device type, web channel] + expr: CHANNEL + data_type: VARCHAR + - name: order_month + synonyms: [web month] + expr: DATE_TRUNC('month', ORDER_DATE) + data_type: DATE + metrics: + - name: web_revenue + synonyms: [web sales, online revenue] + expr: SUM(AMOUNT) + +relationships: + - name: store_to_region + left_table: store_orders + right_table: regions + relationship_columns: + - left_column: REGION + right_column: REGION + - name: web_to_region + left_table: web_orders + right_table: regions + relationship_columns: + - left_column: REGION + right_column: REGION + +metrics: + - name: total_revenue + synonyms: [total revenue, all channel revenue] + description: Combined store and web revenue + expr: store_orders.store_revenue + web_orders.web_revenue diff --git a/skills/semantic-view-patterns/snippets/standard_sql/README.md b/skills/semantic-view-patterns/snippets/standard_sql/README.md new file mode 100644 index 00000000..7663d29d --- /dev/null +++ b/skills/semantic-view-patterns/snippets/standard_sql/README.md @@ -0,0 +1,56 @@ +# Standard SQL on Semantic Views + +## The Problem + +Not every consumer of a Semantic View uses Cortex Analyst or the `SEMANTIC_VIEW()` function. Data analysts in SQL clients, BI tools, and dbt models often need to query a SV like a regular table or view using **plain SQL syntax**. + +Snowflake allows querying a SV with a regular SELECT — but with some important rules around aggregate functions. + +## How You Might Express This Need + +- "Can I connect Tableau directly to a Semantic View without using SEMANTIC_VIEW()?" +- "I want to query the SV in a dbt model using SELECT ... FROM ... WHERE" +- "How do I use a window function on top of SV output?" +- "I just want to see distinct dates from the SV — no aggregation needed" + +## The Rules + +When using standard SQL on a SV (not SEMANTIC_VIEW()): + +| Scenario | Rule | +|----------|------| +| SELECT metric + other columns | Wrap metric in `ANY_VALUE()`, `MIN()`, or `MAX()` | +| SELECT metric only | No wrapper needed | +| SELECT dimensions only | No wrapper, no GROUP BY needed — returns distinct values | +| WHERE clause | Works normally on dimensions | +| ORDER BY, LIMIT | Work normally | +| JOIN another table/SV | Works normally | + +### Why `ANY_VALUE()`? +Because SVs are not regular tables — they are aggregated semantic objects. When combined with non-metric columns, the engine needs an aggregate function to resolve the grouping. `ANY_VALUE` is the idiomatic "I know this is functionally deterministic for this group" wrapper. + +## Example + +```sql +SELECT + month, + ANY_VALUE(total_revenue) AS revenue +FROM SNIPPETS.PUBLIC.CHANNEL_SALES_SV +WHERE year = 2024 +GROUP BY ALL +ORDER BY month; +``` + +## Docs + +- [Querying semantic views — standard SQL FROM clause](https://docs.snowflake.com/en/user-guide/views-semantic/querying#specifying-the-name-of-the-semantic-view-in-the-from-clause) +- [Querying semantic views — SEMANTIC_VIEW() clause](https://docs.snowflake.com/en/user-guide/views-semantic/querying#specifying-the-semantic-view-clause-in-the-from-clause) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | Reference note — uses `derived_metrics` SV | +| `queries.sql` | Standard SQL patterns: ANY_VALUE, metric-only, dim-only, JOINs, window on top | + +**Prerequisites:** Deploy `derived_metrics/semantic_view.sql` first (creates `SNIPPETS.PUBLIC.CHANNEL_SALES_SV`). diff --git a/skills/semantic-view-patterns/snippets/standard_sql/queries.sql b/skills/semantic-view-patterns/snippets/standard_sql/queries.sql new file mode 100644 index 00000000..6681d063 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/standard_sql/queries.sql @@ -0,0 +1,87 @@ +-- Standard SQL: Queries +-- Prerequisites: deploy derived_metrics/semantic_view.sql first +-- Reference SV: SNIPPETS.PUBLIC.CHANNEL_SALES_SV (no SEMANTIC_VIEW() function needed) + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- STANDARD SQL ON A SEMANTIC VIEW +-- Regular SELECT from a SV as if it were a view. +-- ============================================================ + +-- 1. Monthly revenue across channels using standard SQL (not SEMANTIC_VIEW()) +-- IMPORTANT: metrics must be wrapped in ANY_VALUE(), MIN(), or MAX() +-- if other columns are selected; otherwise use them ungrouped. +SELECT + month, + ANY_VALUE(store_revenue) AS store_rev, + ANY_VALUE(web_revenue) AS web_rev, + ANY_VALUE(catalog_revenue) AS catalog_rev, + ANY_VALUE(total_revenue) AS total_rev +FROM SNIPPETS.PUBLIC.CHANNEL_SALES_SV +WHERE year = 2024 +GROUP BY ALL +ORDER BY month; + + +-- 2. MIN/MAX pattern for metrics +SELECT + quarter, + MIN(total_revenue) AS min_quarterly_rev, + MAX(total_revenue) AS max_quarterly_rev +FROM SNIPPETS.PUBLIC.CHANNEL_SALES_SV +GROUP BY quarter +ORDER BY quarter; + + +-- 3. Metric-less dimension query — returns distinct dimension values +-- No GROUP BY needed; the SV engine handles deduplication. +SELECT month +FROM SNIPPETS.PUBLIC.CHANNEL_SALES_SV +WHERE year = 2024; + + +-- 4. WHERE clause on dimensions + standard aggregation +SELECT + year, + quarter, + ANY_VALUE(store_revenue) AS store_rev, + ANY_VALUE(total_revenue) AS total_rev +FROM SNIPPETS.PUBLIC.CHANNEL_SALES_SV +WHERE quarter IN (1, 2) +GROUP BY ALL +ORDER BY year, quarter; + + +-- 5. Combine with standard SQL window functions on top of the SV +SELECT + month, + ANY_VALUE(total_revenue) AS monthly_rev, + SUM(ANY_VALUE(total_revenue)) OVER (ORDER BY month) AS running_total +FROM SNIPPETS.PUBLIC.CHANNEL_SALES_SV +GROUP BY month +ORDER BY month; + + +-- ============================================================ +-- RULES FOR STANDARD SQL ON SVs +-- ============================================================ + +-- 1. If you SELECT a metric alongside other columns, wrap it in +-- ANY_VALUE(), MIN(), or MAX() +-- 2. If you SELECT only metrics (no other columns), no wrapping needed +-- 3. If you SELECT only dimensions (no metrics), no wrapping needed +-- and GROUP BY is not required — SV returns distinct values +-- 4. Standard WHERE, ORDER BY, LIMIT all work normally +-- 5. You can JOIN a SV to another table or SV using standard SQL syntax + +-- ============================================================ +-- WHY USE STANDARD SQL OVER SEMANTIC_VIEW()? +-- ============================================================ +-- + Familiar SQL syntax for analysts already in SQL tools +-- + Works with BI tools that don't speak SEMANTIC_VIEW() syntax +-- + Enables standard SQL window functions on top of SV output +-- + Can JOIN multiple SVs together +-- - Less explicit about which metrics/dimensions are being requested +-- - No AI routing or VQR matching (Cortex Analyst uses SEMANTIC_VIEW() internally) diff --git a/skills/semantic-view-patterns/snippets/standard_sql/schema.sql b/skills/semantic-view-patterns/snippets/standard_sql/schema.sql new file mode 100644 index 00000000..7fd944e2 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/standard_sql/schema.sql @@ -0,0 +1,6 @@ +-- Standard SQL: Schema +-- This snippet uses the channel_sales SV from derived_metrics/. +-- No new tables needed — deploy derived_metrics first. + +-- Reference SV: SNIPPETS.PUBLIC.CHANNEL_SALES_SV +-- Created by: derived_metrics/semantic_view.sql diff --git a/skills/semantic-view-patterns/snippets/sv_diagnostics/README.md b/skills/semantic-view-patterns/snippets/sv_diagnostics/README.md new file mode 100644 index 00000000..fef75815 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/sv_diagnostics/README.md @@ -0,0 +1,276 @@ +# SV Diagnostics + +## The Problem + +You've written a Semantic View. It deploys without errors. But then an analyst reports that a query isn't working, or Cortex Analyst gives them the wrong answer, or the numbers don't add up. Where do you start? + +This snippet is a diagnostic reference: six real failure modes, the exact error messages they produce, how to tell them apart, and what to fix. + +**How You Might Express This Need:** +- "My SV deployed fine but queries are erroring — what's wrong?" +- "Cortex Analyst keeps saying it can't answer my question." +- "I added a date dimension and now nothing works." +- "My revenue numbers look wrong when I break down by product." +- "The fan trap error went away after I changed something, but now I'm not sure the numbers are right." +- "How do I know if my model is structured correctly before I ship it?" + +--- + +## The Six Diagnostics + +### 1. Ambiguous Path Relationship + +**Symptom:** The SV deploys. Queries without date dimensions work fine. The moment an analyst tries to group by `year` or `month`, it errors. + +**Error:** +``` +Invalid dimension specified: Multi-path relationship between the dimension +entity 'DATE_DIM' and the base metric or dimension entity 'DEALS' is not supported. +``` + +**Root cause:** A fact table has two FKs that both reference the same dimension table (e.g., `CREATED_DATE` and `CLOSE_DATE` both pointing to `DIM_DATE`). Two relationships exist with no disambiguation. + +**Why it's insidious:** Many queries succeed. The bug hides until someone tries a time-series breakdown. + +**Fix:** Add `USING (relationship)` to every metric, explicitly declaring which date path that metric uses. + +```sql +-- BROKEN: no USING — ambiguous at query time +deals.total_amount AS SUM(AMOUNT) + +-- FIXED: USING before AS — each metric owns its date path +deals.total_amount_created USING (deals_to_created_date) AS SUM(AMOUNT) +deals.total_amount_closed USING (deals_to_close_date) AS SUM(AMOUNT) +``` + +> **See also:** `multi_path_metrics` snippet for the full USING pattern; `accumulating_snapshot` for USING on a multi-milestone fact table. + +--- + +### 2. Fan Trap + +**Symptom:** Querying a metric grouped by a "downstream" dimension errors at query time. + +**Error:** +``` +Invalid dimension specified: The dimension entity 'PRODUCTS' must be related to +and have an equal or lower level of granularity compared to the base metric or +dimension entity 'DEALS'. +``` + +**Root cause:** The metric is at a coarser grain than the dimension it's being grouped by. Revenue lives at the `DEALS` header (one row per deal), but `DIM_PRODUCT` is only reachable through `DEAL_ITEMS` (many rows per deal). The SV engine detects the potential fan-out and refuses. + +**Distinguishing from Scenario 3:** Same error message. Check your RELATIONSHIPS clause — in a fan trap the relationship exists but at the wrong grain; in Scenario 3 the relationship is simply missing. + +**Fix:** Move the metric to the table that directly joins the dimension. + +```sql +-- BROKEN: metric at DEALS grain, dimension only reachable via DEAL_ITEMS +FACTS ( deals.amount AS AMOUNT ) +METRICS ( deals.total_amount AS SUM(AMOUNT) ) -- can't group by products.category + +-- FIXED: metric at DEAL_ITEMS grain — same level as DIM_PRODUCT +FACTS ( deal_items.line_amount AS LINE_AMOUNT ) +METRICS ( deal_items.total_revenue AS SUM(LINE_AMOUNT) ) -- ✓ +``` + +--- + +### 3. Table With No Relationship + +**Symptom:** A table is listed in `TABLES` and its dimensions are defined, but any query using those dimensions errors. + +**Error:** +``` +Invalid dimension specified: The dimension entity 'DIM_REGION' must be related +to and have an equal or lower level of granularity compared to the base metric +or dimension entity 'DEALS'. +``` + +**Root cause:** The table was added to `TABLES` but no `RELATIONSHIPS` entry connects it. The engine can't build a join path. + +**Distinguishing from Scenario 2:** Same error message. Search the `RELATIONSHIPS` clause for the orphaned table's name — it won't appear on either side of any relationship. + +**Fix:** Add the missing relationship, or remove the orphaned table from `TABLES`. + +```sql +-- BROKEN: no relationship for dim_region +RELATIONSHIPS ( + deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) +) + +-- FIXED: add the missing link +RELATIONSHIPS ( + deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + , rep_to_region AS rep_dim(REGION) REFERENCES dim_region(REGION_CODE) +) +``` + +--- + +### 4. Duplicate Names and Ambiguous Synonyms + +Two flavors with different consequences. + +#### 4a. Duplicate Logical Name — Deploy-Time Error + +**Error:** `SQL compilation error: invalid identifier ''` at CREATE time. + +**Root cause:** Two dimensions (or metrics, or a dimension and a metric) share the same logical name. Logical names must be globally unique within a SV. + +**Fix:** Entity-scope your logical names. + +```sql +-- BROKEN +rep_dim.segment AS REGION -- logical: "segment" — duplicate +products.segment AS CATEGORY -- logical: "segment" — duplicate → deploy error + +-- FIXED +rep_dim.rep_segment AS REGION +products.product_segment AS CATEGORY +``` + +#### 4b. Overlapping Synonyms — Cortex Analyst Ambiguity + +**Symptom:** SV deploys, SQL queries work. Cortex Analyst refuses natural language questions. + +**CA response:** +``` +The term 'segment' is ambiguous. It could refer to 'product_segment' or +'rep_segment'. Could you clarify which segment you mean? +``` + +**Root cause:** Multiple definitions share the same synonym. CA never silently picks the wrong one — it refuses. This is correct behavior, but your analysts hit a wall. + +**Fix:** Give each definition a synonym set that is unique and scoped. Never share high-value terms (`revenue`, `total`, `count`, `segment`, `area`) across multiple definitions. + +--- + +### 5. Wrong Relationship Direction and Wrong Cardinality + +Two flavors. One fails loudly; the other is the most dangerous issue in this entire guide. + +#### 5a. Reversed Direction — Deploy-Time Error + +**Error:** +``` +The referenced key in the relationship 'REP_DIM REFERENCES DEALS' must be the +primary or unique key of the referenced entity. +``` + +**Root cause:** The relationship direction is flipped — the dimension table is on the left of `REFERENCES`, pointing to the fact table on the right. The engine enforces that the RHS of `REFERENCES` must be a declared PK/UK. Since `DEALS.REP_ID` is not a PK of DEALS, it errors immediately. + +**The guardrail limit:** This protection only works when the FK column is not the PK of its own table. If both sides happen to declare the same column as PK (Scenario 5b), the engine can't detect the lie. + +**Fix:** Always write relationships as `many_side(FK) REFERENCES one_side(PK)`. The right-hand side is always the dimension/parent primary key. + +#### 5b. Wrong Cardinality (Lying About the PK) — Silent Wrong Results + +**This is the most dangerous diagnostic in this guide. No error. Ever.** + +**Symptom:** The SV deploys. Most queries return correct results. But certain queries — specifically, header-level metrics grouped by fine-grain dimensions — return silently inflated numbers. + +**Root cause:** `DEAL_ITEMS` has `ITEM_ID` as its real PK (many items per deal). The modeler accidentally declares `PRIMARY KEY (DEAL_ID)` instead — asserting 1:1 with DEALS. Snowflake doesn't enforce PK uniqueness, so the model deploys. + +The SV engine uses the declared PK to assess join cardinality. Believing the relationship is 1:1, it disables its fan trap guard. The exact query that would correctly error on a properly-declared model now runs — and inflates every number by the average number of items per parent row: + +``` +Correct model → deals.total_amount by products.category → ERROR (fan trap caught ✓) +Wrong PK model → deals.total_amount by products.category → $430k instead of ~$240k ✗ +``` + +Multi-item deals get their `AMOUNT` counted once per item. The numbers look plausible. They're wrong. + +**Detection:** Compare the SV metric total against a raw `SELECT SUM(...)` on the table. If they don't match when grouping across all rows, there's a cardinality lie in your `TABLES` clause. + +```sql +-- Detection query: does the SV total match raw SQL? +SELECT SUM(amount) FROM DEALS; -- should equal SV total_amount with no grouping +``` + +**Fix:** Declare `PRIMARY KEY` on the column that is actually unique in that table. For bridge and line-item tables, that is the surrogate item key — not the FK back to the parent. + +```sql +-- WRONG: declaring the FK column as PK +deal_items AS DEAL_ITEMS PRIMARY KEY (DEAL_ID) -- DEAL_ID is not unique in DEAL_ITEMS + +-- CORRECT: declare the actual unique key +deal_items AS DEAL_ITEMS PRIMARY KEY (ITEM_ID) -- ITEM_ID is unique ✓ +``` + +--- + +### 6. Forgotten Semi-Additive Behavior + +**No error. No query failure. No CA refusal. Just wrong answers.** + +This is a model design review item, not a detectable error. Ask this question for every `FACT` and `METRIC` before you deploy: + +> **"Does this column represent a SNAPSHOT or a FLOW?"** + +| Type | Examples | Correct aggregation | +|------|----------|-------------------| +| **Flow** — accumulates over time | Revenue, quantity sold, transactions | `SUM` ✓ | +| **Snapshot** — point-in-time | Account balance, headcount, inventory, open pipeline | `SUM` across time = **wrong** | + +Summing daily account balances across 30 days gives a number 30× too large. The same deal counted in open pipeline every day it's open gets multiplied by its age in days. + +**The fix:** Use `NON ADDITIVE BY` in your metric definition. See the `semi_additive_metric` snippet for the full pattern. The checklist question above is your trigger to go look at that snippet. + +--- + +## Diagnostic Cheat Sheet + +| Error / Symptom | Possible Causes | How to Tell Apart | Fix | +|---|---|---|---| +| "Multi-path relationship not supported" | Two relationships to same dim, no USING | Only one root cause — check RELATIONSHIPS for duplicate target | Add USING to each metric | +| "Dimension must be equal or lower granularity" | Fan trap OR missing relationship | Check RELATIONSHIPS — is the table connected at all? | Fan trap → move metric to bridge grain; Missing rel → add relationship | +| "invalid identifier" at CREATE time | Duplicate logical name | Scan DIMENSIONS/METRICS for repeated names | Entity-scope all logical names | +| CA refuses with ambiguity explanation | Overlapping synonyms | Scan WITH SYNONYMS for shared terms | Remove shared terms; unique synonym sets per definition | +| "Referenced key must be PK/UK" at CREATE time | Reversed relationship direction | FK is on the RHS instead of LHS | Flip: `many(FK) REFERENCES one(PK)` | +| No error, silently inflated numbers | Wrong PK declaration (cardinality lie) | Compare SV total to raw SQL total | Declare PK on the actually-unique column | +| No error, subtly wrong aggregations over time | Snapshot metric using SUM | Ask: snapshot or flow? | Use `NON ADDITIVE BY` — see `semi_additive_metric` snippet | + +--- + +## Pre-Deployment Checklist + +Before running `CREATE SEMANTIC VIEW`, scan your DDL for these patterns: + +- [ ] **Every fact table with two or more date FKs**: does every metric have `USING`? +- [ ] **Every table in TABLES**: does it appear in at least one RELATIONSHIP? +- [ ] **Every metric**: is it defined at the same or lower grain as the dimensions it will be grouped by? +- [ ] **Every logical name in DIMENSIONS and METRICS**: is it globally unique within the SV? +- [ ] **Every synonym**: does it appear in only one definition? Check especially: `revenue`, `count`, `total`, `amount`, `segment`, `type`, `name`, `date`, `area`. +- [ ] **Every RELATIONSHIP**: is it written as `many_side(FK) REFERENCES one_side(PK)`? +- [ ] **Every PRIMARY KEY declaration in TABLES**: is it the column that is actually unique in that table (not a FK, not a non-unique attribute)? +- [ ] **Every FACT and METRIC**: is it a flow (SUM is correct) or a snapshot (needs NON ADDITIVE BY)? + +--- + +## What Doesn't Work + +- **Pre-deployment dry-run**: There is no `VALIDATE SEMANTIC VIEW` command. The only way to test deploy-time errors is to attempt `CREATE`. For query-time errors, deploy first and run test queries. + +- **DESCRIBE as a validator**: `DESCRIBE SEMANTIC VIEW` shows structure after deployment but cannot detect query-time issues like fan traps or ambiguous paths. + +- **PK enforcement**: Snowflake does not enforce PRIMARY KEY uniqueness on tables. The SV engine trusts whatever you declare. A wrong PK declaration deploys silently and disables cardinality guards — always verify your PK declarations against actual data. + +- **Fixing overlapping synonyms at query time**: Once deployed with ambiguous synonyms, CA will refuse those questions until the SV is altered. + +--- + +## Docs + +- [Semantic View DDL reference](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view) +- [SEMANTIC_VIEW() table function](https://docs.snowflake.com/en/sql-reference/functions/semantic_view) +- [Cortex Analyst — semantic views](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/semantic-model-spec) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `DIM_REP`, `DIM_PRODUCT`, `DIM_REGION` (orphaned), `DEALS`, `DEAL_ITEMS` | +| `seed_data.sql` | 4 reps, 4 products, 12 deals, 18 line items, 15 DIM_DATE rows | +| `semantic_view.sql` | All broken and fixed SVs for scenarios 1–5; scenario 6 is checklist only | +| `queries.sql` | Error-triggering queries with exact messages, fixed queries with verified output, and the semi-additive checklist | diff --git a/skills/semantic-view-patterns/snippets/sv_diagnostics/queries.sql b/skills/semantic-view-patterns/snippets/sv_diagnostics/queries.sql new file mode 100644 index 00000000..75da9a8d --- /dev/null +++ b/skills/semantic-view-patterns/snippets/sv_diagnostics/queries.sql @@ -0,0 +1,306 @@ +-- SV Diagnostics: Verification Queries +-- +-- Each section triggers a specific error on the BROKEN SV, states the exact +-- error message, then demonstrates the FIX on the corrected SV. + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 1: AMBIGUOUS PATH RELATIONSHIP +-- ══════════════════════════════════════════════════════════════════════════════ + +-- Step 1a: Confirm the broken SV appears healthy — non-date queries work fine. +-- This is why the bug is insidious: many queries succeed before anyone +-- tries a time-series breakdown. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_AMBIGUOUS_PATH_SV + DIMENSIONS deals.product + METRICS deals.total_amount +) +ORDER BY product; +-- | PRODUCT | TOTAL_AMOUNT | +-- |-----------------|--------------| +-- | Analytics | 240000.00 | +-- | Data Pipelines | 73500.00 | + +-- Step 1b: Trigger the ambiguous path error — group by a date dimension. +-- ANY metric + ANY date dimension fires this error. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_AMBIGUOUS_PATH_SV + DIMENSIONS date_dim.year, date_dim.month_name + METRICS deals.total_amount +) +ORDER BY year, month_name; +-- ERROR: SQL compilation error: +-- Invalid dimension specified: Multi-path relationship between the dimension +-- entity 'DATE_DIM' and the base metric or dimension entity 'DEALS' is not supported. + +-- Step 1c: Fixed SV — USING on each metric picks the correct date path. +-- deal_count_created: buckets by creation date (all 12 deals visible) +-- deal_count_closed: buckets by close date (5 open deals → NULL row) +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_FIXED_SV + DIMENSIONS date_dim.year, date_dim.month_num, date_dim.month_name + METRICS deals.deal_count_created, deals.deal_count_closed +) +ORDER BY year, month_num; +-- | YEAR | MONTH_NUM | MONTH_NAME | DEAL_COUNT_CREATED | DEAL_COUNT_CLOSED | +-- |------|-----------|------------|--------------------|-------------------| +-- | 2025 | 1 | January | 4 | 1 | +-- | 2025 | 2 | February | 4 | 2 | +-- | 2025 | 3 | March | 4 | 4 | +-- | NULL | NULL | | NULL | 5 | + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 2: FAN TRAP +-- ══════════════════════════════════════════════════════════════════════════════ + +-- Step 2a: Confirm the broken SV works for dimensions at or above DEALS grain. +-- Rep and stage are fine — they join directly to DEALS. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_FAN_TRAP_SV + DIMENSIONS rep_dim.rep_name + METRICS deals.total_amount +) +ORDER BY rep_name; +-- | REP_NAME | TOTAL_AMOUNT | +-- |---------------|--------------| +-- | Alice Nguyen | 160000.00 | +-- | Bob Torres | 45000.00 | +-- | Carol Kim | 80000.00 | +-- | David Osei | 28500.00 | + +-- Step 2b: Trigger the fan trap — group by a product dimension. +-- products.category is only reachable via DEAL_ITEMS (many-per-deal), +-- which is at a finer grain than DEALS. The SV engine catches this. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_FAN_TRAP_SV + DIMENSIONS products.category + METRICS deals.total_amount +) +ORDER BY category; +-- ERROR: SQL compilation error: +-- Invalid dimension specified: The dimension entity 'PRODUCTS' must be related +-- to and have an equal or lower level of granularity compared to the base metric +-- or dimension entity 'DEALS'. + +-- Step 2c: Fixed SV — metric moved to DEAL_ITEMS grain. +-- Total revenue by product category, no fan trap. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_FAN_TRAP_FIXED_SV + DIMENSIONS products.category + METRICS deal_items.total_revenue, deal_items.item_count +) +ORDER BY category; +-- | CATEGORY | TOTAL_REVENUE | ITEM_COUNT | +-- |-----------------|---------------|------------| +-- | Analytics | 200666.66 | 10 | +-- | Data Pipelines | 112833.34 | 8 | + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 3: TABLE WITH NO RELATIONSHIP +-- ══════════════════════════════════════════════════════════════════════════════ + +-- Step 3a: Confirm the broken SV works for connected dimensions. +-- rep_dim has a relationship — its dimensions work fine. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_NO_REL_SV + DIMENSIONS rep_dim.region + METRICS deals.total_amount +) +ORDER BY region; +-- | REGION | TOTAL_AMOUNT | +-- |--------|--------------| +-- | East | 108500.00 | +-- | West | 205000.00 | + +-- Step 3b: Trigger the no-relationship error — use the orphaned dim_region table. +-- Note: IDENTICAL error message to the fan trap (Scenario 2). +-- Distinguishing factor: check RELATIONSHIPS clause for a missing entry. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_NO_REL_SV + DIMENSIONS dim_region.region_name + METRICS deals.total_amount +); +-- ERROR: SQL compilation error: +-- Invalid dimension specified: The dimension entity 'DIM_REGION' must be related +-- to and have an equal or lower level of granularity compared to the base metric +-- or dimension entity 'DEALS'. + +-- Step 3c: Fixed SV — relationship added: deals → rep_dim → dim_region. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_NO_REL_FIXED_SV + DIMENSIONS dim_region.region_name + METRICS deals.total_amount +) +ORDER BY region_name; +-- | REGION_NAME | TOTAL_AMOUNT | +-- |-----------------|--------------| +-- | Eastern Region | 108500.00 | +-- | Western Region | 205000.00 | + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 4A: DUPLICATE LOGICAL NAME — DEPLOY-TIME ERROR +-- ══════════════════════════════════════════════════════════════════════════════ + +-- Reproducing this error requires attempting to CREATE the broken SV. +-- See semantic_view.sql — the DEALS_DUPE_NAME_SV definition is commented out. +-- Uncomment and run to observe the deploy-time error: +-- +-- ERROR: SQL compilation error: error line N at position N +-- invalid identifier '' +-- +-- Fix: ensure every dimension and metric has a globally unique logical name +-- within the SV. Use entity-scoped names (rep_segment, product_segment) when +-- the same concept appears on multiple entities. + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 4B: OVERLAPPING SYNONYMS — CORTEX ANALYST AMBIGUITY +-- ══════════════════════════════════════════════════════════════════════════════ +-- These queries are run via the Cortex Analyst API, not SEMANTIC_VIEW(). +-- Results are shown as CA responses. + +-- Q4b-1: "What is total revenue by segment?" +-- → CA response on DEALS_AMBIGUOUS_NAMES_SV: +-- "The term 'segment' is ambiguous. It could refer to 'product_segment' +-- (product category/segment) or 'rep_segment' (rep region/territory/segment). +-- Could you clarify which segment you mean?" +-- +-- → CA response on DEALS_CLEAR_NAMES_SV: +-- "The term 'segment' is ambiguous. The closest dimensions are 'product_category' +-- or 'rep_territory'. Could you clarify which 'segment' you mean?" +-- (Still can't answer — "segment" was deliberately removed from all synonyms.) + +-- Q4b-2: "What is total revenue by area?" +-- → CA response on DEALS_AMBIGUOUS_NAMES_SV: +-- "The term 'area' is ambiguous — it matches both 'product_segment' and +-- 'rep_segment'. Could you clarify?" + +-- Q4b-3: "What is total revenue?" +-- → CA response on DEALS_AMBIGUOUS_NAMES_SV: +-- "'total revenue' can refer to: (1) 'total_amount' — deal-level total, or +-- (2) 'total_revenue' — line-item level total. These may produce different +-- results. Could you clarify?" + +-- Q4b-4: "What is product revenue by product category?" +-- → CA on DEALS_AMBIGUOUS_NAMES_SV: answers correctly (unambiguous phrasing) +-- → CA on DEALS_CLEAR_NAMES_SV: answers correctly ✓ + +-- Q4b-5: "What is total deal value by rep territory?" +-- → CA on DEALS_CLEAR_NAMES_SV: answers correctly ✓ +-- Generated SQL routes to deals.deal_value metric + rep_dim.rep_territory dimension + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 5A: REVERSED RELATIONSHIP DIRECTION — DEPLOY-TIME ERROR +-- ══════════════════════════════════════════════════════════════════════════════ + +-- Reproducing this error requires attempting to CREATE the broken SV. +-- See semantic_view.sql — DEALS_REVERSED_REL_SV is commented out. +-- Uncomment and run to observe: +-- +-- ERROR: SQL compilation error: +-- The referenced key in the relationship 'REP_DIM REFERENCES DEALS' must be +-- the primary or unique key of the referenced entity. +-- +-- This fires because DEALS.REP_ID is not a declared PK or UK of DEALS. +-- The SV engine enforces that the RHS of REFERENCES must be a PK/UK — this +-- catches reversed-direction mistakes whenever the FK column is not also unique. +-- +-- Note: this guard only works when the FK column is NOT the PK of its table. +-- If both the left and right columns happen to be declared PKs (Scenario 5b), +-- the engine cannot detect the cardinality lie and the model deploys silently. + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 5B: WRONG CARDINALITY — SILENT WRONG RESULTS +-- ══════════════════════════════════════════════════════════════════════════════ + +-- Step 5b-1: Confirm the wrong-cardinality SV looks healthy for safe queries. +-- Line-item metrics by product work correctly because the join path +-- is the same regardless of the declared PK. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_BOTH_UNIQUE_SV + DIMENSIONS products.category + METRICS deal_items.total_revenue, deal_items.item_count +) +ORDER BY category; +-- | CATEGORY | TOTAL_REVENUE | ITEM_COUNT | +-- |-----------------|---------------|------------| +-- | Analytics | 221666.66 | 10 | ← correct ✓ +-- | Data Pipelines | 91833.34 | 8 | ← correct ✓ + +-- Step 5b-2: The dangerous query — header-level metric by fine-grain dimension. +-- On a correctly-declared SV this errors (fan trap caught). +-- On the wrong-cardinality SV it runs and silently inflates numbers. +-- +-- Why: declaring PRIMARY KEY (DEAL_ID) on DEAL_ITEMS tells the engine +-- the relationship is 1:1. It believes no fan-out can occur and skips +-- the cardinality check. Every deal with multiple items gets its +-- AMOUNT counted once per item. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_BOTH_UNIQUE_SV + DIMENSIONS products.category + METRICS deals.total_amount +) +ORDER BY category; +-- | CATEGORY | TOTAL_AMOUNT | +-- |-----------------|---------------| +-- | Analytics | 430000.00 | ← WRONG: should be ~$240k (multi-item deals counted 2-3×) +-- | Data Pipelines | 146500.00 | ← WRONG: should be ~$73.5k + +-- Step 5b-3: Prove the correctly-declared SV would catch this as a fan trap error. +SELECT * FROM SEMANTIC_VIEW( + SEMANTIC_SKILLS.SNIPPETS.DEALS_FAN_TRAP_SV + DIMENSIONS products.category + METRICS deals.total_amount +) +ORDER BY category; +-- ERROR: SQL compilation error: +-- Invalid dimension specified: The dimension entity 'PRODUCTS' must be related +-- to and have an equal or lower level of granularity compared to the base metric +-- or dimension entity 'DEALS'. +-- +-- The guard is disabled by the wrong PK declaration. Same model structure, +-- same wrong query — one errors (safe), one silently returns garbage (dangerous). + +-- Step 5b-4: Detection heuristic — compare totals against raw SQL. +-- If a SV metric total doesn't match a direct table aggregate, the +-- model likely has a cardinality lie somewhere in the TABLES clause. +SELECT SUM(amount) AS raw_total FROM SEMANTIC_SKILLS.SNIPPETS.DEALS; +-- | RAW_TOTAL | +-- |-------------| +-- | 313500.00 | ← correct deal total; $430k + $146.5k = $576.5k in the SV ≠ this + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 6: FORGOTTEN SEMI-ADDITIVE BEHAVIOR — CHECKLIST ONLY +-- ══════════════════════════════════════════════════════════════════════════════ +-- +-- No query to run — this is a model review heuristic, not a detectable error. +-- +-- Ask this question for every FACT and METRIC in your model: +-- "Does this column represent a SNAPSHOT (balance, headcount, inventory level) +-- or a FLOW (revenue, quantity sold, events)?" +-- +-- FLOW → SUM is correct. Adding up revenue across time periods is meaningful. +-- SNAPSHOT → SUM across time is almost certainly wrong. Summing daily account +-- balances across 30 days gives a number 30× too large. +-- +-- Examples of snapshot metrics that should NOT be SUM'd across time: +-- - Account balance (bank, savings, investment) +-- - Headcount / employee count +-- - Inventory on hand +-- - Active subscriptions +-- - Open pipeline value (the same deal counted every day it's open) +-- +-- The correct aggregation for snapshots is either: +-- - LAST_VALUE (closing balance, end-of-period headcount) +-- - AVG (average daily balance, average inventory) +-- - MAX/MIN (peak/trough) +-- +-- Snowflake SVs support this via NON ADDITIVE BY — see the `semi_additive_metric` +-- snippet for the full pattern. Use the question above as your trigger to go look +-- at that snippet before defining a SUM on any snapshot column. diff --git a/skills/semantic-view-patterns/snippets/sv_diagnostics/schema.sql b/skills/semantic-view-patterns/snippets/sv_diagnostics/schema.sql new file mode 100644 index 00000000..9985a6a8 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/sv_diagnostics/schema.sql @@ -0,0 +1,76 @@ +-- SV Diagnostics: Shared Schema +-- +-- One schema supports all four diagnostic scenarios: +-- +-- Scenario 1 — Ambiguous path +-- DEALS has two date FKs (created_date, close_date) both pointing to DIM_DATE. +-- Without USING on metrics, any date dimension query errors at runtime. +-- +-- Scenario 2 — Fan trap +-- Revenue lives at the DEALS header (one row per deal). +-- DEAL_ITEMS links each deal to one or more products. +-- Routing header-level revenue through DEAL_ITEMS to DIM_PRODUCT fans out rows +-- and produces a query-time error. +-- Fix: move the metric to DEAL_ITEMS.LINE_AMOUNT (line-item grain). +-- +-- Scenario 3 — Table with no relationship +-- DIM_REGION is defined in TABLES but never given a RELATIONSHIP to any fact. +-- Deploying succeeds; using its dimensions at query time errors. +-- +-- Scenario 4 — Duplicate logical name / ambiguous synonyms +-- Duplicate logical name across entities → deploy-time error (hard stop). +-- Overlapping synonyms across dimensions/metrics → deploys fine, but Cortex +-- Analyst can't disambiguate and refuses to answer. + +USE DATABASE SEMANTIC_SKILLS; +USE SCHEMA SNIPPETS; + +-- ── Dimensions ──────────────────────────────────────────────────────────────── + +CREATE OR REPLACE TABLE DIM_REP ( + rep_id INTEGER NOT NULL, + rep_name VARCHAR(50) NOT NULL, + region VARCHAR(20) NOT NULL, + team VARCHAR(20) NOT NULL, + CONSTRAINT pk_dim_rep PRIMARY KEY (rep_id) +); + +CREATE OR REPLACE TABLE DIM_PRODUCT ( + product_id INTEGER NOT NULL, + product_name VARCHAR(50) NOT NULL, + category VARCHAR(30) NOT NULL, + CONSTRAINT pk_dim_product PRIMARY KEY (product_id) +); + +-- Intentionally orphaned for Scenario 3 — defined in TABLES, no RELATIONSHIP +CREATE OR REPLACE TABLE DIM_REGION ( + region_code VARCHAR(20) NOT NULL, + region_name VARCHAR(50) NOT NULL, + CONSTRAINT pk_dim_region PRIMARY KEY (region_code) +); + +-- ── Facts ───────────────────────────────────────────────────────────────────── + +-- DEALS: two date FKs → Scenario 1 (ambiguous path). +-- Revenue (AMOUNT) lives here at header grain → Scenario 2 (fan trap source). +CREATE OR REPLACE TABLE DEALS ( + deal_id INTEGER NOT NULL, + rep_id INTEGER NOT NULL, -- FK → DIM_REP + created_date DATE NOT NULL, -- FK → DIM_DATE (pipeline entry) + close_date DATE, -- FK → DIM_DATE (NULL if open) + amount NUMBER(10,2) NOT NULL, + product VARCHAR(30) NOT NULL, + stage VARCHAR(20) NOT NULL, + CONSTRAINT pk_deals PRIMARY KEY (deal_id) +); + +-- DEAL_ITEMS: bridge between DEALS and DIM_PRODUCT. +-- LINE_AMOUNT is the per-product allocation of the deal amount. +-- This is the correct grain for product-level revenue metrics. +CREATE OR REPLACE TABLE DEAL_ITEMS ( + item_id INTEGER NOT NULL, + deal_id INTEGER NOT NULL, -- FK → DEALS + product_id INTEGER NOT NULL, -- FK → DIM_PRODUCT + line_amount NUMBER(10,2), -- revenue at line-item grain + CONSTRAINT pk_deal_items PRIMARY KEY (item_id) +); diff --git a/skills/semantic-view-patterns/snippets/sv_diagnostics/seed_data.sql b/skills/semantic-view-patterns/snippets/sv_diagnostics/seed_data.sql new file mode 100644 index 00000000..f68e3746 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/sv_diagnostics/seed_data.sql @@ -0,0 +1,93 @@ +-- SV Diagnostics: Seed Data + +USE DATABASE SEMANTIC_SKILLS; +USE SCHEMA SNIPPETS; + +-- ── DIM_DATE (shared calendar) ──────────────────────────────────────────────── +INSERT INTO DIM_DATE (date_key, month_num, month_name, quarter, year) +SELECT column1, column2, column3, column4, column5 +FROM VALUES + ('2025-01-06'::DATE, 1, 'January', 'Q1', 2025), + ('2025-01-15'::DATE, 1, 'January', 'Q1', 2025), + ('2025-01-22'::DATE, 1, 'January', 'Q1', 2025), + ('2025-01-28'::DATE, 1, 'January', 'Q1', 2025), + ('2025-01-31'::DATE, 1, 'January', 'Q1', 2025), + ('2025-02-03'::DATE, 2, 'February', 'Q1', 2025), + ('2025-02-10'::DATE, 2, 'February', 'Q1', 2025), + ('2025-02-14'::DATE, 2, 'February', 'Q1', 2025), + ('2025-02-20'::DATE, 2, 'February', 'Q1', 2025), + ('2025-02-28'::DATE, 2, 'February', 'Q1', 2025), + ('2025-03-05'::DATE, 3, 'March', 'Q1', 2025), + ('2025-03-12'::DATE, 3, 'March', 'Q1', 2025), + ('2025-03-19'::DATE, 3, 'March', 'Q1', 2025), + ('2025-03-25'::DATE, 3, 'March', 'Q1', 2025), + ('2025-03-31'::DATE, 3, 'March', 'Q1', 2025) +WHERE NOT EXISTS (SELECT 1 FROM DIM_DATE WHERE date_key = column1); + +-- ── DIM_REP ─────────────────────────────────────────────────────────────────── +DELETE FROM DIM_REP; +INSERT INTO DIM_REP VALUES + (1, 'Alice Nguyen', 'West', 'Enterprise'), + (2, 'Bob Torres', 'West', 'SMB'), + (3, 'Carol Kim', 'East', 'Enterprise'), + (4, 'David Osei', 'East', 'SMB'); + +-- ── DIM_PRODUCT ─────────────────────────────────────────────────────────────── +DELETE FROM DIM_PRODUCT; +INSERT INTO DIM_PRODUCT VALUES + (1, 'Analytics Cloud', 'Analytics'), + (2, 'Data Pipeline Pro', 'Data Pipelines'), + (3, 'ML Workbench', 'Analytics'), + (4, 'Connector Suite', 'Data Pipelines'); + +-- ── DIM_REGION (region_code must match DIM_REP.region values exactly) ───────── +DELETE FROM DIM_REGION; +INSERT INTO DIM_REGION VALUES + ('West', 'Western Region'), + ('East', 'Eastern Region'); + +-- ── DEALS ───────────────────────────────────────────────────────────────────── +DELETE FROM DEALS; +-- 12 deals Q1 2025. created_date and close_date often differ by weeks — +-- that gap makes the ambiguous-path error meaningful (Scenario 1). +-- AMOUNT is at header grain — used to demonstrate the fan trap (Scenario 2). +INSERT INTO DEALS VALUES +-- id rep created_date close_date amount product stage + (1, 1, '2025-01-06', '2025-01-31', 45000.00, 'Analytics', 'Closed Won'), + (2, 2, '2025-01-15', '2025-02-20', 12000.00, 'Data Pipelines', 'Closed Won'), + (3, 3, '2025-01-22', '2025-02-28', 30000.00, 'Analytics', 'Closed Lost'), + (4, 4, '2025-01-28', NULL, 8500.00, 'Data Pipelines', 'Open'), + (5, 1, '2025-02-03', '2025-03-12', 55000.00, 'Analytics', 'Closed Won'), + (6, 2, '2025-02-10', '2025-03-25', 18000.00, 'Data Pipelines', 'Closed Won'), + (7, 3, '2025-02-14', '2025-03-31', 22000.00, 'Analytics', 'Closed Won'), + (8, 4, '2025-02-20', NULL, 11000.00, 'Data Pipelines', 'Open'), + (9, 1, '2025-03-05', '2025-03-31', 60000.00, 'Analytics', 'Closed Won'), + (10, 2, '2025-03-12', NULL, 15000.00, 'Data Pipelines', 'Open'), + (11, 3, '2025-03-19', NULL, 28000.00, 'Analytics', 'Open'), + (12, 4, '2025-03-25', NULL, 9000.00, 'Data Pipelines', 'Open'); + +-- ── DEAL_ITEMS ──────────────────────────────────────────────────────────────── +-- Each deal links to 1–3 products. LINE_AMOUNT is the per-product share of +-- the deal amount (evenly split). Deals with multiple items (1, 3, 5, 6, 7, 9) +-- are the ones that would double/triple-count if header AMOUNT were used. +DELETE FROM DEAL_ITEMS; +INSERT INTO DEAL_ITEMS VALUES +-- id deal product line_amount + (1, 1, 1, 22500.00), -- Deal 1 ($45k): Analytics Cloud (split 2 ways) + (2, 1, 3, 22500.00), -- Deal 1 ($45k): ML Workbench + (3, 2, 2, 12000.00), -- Deal 2 ($12k): Data Pipeline Pro (single product) + (4, 3, 1, 15000.00), -- Deal 3 ($30k): Analytics Cloud (split 2 ways) + (5, 3, 3, 15000.00), -- Deal 3 ($30k): ML Workbench + (6, 4, 2, 8500.00), -- Deal 4 ($8.5k): Data Pipeline Pro (single product) + (7, 5, 1, 18333.33), -- Deal 5 ($55k): Analytics Cloud (split 3 ways) + (8, 5, 3, 18333.33), -- Deal 5 ($55k): ML Workbench + (9, 5, 4, 18333.34), -- Deal 5 ($55k): Connector Suite + (10, 6, 2, 9000.00), -- Deal 6 ($18k): Data Pipeline Pro (split 2 ways) + (11, 6, 4, 9000.00), -- Deal 6 ($18k): Connector Suite + (12, 7, 1, 22000.00), -- Deal 7 ($22k): Analytics Cloud (single product) + (13, 8, 2, 11000.00), -- Deal 8 ($11k): Data Pipeline Pro (single product) + (14, 9, 1, 30000.00), -- Deal 9 ($60k): Analytics Cloud (split 2 ways) + (15, 9, 3, 30000.00), -- Deal 9 ($60k): ML Workbench + (16,10, 4, 15000.00), -- Deal 10 ($15k): Connector Suite (single product) + (17,11, 1, 28000.00), -- Deal 11 ($28k): Analytics Cloud (single product) + (18,12, 2, 9000.00); -- Deal 12 ($9k): Data Pipeline Pro (single product) diff --git a/skills/semantic-view-patterns/snippets/sv_diagnostics/semantic_view.sql b/skills/semantic-view-patterns/snippets/sv_diagnostics/semantic_view.sql new file mode 100644 index 00000000..60b79def --- /dev/null +++ b/skills/semantic-view-patterns/snippets/sv_diagnostics/semantic_view.sql @@ -0,0 +1,476 @@ +-- SV Diagnostics: All Semantic View DDL +-- +-- Six diagnostic scenarios — each has a BROKEN version (to trigger the error) +-- and a FIXED version (the correct model). Read together with queries.sql and README.md. +-- +-- Scenario 1: Ambiguous path relationship → DEALS_AMBIGUOUS_PATH_SV / DEALS_FIXED_SV +-- Scenario 2: Fan trap → DEALS_FAN_TRAP_SV / DEALS_FAN_TRAP_FIXED_SV +-- Scenario 3: Table with no relationship → DEALS_NO_REL_SV / DEALS_NO_REL_FIXED_SV +-- Scenario 4: Duplicate name / ambiguous synos → DEALS_DUPE_NAME_SV (deploy error) +-- DEALS_AMBIGUOUS_NAMES_SV / DEALS_CLEAR_NAMES_SV +-- Scenario 5: Wrong relationship direction → deploy-time error (reversed FK/PK) +-- Wrong cardinality (lying PK) → DEALS_BOTH_UNIQUE_SV bypasses fan trap guard +-- Scenario 6: Forgotten semi-additive metric → checklist only, no SV DDL + +USE DATABASE SEMANTIC_SKILLS; +USE SCHEMA SNIPPETS; + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 1: AMBIGUOUS PATH RELATIONSHIP +-- ══════════════════════════════════════════════════════════════════════════════ +-- +-- PROBLEM: DEALS has two date FKs (CREATED_DATE, CLOSE_DATE) both referencing +-- DIM_DATE. Two relationships exist with no disambiguation. The SV deploys +-- without error, but any query that uses a date dimension fails at runtime: +-- +-- "Multi-path relationship between the dimension entity 'DATE_DIM' and the +-- base metric or dimension entity 'DEALS' is not supported." +-- +-- KEY INSIGHT: Queries that don't touch date dimensions work fine — the bug +-- hides until an analyst tries to do time-series analysis. +-- +-- FIX: Add USING (relationship) to every metric to declare which date path +-- that metric should use. Each metric independently picks its own date path. + +-- ── BROKEN ──────────────────────────────────────────────────────────────────── +CREATE OR REPLACE SEMANTIC VIEW DEALS_AMBIGUOUS_PATH_SV + TABLES ( + deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) + , date_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_DATE PRIMARY KEY (DATE_KEY) + , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) + ) + RELATIONSHIPS ( + deals_to_created_date AS deals(CREATED_DATE) REFERENCES date_dim(DATE_KEY) + , deals_to_close_date AS deals(CLOSE_DATE) REFERENCES date_dim(DATE_KEY) + , deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + ) + FACTS ( deals.amount AS AMOUNT ) + DIMENSIONS ( + deals.product AS PRODUCT + , deals.stage AS STAGE + , rep_dim.rep_name AS REP_NAME + , rep_dim.region AS REGION + , date_dim.year AS YEAR + , date_dim.month_num AS MONTH_NUM + , date_dim.month_name AS MONTH_NAME + ) + METRICS ( + -- No USING → ambiguous which date path to use at query time + deals.total_amount AS SUM(AMOUNT) + , deals.deal_count AS COUNT(DEAL_ID) + ); + +-- ── FIXED ───────────────────────────────────────────────────────────────────── +CREATE OR REPLACE SEMANTIC VIEW DEALS_FIXED_SV + TABLES ( + deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) + , date_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_DATE PRIMARY KEY (DATE_KEY) + , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) + ) + RELATIONSHIPS ( + deals_to_created_date AS deals(CREATED_DATE) REFERENCES date_dim(DATE_KEY) + , deals_to_close_date AS deals(CLOSE_DATE) REFERENCES date_dim(DATE_KEY) + , deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + ) + FACTS ( deals.amount AS AMOUNT ) + DIMENSIONS ( + deals.product AS PRODUCT + , deals.stage AS STAGE + , rep_dim.rep_name AS REP_NAME + , rep_dim.region AS REGION + , date_dim.year AS YEAR + , date_dim.month_num AS MONTH_NUM + , date_dim.month_name AS MONTH_NAME + ) + METRICS ( + -- USING declares which date relationship each metric uses + -- Syntax: entity.logical_name USING (relationship) AS physical_expression + deals.total_amount_created USING (deals_to_created_date) AS SUM(AMOUNT) + COMMENT = 'Total deal value, dated by when the deal was created' + , deals.deal_count_created USING (deals_to_created_date) AS COUNT(DEAL_ID) + COMMENT = 'Count of deals, dated by creation date' + , deals.total_amount_closed USING (deals_to_close_date) AS SUM(AMOUNT) + COMMENT = 'Total deal value, dated by close date (excludes open deals)' + , deals.deal_count_closed USING (deals_to_close_date) AS COUNT(DEAL_ID) + COMMENT = 'Count of closed deals, dated by close date' + ); + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 2: FAN TRAP +-- ══════════════════════════════════════════════════════════════════════════════ +-- +-- PROBLEM: Revenue (AMOUNT) lives at the DEALS header level — one row per deal. +-- DEAL_ITEMS is a bridge table: each deal maps to one or more products. +-- Defining a metric on DEALS and grouping by a dimension from DIM_PRODUCT +-- requires routing through DEAL_ITEMS, which fans out DEALS rows. +-- The SV engine catches this and errors at query time: +-- +-- "The dimension entity 'PRODUCTS' must be related to and have an equal or +-- lower level of granularity compared to the base metric or dimension +-- entity 'DEALS'." +-- +-- NOTE: This same error appears for Scenario 3 (no relationship). The fix +-- is different — here the relationship exists but at the wrong grain; in +-- Scenario 3 the relationship is simply missing entirely. +-- +-- FIX: Move the metric to DEAL_ITEMS.LINE_AMOUNT (line-item grain). A metric +-- defined on DEAL_ITEMS can be grouped by DIM_PRODUCT because DEAL_ITEMS +-- directly references DIM_PRODUCT — same or lower granularity. ✓ + +-- ── BROKEN ──────────────────────────────────────────────────────────────────── +CREATE OR REPLACE SEMANTIC VIEW DEALS_FAN_TRAP_SV + TABLES ( + deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) + , deal_items AS SEMANTIC_SKILLS.SNIPPETS.DEAL_ITEMS PRIMARY KEY (ITEM_ID) + , products AS SEMANTIC_SKILLS.SNIPPETS.DIM_PRODUCT PRIMARY KEY (PRODUCT_ID) + , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) + ) + RELATIONSHIPS ( + items_to_deals AS deal_items(DEAL_ID) REFERENCES deals(DEAL_ID) + , items_to_products AS deal_items(PRODUCT_ID) REFERENCES products(PRODUCT_ID) + , deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + ) + FACTS ( deals.amount AS AMOUNT ) -- metric at DEALS grain + DIMENSIONS ( + deals.stage AS STAGE + , rep_dim.rep_name AS REP_NAME + , products.product_name AS PRODUCT_NAME -- dimension at PRODUCTS grain + , products.category AS CATEGORY + ) + METRICS ( + -- AMOUNT is at DEALS grain. PRODUCTS is reachable only via DEAL_ITEMS. + -- Grouping by PRODUCTS fanout multiplies DEALS rows → fan trap. + deals.total_amount AS SUM(AMOUNT) + , deals.deal_count AS COUNT(DEAL_ID) + ); + +-- ── FIXED ───────────────────────────────────────────────────────────────────── +CREATE OR REPLACE SEMANTIC VIEW DEALS_FAN_TRAP_FIXED_SV + TABLES ( + deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) + , deal_items AS SEMANTIC_SKILLS.SNIPPETS.DEAL_ITEMS PRIMARY KEY (ITEM_ID) + , products AS SEMANTIC_SKILLS.SNIPPETS.DIM_PRODUCT PRIMARY KEY (PRODUCT_ID) + , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) + ) + RELATIONSHIPS ( + items_to_deals AS deal_items(DEAL_ID) REFERENCES deals(DEAL_ID) + , items_to_products AS deal_items(PRODUCT_ID) REFERENCES products(PRODUCT_ID) + , deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + ) + FACTS ( + deal_items.line_amount AS LINE_AMOUNT -- metric moved to DEAL_ITEMS grain + ) + DIMENSIONS ( + deals.stage AS STAGE + , rep_dim.rep_name AS REP_NAME + , products.product_name AS PRODUCT_NAME + , products.category AS CATEGORY + ) + METRICS ( + -- LINE_AMOUNT is at DEAL_ITEMS grain — same level as DIM_PRODUCT. + -- Grouping by product category is now safe. ✓ + deal_items.total_revenue AS SUM(LINE_AMOUNT) + COMMENT = 'Revenue at line-item grain — safe to group by product or category' + , deal_items.item_count AS COUNT(ITEM_ID) + ); + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 3: TABLE WITH NO RELATIONSHIP +-- ══════════════════════════════════════════════════════════════════════════════ +-- +-- PROBLEM: DIM_REGION is listed in the TABLES clause but has no RELATIONSHIP +-- connecting it to any fact. The SV deploys without error. At query time, +-- using any dimension from DIM_REGION triggers the same error as a fan trap: +-- +-- "The dimension entity 'DIM_REGION' must be related to and have an equal or +-- lower level of granularity compared to the base metric or dimension +-- entity 'DEALS'." +-- +-- The error message is identical to the fan trap — the diagnostic difference +-- is that here there is NO relationship at all, whereas in a fan trap there IS +-- a relationship but at the wrong grain. +-- +-- FIX: Either add the missing relationship, or remove the orphaned table from +-- the TABLES clause if it was included by mistake. + +-- ── BROKEN ──────────────────────────────────────────────────────────────────── +CREATE OR REPLACE SEMANTIC VIEW DEALS_NO_REL_SV + TABLES ( + deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) + , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) + , dim_region AS SEMANTIC_SKILLS.SNIPPETS.DIM_REGION PRIMARY KEY (REGION_CODE) + ) + RELATIONSHIPS ( + deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + -- dim_region has no relationship — orphaned table + ) + FACTS ( deals.amount AS AMOUNT ) + DIMENSIONS ( + deals.stage AS STAGE + , rep_dim.rep_name AS REP_NAME + , rep_dim.region AS REGION + , dim_region.region_name AS REGION_NAME -- will error at query time + ) + METRICS ( deals.total_amount AS SUM(AMOUNT) ); + +-- ── FIXED ───────────────────────────────────────────────────────────────────── +CREATE OR REPLACE SEMANTIC VIEW DEALS_NO_REL_FIXED_SV + TABLES ( + deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) + , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) + , dim_region AS SEMANTIC_SKILLS.SNIPPETS.DIM_REGION PRIMARY KEY (REGION_CODE) + ) + RELATIONSHIPS ( + deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + , rep_to_region AS rep_dim(REGION) REFERENCES dim_region(REGION_CODE) + -- dim_region is now reachable: deals → rep_dim → dim_region ✓ + ) + FACTS ( deals.amount AS AMOUNT ) + DIMENSIONS ( + deals.stage AS STAGE + , rep_dim.rep_name AS REP_NAME + , rep_dim.region AS REGION + , dim_region.region_name AS REGION_NAME + ) + METRICS ( deals.total_amount AS SUM(AMOUNT) ); + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 4A: DUPLICATE LOGICAL NAME — DEPLOY-TIME ERROR +-- ══════════════════════════════════════════════════════════════════════════════ +-- +-- PROBLEM: Two dimensions in the same SV share the same logical name, even +-- across different entities. The SV engine enforces globally unique logical +-- names within a SV — this fails immediately at CREATE time: +-- +-- "SQL compilation error: invalid identifier ''" +-- +-- Attempting to run this will fail. It is included here to show the error. +-- +-- FIX: Give each dimension a unique logical name that reflects its entity context. +-- If two dimensions represent the same concept from different join paths, +-- consider whether they should be in separate SVs or use distinct names like +-- rep_segment vs product_segment. + +-- Uncomment to reproduce the deploy-time error: +-- CREATE OR REPLACE SEMANTIC VIEW DEALS_DUPE_NAME_SV +-- TABLES ( +-- deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) +-- , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) +-- , products AS SEMANTIC_SKILLS.SNIPPETS.DIM_PRODUCT PRIMARY KEY (PRODUCT_ID) +-- , deal_items AS SEMANTIC_SKILLS.SNIPPETS.DEAL_ITEMS PRIMARY KEY (ITEM_ID) +-- ) +-- RELATIONSHIPS ( +-- deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) +-- , items_to_deals AS deal_items(DEAL_ID) REFERENCES deals(DEAL_ID) +-- , items_to_products AS deal_items(PRODUCT_ID) REFERENCES products(PRODUCT_ID) +-- ) +-- FACTS ( deals.amount AS AMOUNT ) +-- DIMENSIONS ( +-- deals.stage AS STAGE +-- , rep_dim.rep_name AS REP_NAME +-- , rep_dim.segment AS REGION -- logical name: "segment" +-- , products.segment AS CATEGORY -- logical name: "segment" ← DUPLICATE → error +-- ) +-- METRICS ( deals.total_amount AS SUM(AMOUNT) ); + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 4B: OVERLAPPING SYNONYMS — CORTEX ANALYST AMBIGUITY +-- ══════════════════════════════════════════════════════════════════════════════ +-- +-- PROBLEM: Distinct logical names but overlapping synonyms. The SV deploys +-- and all SQL queries work correctly. However Cortex Analyst cannot disambiguate: +-- when the user asks "what is total revenue by segment?" CA refuses to answer +-- because "revenue" matches two metrics and "segment" matches two dimensions. +-- +-- FIX: Give each dimension and metric a synonym set that is unique and scoped +-- to its entity context. Avoid sharing high-value terms like "revenue", "count", +-- "total", "segment" across multiple definitions. + +-- ── BROKEN (deploys, but CA refuses ambiguous queries) ─────────────────────── +CREATE OR REPLACE SEMANTIC VIEW DEALS_AMBIGUOUS_NAMES_SV + TABLES ( + deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) + , deal_items AS SEMANTIC_SKILLS.SNIPPETS.DEAL_ITEMS PRIMARY KEY (ITEM_ID) + , products AS SEMANTIC_SKILLS.SNIPPETS.DIM_PRODUCT PRIMARY KEY (PRODUCT_ID) + , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) + ) + RELATIONSHIPS ( + items_to_deals AS deal_items(DEAL_ID) REFERENCES deals(DEAL_ID) + , items_to_products AS deal_items(PRODUCT_ID) REFERENCES products(PRODUCT_ID) + , deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + ) + FACTS ( + deals.amount AS AMOUNT + , deal_items.line_amount AS LINE_AMOUNT + ) + DIMENSIONS ( + deals.stage AS STAGE + , rep_dim.rep_name AS REP_NAME + -- "segment", "area" claimed by both → CA can't resolve either + , rep_dim.rep_segment AS REGION WITH SYNONYMS ('segment', 'region', 'territory', 'area') + , products.product_segment AS CATEGORY WITH SYNONYMS ('segment', 'category', 'product type', 'area') + ) + METRICS ( + -- "revenue", "total revenue" claimed by both → CA can't resolve either + deals.total_amount AS SUM(AMOUNT) + WITH SYNONYMS ('revenue', 'total revenue', 'sales') + COMMENT = 'Total deal value at header level — not suitable for product breakdowns' + , deal_items.total_revenue AS SUM(LINE_AMOUNT) + WITH SYNONYMS ('revenue', 'total revenue', 'product revenue') + COMMENT = 'Revenue at line-item level — use this when grouping by product' + ); + +-- ── FIXED ───────────────────────────────────────────────────────────────────── +CREATE OR REPLACE SEMANTIC VIEW DEALS_CLEAR_NAMES_SV + TABLES ( + deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) + , deal_items AS SEMANTIC_SKILLS.SNIPPETS.DEAL_ITEMS PRIMARY KEY (ITEM_ID) + , products AS SEMANTIC_SKILLS.SNIPPETS.DIM_PRODUCT PRIMARY KEY (PRODUCT_ID) + , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) + ) + RELATIONSHIPS ( + items_to_deals AS deal_items(DEAL_ID) REFERENCES deals(DEAL_ID) + , items_to_products AS deal_items(PRODUCT_ID) REFERENCES products(PRODUCT_ID) + , deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + ) + FACTS ( + deals.amount AS AMOUNT + , deal_items.line_amount AS LINE_AMOUNT + ) + DIMENSIONS ( + deals.stage AS STAGE + , rep_dim.rep_name AS REP_NAME + -- Each dimension owns a non-overlapping synonym set + , rep_dim.rep_territory AS REGION WITH SYNONYMS ('rep territory', 'sales territory', 'rep region') + , products.product_category AS CATEGORY WITH SYNONYMS ('product category', 'product type', 'product line') + ) + METRICS ( + -- Each metric owns a non-overlapping synonym set + deals.deal_value AS SUM(AMOUNT) + WITH SYNONYMS ('deal value', 'total deal value', 'closed deal value', 'pipeline value') + COMMENT = 'Total deal value at header level — use for deal-level analysis by rep, stage, or time' + , deal_items.product_revenue AS SUM(LINE_AMOUNT) + WITH SYNONYMS ('product revenue', 'revenue by product', 'line item revenue') + COMMENT = 'Revenue at line-item level — use when grouping by product or product category' + ); + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 5A: REVERSED RELATIONSHIP DIRECTION — DEPLOY-TIME ERROR +-- ══════════════════════════════════════════════════════════════════════════════ +-- +-- PROBLEM: The REFERENCES direction is flipped — the dimension (one side) is +-- placed on the left of REFERENCES, pointing to the fact (many side) on the right. +-- The SV engine enforces that the right-hand side of REFERENCES must be a +-- declared primary or unique key. Since DEALS.REP_ID is not a PK, this fails +-- immediately at CREATE time: +-- +-- "The referenced key in the relationship 'REP_DIM REFERENCES DEALS' must be +-- the primary or unique key of the referenced entity." +-- +-- Attempting to run this will fail. It is included here to show the error. +-- +-- FIX: Always write relationships as many_side(FK) REFERENCES one_side(PK). +-- The right-hand side must be the primary key of the dimension/parent table. + +-- Uncomment to reproduce the deploy-time error: +-- CREATE OR REPLACE SEMANTIC VIEW DEALS_REVERSED_REL_SV +-- TABLES ( +-- deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) +-- , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) +-- ) +-- RELATIONSHIPS ( +-- rep_to_deals AS rep_dim(REP_ID) REFERENCES deals(REP_ID) -- ← reversed! +-- -- ^^^^^^ not a PK of DEALS +-- ) +-- FACTS ( deals.amount AS AMOUNT ) +-- DIMENSIONS ( rep_dim.rep_name AS REP_NAME ) +-- METRICS ( deals.total_amount AS SUM(AMOUNT) ); + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 5B: WRONG CARDINALITY (LYING ABOUT THE PRIMARY KEY) +-- ══════════════════════════════════════════════════════════════════════════════ +-- +-- PROBLEM: DEAL_ITEMS has ITEM_ID as its real primary key (many items per deal). +-- The modeler accidentally declares PRIMARY KEY (DEAL_ID) instead — asserting +-- that DEAL_ITEMS is at DEAL grain (one item per deal). Snowflake does NOT +-- enforce PK uniqueness, so the SV deploys without error. +-- +-- The consequence is subtle and dangerous: the SV engine's fan trap detector +-- uses the declared PK to assess cardinality. By lying that DEAL_ITEMS is +-- at DEAL grain, the engine believes the DEAL_ITEMS → DEALS relationship is +-- 1:1. It therefore allows querying DEALS.AMOUNT (header-level) grouped by +-- DIM_PRODUCT dimensions — the exact query that correctly errors on a properly +-- declared model. The fan trap runs silently and inflates every number. +-- +-- COMPARISON: +-- Correct SV (DEALS_FAN_TRAP_SV, PK=ITEM_ID): +-- deals.total_amount by products.category → ERROR (fan trap caught ✓) +-- Wrong cardinality SV (below, PK=DEAL_ID): +-- deals.total_amount by products.category → runs, returns inflated numbers ✗ +-- +-- Analytics: correct ≈ $240k → wrong = $430k (multi-item deals counted 2-3×) +-- Data Pipelines: correct ≈ $73.5k → wrong = $146.5k +-- +-- DETECTION: Run the same query on both a correctly-declared and wrong-cardinality +-- SV and compare totals. Wrong-cardinality results will be inflated by a factor +-- roughly equal to the average number of items per deal. +-- +-- FIX: Declare PRIMARY KEY on the column that is actually unique in that table. +-- For bridge/line-item tables: PRIMARY KEY (ITEM_ID), not the FK column. + +CREATE OR REPLACE SEMANTIC VIEW SEMANTIC_SKILLS.SNIPPETS.DEALS_BOTH_UNIQUE_SV + TABLES ( + deals AS SEMANTIC_SKILLS.SNIPPETS.DEALS PRIMARY KEY (DEAL_ID) + , deal_items AS SEMANTIC_SKILLS.SNIPPETS.DEAL_ITEMS PRIMARY KEY (DEAL_ID) + -- ^^^^^^^^ + -- Wrong: DEAL_ID is a FK in DEAL_ITEMS, not unique. Correct: PRIMARY KEY (ITEM_ID) + , products AS SEMANTIC_SKILLS.SNIPPETS.DIM_PRODUCT PRIMARY KEY (PRODUCT_ID) + , rep_dim AS SEMANTIC_SKILLS.SNIPPETS.DIM_REP PRIMARY KEY (REP_ID) + ) + RELATIONSHIPS ( + items_to_deals AS deal_items(DEAL_ID) REFERENCES deals(DEAL_ID) + , items_to_products AS deal_items(PRODUCT_ID) REFERENCES products(PRODUCT_ID) + , deals_to_rep AS deals(REP_ID) REFERENCES rep_dim(REP_ID) + ) + FACTS ( + deals.amount AS AMOUNT + , deal_items.line_amount AS LINE_AMOUNT + ) + DIMENSIONS ( + deals.stage AS STAGE + , rep_dim.rep_name AS REP_NAME + , products.product_name AS PRODUCT_NAME + , products.category AS CATEGORY + ) + METRICS ( + deals.total_amount AS SUM(AMOUNT) + , deal_items.total_revenue AS SUM(LINE_AMOUNT) + , deal_items.item_count AS COUNT(ITEM_ID) + ); + + +-- ══════════════════════════════════════════════════════════════════════════════ +-- SCENARIO 6: FORGOTTEN SEMI-ADDITIVE BEHAVIOR +-- ══════════════════════════════════════════════════════════════════════════════ +-- +-- This scenario has no broken SV — the query always runs without error. +-- A SUM() on a balance, headcount, or inventory snapshot is syntactically valid +-- but semantically wrong: summing a point-in-time snapshot across time produces +-- a number that has no business meaning. +-- +-- Example: a daily account balance table. Each row is the end-of-day balance +-- for one account. SUM(balance) across all days = nonsense. The correct +-- aggregation is LAST_VALUE(balance) per account, or AVG if smoothing is needed. +-- +-- See the `semi_additive_metric` snippet for the full NON ADDITIVE BY pattern. +-- The checklist question here is: "Does this column represent a snapshot +-- (balance, headcount, inventory) rather than a flow (revenue, quantity sold)? +-- If yes, SUM across time is almost certainly wrong." +-- +-- No DDL for this scenario — it is a model design heuristic, not a detectable error. diff --git a/skills/semantic-view-patterns/snippets/sv_diagnostics/semantic_view.yaml b/skills/semantic-view-patterns/snippets/sv_diagnostics/semantic_view.yaml new file mode 100644 index 00000000..e8d24690 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/sv_diagnostics/semantic_view.yaml @@ -0,0 +1,136 @@ +# SV Diagnostics: Semantic View YAML +# +# The DEALS_FIXED_SV (the correctly-structured diagnostic SV) is exported via: +# SELECT SYSTEM$READ_YAML_FROM_SEMANTIC_VIEW('SEMANTIC_SKILLS.SNIPPETS.DEALS_FIXED_SV'); +# +# KEY YAML ADVANTAGE: verify_only=TRUE in SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML +# provides a pre-deployment dry-run — the closest thing to a model linter: +# +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML( +# 'TARGET_DB.TARGET_SCHEMA', +# $$ $$, +# TRUE -- verify_only: validates without deploying +# ); +# +# DIAGNOSTIC SCENARIOS IN YAML: +# Scenario 1 — Ambiguous path: fully expressible (no using_relationships → same error) +# Scenario 2 — Fan trap: fully expressible (wrong grain → same error) +# Scenario 3 — No relationship: fully expressible (orphaned table → same error) +# Scenario 4a — Duplicate name: verify_only=TRUE catches this before deployment ✓ +# Scenario 4b — Ambiguous synonyms: fully expressible (same CA confusion) +# Scenario 5a — Reversed direction: verify_only=TRUE catches this ✓ +# Scenario 5b — Wrong cardinality: fully expressible (same silent wrong-result risk) +# Scenario 6 — Semi-additive: non_additive_dimensions is the YAML fix ✓ +# +# For broken/error-triggering variants, see semantic_view.sql. +# +# Deploy: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# Verify only: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); + +name: DEALS_FIXED_SV + +tables: + - name: DEALS + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DEALS + primary_key: + columns: + - DEAL_ID + dimensions: + - name: PRODUCT + expr: PRODUCT + data_type: VARCHAR(30) + - name: STAGE + expr: STAGE + data_type: VARCHAR(20) + facts: + - name: AMOUNT + expr: AMOUNT + data_type: NUMBER(10,2) + access_modifier: public_access + metrics: + # using_relationships is the YAML equivalent of DDL's USING (relationship_name) + # Scenario 1 fix: each metric declares its own date path — no ambiguity + - name: TOTAL_AMOUNT_CREATED + description: Total deal value, dated by when the deal was created + expr: SUM(AMOUNT) + access_modifier: public_access + using_relationships: + - DEALS_TO_CREATED_DATE + - name: DEAL_COUNT_CREATED + description: Count of deals, dated by creation date + expr: COUNT(DEAL_ID) + access_modifier: public_access + using_relationships: + - DEALS_TO_CREATED_DATE + - name: TOTAL_AMOUNT_CLOSED + description: Total deal value, dated by close date + expr: SUM(AMOUNT) + access_modifier: public_access + using_relationships: + - DEALS_TO_CLOSE_DATE + - name: DEAL_COUNT_CLOSED + description: Count of closed deals, dated by close date + expr: COUNT(DEAL_ID) + access_modifier: public_access + using_relationships: + - DEALS_TO_CLOSE_DATE + + - name: DATE_DIM + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DIM_DATE + primary_key: + columns: + - DATE_KEY + dimensions: + - name: MONTH_NAME + expr: MONTH_NAME + data_type: VARCHAR(10) + - name: MONTH_NUM + expr: MONTH_NUM + data_type: NUMBER + - name: YEAR + expr: YEAR + data_type: NUMBER + + - name: REP_DIM + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DIM_REP + primary_key: + columns: + - REP_ID + dimensions: + - name: REP_NAME + expr: REP_NAME + data_type: VARCHAR(50) + - name: REGION + expr: REGION + data_type: VARCHAR(20) + +relationships: + - name: DEALS_TO_CREATED_DATE + left_table: DEALS + right_table: DATE_DIM + relationship_columns: + - left_column: CREATED_DATE + right_column: DATE_KEY + - name: DEALS_TO_CLOSE_DATE + left_table: DEALS + right_table: DATE_DIM + relationship_columns: + - left_column: CLOSE_DATE + right_column: DATE_KEY + - name: DEALS_TO_REP + left_table: DEALS + right_table: REP_DIM + relationship_columns: + - left_column: REP_ID + right_column: REP_ID diff --git a/skills/semantic-view-patterns/snippets/system_explain_semantic_query/README.md b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/README.md new file mode 100644 index 00000000..d082989b --- /dev/null +++ b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/README.md @@ -0,0 +1,63 @@ +# SYSTEM$EXPLAIN_SEMANTIC_QUERY + +## The Problem + +A `SEMANTIC_VIEW()` query fails — or returns unexpected results — and you need to understand what SQL the engine is actually generating. The error message points to a column name that doesn't exist, or a result looks wrong, but the SV definition looks correct. Without seeing the generated SQL, debugging is guesswork. + +`SYSTEM$EXPLAIN_SEMANTIC_QUERY` solves this: given a semantic view name and a `SEMANTIC_VIEW()` query string, it returns the exact SQL the engine would generate and execute — without running it. + +## When to Use It + +| Situation | What EXPLAIN tells you | +|-----------|----------------------| +| Query fails with `invalid identifier` | Which column or alias the engine generated that doesn't resolve | +| Unexpected metric values | Whether the aggregation, join, or GROUP BY is what you expect | +| Debugging PRIVATE facts | Whether an intermediate fact is inlined correctly into downstream expressions | +| Verifying a complex join path | Which tables are joined and in what order | +| Learning how SVs work | The generated SQL is plain, readable SELECT — see the "magic" | + +## How You Might Express This Need + +- "My SEMANTIC_VIEW query fails and I can't tell why — how do I debug it?" +- "I want to see the SQL the semantic view generates for a given query" +- "The metric value looks wrong — how can I verify the aggregation?" +- "How do I know which join path the engine is using?" + +## The SV Approach + +```sql +SELECT SYSTEM$EXPLAIN_SEMANTIC_QUERY( + 'SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV', + $$ + SELECT sv.* + FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV + METRICS tickets.total_tickets + DIMENSIONS customers.tier + ) AS sv + $$ +); +``` + +The function returns a single string: the SQL that would be executed against the underlying tables. It does **not** run the query — it is safe to call even for queries that would fail at runtime. + +## What Doesn't Work + +- **Only accepts `SEMANTIC_VIEW()` query syntax** — you cannot pass a general SQL query. The inner string must use the `SEMANTIC_VIEW(... METRICS ... DIMENSIONS ...)` syntax. +- **Does not validate that the generated SQL is correct** — it shows what the engine *intends* to generate, but the result can still fail if the generated SQL references something that doesn't exist (that's the point: you can see *why* it fails). +- **Output is a single long string** — use `PARSE_JSON` or a `::`-cast and pretty-print in your client if needed. + +## Docs + +- [SYSTEM$EXPLAIN_SEMANTIC_QUERY](https://docs.snowflake.com/en/sql-reference/functions/system_explain_semantic_query) +- [Querying a semantic view](https://docs.snowflake.com/en/user-guide/views-semantic/querying) +- [DESCRIBE SEMANTIC VIEW](https://docs.snowflake.com/en/sql-reference/sql/desc-semantic-view) — complementary introspection command + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `customers`, `support_tickets` table DDL | +| `seed_data.sql` | 4 customers (tiers), 10 tickets | +| `semantic_view.sql` | SV with PRIVATE fact, derived dimension, cross-table metric | +| `queries.sql` | EXPLAIN calls for simple, cross-table, and PRIVATE-fact queries | diff --git a/skills/semantic-view-patterns/snippets/system_explain_semantic_query/queries.sql b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/queries.sql new file mode 100644 index 00000000..0002fa17 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/queries.sql @@ -0,0 +1,135 @@ +-- SYSTEM$EXPLAIN_SEMANTIC_QUERY: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- HOW TO USE +-- ============================================================ +-- +-- SYSTEM$EXPLAIN_SEMANTIC_QUERY(sv_name, query_string) +-- sv_name : fully qualified semantic view name (string literal) +-- query_string : a SEMANTIC_VIEW() query in $$...$$ dollar-quoting +-- +-- Returns: the SQL the engine would generate — without executing it. +-- Safe to call even for queries that would fail at runtime. + + +-- ============================================================ +-- 1. Simple metric + dimension +-- Shows: generated SELECT, GROUP BY, and join to customers table +-- ============================================================ + +SELECT SYSTEM$EXPLAIN_SEMANTIC_QUERY( + 'SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV', + $$ + SELECT sv.* + FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV + METRICS support_tickets.total_tickets + DIMENSIONS customers.tier + ) AS sv + $$ +); + + +-- ============================================================ +-- 2. Derived dimension from PRIVATE fact +-- Shows: the CASE expression for value_segment is inlined directly +-- into the generated SQL — the PRIVATE fact is never exposed as +-- a column; it appears as a subexpression inside the CASE. +-- ============================================================ + +SELECT SYSTEM$EXPLAIN_SEMANTIC_QUERY( + 'SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV', + $$ + SELECT sv.* + FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV + METRICS support_tickets.total_tickets, support_tickets.total_revenue + DIMENSIONS customers.value_segment + ) AS sv + $$ +); + + +-- ============================================================ +-- 3. Multi-metric cross-table query +-- Shows: how two metrics from different granularities +-- (ticket-level count + customer-level count) are combined +-- ============================================================ + +SELECT SYSTEM$EXPLAIN_SEMANTIC_QUERY( + 'SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV', + $$ + SELECT sv.* + FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV + METRICS support_tickets.total_tickets, customers.customer_count + DIMENSIONS customers.tier + ) AS sv + $$ +); + + +-- ============================================================ +-- 4. Using EXPLAIN to diagnose a failing query BEFORE running it +-- The query below mixes FACTS and METRICS — which is illegal. +-- EXPLAIN shows you the intended SQL so you can spot the issue +-- without waiting for a runtime error. +-- ============================================================ + +-- First, see what EXPLAIN shows for the mixed query: +SELECT SYSTEM$EXPLAIN_SEMANTIC_QUERY( + 'SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV', + $$ + SELECT sv.* + FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV + FACTS support_tickets.ticket_amount + METRICS support_tickets.total_tickets -- mixing FACTS + METRICS + ) AS sv + $$ +); + +-- Then confirm the actual runtime error: +-- SELECT sv.* +-- FROM SEMANTIC_VIEW( +-- SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV +-- FACTS support_tickets.ticket_amount +-- METRICS support_tickets.total_tickets +-- ) AS sv; +-- → Error: Cannot specify FACTS and METRICS in the same SEMANTIC_VIEW clause + + +-- ============================================================ +-- RUNNING QUERIES (for comparison with EXPLAIN output) +-- ============================================================ + +-- Q1: What EXPLAIN showed in query 1 — tickets by tier +SELECT sv.* +FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV + METRICS support_tickets.total_tickets + DIMENSIONS customers.tier +) AS sv +ORDER BY sv.total_tickets DESC; + + +-- Q2: What EXPLAIN showed in query 2 — revenue by value segment +SELECT sv.* +FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV + METRICS support_tickets.total_tickets, support_tickets.total_revenue + DIMENSIONS customers.value_segment +) AS sv +ORDER BY sv.value_segment; + + +-- ============================================================ +-- HOW SYSTEM$EXPLAIN_SEMANTIC_QUERY WORKS: +-- The function compiles the SEMANTIC_VIEW() call against the SV's +-- metadata — resolving metric expressions, join paths, and dimension +-- derivations — then serializes the resulting logical plan as SQL. +-- No tables are scanned; no query is executed. Use it freely as a +-- debugging and learning tool. diff --git a/skills/semantic-view-patterns/snippets/system_explain_semantic_query/schema.sql b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/schema.sql new file mode 100644 index 00000000..db35a36f --- /dev/null +++ b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/schema.sql @@ -0,0 +1,22 @@ +-- SYSTEM$EXPLAIN_SEMANTIC_QUERY: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE customers ( + customer_id INTEGER NOT NULL, + customer_name VARCHAR(50) NOT NULL, + tier VARCHAR(20) NOT NULL, -- 'enterprise', 'mid-market', 'smb' + CONSTRAINT pk_customers_explain PRIMARY KEY (customer_id) +); + +CREATE OR REPLACE TABLE support_tickets ( + ticket_id INTEGER NOT NULL, + customer_id INTEGER NOT NULL, + opened_date DATE NOT NULL, + priority VARCHAR(10) NOT NULL, -- 'P1', 'P2', 'P3' + amount NUMBER(10,2) NOT NULL, -- contract value at time of ticket + CONSTRAINT pk_tickets PRIMARY KEY (ticket_id) +); diff --git a/skills/semantic-view-patterns/snippets/system_explain_semantic_query/seed_data.sql b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/seed_data.sql new file mode 100644 index 00000000..b190ecfc --- /dev/null +++ b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/seed_data.sql @@ -0,0 +1,22 @@ +-- SYSTEM$EXPLAIN_SEMANTIC_QUERY: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO customers VALUES + (1, 'Acme Corp', 'enterprise'), + (2, 'Riverside LLC', 'mid-market'), + (3, 'Tiny Co', 'smb'), + (4, 'Global Inc', 'enterprise'); + +INSERT INTO support_tickets VALUES + ( 1, 1, '2024-01-10', 'P1', 120000.00), + ( 2, 1, '2024-02-14', 'P2', 80000.00), + ( 3, 1, '2024-03-05', 'P3', 50000.00), + ( 4, 2, '2024-01-22', 'P1', 45000.00), + ( 5, 2, '2024-03-18', 'P2', 30000.00), + ( 6, 3, '2024-02-01', 'P3', 5000.00), + ( 7, 3, '2024-04-09', 'P2', 8000.00), + ( 8, 4, '2024-01-30', 'P1', 95000.00), + ( 9, 4, '2024-02-20', 'P1', 110000.00), + (10, 4, '2024-04-15', 'P2', 60000.00); diff --git a/skills/semantic-view-patterns/snippets/system_explain_semantic_query/semantic_view.sql b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/semantic_view.sql new file mode 100644 index 00000000..21b37714 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/semantic_view.sql @@ -0,0 +1,58 @@ +-- SYSTEM$EXPLAIN_SEMANTIC_QUERY: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.SUPPORT_ANALYTICS_SV + + TABLES ( + customers PRIMARY KEY (customer_id), + support_tickets PRIMARY KEY (ticket_id) + ) + + RELATIONSHIPS ( + support_tickets(customer_id) REFERENCES customers + ) + + FACTS ( + support_tickets.ticket_amount AS amount, + + -- PRIVATE entity-level fact: total contract value per customer + -- Cannot be queried directly; used only inside the derived dimension below + PRIVATE customers.total_contract_value AS SUM(support_tickets.amount) + ) + + DIMENSIONS ( + customers.tier AS tier + WITH SYNONYMS ('customer tier', 'segment'), + customers.customer_name AS customer_name + WITH SYNONYMS ('customer', 'account'), + + -- Derived dimension from a PRIVATE aggregated fact + customers.value_segment AS ( + CASE + WHEN customers.total_contract_value >= 200000 THEN 'high-value' + WHEN customers.total_contract_value >= 50000 THEN 'mid-value' + ELSE 'low-value' + END + ) + WITH SYNONYMS ('value segment', 'account segment'), + + support_tickets.priority AS priority + WITH SYNONYMS ('ticket priority', 'severity'), + support_tickets.opened_date AS opened_date + WITH SYNONYMS ('date', 'ticket date') + ) + + METRICS ( + support_tickets.total_tickets AS COUNT(ticket_id) + WITH SYNONYMS ('tickets', 'ticket count', 'number of tickets'), + support_tickets.total_revenue AS SUM(amount) + WITH SYNONYMS ('revenue', 'contract value'), + customers.customer_count AS COUNT(customer_id) + WITH SYNONYMS ('customers', 'accounts') + ) + + COMMENT = 'Support ticket analytics. Includes a PRIVATE aggregated fact (total_contract_value) that drives the value_segment derived dimension — useful for demonstrating SYSTEM$EXPLAIN_SEMANTIC_QUERY on queries with inlined PRIVATE logic.' + + AI_SQL_GENERATION 'Use customers.tier for enterprise/mid-market/smb breakdowns. Use customers.value_segment for high/mid/low-value account segmentation. Use support_tickets.priority to filter or group by P1/P2/P3. The total_contract_value fact is PRIVATE — it powers value_segment internally.'; diff --git a/skills/semantic-view-patterns/snippets/system_explain_semantic_query/semantic_view.yaml b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/semantic_view.yaml new file mode 100644 index 00000000..d79c1d4a --- /dev/null +++ b/skills/semantic-view-patterns/snippets/system_explain_semantic_query/semantic_view.yaml @@ -0,0 +1,93 @@ +# System Explain Semantic Query: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# DDL-only features: +# - AI_SQL_GENERATION → module_custom_instructions: sql_generation: in YAML +# - PRIVATE entity-level aggregated facts → YAML: access_modifier: private_access + +name: SUPPORT_ANALYTICS_SV +description: > + Support ticket analytics. Includes a PRIVATE aggregated fact (total_contract_value) + that drives the value_segment derived dimension — useful for demonstrating + SYSTEM$EXPLAIN_SEMANTIC_QUERY. + +tables: + - name: customers + description: Customer accounts + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: CUSTOMERS + primary_key: + columns: [CUSTOMER_ID] + dimensions: + - name: tier + synonyms: [customer tier, segment] + expr: TIER + data_type: VARCHAR + - name: customer_name + synonyms: [customer, account] + expr: CUSTOMER_NAME + data_type: VARCHAR + - name: value_segment + synonyms: [value segment, account segment] + description: High/mid/low-value segment based on total contract value + expr: > + CASE + WHEN SUM(support_tickets.amount) >= 200000 THEN 'high-value' + WHEN SUM(support_tickets.amount) >= 50000 THEN 'mid-value' + ELSE 'low-value' + END + data_type: VARCHAR + facts: + - name: total_contract_value + description: Total contract value per customer — private, drives value_segment + expr: SUM(support_tickets.amount) + data_type: NUMBER + access_modifier: private_access + metrics: + - name: customer_count + synonyms: [customers, accounts] + expr: COUNT(CUSTOMER_ID) + + - name: support_tickets + description: Support ticket records + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: SUPPORT_TICKETS + primary_key: + columns: [TICKET_ID] + dimensions: + - name: priority + synonyms: [ticket priority, severity] + expr: PRIORITY + data_type: VARCHAR + - name: opened_date + synonyms: [date, ticket date] + expr: OPENED_DATE + data_type: DATE + facts: + - name: ticket_amount + expr: AMOUNT + data_type: NUMBER + metrics: + - name: total_tickets + synonyms: [tickets, ticket count, number of tickets] + expr: COUNT(TICKET_ID) + - name: total_revenue + synonyms: [revenue, contract value] + expr: SUM(ticket_amount) + +relationships: + - name: tickets_to_customers + left_table: support_tickets + right_table: customers + relationship_columns: + - left_column: CUSTOMER_ID + right_column: CUSTOMER_ID diff --git a/skills/semantic-view-patterns/snippets/tags/README.md b/skills/semantic-view-patterns/snippets/tags/README.md new file mode 100644 index 00000000..3781819a --- /dev/null +++ b/skills/semantic-view-patterns/snippets/tags/README.md @@ -0,0 +1,71 @@ +# Tags on Metrics + +## The Problem + +You have dozens of metrics across a Semantic View. You need a way to: +- Track **ownership** ("who is responsible for `store_revenue`?") +- Communicate **certification status** ("is this metric approved for reporting?") +- Enable **governance discovery** ("show me all certified finance metrics") + +**WITH TAG** attaches Snowflake governance tag key-value pairs directly to metrics in the SV DDL. These are queryable via `tag_references()` using standard Snowflake governance tooling. + +## How You Might Express This Need + +- "Mark our finance-owned metrics as 'certified' and analytics-owned ones as 'in_development'" +- "I want to build a data catalog that shows which SV metrics are ready for production" +- "Alert me if anyone queries a 'deprecated' metric in their BI tool" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **dbt** | `meta: {owner: ..., tier: ...}` in metrics YAML | +| **Atlan / Alation** | Manual tag assignment in catalog UI | +| **Power BI / Tableau** | Certification flags in dataset/workbook metadata | +| **LookML** | `tags: ["certified", "finance"]` on measures | + +## The SV Approach + +**Step 1: Create the tags** (one-time DDL): +```sql +CREATE TAG metric_owner; +CREATE TAG metric_status; +``` + +**Step 2: Apply tags in the SV METRICS block:** +```sql +store_revenue AS SUM(revenue) + WITH SYNONYMS ('store revenue') + WITH TAG (metric_owner = 'finance_team', metric_status = 'certified'), +``` + +**Step 3: Query tags via `tag_references()`:** +```sql +SELECT OBJECT_NAME, TAG_NAME, TAG_VALUE +FROM TABLE(SNIPPETS.INFORMATION_SCHEMA.TAG_REFERENCES( + 'SNIPPETS.PUBLIC.CHANNEL_SALES_TAGGED_SV!TAG_STORE_SALES.STORE_REVENUE', + 'semantic metric' +)); +``` + +## `tag_references()` Object Name Format + +``` +'DATABASE.SCHEMA.VIEW_NAME!ENTITY_TABLE.METRIC_LOGICAL_NAME' +``` + +The `!` separates the SV fully-qualified name from the metric reference. + +## Docs + +- [CREATE SEMANTIC VIEW — WITH TAG clause](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#with-tag) +- [TAG_REFERENCES function](https://docs.snowflake.com/en/sql-reference/functions/tag_references) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | Tag objects, store/web sales tables, date dim | +| `seed_data.sql` | 4 months × 2 channels | +| `semantic_view.sql` | SV with 5 tagged metrics (3 owners, 2 statuses) | +| `queries.sql` | SV queries + `tag_references()` discovery queries | diff --git a/skills/semantic-view-patterns/snippets/tags/queries.sql b/skills/semantic-view-patterns/snippets/tags/queries.sql new file mode 100644 index 00000000..0736d77a --- /dev/null +++ b/skills/semantic-view-patterns/snippets/tags/queries.sql @@ -0,0 +1,61 @@ +-- Tags: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES ON THE SV +-- ============================================================ + +-- 1. Channel revenue by month (standard query — tags don't affect query behavior) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.CHANNEL_SALES_TAGGED_SV + DIMENSIONS tag_dim_date.month + METRICS tag_store_sales.store_revenue, tag_web_sales.web_revenue, + total_channel_revenue +) +ORDER BY month; + + +-- ============================================================ +-- QUERYING TAGS (via tag_references) +-- ============================================================ + +-- 2. All tags for a specific metric +-- Note the special syntax: 'database.schema.view_name!entity.metric_name' +SELECT OBJECT_NAME, TAG_NAME, TAG_VALUE +FROM TABLE(SNIPPETS.INFORMATION_SCHEMA.TAG_REFERENCES( + 'SNIPPETS.PUBLIC.CHANNEL_SALES_TAGGED_SV!TAG_STORE_SALES.STORE_REVENUE', + 'semantic metric' +)); + + +-- 3. All certified metrics in the SV +-- (query tag_references for each metric, filter by status='certified') +SELECT OBJECT_NAME, TAG_NAME, TAG_VALUE +FROM TABLE(SNIPPETS.INFORMATION_SCHEMA.TAG_REFERENCES( + 'SNIPPETS.PUBLIC.CHANNEL_SALES_TAGGED_SV', + 'semantic view' +)) +WHERE TAG_NAME = 'METRIC_STATUS' AND TAG_VALUE = 'certified'; + + +-- 4. Discovery: find all metrics owned by a specific team +-- Use SHOW SEMANTIC METRICS to get the list, then join to tag_references +SHOW SEMANTIC METRICS IN SNIPPETS.PUBLIC.CHANNEL_SALES_TAGGED_SV; + + +-- ============================================================ +-- TAG SYNTAX NOTES +-- ============================================================ + +-- In DDL: +-- WITH TAG (tag_name = 'value', tag_name2 = 'value2') +-- Tag names refer to TAG objects already created via CREATE TAG + +-- In tag_references(): +-- Object reference format: 'DB.SCHEMA.VIEW_NAME!ENTITY.METRIC_LOGICAL_NAME' +-- Object type: 'semantic metric' + +-- Tags are Snowflake governance objects — they can be queried via +-- INFORMATION_SCHEMA.TAG_REFERENCES and governed via roles/policies. diff --git a/skills/semantic-view-patterns/snippets/tags/schema.sql b/skills/semantic-view-patterns/snippets/tags/schema.sql new file mode 100644 index 00000000..c7672ac2 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/tags/schema.sql @@ -0,0 +1,38 @@ +-- Tags: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- Create tags that will be applied to metrics in the semantic view +CREATE TAG IF NOT EXISTS metric_owner + COMMENT = 'Team or person responsible for this metric'; + +CREATE TAG IF NOT EXISTS metric_status + COMMENT = 'Development status: certified, in_development, deprecated'; + +CREATE TAG IF NOT EXISTS metric_domain + COMMENT = 'Business domain: sales, marketing, finance, ops'; + +CREATE OR REPLACE TABLE tag_store_sales ( + sale_id INTEGER NOT NULL, + date_id INTEGER NOT NULL, + revenue NUMBER(10,2) NOT NULL, + quantity INTEGER NOT NULL +); + +CREATE OR REPLACE TABLE tag_web_sales ( + sale_id INTEGER NOT NULL, + date_id INTEGER NOT NULL, + revenue NUMBER(10,2) NOT NULL, + quantity INTEGER NOT NULL +); + +CREATE OR REPLACE TABLE tag_dim_date ( + date_id INTEGER NOT NULL, + full_date DATE NOT NULL, + year INTEGER NOT NULL, + month INTEGER NOT NULL, + CONSTRAINT pk_tag_dim_date PRIMARY KEY (date_id) +); diff --git a/skills/semantic-view-patterns/snippets/tags/seed_data.sql b/skills/semantic-view-patterns/snippets/tags/seed_data.sql new file mode 100644 index 00000000..86c3a825 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/tags/seed_data.sql @@ -0,0 +1,16 @@ +-- Tags: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO tag_dim_date VALUES + (1, '2024-01-01', 2024, 1), + (2, '2024-02-01', 2024, 2), + (3, '2024-03-01', 2024, 3), + (4, '2024-04-01', 2024, 4); + +INSERT INTO tag_store_sales VALUES + (1, 1, 5000, 50), (2, 2, 6000, 60), (3, 3, 7000, 70), (4, 4, 4500, 45); + +INSERT INTO tag_web_sales VALUES + (1, 1, 2000, 25), (2, 2, 2500, 30), (3, 3, 3000, 35), (4, 4, 3500, 40); diff --git a/skills/semantic-view-patterns/snippets/tags/semantic_view.sql b/skills/semantic-view-patterns/snippets/tags/semantic_view.sql new file mode 100644 index 00000000..3699044a --- /dev/null +++ b/skills/semantic-view-patterns/snippets/tags/semantic_view.sql @@ -0,0 +1,47 @@ +-- Tags: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.CHANNEL_SALES_TAGGED_SV + + TABLES ( + tag_dim_date PRIMARY KEY (date_id), + tag_store_sales, + tag_web_sales + ) + + RELATIONSHIPS ( + store_to_date AS tag_store_sales(date_id) REFERENCES tag_dim_date, + web_to_date AS tag_web_sales(date_id) REFERENCES tag_dim_date + ) + + DIMENSIONS ( + tag_dim_date.year AS year WITH SYNONYMS ('year'), + tag_dim_date.month AS month WITH SYNONYMS ('month') + ) + + METRICS ( + tag_store_sales.store_revenue AS SUM(revenue) + WITH SYNONYMS ('store revenue', 'store sales') + WITH TAG (metric_owner = 'finance_team', metric_status = 'certified', metric_domain = 'sales'), + + tag_store_sales.store_quantity AS SUM(quantity) + WITH SYNONYMS ('store units') + WITH TAG (metric_owner = 'finance_team', metric_status = 'certified', metric_domain = 'sales'), + + tag_web_sales.web_revenue AS SUM(revenue) + WITH SYNONYMS ('web revenue', 'online sales') + WITH TAG (metric_owner = 'growth_team', metric_status = 'in_development', metric_domain = 'sales'), + + tag_web_sales.web_quantity AS SUM(quantity) + WITH SYNONYMS ('web units', 'online units') + WITH TAG (metric_owner = 'growth_team', metric_status = 'in_development', metric_domain = 'sales'), + + -- Cross-channel derived metric — no entity prefix; tagged as well + total_channel_revenue AS tag_store_sales.store_revenue + tag_web_sales.web_revenue + WITH SYNONYMS ('total revenue', 'all channel revenue') + WITH TAG (metric_owner = 'finance_team', metric_status = 'certified', metric_domain = 'sales') + ) + + COMMENT = 'Demonstrates WITH TAG on metrics. Tags are queryable via tag_references() to discover metric ownership, certification status, and business domain.'; diff --git a/skills/semantic-view-patterns/snippets/tags/semantic_view.yaml b/skills/semantic-view-patterns/snippets/tags/semantic_view.yaml new file mode 100644 index 00000000..4ed7828e --- /dev/null +++ b/skills/semantic-view-patterns/snippets/tags/semantic_view.yaml @@ -0,0 +1,90 @@ +# Tags: Semantic View YAML +# +# ⚠️ TAGS NOT SUPPORTED IN YAML: The WITH TAG clause on metrics is a DDL-only +# feature. Tags cannot be declared in the YAML specification. This YAML defines +# the base SV structure without metric tags. +# +# For metric tagging, use semantic_view.sql and ALTER SEMANTIC VIEW ... ADD TAG +# post-deploy, or use the CREATE SEMANTIC VIEW DDL directly. +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); + +name: CHANNEL_SALES_TAGGED_SV +description: > + Multi-channel sales with metric tagging for ownership and certification tracking. + NOTE: WITH TAG is a DDL-only feature — tags must be applied via semantic_view.sql + or via ALTER SEMANTIC VIEW ... ADD TAG after deploying this YAML. + +tables: + - name: tag_dim_date + description: Date dimension + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: TAG_DIM_DATE + primary_key: + columns: [DATE_ID] + dimensions: + - name: year + synonyms: [year] + expr: YEAR + data_type: NUMBER + - name: month + synonyms: [month] + expr: MONTH + data_type: NUMBER + + - name: tag_store_sales + description: Store sales transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: TAG_STORE_SALES + metrics: + # Tags would be applied here in DDL: WITH TAG (metric_owner='finance_team', ...) + - name: store_revenue + synonyms: [store revenue, store sales] + description: "Finance team certified. Tag: metric_owner=finance_team, metric_status=certified" + expr: SUM(REVENUE) + - name: store_quantity + synonyms: [store units] + description: "Finance team certified. Tag: metric_owner=finance_team, metric_status=certified" + expr: SUM(QUANTITY) + + - name: tag_web_sales + description: Web sales transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: TAG_WEB_SALES + metrics: + # Tags would be applied here in DDL: WITH TAG (metric_owner='growth_team', ...) + - name: web_revenue + synonyms: [web revenue, online sales] + description: "Growth team in development. Tag: metric_owner=growth_team, metric_status=in_development" + expr: SUM(REVENUE) + - name: web_quantity + synonyms: [web units, online units] + description: "Growth team in development. Tag: metric_owner=growth_team, metric_status=in_development" + expr: SUM(QUANTITY) + +relationships: + - name: store_to_date + left_table: tag_store_sales + right_table: tag_dim_date + relationship_columns: + - left_column: DATE_ID + right_column: DATE_ID + - name: web_to_date + left_table: tag_web_sales + right_table: tag_dim_date + relationship_columns: + - left_column: DATE_ID + right_column: DATE_ID + +metrics: + - name: total_channel_revenue + synonyms: [total revenue, all channel revenue] + description: "Finance team certified. Tag: metric_owner=finance_team, metric_status=certified" + expr: tag_store_sales.store_revenue + tag_web_sales.web_revenue diff --git a/skills/semantic-view-patterns/snippets/time_intelligence/README.md b/skills/semantic-view-patterns/snippets/time_intelligence/README.md new file mode 100644 index 00000000..af1c34b5 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/time_intelligence/README.md @@ -0,0 +1,116 @@ +# Time Intelligence (SPLY, YoY, MoM) + +## The Problem + +BI tools like Power BI, Tableau, and Looker have built-in time intelligence functions (`PREVIOUSYEAR`, `SAMEPERIODLASTYEAR`, `DATEADD`). In a semantic layer you need an equivalent pattern that lets users ask "how does this month compare to last year?" without writing any SQL. + +**Example in this snippet**: Monthly sales revenue compared to the same period last month (SPLM) and same period last year (SPLY), with MoM% and YoY% derived metrics. + +## How You Might Express This Need + +- "Show me revenue this month vs last month" +- "What's our year-over-year growth rate?" +- "Compare Q3 2024 to Q3 2023" +- "Revenue this year vs same period last year, broken down by region" +- "I want PREVIOUSMONTH and SAMEPERIODLASTYEAR like we had in Power BI" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **Power BI / DAX** | `CALCULATE([Revenue], SAMEPERIODLASTYEAR(Calendar[Date]))` — time intelligence functions on a Date table | +| **LookML** | `type: yesno` + `offset_periods` parameter, or `required_access_grants` + derived table with shifted dates | +| **Tableau** | `LOOKUP(SUM([Revenue]), -1)` for period-over-period in table calcs; `DATE_TRUNC` + FIXED LOD for explicit prior periods | +| **Raw SQL** | `LAG(revenue, 12) OVER (PARTITION BY region ORDER BY month)` for 12-month lag on monthly data | + +Snowflake Semantic Views handle this with a **role-playing alias + computed FACT as a shifted join key** — no window functions, no ETL views, no pre-aggregated tables required. + +## The SV Approach: Role-Playing Aliases with Shifted Keys + +The core idea: create **multiple logical table aliases pointing to the same physical table**, each with a different join key that "shifts" those rows into the current period bucket. + +### Three Parts + +**1. Role-playing table alias** in TABLES: +```sql +, sales_ly AS SNIPPETS.PUBLIC.FACT_SALES + PRIMARY KEY (ROW_ID) + COMMENT = 'SPLY alias: same rows, shifted +1 year in the calendar join' +``` + +**2. Computed FACT as the shifted join key** in FACTS: +```sql +, sales_ly.sale_month_shifted_ly AS DATEADD('year', 1, SALE_MONTH) + COMMENT = 'Computed FK: SALE_MONTH + 1 year' +``` +The expression `DATEADD('year', 1, SALE_MONTH)` is a **scalar expression on the physical column**. The SV evaluates it per row to produce the join key. + +**3. Relationship using the computed key** in RELATIONSHIPS: +```sql +, sales_ly_to_calendar AS sales_ly(sale_month_shifted_ly) REFERENCES calendar(MONTH) +``` + +### Why This Works + +When you query `calendar.MONTH = '2024-03-01'`: + +| Entity | Join condition | Rows returned | +|--------|---------------|---------------| +| `sales` | `SALE_MONTH = '2024-03-01'` | March 2024 rows | +| `sales_ly` | `DATEADD('year',1, SALE_MONTH) = '2024-03-01'` → `SALE_MONTH = '2023-03-01'` | March 2023 rows | +| `sales_lm` | `DATEADD('month',1, SALE_MONTH) = '2024-03-01'` → `SALE_MONTH = '2024-02-01'` | February 2024 rows | + +The "shift" happens entirely in the join evaluation — no extra rows, no UNION ALL, no pre-built view. + +### Cross-Entity Derived Metrics + +Once you have LY and LM metrics, YoY% and MoM% are just derived metrics referencing both entities: + +```sql +, yoy_pct AS DIV0(sales.revenue - sales_ly.revenue_ly, sales_ly.revenue_ly) * 100 + COMMENT = 'Revenue % change vs same period last year' +``` + +No table prefix on the left side (`yoy_pct`) — these are **global derived metrics** that reference metrics from different entities. + +## What Works + +| Pattern | Query | +|---------|-------| +| Monthly SPLY comparison | `DIMENSIONS calendar.year, calendar.month METRICS sales.revenue, sales_ly.revenue_ly` | +| Annual YoY totals | `DIMENSIONS calendar.year METRICS sales.revenue, sales_ly.revenue_ly, yoy_pct` | +| MoM % by region | `DIMENSIONS calendar.year, calendar.month, sales.region METRICS mom_pct` | +| Full dashboard row | All metrics together grouped by month | + +## What Doesn't Work + +**YTD / QTD / MTD** — this pattern gives point-in-time period comparisons, not cumulative running totals. For YTD use `SUM OVER (PARTITION BY year ORDER BY date ROWS UNBOUNDED PRECEDING)`. See `window_metrics/`. + +**NULL for boundary periods** — `revenue_ly` is NULL for all 2023 rows (no 2022 data in dataset). `revenue_lm` is NULL for January 2023. Handle with `COALESCE(revenue_ly, 0)` in a `standard_sql` wrapper. + +**Quarter/period-to-date breakdowns** — the shift is a full period (1 month or 1 year). Partial periods (e.g., "Q1 to date" for a mid-quarter query) require additional calendar filtering logic outside the SV. + +## Comparison with `window_metrics/` + +| | `window_metrics` | `time_intelligence` | +|-|-----------------|---------------------| +| Pattern | `LAG(n)` / `SUM() OVER (...)` window functions on a single entity | Role-playing aliases + shifted join keys across entities | +| Best for | Daily grain, rolling averages, YTD accumulators | Monthly/quarterly grain, SPLY, SPLM, YoY% | +| Requires calendar table | No | Yes | +| Cross-period filters | Limited (by row offset) | Natural — filter on calendar dimensions | +| NULL behavior | First N rows are NULL | First period in dataset is NULL | + +## Docs + +- [CREATE SEMANTIC VIEW — FACTS (scalar expressions)](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#label-create-semantic-view-facts) +- [Cross-table (derived) metrics](https://docs.snowflake.com/en/user-guide/views-semantic/sql#defining-cross-table-metrics) +- [Role-playing logical tables](https://docs.snowflake.com/en/user-guide/views-semantic/sql#defining-role-playing-logical-tables) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `DIM_CALENDAR` and `FACT_SALES` table DDL | +| `seed_data.sql` | 24 calendar months (2023–2024) + 48 sales rows (East/West × 24 months) | +| `semantic_view.sql` | SV with 3 logical table aliases, computed-FK FACTS, and 8 metrics | +| `queries.sql` | SPLY comparison, MoM%, YoY by region, full dashboard row — plus gotchas | diff --git a/skills/semantic-view-patterns/snippets/time_intelligence/queries.sql b/skills/semantic-view-patterns/snippets/time_intelligence/queries.sql new file mode 100644 index 00000000..7ccb6d02 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/time_intelligence/queries.sql @@ -0,0 +1,104 @@ +-- Time Intelligence: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Monthly revenue vs same period last year (SPLY) +-- For 2024 months: revenue_ly shows the aligned 2023 value +-- yoy_pct shows % growth — positive means 2024 > 2023 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.TIME_INTELLIGENCE_SV + DIMENSIONS calendar.year, calendar.month, calendar.month_name + METRICS sales.revenue, sales_ly.revenue_ly, yoy_change, yoy_pct +) +ORDER BY YEAR ASC, MONTH ASC; + + +-- 2. Month-over-month revenue change +-- revenue_lm for January 2024 = December 2023 (the prior month) +-- First month in dataset (Jan 2023) shows NULL for revenue_lm — no prior month +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.TIME_INTELLIGENCE_SV + DIMENSIONS calendar.year, calendar.month, calendar.month_name + METRICS sales.revenue, sales_lm.revenue_lm, mom_change, mom_pct +) +ORDER BY YEAR ASC, MONTH ASC; + + +-- 3. Annual totals with YoY growth +-- Grouping by year collapses all months — shift arithmetic still aligns correctly +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.TIME_INTELLIGENCE_SV + DIMENSIONS calendar.year + METRICS sales.revenue, sales_ly.revenue_ly, yoy_change, yoy_pct +) +ORDER BY YEAR ASC; + + +-- 4. YoY growth by region +-- Both East and West show their own YoY numbers +-- Works because region lives on the current-period entity (sales) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.TIME_INTELLIGENCE_SV + DIMENSIONS calendar.year, sales.region + METRICS sales.revenue, sales_ly.revenue_ly, yoy_pct +) +ORDER BY YEAR ASC, REGION ASC; + + +-- 5. Full dashboard row: current, SPLM, SPLY, MoM%, YoY% +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.TIME_INTELLIGENCE_SV + DIMENSIONS calendar.year, calendar.month, calendar.month_name + METRICS sales.revenue, + sales_lm.revenue_lm, + sales_ly.revenue_ly, + mom_pct, + yoy_pct +) +ORDER BY YEAR ASC, MONTH ASC; + + +-- ============================================================ +-- HOW THE DATE SHIFT WORKS +-- ============================================================ + +-- The sales_ly entity has a computed FACT: +-- sales_ly.sale_month_shifted_ly AS DATEADD('year', 1, SALE_MONTH) +-- +-- The relationship joins on that computed fact: +-- sales_ly(sale_month_shifted_ly) REFERENCES calendar(MONTH) +-- +-- So when the query filters calendar.MONTH = '2024-03-01': +-- → sales rows where SALE_MONTH = '2024-03-01' (current period) +-- → sales_ly rows where DATEADD('year',1, SALE_MONTH) = '2024-03-01' +-- = rows where SALE_MONTH = '2023-03-01' (last year) ✓ +-- +-- No ETL, no UNION ALL view, no window function needed. + + +-- ============================================================ +-- GOTCHAS +-- ============================================================ + +-- NULL for the first/last period: +-- revenue_lm is NULL for Jan 2023 (no prior month in the dataset). +-- revenue_ly is NULL for all 2023 months (no 2022 data to shift from). +-- Handle with COALESCE(revenue_ly, 0) in standard SQL wrapping if needed. + +-- YTD / QTD / MTD are NOT supported by this pattern. +-- The time-shift pattern gives you point-in-time period comparisons. +-- For cumulative running totals (YTD, QTD), use window metrics with +-- SUM(total_revenue) OVER (PARTITION BY year ORDER BY date ROWS UNBOUNDED PRECEDING). +-- See the window_metrics/ snippet. + +-- Cross-period region breakdown: +-- Query 4 uses sales.region (current period) with revenue_ly. +-- The SV automatically applies region to both entities because they share +-- the same physical table (FACT_SALES). If you add a separate entity for +-- the LY role-play that joins through a different path, you may need to +-- define region on sales_ly explicitly. diff --git a/skills/semantic-view-patterns/snippets/time_intelligence/schema.sql b/skills/semantic-view-patterns/snippets/time_intelligence/schema.sql new file mode 100644 index 00000000..7f8d9dd4 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/time_intelligence/schema.sql @@ -0,0 +1,22 @@ +-- Time Intelligence: Schema DDL + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE DIM_CALENDAR ( + MONTH DATE PRIMARY KEY COMMENT 'First day of the month (e.g. 2024-03-01)', + MONTH_NAME VARCHAR(10) COMMENT 'January, February, ...', + QUARTER NUMBER(1) COMMENT '1–4', + YEAR NUMBER(4) COMMENT '2023, 2024, ...' +); + +CREATE OR REPLACE TABLE FACT_SALES ( + ROW_ID NUMBER AUTOINCREMENT PRIMARY KEY, + SALE_MONTH DATE COMMENT 'First day of the sale month — FK to DIM_CALENDAR.MONTH', + REGION VARCHAR(50), + REVENUE FLOAT, + UNITS NUMBER +); diff --git a/skills/semantic-view-patterns/snippets/time_intelligence/seed_data.sql b/skills/semantic-view-patterns/snippets/time_intelligence/seed_data.sql new file mode 100644 index 00000000..1944cb74 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/time_intelligence/seed_data.sql @@ -0,0 +1,91 @@ +-- Time Intelligence: Seed Data +-- 24 calendar months (2023–2024) + 48 sales rows (2 regions × 24 months) + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO DIM_CALENDAR VALUES + ('2023-01-01', 'January', 1, 2023), + ('2023-02-01', 'February', 1, 2023), + ('2023-03-01', 'March', 1, 2023), + ('2023-04-01', 'April', 2, 2023), + ('2023-05-01', 'May', 2, 2023), + ('2023-06-01', 'June', 2, 2023), + ('2023-07-01', 'July', 3, 2023), + ('2023-08-01', 'August', 3, 2023), + ('2023-09-01', 'September', 3, 2023), + ('2023-10-01', 'October', 4, 2023), + ('2023-11-01', 'November', 4, 2023), + ('2023-12-01', 'December', 4, 2023), + ('2024-01-01', 'January', 1, 2024), + ('2024-02-01', 'February', 1, 2024), + ('2024-03-01', 'March', 1, 2024), + ('2024-04-01', 'April', 2, 2024), + ('2024-05-01', 'May', 2, 2024), + ('2024-06-01', 'June', 2, 2024), + ('2024-07-01', 'July', 3, 2024), + ('2024-08-01', 'August', 3, 2024), + ('2024-09-01', 'September', 3, 2024), + ('2024-10-01', 'October', 4, 2024), + ('2024-11-01', 'November', 4, 2024), + ('2024-12-01', 'December', 4, 2024); + +-- East region 2023 +INSERT INTO FACT_SALES (SALE_MONTH, REGION, REVENUE, UNITS) VALUES + ('2023-01-01', 'East', 105000, 105), + ('2023-02-01', 'East', 92000, 92), + ('2023-03-01', 'East', 98000, 98), + ('2023-04-01', 'East', 110000, 110), + ('2023-05-01', 'East', 115000, 115), + ('2023-06-01', 'East', 128000, 128), + ('2023-07-01', 'East', 138000, 138), + ('2023-08-01', 'East', 133000, 133), + ('2023-09-01', 'East', 121000, 121), + ('2023-10-01', 'East', 112000, 112), + ('2023-11-01', 'East', 140000, 140), + ('2023-12-01', 'East', 158000, 158); + +-- East region 2024 (~12% YoY growth) +INSERT INTO FACT_SALES (SALE_MONTH, REGION, REVENUE, UNITS) VALUES + ('2024-01-01', 'East', 118000, 118), + ('2024-02-01', 'East', 103000, 103), + ('2024-03-01', 'East', 110000, 110), + ('2024-04-01', 'East', 123000, 123), + ('2024-05-01', 'East', 129000, 129), + ('2024-06-01', 'East', 143000, 143), + ('2024-07-01', 'East', 154000, 154), + ('2024-08-01', 'East', 149000, 149), + ('2024-09-01', 'East', 136000, 136), + ('2024-10-01', 'East', 125000, 125), + ('2024-11-01', 'East', 156000, 156), + ('2024-12-01', 'East', 177000, 177); + +-- West region 2023 (~65% of East) +INSERT INTO FACT_SALES (SALE_MONTH, REGION, REVENUE, UNITS) VALUES + ('2023-01-01', 'West', 68000, 68), + ('2023-02-01', 'West', 60000, 60), + ('2023-03-01', 'West', 64000, 64), + ('2023-04-01', 'West', 72000, 72), + ('2023-05-01', 'West', 75000, 75), + ('2023-06-01', 'West', 83000, 83), + ('2023-07-01', 'West', 90000, 90), + ('2023-08-01', 'West', 87000, 87), + ('2023-09-01', 'West', 79000, 79), + ('2023-10-01', 'West', 73000, 73), + ('2023-11-01', 'West', 91000, 91), + ('2023-12-01', 'West', 103000, 103); + +-- West region 2024 (~10% YoY growth over West 2023) +INSERT INTO FACT_SALES (SALE_MONTH, REGION, REVENUE, UNITS) VALUES + ('2024-01-01', 'West', 75000, 75), + ('2024-02-01', 'West', 66000, 66), + ('2024-03-01', 'West', 70000, 70), + ('2024-04-01', 'West', 79000, 79), + ('2024-05-01', 'West', 82000, 82), + ('2024-06-01', 'West', 91000, 91), + ('2024-07-01', 'West', 99000, 99), + ('2024-08-01', 'West', 95000, 95), + ('2024-09-01', 'West', 87000, 87), + ('2024-10-01', 'West', 80000, 80), + ('2024-11-01', 'West', 100000, 100), + ('2024-12-01', 'West', 113000, 113); diff --git a/skills/semantic-view-patterns/snippets/time_intelligence/semantic_view.sql b/skills/semantic-view-patterns/snippets/time_intelligence/semantic_view.sql new file mode 100644 index 00000000..64e818a5 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/time_intelligence/semantic_view.sql @@ -0,0 +1,123 @@ +-- Time Intelligence: Semantic View DDL +-- Run schema.sql and seed_data.sql first. + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.TIME_INTELLIGENCE_SV + + TABLES ( + -- Calendar dimension (one row per month) + calendar AS SNIPPETS.PUBLIC.DIM_CALENDAR + PRIMARY KEY (MONTH) + + -- Current-period sales fact + , sales AS SNIPPETS.PUBLIC.FACT_SALES + PRIMARY KEY (ROW_ID) + + -- Role-playing alias: same physical table, date will be shifted +1 month + -- When you query by calendar.month = '2024-03', this entity returns Feb 2024 rows + , sales_lm AS SNIPPETS.PUBLIC.FACT_SALES + PRIMARY KEY (ROW_ID) + COMMENT = 'Same-period last month: FACT_SALES with date shifted forward 1 month' + + -- Role-playing alias: same physical table, date will be shifted +1 year + -- When you query by calendar.month = '2024-03', this entity returns Mar 2023 rows + , sales_ly AS SNIPPETS.PUBLIC.FACT_SALES + PRIMARY KEY (ROW_ID) + COMMENT = 'Same-period last year (SPLY): FACT_SALES with date shifted forward 1 year' + ) + + RELATIONSHIPS ( + -- Current period joins directly on SALE_MONTH + sales_to_calendar AS sales(SALE_MONTH) REFERENCES calendar(MONTH) + + -- LM alias joins on the COMPUTED fact sale_month_shifted_lm (= SALE_MONTH + 1 month) + -- This makes "Feb 2024" rows appear under "Mar 2024" in queries + , sales_lm_to_calendar AS sales_lm(sale_month_shifted_lm) REFERENCES calendar(MONTH) + + -- LY alias joins on the COMPUTED fact sale_month_shifted_ly (= SALE_MONTH + 1 year) + -- This makes "Mar 2023" rows appear under "Mar 2024" in queries + , sales_ly_to_calendar AS sales_ly(sale_month_shifted_ly) REFERENCES calendar(MONTH) + ) + + FACTS ( + sales.revenue AS REVENUE + , sales.units AS UNITS + + -- Computed FK for last-month join: shift the actual date forward by 1 month. + -- The relationship uses this column to join to calendar(MONTH), so when + -- calendar.MONTH = '2024-03-01', we need DATEADD('month',1,SALE_MONTH) = '2024-03-01' + -- → SALE_MONTH = '2024-02-01' → last month's rows appear in March's bucket. + , sales_lm.sale_month_shifted_lm AS DATEADD('month', 1, SALE_MONTH) + COMMENT = 'Computed FK: SALE_MONTH + 1 month — shifts LM rows into the current period bucket' + + -- Same idea, 1 year forward: SALE_MONTH = '2023-03-01' → appears in March 2024 bucket. + , sales_ly.sale_month_shifted_ly AS DATEADD('year', 1, SALE_MONTH) + COMMENT = 'Computed FK: SALE_MONTH + 1 year — shifts LY rows into the current period bucket' + ) + + DIMENSIONS ( + calendar.month AS MONTH + WITH SYNONYMS ('period', 'month', 'sale month') + COMMENT = 'Calendar month (first day of month)' + , calendar.month_name AS MONTH_NAME + WITH SYNONYMS ('month name') + , calendar.quarter AS QUARTER + WITH SYNONYMS ('quarter', 'qtr') + , calendar.year AS YEAR + WITH SYNONYMS ('year') + , sales.region AS REGION + WITH SYNONYMS ('region', 'sales region', 'territory') + ) + + METRICS ( + -- ── Current period ────────────────────────────────────────────────────── + sales.revenue AS SUM(revenue) + WITH SYNONYMS ('revenue', 'sales', 'net sales', 'total revenue') + COMMENT = 'Total revenue in the selected period' + + , sales.units AS SUM(units) + WITH SYNONYMS ('units', 'units sold', 'quantity') + COMMENT = 'Total units sold in the selected period' + + -- ── Same period last month (SPLM) ──────────────────────────────────────── + , sales_lm.revenue_lm AS SUM(revenue) + WITH SYNONYMS ('revenue last month', 'LM revenue', 'prior month revenue', 'SPLM') + COMMENT = 'Revenue for the same period last month' + + -- ── Same period last year (SPLY) ───────────────────────────────────────── + , sales_ly.revenue_ly AS SUM(revenue) + WITH SYNONYMS ('revenue last year', 'LY revenue', 'prior year revenue', 'SPLY', 'same period last year') + COMMENT = 'Revenue for the same period last year' + + -- ── Month-over-month (cross-entity derived metrics) ────────────────────── + , mom_change AS sales.revenue - sales_lm.revenue_lm + WITH SYNONYMS ('MoM change', 'month over month change', 'monthly delta') + COMMENT = 'Revenue change vs prior month (positive = growth)' + + , mom_pct AS DIV0(sales.revenue - sales_lm.revenue_lm, sales_lm.revenue_lm) * 100 + WITH SYNONYMS ('MoM %', 'MoM growth', 'month over month growth rate') + COMMENT = 'Revenue % change vs prior month' + + -- ── Year-over-year (cross-entity derived metrics) ──────────────────────── + , yoy_change AS sales.revenue - sales_ly.revenue_ly + WITH SYNONYMS ('YoY change', 'year over year change', 'annual delta') + COMMENT = 'Revenue change vs same period last year (positive = growth)' + + , yoy_pct AS DIV0(sales.revenue - sales_ly.revenue_ly, sales_ly.revenue_ly) * 100 + WITH SYNONYMS ('YoY %', 'YoY growth', 'year over year growth rate') + COMMENT = 'Revenue % change vs same period last year' + ) + + COMMENT = 'Monthly sales with time-shifted role-playing aliases for same-period-last-month (SPLM) and same-period-last-year (SPLY) comparisons. Demonstrates the computed-FK pattern for time intelligence without window functions or pre-aggregated ETL views.' + + AI_SQL_GENERATION 'This semantic view demonstrates two time intelligence patterns using role-playing logical table aliases: + +1. sales_lm: same physical table as sales, but SALE_MONTH is shifted forward 1 month via a computed FACT (sale_month_shifted_lm). When you group by calendar.month = March 2024, this entity returns February 2024 rows — enabling MoM comparison without window functions. + +2. sales_ly: same physical table as sales, but SALE_MONTH is shifted forward 1 year via a computed FACT (sale_month_shifted_ly). When you group by calendar.year = 2024, this entity returns 2023 rows aligned to the same calendar periods — enabling SPLY/YoY comparison. + +Cross-entity derived metrics (mom_change, mom_pct, yoy_change, yoy_pct) reference metrics from both the current and prior-period entities. + +Always include calendar.year and/or calendar.month in DIMENSIONS when querying period-over-period metrics so results are aligned by period. For YoY queries, grouping by calendar.year returns annual totals; grouping by calendar.year + calendar.month returns month-level comparisons.'; diff --git a/skills/semantic-view-patterns/snippets/time_intelligence/semantic_view.yaml b/skills/semantic-view-patterns/snippets/time_intelligence/semantic_view.yaml new file mode 100644 index 00000000..92db224f --- /dev/null +++ b/skills/semantic-view-patterns/snippets/time_intelligence/semantic_view.yaml @@ -0,0 +1,149 @@ +# Time Intelligence: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# DDL-only features not in YAML: +# - AI_SQL_GENERATION → module_custom_instructions: sql_generation: in YAML +# - Computed FK FACTS used as relationship keys (DATEADD shift trick) are DDL-only. +# The role-playing alias pattern (sales_lm, sales_ly) is fully supported in YAML +# via multiple table entries pointing to the same physical table, but the +# computed-FK join mechanism requires DDL. See semantic_view.sql for the full +# time intelligence pattern including SPLM/SPLY shifted joins. + +name: TIME_INTELLIGENCE_SV +description: > + Monthly sales with time-shifted role-playing aliases for same-period-last-month + (SPLM) and same-period-last-year (SPLY) comparisons. + NOTE: The computed-FK DATEADD shift mechanism requires DDL authoring. This YAML + defines the base structure; use semantic_view.sql for full YoY/MoM metrics. + +tables: + - name: calendar + description: Calendar dimension — one row per month + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DIM_CALENDAR + primary_key: + columns: [MONTH] + dimensions: + - name: month + synonyms: [period, month, sale month] + description: Calendar month (first day of month) + expr: MONTH + data_type: DATE + - name: month_name + synonyms: [month name] + expr: MONTH_NAME + data_type: VARCHAR + - name: quarter + synonyms: [quarter, qtr] + expr: QUARTER + data_type: VARCHAR + - name: year + synonyms: [year] + expr: YEAR + data_type: NUMBER + + - name: sales + description: Current-period sales fact + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: FACT_SALES + primary_key: + columns: [ROW_ID] + dimensions: + - name: region + synonyms: [region, sales region, territory] + expr: REGION + data_type: VARCHAR + metrics: + - name: revenue + synonyms: [revenue, sales, net sales, total revenue] + description: Total revenue in the selected period + expr: SUM(REVENUE) + - name: units + synonyms: [units, units sold, quantity] + description: Total units sold in the selected period + expr: SUM(UNITS) + + # Role-playing alias: same physical table as sales, but joined via shifted date. + # NOTE: the DATEADD computed-FK shift requires DDL. In YAML, this alias can be + # defined but will join on the raw SALE_MONTH column (no date shift). + - name: sales_lm + description: > + Same physical table as sales — intended for last-month comparison. + Full SPLM shift (DATEADD 1 month) requires DDL semantic_view.sql. + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: FACT_SALES + primary_key: + columns: [ROW_ID] + metrics: + - name: revenue_lm + synonyms: [revenue last month, LM revenue, prior month revenue, SPLM] + description: Revenue for the same period last month + expr: SUM(REVENUE) + + # Role-playing alias: same physical table as sales, but joined via +1 year shift. + - name: sales_ly + description: > + Same physical table as sales — intended for last-year comparison. + Full SPLY shift (DATEADD 1 year) requires DDL semantic_view.sql. + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: FACT_SALES + primary_key: + columns: [ROW_ID] + metrics: + - name: revenue_ly + synonyms: [revenue last year, LY revenue, prior year revenue, SPLY, same period last year] + description: Revenue for the same period last year + expr: SUM(REVENUE) + +relationships: + - name: sales_to_calendar + left_table: sales + right_table: calendar + relationship_columns: + - left_column: SALE_MONTH + right_column: MONTH + # These relationships join on raw SALE_MONTH; for shifted joins use semantic_view.sql + - name: sales_lm_to_calendar + left_table: sales_lm + right_table: calendar + relationship_columns: + - left_column: SALE_MONTH + right_column: MONTH + - name: sales_ly_to_calendar + left_table: sales_ly + right_table: calendar + relationship_columns: + - left_column: SALE_MONTH + right_column: MONTH + +# Cross-entity derived metrics +metrics: + - name: mom_change + synonyms: [MoM change, month over month change, monthly delta] + description: Revenue change vs prior month (positive = growth) + expr: sales.revenue - sales_lm.revenue_lm + - name: mom_pct + synonyms: [MoM %, MoM growth, month over month growth rate] + description: Revenue % change vs prior month + expr: DIV0(sales.revenue - sales_lm.revenue_lm, sales_lm.revenue_lm) * 100 + - name: yoy_change + synonyms: [YoY change, year over year change, annual delta] + description: Revenue change vs same period last year (positive = growth) + expr: sales.revenue - sales_ly.revenue_ly + - name: yoy_pct + synonyms: [YoY %, YoY growth, year over year growth rate] + description: Revenue % change vs same period last year + expr: DIV0(sales.revenue - sales_ly.revenue_ly, sales_ly.revenue_ly) * 100 diff --git a/skills/semantic-view-patterns/snippets/variables/README.md b/skills/semantic-view-patterns/snippets/variables/README.md new file mode 100644 index 00000000..6d1b6a72 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/variables/README.md @@ -0,0 +1,80 @@ +# Variables (Parameterized Semantic Views) + +## The Problem + +You have a metric or dimension whose calculation depends on a **user-defined threshold, weight, or date window** that shouldn't be hard-coded. Different business users or use cases need different values — but you don't want to create a separate SV for each configuration. + +**Variables** let you define adjustable parameters in the SV DDL with optional defaults, then override them at query time. + +## How You Might Express This Need + +- "Our data science team wants to adjust the weights in our composite score model without changing the SV" +- "We want 'premium' to mean >$500 in Q1 but >$400 in Q2 promotions — without duplicating the SV" +- "Let users define the 'lookback window' for recency without touching the DDL" +- "Dynamic thresholds, dynamic scoring — same SV, different parameters" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | Query parameters (`$1`, `%s`) or Jinja templates (`{{ var('threshold') }}`) | +| **LookML** | Liquid parameters: `{% parameter threshold %}` | +| **dbt** | `{{ var('premium_threshold', 500) }}` in model SQL | +| **Power BI** | What-if parameters / Power Query parameters | +| **Tableau** | Parameters with Parameter Actions for dynamic filtering; Fixed LOD with parameter-driven thresholds. Parameters are session-scoped like SV variables. | + +## The SV Approach + +**Declare variables with types and defaults in the DDL:** +```sql +VARIABLES ( + premium_threshold NUMBER(10,2) DEFAULT 500.00, + rating_weight NUMBER(3,2) DEFAULT 0.6 +) +``` + +**Reference by name in DIMENSIONS or METRICS expressions:** +```sql +DIMENSIONS ( + product_sales.price_tier AS ( + CASE WHEN unit_price >= premium_threshold THEN 'premium' ... END + ) +) +METRICS ( + product_sales.performance_score AS ( + rating_weight * AVG(customer_rating) / 5.0 + ... + ) +) +``` + +**Override at query time with `VARIABLES key => value`:** +```sql +SELECT * FROM SEMANTIC_VIEW( + product_performance + DIMENSIONS price_tier + METRICS total_sales + VARIABLES premium_threshold => 400.00 +) +``` + +## Key Rules + +- Variables can only be used in **DIMENSIONS, METRICS, FACTS** expressions +- Variables **cannot** be used in TABLES or RELATIONSHIPS clauses +- `DEFAULT` is optional — if omitted, the variable **must** be supplied at every query call +- Value must be coercible to the declared type (e.g. integer `1` for `NUMBER(3,2)` works) +- Variables are not exposed as queryable dimensions or metrics + +## Docs + +- [CREATE SEMANTIC VIEW — VARIABLES clause](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#variables) +- [SEMANTIC_VIEW clause — VARIABLES at query time](https://docs.snowflake.com/en/sql-reference/constructs/semantic_view#variables) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `product_sales` table | +| `seed_data.sql` | 8 product sale rows across 3 categories | +| `semantic_view.sql` | SV with 6 variables: scoring weights, price tier thresholds, date windows | +| `queries.sql` | Default vs override for each variable pattern | diff --git a/skills/semantic-view-patterns/snippets/variables/queries.sql b/skills/semantic-view-patterns/snippets/variables/queries.sql new file mode 100644 index 00000000..8b50f6a8 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/variables/queries.sql @@ -0,0 +1,75 @@ +-- Variables: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Default scoring weights: price=0.4, rating=0.6 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.PRODUCT_PERFORMANCE_SV + DIMENSIONS product_name, category + METRICS total_sales, avg_rating, performance_score +) +ORDER BY performance_score DESC; + + +-- 2. Override to rating-only weighting (price_weight=0, rating_weight=1) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.PRODUCT_PERFORMANCE_SV + DIMENSIONS product_name, category + METRICS total_sales, avg_rating, performance_score + VARIABLES price_weight => 0, rating_weight => 1 +) +ORDER BY performance_score DESC; + + +-- 3. Price tier breakdown using default thresholds (budget<$100, mid $100-$500, premium>$500) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.PRODUCT_PERFORMANCE_SV + DIMENSIONS price_tier + METRICS total_sales, total_revenue +) +ORDER BY price_tier; + + +-- 4. Adjust tier thresholds at query time +-- New tiers: budget <$200, mid-range $200-$400, premium >$400 +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.PRODUCT_PERFORMANCE_SV + DIMENSIONS price_tier + METRICS total_sales, total_revenue + VARIABLES premium_threshold => 400.00, budget_threshold => 200.00 +) +ORDER BY price_tier; + + +-- 5. "Recent products" flag with custom analysis window +-- Only items sold in the last 60 days (from a specific reference date) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.PRODUCT_PERFORMANCE_SV + DIMENSIONS product_name, is_recent + METRICS total_sales + VARIABLES recent_days => 60, analysis_date => '2024-03-31' +) +ORDER BY is_recent DESC, total_sales DESC; + + +-- ============================================================ +-- VARIABLE RULES +-- ============================================================ + +-- Variables can ONLY be used in: +-- DIMENSIONS, METRICS, FACTS calculation expressions + +-- Variables CANNOT be used in: +-- TABLES clause, RELATIONSHIPS clause + +-- At query time: VARIABLES key => value +-- All unspecified variables use their DEFAULT value +-- If a variable has no DEFAULT, specifying it at query time is REQUIRED + +-- Type coercion: the supplied value must be coercible to the declared type +-- (e.g. passing integer 1 for a DECIMAL(3,2) variable works fine) diff --git a/skills/semantic-view-patterns/snippets/variables/schema.sql b/skills/semantic-view-patterns/snippets/variables/schema.sql new file mode 100644 index 00000000..70c647e6 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/variables/schema.sql @@ -0,0 +1,17 @@ +-- Variables: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE product_sales ( + sale_id INTEGER NOT NULL, + product_id INTEGER NOT NULL, + product_name VARCHAR(50) NOT NULL, + category VARCHAR(30) NOT NULL, + sale_date DATE NOT NULL, + quantity INTEGER NOT NULL, + unit_price NUMBER(10,2) NOT NULL, + customer_rating NUMBER(3,2) NOT NULL +); diff --git a/skills/semantic-view-patterns/snippets/variables/seed_data.sql b/skills/semantic-view-patterns/snippets/variables/seed_data.sql new file mode 100644 index 00000000..9ed4f31c --- /dev/null +++ b/skills/semantic-view-patterns/snippets/variables/seed_data.sql @@ -0,0 +1,14 @@ +-- Variables: Seed Data + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO product_sales VALUES + (1, 101, 'Laptop Pro', 'Electronics', '2024-01-15', 2, 1200.00, 4.5), + (2, 102, 'Desk Chair', 'Furniture', '2024-01-16', 5, 150.00, 4.0), + (3, 101, 'Laptop Pro', 'Electronics', '2024-02-20', 1, 1200.00, 4.8), + (4, 103, 'Mouse Pad', 'Accessories', '2024-01-17', 10, 15.00, 3.5), + (5, 104, 'Standing Desk', 'Furniture', '2024-02-10', 3, 450.00, 4.9), + (6, 102, 'Desk Chair', 'Furniture', '2024-03-05', 2, 150.00, 3.8), + (7, 103, 'Mouse Pad', 'Accessories', '2024-01-25', 8, 15.00, 3.2), + (8, 105, 'Monitor 4K', 'Electronics', '2024-03-10', 4, 600.00, 4.7); diff --git a/skills/semantic-view-patterns/snippets/variables/semantic_view.sql b/skills/semantic-view-patterns/snippets/variables/semantic_view.sql new file mode 100644 index 00000000..8ea00f7f --- /dev/null +++ b/skills/semantic-view-patterns/snippets/variables/semantic_view.sql @@ -0,0 +1,65 @@ +-- Variables: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.PRODUCT_PERFORMANCE_SV + + TABLES (product_sales) + + VARIABLES ( + -- Pattern 1: Adjustable scoring weights for performance composite index + price_weight NUMBER(3,2) DEFAULT 0.4, + rating_weight NUMBER(3,2) DEFAULT 0.6, + + -- Pattern 2: Adjustable tier boundaries for price bucketing + premium_threshold NUMBER(10,2) DEFAULT 500.00, + budget_threshold NUMBER(10,2) DEFAULT 100.00, + + -- Pattern 3: Date window for "recent sales" analysis + recent_days INTEGER DEFAULT 90, + analysis_date DATE DEFAULT CURRENT_DATE() + ) + + DIMENSIONS ( + product_sales.product_name AS product_name + WITH SYNONYMS ('product', 'name'), + product_sales.category AS category + WITH SYNONYMS ('category', 'product category'), + + -- Dynamic price tier — bucket boundaries come from VARIABLES + product_sales.price_tier AS ( + CASE + WHEN product_sales.unit_price >= premium_threshold THEN 'premium' + WHEN product_sales.unit_price >= budget_threshold THEN 'mid-range' + ELSE 'budget' + END + ) + WITH SYNONYMS ('tier', 'price tier', 'price segment'), + + -- Dynamic recency flag — uses analysis_date and recent_days variables + product_sales.is_recent AS ( + product_sales.sale_date >= DATEADD('day', -recent_days, analysis_date) + ) + WITH SYNONYMS ('recent', 'is recent sale') + ) + + METRICS ( + product_sales.total_sales AS COUNT(sale_id) + WITH SYNONYMS ('sales count', 'number of sales'), + product_sales.total_revenue AS SUM(unit_price * quantity) + WITH SYNONYMS ('revenue', 'total revenue'), + product_sales.avg_rating AS AVG(customer_rating) + WITH SYNONYMS ('rating', 'average rating'), + + -- Composite score blending price and rating via weighted VARIABLES + product_sales.performance_score AS ( + price_weight * AVG(unit_price) / MAX(unit_price) + + rating_weight * AVG(customer_rating) / 5.0 + ) + WITH SYNONYMS ('score', 'performance', 'composite score') + ) + + COMMENT = 'Product performance analytics with runtime-configurable variables. Scoring weights, price tier boundaries, and recency windows are all adjustable at query time without changing the SV DDL.' + + AI_SQL_GENERATION 'Variables can be overridden at query time using VARIABLES key => value. Default weights: price_weight=0.4, rating_weight=0.6. Default tiers: budget <$100, mid-range $100-$500, premium >$500. Use price_tier dimension to segment by dynamically-defined price buckets. Use is_recent to filter to recent products.'; diff --git a/skills/semantic-view-patterns/snippets/variables/semantic_view.yaml b/skills/semantic-view-patterns/snippets/variables/semantic_view.yaml new file mode 100644 index 00000000..5686d857 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/variables/semantic_view.yaml @@ -0,0 +1,73 @@ +# Variables: Semantic View YAML +# +# ⚠️ VARIABLES NOT SUPPORTED IN YAML: The YAML specification for Semantic Views +# does not support the VARIABLES clause. This is a DDL-only feature. +# +# This YAML defines the base SV structure without variables. The dynamic +# dimensions (price_tier, is_recent) and variable-driven performance_score +# metric are replaced with static equivalents. Use semantic_view.sql for +# the full parameterized VARIABLES pattern. +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); + +name: PRODUCT_PERFORMANCE_SV +description: > + Product performance analytics. NOTE: the full VARIABLES pattern (adjustable + scoring weights, price tiers, recency windows) requires DDL authoring + (semantic_view.sql). This YAML provides the base structure without runtime variables. + +tables: + - name: product_sales + description: Product sales transactions + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: PRODUCT_SALES + dimensions: + - name: product_name + synonyms: [product, name] + expr: PRODUCT_NAME + data_type: VARCHAR + - name: category + synonyms: [category, product category] + expr: CATEGORY + data_type: VARCHAR + - name: price_tier + synonyms: [tier, price tier, price segment] + description: > + Static price tier bucketing. For dynamic tier boundaries via VARIABLES, + use semantic_view.sql instead. + expr: > + CASE + WHEN UNIT_PRICE >= 500 THEN 'premium' + WHEN UNIT_PRICE >= 100 THEN 'mid-range' + ELSE 'budget' + END + data_type: VARCHAR + - name: is_recent + synonyms: [recent, is recent sale] + description: > + Whether the sale occurred in the last 90 days. For a configurable + recency window via VARIABLES, use semantic_view.sql instead. + expr: SALE_DATE >= DATEADD('day', -90, CURRENT_DATE()) + data_type: BOOLEAN + metrics: + - name: total_sales + synonyms: [sales count, number of sales] + expr: COUNT(SALE_ID) + - name: total_revenue + synonyms: [revenue, total revenue] + expr: SUM(UNIT_PRICE * QUANTITY) + - name: avg_rating + synonyms: [rating, average rating] + expr: AVG(CUSTOMER_RATING) + - name: performance_score + synonyms: [score, performance, composite score] + description: > + Composite score blending normalized price and rating with fixed weights + (0.4 price, 0.6 rating). For adjustable weights via VARIABLES, use + semantic_view.sql instead. + expr: > + 0.4 * AVG(UNIT_PRICE) / MAX(UNIT_PRICE) + + 0.6 * AVG(CUSTOMER_RATING) / 5.0 diff --git a/skills/semantic-view-patterns/snippets/window_metrics/README.md b/skills/semantic-view-patterns/snippets/window_metrics/README.md new file mode 100644 index 00000000..2d94e217 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/window_metrics/README.md @@ -0,0 +1,133 @@ +# Window Metrics (LAG, Rolling Average, YTD) + +## The Problem + +You need metrics that **span time** — comparing today to a prior period, smoothing daily noise into a rolling average, or accumulating a year-to-date total. These require window functions that operate over ordered rows, not simple aggregations. + +## How You Might Express This Need + +- "Show me revenue with a 7-day rolling average to smooth out weekend dips" +- "Compare today's sales to the same day 30 days ago" +- "I want a running YTD total that resets each January 1st" +- "What was the 7-day rolling average 30 days ago, for period-over-period comparison?" + +## Equivalent in Other Tools + +| Tool | Approach | +|------|----------| +| **SQL** | `AVG(revenue) OVER (ORDER BY date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW)` | +| **LookML** | `measure: rolling_7d { type: running_total ... }` (limited) | +| **dbt** | Window functions in model SQL; must be pre-materialized | +| **Power BI** | DAX `DATESINPERIOD()`, `TOTALYTD()` | +| **Tableau** | Table calculations: `WINDOW_AVG` for rolling, `LOOKUP(SUM([metric]), -1)` for LAG, `RUNNING_SUM` for YTD. Limited to dimensions present in the current view. | + +## Three Window Metric Patterns + +### 1. Rolling Average (RANGE INTERVAL) +```sql +STORESALES.rolling_7d_avg AS + AVG(total_revenue) + OVER (PARTITION BY EXCLUDING daily_sales.date + ORDER BY daily_sales.date + RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW) +``` +`PARTITION BY EXCLUDING` = "partition by all dimensions in the query **except** the ORDER BY dim." If channel is requested, each channel gets its own independent window. + +### 2. LAG — Prior Period Comparison +```sql +daily_sales.revenue_30d_ago AS + LAG(total_revenue, 30) + OVER (PARTITION BY EXCLUDING daily_sales.date + ORDER BY daily_sales.date) +``` +Returns the value of `total_revenue` from 30 rows (days) earlier in the same partition. NULL for the first 30 rows. + +### 3. YTD Cumulative Sum +```sql +daily_sales.ytd_revenue AS + SUM(total_revenue) + OVER (PARTITION BY daily_sales.year + ORDER BY daily_sales.date + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) +``` +`PARTITION BY year` (not EXCLUDING) explicitly partitions by year — the running total resets at each year boundary. + +## Key Rules + +- Window metrics must include their ORDER BY dimension in the DIMENSIONS clause of the query +- `PARTITION BY EXCLUDING ` partitions by all other query dimensions — adding more dimensions (e.g. channel) automatically applies the window per-group +- `PARTITION BY ` (without EXCLUDING) partitions explicitly by that dimension only +- `LAG(n)` returns NULL for the first n rows — expected behavior + +## What Doesn't Work + +### PARTITION BY EXCLUDING on FACT-based metrics + +If you declare a measure column in the `FACTS` clause and then use it in a base metric, `PARTITION BY EXCLUDING` will fail: + +```sql +FACTS (fact_table.revenue AS revenue) -- ← declares revenue as a FACT + +METRICS ( + fact_table.total_revenue AS SUM(fact_table.revenue), -- ← references a FACT + + fact_table.rolling_avg AS + AVG(total_revenue) + OVER (PARTITION BY EXCLUDING fact_table.date -- ← FAILS + ORDER BY fact_table.date + RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW) +) +``` +**Error:** `PARTITION BY EXCLUDING is not allowed when the window function operates over a row-level expression.` + +The engine classifies any metric whose expression references a FACT column as "row-level" — even though it's wrapped in `SUM()`. The fix is to **not declare measure columns in FACTS**. Leave them as plain table columns and reference them by bare physical name in the metric: + +```sql +-- No FACTS clause for revenue + +METRICS ( + fact_table.total_revenue AS SUM(revenue), -- ← bare physical column name, no entity prefix + + fact_table.rolling_avg AS + AVG(total_revenue) + OVER (PARTITION BY EXCLUDING fact_table.date -- ← works + ORDER BY fact_table.date + RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW) +) +``` + +### ROWS BETWEEN without PARTITION BY EXCLUDING + +Dropping `PARTITION BY EXCLUDING` entirely and using bare `ORDER BY` with `ROWS BETWEEN PRECEDING` also fails: + +```sql +fact_table.rolling AS + SUM(total_revenue) + OVER (ORDER BY fact_table.date + ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) -- ← FAILS +``` +**Error:** `Unsupported expression in the definition of derived metric.` + +Always include `PARTITION BY EXCLUDING` (or an explicit `PARTITION BY`) with window metrics. + +### Entity prefix in metric expressions + +Window metrics must use the **entity prefix on the metric name** (matching the snippet style `entity.metric_name AS ...`). Metrics defined without the entity prefix (`total_revenue AS SUM(revenue)`) may fail to resolve correctly in window function context. Always use: +```sql +fact_table.total_revenue AS SUM(revenue) +fact_table.rolling_avg AS AVG(total_revenue) OVER (...) +``` + +## Docs + +- [Defining and querying window function metrics](https://docs.snowflake.com/en/user-guide/views-semantic/querying#defining-and-querying-window-function-metrics) +- [CREATE SEMANTIC VIEW — window function syntax](https://docs.snowflake.com/en/sql-reference/sql/create-semantic-view#label-create-semantic-view-window-function-syntax) + +## Files + +| File | Description | +|------|-------------| +| `schema.sql` | `daily_sales` table | +| `seed_data.sql` | 35 days of daily sales | +| `semantic_view.sql` | SV with rolling avg, LAG, and YTD metrics | +| `queries.sql` | Each window pattern queried independently + combined | diff --git a/skills/semantic-view-patterns/snippets/window_metrics/queries.sql b/skills/semantic-view-patterns/snippets/window_metrics/queries.sql new file mode 100644 index 00000000..6c4489e4 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/window_metrics/queries.sql @@ -0,0 +1,94 @@ +-- Window Metrics: Queries + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +-- ============================================================ +-- WORKING QUERIES +-- ============================================================ + +-- 1. Daily revenue with 7-day rolling average +-- Shows smoothed trend vs raw daily noise +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.DAILY_SALES_SV + DIMENSIONS daily_sales.date + METRICS daily_sales.total_revenue, daily_sales.rolling_7d_avg_revenue +) +ORDER BY date; + + +-- 2. Period-over-period comparison: today vs 30 days ago +-- revenue_30d_ago is NULL for the first 30 rows (no prior data) +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.DAILY_SALES_SV + DIMENSIONS daily_sales.date + METRICS daily_sales.total_revenue, daily_sales.revenue_30d_ago +) +ORDER BY date; + + +-- 3. YTD cumulative revenue by day +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.DAILY_SALES_SV + DIMENSIONS daily_sales.date + METRICS daily_sales.total_revenue, daily_sales.ytd_revenue +) +ORDER BY date; + + +-- 4. All window metrics together — full picture +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.DAILY_SALES_SV + DIMENSIONS daily_sales.date + METRICS daily_sales.total_revenue, + daily_sales.rolling_7d_avg_revenue, + daily_sales.revenue_30d_ago, + daily_sales.ytd_revenue +) +ORDER BY date; + + +-- ============================================================ +-- IMPORTANT NOTES +-- ============================================================ + +-- PARTITION BY EXCLUDING : The window partitions by all other dimensions +-- requested in the query, EXCLUDING the specified one. +-- This means if you add channel to the query, each channel gets its own window. + +-- Window with channel breakdown (each channel gets its own 7-day window): +SELECT * FROM SEMANTIC_VIEW( + SNIPPETS.PUBLIC.DAILY_SALES_SV + DIMENSIONS daily_sales.date, daily_sales.channel + METRICS daily_sales.total_revenue, daily_sales.rolling_7d_avg_revenue +) +ORDER BY channel, date; + + +-- ============================================================ +-- GOTCHAS +-- ============================================================ + +-- 1. Window metrics require their ORDER BY dimension to be in the SELECT. +-- If you ask for ytd_revenue without also requesting daily_sales.date, +-- the result is ambiguous — include date in DIMENSIONS. + +-- 2. LAG(n) will be NULL for the first n rows — expected behavior. +-- Handle with COALESCE if needed in standard SQL wrapping. + +-- 3. Do NOT declare measure columns in FACTS if you want to use them in +-- window metrics. FACTS columns are treated as "row-level expressions"; +-- PARTITION BY EXCLUDING will fail with: +-- "PARTITION BY EXCLUDING is not allowed when the window function +-- operates over a row-level expression." +-- Fix: omit measure columns from FACTS and reference them by bare +-- physical column name in the base metric: SUM(revenue) not SUM(entity.revenue). + +-- 4. Always include PARTITION BY EXCLUDING (or explicit PARTITION BY) in +-- window metrics. Bare ORDER BY without PARTITION BY is unsupported: +-- SUM(total_revenue) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING ...) +-- will fail with "Unsupported expression in the definition of derived metric". + +-- 5. Use entity prefix on ALL metric names when the SV includes window metrics: +-- daily_sales.total_revenue AS SUM(revenue) -- correct +-- total_revenue AS SUM(revenue) -- may fail in window context diff --git a/skills/semantic-view-patterns/snippets/window_metrics/schema.sql b/skills/semantic-view-patterns/snippets/window_metrics/schema.sql new file mode 100644 index 00000000..06ec64b4 --- /dev/null +++ b/skills/semantic-view-patterns/snippets/window_metrics/schema.sql @@ -0,0 +1,14 @@ +-- Window Metrics: Schema + +CREATE DATABASE IF NOT EXISTS SNIPPETS; +CREATE SCHEMA IF NOT EXISTS SNIPPETS.PUBLIC; +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE TABLE daily_sales ( + sale_id INTEGER NOT NULL, + sale_date DATE NOT NULL, + channel VARCHAR(20) NOT NULL, + revenue NUMBER(10,2) NOT NULL, + quantity INTEGER NOT NULL +); diff --git a/skills/semantic-view-patterns/snippets/window_metrics/seed_data.sql b/skills/semantic-view-patterns/snippets/window_metrics/seed_data.sql new file mode 100644 index 00000000..0b0b58ff --- /dev/null +++ b/skills/semantic-view-patterns/snippets/window_metrics/seed_data.sql @@ -0,0 +1,43 @@ +-- Window Metrics: Seed Data +-- 30 days of daily sales data — enough for LAG(30) and rolling averages + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +INSERT INTO daily_sales VALUES + (1, '2024-01-01', 'online', 1200, 12), + (2, '2024-01-02', 'online', 950, 10), + (3, '2024-01-03', 'online', 1100, 11), + (4, '2024-01-04', 'online', 1300, 13), + (5, '2024-01-05', 'online', 1450, 15), + (6, '2024-01-06', 'online', 900, 9), + (7, '2024-01-07', 'online', 1000, 10), + (8, '2024-01-08', 'online', 1150, 12), + (9, '2024-01-09', 'online', 1250, 13), + (10, '2024-01-10', 'online', 1350, 14), + (11, '2024-01-11', 'online', 1600, 16), + (12, '2024-01-12', 'online', 1700, 17), + (13, '2024-01-13', 'online', 1400, 14), + (14, '2024-01-14', 'online', 1550, 16), + (15, '2024-01-15', 'online', 1800, 18), + (16, '2024-01-16', 'online', 1650, 17), + (17, '2024-01-17', 'online', 1500, 15), + (18, '2024-01-18', 'online', 1100, 11), + (19, '2024-01-19', 'online', 950, 9), + (20, '2024-01-20', 'online', 1200, 12), + (21, '2024-01-21', 'online', 1400, 14), + (22, '2024-01-22', 'online', 1600, 16), + (23, '2024-01-23', 'online', 1750, 18), + (24, '2024-01-24', 'online', 1850, 19), + (25, '2024-01-25', 'online', 1950, 20), + (26, '2024-01-26', 'online', 2000, 20), + (27, '2024-01-27', 'online', 1900, 19), + (28, '2024-01-28', 'online', 1800, 18), + (29, '2024-01-29', 'online', 2100, 21), + (30, '2024-01-30', 'online', 2200, 22), + -- February: used to show LAG(30) comparison back to January + (31, '2024-02-01', 'online', 1400, 14), + (32, '2024-02-02', 'online', 1250, 13), + (33, '2024-02-03', 'online', 1300, 13), + (34, '2024-02-04', 'online', 1500, 15), + (35, '2024-02-05', 'online', 1650, 17); diff --git a/skills/semantic-view-patterns/snippets/window_metrics/semantic_view.sql b/skills/semantic-view-patterns/snippets/window_metrics/semantic_view.sql new file mode 100644 index 00000000..dc4b644d --- /dev/null +++ b/skills/semantic-view-patterns/snippets/window_metrics/semantic_view.sql @@ -0,0 +1,73 @@ +-- Window Metrics: Semantic View DDL + +USE DATABASE SNIPPETS; +USE SCHEMA PUBLIC; + +CREATE OR REPLACE SEMANTIC VIEW SNIPPETS.PUBLIC.DAILY_SALES_SV + + TABLES ( + daily_sales + ) + + FACTS ( + daily_sales.sale_date_fact AS sale_date + ) + + DIMENSIONS ( + daily_sales.date AS sale_date + WITH SYNONYMS ('date', 'day', 'sale date'), + daily_sales.channel AS channel + WITH SYNONYMS ('channel', 'sales channel'), + daily_sales.year AS YEAR(sale_date) + WITH SYNONYMS ('year'), + daily_sales.month AS MONTH(sale_date) + WITH SYNONYMS ('month') + ) + + METRICS ( + -- Base metric: daily total + -- NOTE: 'revenue' is a bare physical column name (no entity prefix). + -- Do NOT declare revenue in FACTS \u2014 FACTS columns are "row-level" and + -- PARTITION BY EXCLUDING will fail on any metric that references them. + -- Always use entity prefix on the metric name (daily_sales.total_revenue) + -- when the SV includes window metrics. + daily_sales.total_revenue AS SUM(revenue) + WITH SYNONYMS ('revenue', 'daily revenue'), + + daily_sales.total_quantity AS SUM(quantity) + WITH SYNONYMS ('quantity', 'units sold'), + + -- Rolling 7-day average: + -- PARTITION BY EXCLUDING daily_sales.date → group by everything else (e.g. channel) + -- ORDER BY date → march forward in time + -- RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW → 7-day window + daily_sales.rolling_7d_avg_revenue AS + AVG(total_revenue) + OVER (PARTITION BY EXCLUDING daily_sales.date + ORDER BY daily_sales.date + RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW) + WITH SYNONYMS ('7 day rolling average', '7-day avg', 'weekly rolling average'), + + -- LAG — value 30 rows (days) ago in the same partition + -- Used to compare current period vs the same period last month + daily_sales.revenue_30d_ago AS + LAG(total_revenue, 30) + OVER (PARTITION BY EXCLUDING daily_sales.date + ORDER BY daily_sales.date) + WITH SYNONYMS ('revenue 30 days ago', 'prior month revenue', 'last month revenue'), + + -- YTD: running total within each year + -- PARTITION BY daily_sales.year → reset at year boundary + -- ORDER BY date → accumulate forward + -- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW → include all prior rows + daily_sales.ytd_revenue AS + SUM(total_revenue) + OVER (PARTITION BY daily_sales.year + ORDER BY daily_sales.date + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) + WITH SYNONYMS ('YTD revenue', 'year to date revenue', 'cumulative revenue') + ) + + COMMENT = 'Daily sales metrics demonstrating three window function patterns: rolling 7-day average, LAG for period-over-period comparison, and YTD cumulative sum.' + + AI_SQL_GENERATION 'Use rolling_7d_avg_revenue for smoothed trend analysis. Use revenue_30d_ago alongside total_revenue to compare current vs prior period (month-over-month). Use ytd_revenue for cumulative year-to-date totals. Window metrics require daily_sales.date in the DIMENSIONS clause to show day-level results. PARTITION BY EXCLUDING means "partition by all other dimensions in the query except date".'; diff --git a/skills/semantic-view-patterns/snippets/window_metrics/semantic_view.yaml b/skills/semantic-view-patterns/snippets/window_metrics/semantic_view.yaml new file mode 100644 index 00000000..031ece9f --- /dev/null +++ b/skills/semantic-view-patterns/snippets/window_metrics/semantic_view.yaml @@ -0,0 +1,79 @@ +# Window Metrics: Semantic View YAML +# +# Deploy with: +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$); +# +# Verify without deploying (dry-run): +# CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML('TARGET_DB.TARGET_SCHEMA', $$ $$, TRUE); +# +# DDL-only features not in YAML: +# - AI_SQL_GENERATION → module_custom_instructions: sql_generation: in YAML +# - PARTITION BY EXCLUDING syntax in window metrics is DDL-only; +# YAML expresses window functions using standard SQL in the expr field + +name: DAILY_SALES_SV +description: > + Daily sales metrics demonstrating three window function patterns: rolling + 7-day average, LAG for period-over-period comparison, and YTD cumulative sum. + +tables: + - name: daily_sales + description: One row per day per sales channel + base_table: + database: TARGET_DB + schema: TARGET_SCHEMA + table: DAILY_SALES + dimensions: + - name: date + synonyms: [date, day, sale date] + expr: SALE_DATE + data_type: DATE + - name: channel + synonyms: [channel, sales channel] + expr: CHANNEL + data_type: VARCHAR + - name: year + synonyms: [year] + expr: YEAR(SALE_DATE) + data_type: NUMBER + - name: month + synonyms: [month] + expr: MONTH(SALE_DATE) + data_type: NUMBER + facts: + - name: sale_date_fact + expr: SALE_DATE + data_type: DATE + metrics: + - name: total_revenue + synonyms: [revenue, daily revenue] + expr: SUM(REVENUE) + - name: total_quantity + synonyms: [quantity, units sold] + expr: SUM(QUANTITY) + # Window metrics: YAML uses standard SQL window syntax in the expr field. + # DDL uses PARTITION BY EXCLUDING which is not available in YAML; + # the equivalent below partitions by channel explicitly. + - name: rolling_7d_avg_revenue + synonyms: [7 day rolling average, 7-day avg, weekly rolling average] + description: 7-day rolling average of daily revenue, partitioned by channel + expr: > + AVG(SUM(REVENUE)) + OVER (PARTITION BY CHANNEL + ORDER BY SALE_DATE + RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW) + - name: revenue_30d_ago + synonyms: [revenue 30 days ago, prior month revenue, last month revenue] + description: Revenue value 30 days prior in the same channel + expr: > + LAG(SUM(REVENUE), 30) + OVER (PARTITION BY CHANNEL + ORDER BY SALE_DATE) + - name: ytd_revenue + synonyms: [YTD revenue, year to date revenue, cumulative revenue] + description: Running year-to-date total, reset at each year boundary + expr: > + SUM(SUM(REVENUE)) + OVER (PARTITION BY YEAR(SALE_DATE), CHANNEL + ORDER BY SALE_DATE + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)