Skip to content

Commit bcd9558

Browse files
committed
feat: altimate-code v0.1.4
AI-powered CLI for SQL analysis, dbt integration, and data engineering. - TypeScript CLI with cross-platform binaries (npm, Homebrew) - Python engine sidecar for SQL analysis and warehouse connectivity - JSON-RPC bridge between CLI and engine - AI-powered code review, SQL optimization, and lineage analysis - dbt project integration with model parsing and profile management - GitHub Actions and CI/CD integration - MCP (Model Context Protocol) support
0 parents  commit bcd9558

741 files changed

Lines changed: 195320 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
name: cost-report
3+
description: Analyze Snowflake query costs and identify optimization opportunities
4+
---
5+
6+
# Cost Report
7+
8+
## Requirements
9+
**Agent:** any (read-only analysis)
10+
**Tools used:** sql_execute, sql_analyze, sql_predict_cost, sql_record_feedback
11+
12+
Analyze Snowflake warehouse query costs, identify the most expensive queries, detect anti-patterns, and recommend optimizations.
13+
14+
## Workflow
15+
16+
1. **Query SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY** for the top 20 most expensive queries by credits used:
17+
18+
```sql
19+
SELECT
20+
query_id,
21+
query_text,
22+
user_name,
23+
warehouse_name,
24+
query_type,
25+
credits_used_cloud_services,
26+
bytes_scanned,
27+
rows_produced,
28+
total_elapsed_time,
29+
execution_status,
30+
start_time
31+
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
32+
WHERE start_time >= DATEADD('day', -30, CURRENT_TIMESTAMP())
33+
AND execution_status = 'SUCCESS'
34+
AND credits_used_cloud_services > 0
35+
ORDER BY credits_used_cloud_services DESC
36+
LIMIT 20;
37+
```
38+
39+
Use `sql_execute` to run this query against the connected Snowflake warehouse.
40+
41+
2. **Group and summarize** the results by:
42+
- **User**: Which users are driving the most cost?
43+
- **Warehouse**: Which warehouses consume the most credits?
44+
- **Query type**: SELECT vs INSERT vs CREATE TABLE AS SELECT vs MERGE, etc.
45+
46+
Present each grouping as a markdown table.
47+
48+
3. **Analyze the top offenders** - For each of the top 10 most expensive queries:
49+
- Run `sql_analyze` on the query text to detect anti-patterns (SELECT *, missing LIMIT, cartesian products, correlated subqueries, etc.)
50+
- Run `sql_predict_cost` to get the cost tier prediction based on historical feedback data
51+
- Summarize anti-patterns found and their severity
52+
53+
4. **Classify each query into a cost tier**:
54+
55+
| Tier | Credits | Label | Action |
56+
|------|---------|-------|--------|
57+
| 1 | < $0.01 | Cheap | No action needed |
58+
| 2 | $0.01 - $1.00 | Moderate | Review if frequent |
59+
| 3 | $1.00 - $100.00 | Expensive | Optimize or review warehouse sizing |
60+
| 4 | > $100.00 | Dangerous | Immediate review required |
61+
62+
5. **Record feedback** - For each query analyzed, call `sql_record_feedback` to store the execution metrics so future predictions improve:
63+
- Pass `bytes_scanned`, `execution_time_ms`, `credits_used`, and `warehouse_size` from the query history results
64+
65+
6. **Output the final report** as a structured markdown document:
66+
67+
```
68+
# Snowflake Cost Report (Last 30 Days)
69+
70+
## Summary
71+
- Total credits consumed: X
72+
- Number of unique queries: Y
73+
- Most expensive query: Z credits
74+
75+
## Cost by User
76+
| User | Total Credits | Query Count | Avg Credits/Query |
77+
|------|--------------|-------------|-------------------|
78+
79+
## Cost by Warehouse
80+
| Warehouse | Total Credits | Query Count | Avg Credits/Query |
81+
|-----------|--------------|-------------|-------------------|
82+
83+
## Cost by Query Type
84+
| Query Type | Total Credits | Query Count | Avg Credits/Query |
85+
|------------|--------------|-------------|-------------------|
86+
87+
## Top 10 Expensive Queries (Detailed Analysis)
88+
89+
### Query 1 (X credits) - DANGEROUS
90+
**User:** user_name | **Warehouse:** wh_name | **Type:** SELECT
91+
**Anti-patterns found:**
92+
- SELECT_STAR (warning): Query uses SELECT * ...
93+
- MISSING_LIMIT (info): ...
94+
95+
**Optimization suggestions:**
96+
1. Select only needed columns
97+
2. Add LIMIT clause
98+
3. Consider partitioning strategy
99+
100+
**Cost prediction:** Tier 1 (fingerprint match, high confidence)
101+
102+
...
103+
104+
## Recommendations
105+
1. Top priority optimizations
106+
2. Warehouse sizing suggestions
107+
3. Scheduling recommendations
108+
```
109+
110+
## Usage
111+
112+
The user invokes this skill with:
113+
- `/cost-report` -- Analyze the last 30 days
114+
- `/cost-report 7` -- Analyze the last 7 days (adjust the DATEADD interval)
115+
116+
Use the tools: `sql_execute`, `sql_analyze`, `sql_predict_cost`, `sql_record_feedback`.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
name: dbt-docs
3+
description: Generate or improve dbt model documentation — column descriptions, model descriptions, and doc blocks.
4+
---
5+
6+
# Generate dbt Documentation
7+
8+
## Requirements
9+
**Agent:** builder or migrator (requires file write access)
10+
**Tools used:** glob, read, schema_inspect, dbt_manifest, edit, write
11+
12+
> **When to use this vs other skills:** Use /dbt-docs to add or improve descriptions in existing schema.yml. Use /yaml-config to create schema.yml from scratch. Use /generate-tests to add test scaffolding.
13+
14+
Generate comprehensive documentation for dbt models by analyzing SQL logic, schema metadata, and existing docs.
15+
16+
## Workflow
17+
18+
1. **Find the target model** — Use `glob` to locate the model SQL and any existing schema YAML
19+
2. **Read the model SQL** — Understand the transformations, business logic, and column derivations
20+
3. **Read existing docs** — Check for existing `schema.yml`, `_<model>__models.yml`, and `docs/` blocks
21+
4. **Inspect schema** — Use `schema_inspect` to get column types and nullability
22+
5. **Read upstream models** — Use `dbt_manifest` to find dependencies, then `read` upstream SQL to understand data flow
23+
6. **Generate documentation**:
24+
25+
### Model-Level Description
26+
Write a clear, concise description that covers:
27+
- **What** this model represents (business entity)
28+
- **Why** it exists (use case)
29+
- **How** it's built (key transformations, joins, filters)
30+
- **When** it refreshes (materialization strategy)
31+
32+
Example:
33+
```yaml
34+
- name: fct_daily_revenue
35+
description: >
36+
Daily revenue aggregation by product category. Joins staged orders with
37+
product dimensions and calculates gross/net revenue. Materialized as
38+
incremental with a unique key on (date_day, category_id). Used by the
39+
finance team for daily P&L reporting.
40+
```
41+
42+
### Column-Level Descriptions
43+
For each column, describe:
44+
- What the column represents in business terms
45+
- How it's derived (if calculated/transformed)
46+
- Any important caveats (nullability, edge cases)
47+
48+
Example:
49+
```yaml
50+
columns:
51+
- name: net_revenue
52+
description: >
53+
Total revenue minus refunds and discounts for the day.
54+
Calculated as: gross_revenue - refund_amount - discount_amount.
55+
Can be negative if refunds exceed sales.
56+
```
57+
58+
### Doc Blocks (for shared definitions)
59+
If a definition is reused across models, generate a doc block:
60+
61+
```markdown
62+
{% docs customer_id %}
63+
Unique identifier for a customer. Sourced from the `customers` table
64+
in the raw Stripe schema. Used as the primary join key across all
65+
customer-related models.
66+
{% enddocs %}
67+
```
68+
69+
7. **Write output** — Use `edit` to update existing YAML or `write` to create new files
70+
71+
## Quality Checklist
72+
- Every column has a description (no empty descriptions)
73+
- Descriptions use business terms, not technical jargon
74+
- Calculated columns explain their formula
75+
- Primary keys are identified
76+
- Foreign key relationships are documented
77+
- Edge cases and null handling are noted
78+
79+
## Usage
80+
81+
- `/dbt-docs models/marts/fct_daily_revenue.sql`
82+
- `/dbt-docs stg_stripe__payments`
83+
- `/dbt-docs --all models/staging/stripe/` — Document all models in a directory
84+
85+
Use the tools: `glob`, `read`, `schema_inspect`, `dbt_manifest`, `edit`, `write`.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
name: generate-tests
3+
description: Generate dbt tests for a model by inspecting its schema and SQL, producing schema.yml test definitions.
4+
---
5+
6+
# Generate dbt Tests
7+
8+
## Requirements
9+
**Agent:** builder or migrator (requires file write access)
10+
**Tools used:** glob, read, schema_inspect, write, edit
11+
12+
> **When to use this vs other skills:** Use /generate-tests for automated test scaffolding based on column patterns. Use /yaml-config for generating full schema.yml from scratch. Use /dbt-docs for adding descriptions to existing YAML.
13+
14+
Generate comprehensive dbt test definitions for a model. This skill inspects the model's schema, reads its SQL, and produces appropriate tests.
15+
16+
## Workflow
17+
18+
1. **Find the model file** — Use `glob` to locate the model SQL file
19+
2. **Read the model SQL** — Understand the transformations, joins, and column expressions
20+
3. **Inspect the schema** — Use `schema_inspect` to get column names, types, and constraints if a warehouse connection is available. If not, infer columns from the SQL.
21+
4. **Read existing schema.yml** — Use `glob` and `read` to find and load any existing `schema.yml` or `_schema.yml` in the same directory
22+
5. **Generate tests** based on column patterns:
23+
24+
### Test Generation Rules
25+
26+
| Column Pattern | Tests to Generate |
27+
|---|---|
28+
| `*_id` columns | `unique`, `not_null`, `relationships` (if source table is identifiable) |
29+
| `status`, `type`, `category` columns | `accepted_values` (infer values from SQL if possible, otherwise leave as placeholder) |
30+
| Date/timestamp columns | `not_null` |
31+
| Boolean columns | `accepted_values: [true, false]` |
32+
| Columns in PRIMARY KEY | `unique`, `not_null` |
33+
| Columns marked NOT NULL in schema | `not_null` |
34+
| All columns | Consider `not_null` if they appear in JOIN conditions or WHERE filters |
35+
36+
### Output Format
37+
38+
Generate a YAML block that can be merged into the model's `schema.yml`:
39+
40+
```yaml
41+
models:
42+
- name: model_name
43+
columns:
44+
- name: column_name
45+
tests:
46+
- unique
47+
- not_null
48+
- relationships:
49+
to: ref('source_model')
50+
field: id
51+
```
52+
53+
6. **Write or patch the schema.yml** — If a schema.yml exists, merge the new tests into it (don't duplicate existing tests). If none exists, create one in the same directory as the model.
54+
55+
## Usage
56+
57+
The user invokes this skill with a model name or path:
58+
- `/generate-tests models/staging/stg_orders.sql`
59+
- `/generate-tests stg_orders`
60+
61+
Use the tools: `glob`, `read`, `schema_inspect` (if warehouse available), `write` or `edit`.
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
name: impact-analysis
3+
description: Analyze the downstream impact of changes to a dbt model by combining column-level lineage with the dbt dependency graph.
4+
---
5+
6+
# Impact Analysis
7+
8+
## Requirements
9+
**Agent:** any (read-only analysis)
10+
**Tools used:** dbt_manifest, lineage_check, sql_analyze, glob, bash, read
11+
12+
Determine which downstream models, tests, and dashboards are affected when a dbt model changes.
13+
14+
## Workflow
15+
16+
1. **Identify the changed model** — Either:
17+
- Accept a model name or file path from the user
18+
- Detect changed `.sql` files via `git diff --name-only` using `bash`
19+
20+
2. **Load the dbt manifest** — Call `dbt_manifest` with the project's `target/manifest.json` path.
21+
- If the user specifies a manifest path, use that
22+
- Otherwise search for `target/manifest.json` or `manifest.json` using `glob`
23+
24+
3. **Find the changed model in the manifest** — Match by model name or file path.
25+
Extract: `unique_id`, `depends_on`, `columns`, `materialized`
26+
27+
4. **Build the downstream dependency graph** — From the manifest:
28+
- Find all models whose `depends_on` includes the changed model's `unique_id`
29+
- Recursively expand to get the full downstream tree (depth-first)
30+
- Track depth level for each downstream model
31+
32+
5. **Run column-level lineage** — Call `lineage_check` on the changed model's SQL to get:
33+
- Which source columns flow to which output columns
34+
- Which columns were added, removed, or renamed (if comparing old vs new)
35+
36+
6. **Cross-reference lineage with downstream models** — For each downstream model:
37+
- Check if it references any of the changed columns
38+
- Run `lineage_check` on the downstream model's SQL if available
39+
- Classify impact: BREAKING (removed/renamed column used downstream), SAFE (added column, no downstream reference), UNKNOWN (can't determine)
40+
41+
7. **Generate the impact report**:
42+
43+
```
44+
Impact Analysis: stg_orders
45+
════════════════════════════
46+
47+
Changed Model: stg_orders (materialized: view)
48+
Source columns: 5 → 6 (+1 added)
49+
Removed columns: none
50+
Modified columns: order_total (renamed from total_amount)
51+
52+
Downstream Impact (3 models affected):
53+
54+
Depth 1:
55+
[BREAKING] int_order_metrics
56+
References: order_total (was total_amount) — COLUMN RENAMED
57+
Action needed: Update column reference
58+
59+
[SAFE] int_order_summary
60+
No references to changed columns
61+
62+
Depth 2:
63+
[BREAKING] mart_revenue
64+
References: order_total via int_order_metrics — CASCADING BREAK
65+
Action needed: Verify after fixing int_order_metrics
66+
67+
Tests at Risk: 4
68+
- not_null_stg_orders_order_total
69+
- unique_int_order_metrics_order_id
70+
- accepted_values_stg_orders_status
71+
- relationships_int_order_metrics_order_id
72+
73+
Summary: 2 BREAKING, 1 SAFE, 0 UNKNOWN
74+
Recommended: Fix int_order_metrics first, then run `dbt test -s stg_orders+`
75+
```
76+
77+
## Without Manifest (SQL-only mode)
78+
79+
If no dbt manifest is available, fall back to SQL-only analysis:
80+
1. Run `lineage_check` on the changed SQL
81+
2. Show the column-level data flow
82+
3. Note that downstream impact cannot be determined without a manifest
83+
4. Suggest running `dbt docs generate` to create a manifest
84+
85+
## Tools Used
86+
87+
- `dbt_manifest` — Load the dbt dependency graph
88+
- `lineage_check` — Column-level lineage for each model
89+
- `sql_analyze` — Check for anti-patterns in changed SQL
90+
- `glob` — Find manifest and SQL files
91+
- `bash` — Git operations for detecting changes
92+
- `read` — Read SQL files from disk
93+
94+
## Usage Examples
95+
96+
- `/impact-analysis stg_orders` — Analyze impact of changes to stg_orders
97+
- `/impact-analysis models/staging/stg_orders.sql` — Analyze by file path
98+
- `/impact-analysis` — Auto-detect changed models from git diff

0 commit comments

Comments
 (0)