Skip to content

Commit 05b3d42

Browse files
anandgupta42claude
andauthored
feat: /discover command — data stack setup with project_scan tool (#30)
* feat: replace /init with data stack setup command Replace the AGENTS.md-generating /init command with a comprehensive data stack scanner that detects dbt projects, warehouse connections, Docker databases, installed tools, and config files. The AI agent then walks the user through adding connections, testing them, and indexing schemas. New project_scan tool with 5 exported detection functions: - detectGit: branch, remote URL - detectDbtProject: dbt_project.yml, manifest, packages - detectEnvVars: Snowflake, BigQuery, Databricks, Postgres, MySQL, Redshift - detectDataTools: dbt, sqlfluff, airflow, dagster, prefect, soda, sqlmesh, great_expectations, sqlfmt - detectConfigFiles: .altimate-code/, .sqlfluff, .pre-commit-config.yaml Tests: 71 TypeScript (bun:test) + 24 Python (pytest) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: restore /init, rename data stack setup to /discover Restore the original /init command (creates AGENTS.md) and move the data stack setup functionality to /discover instead. Updates all docs to reference /discover as the recommended first-run command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make detectGit test resilient to CI detached HEAD GitHub Actions checks out in detached HEAD state, so git branch --show-current returns empty. The test now accepts undefined branch in that case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: treat empty git branch as undefined in detectGit In CI detached HEAD, git branch --show-current returns an empty string. Convert empty string to undefined so callers get a clean undefined instead of "". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove branch type assertion from detectGit repo test The "detects a git repository" test should only assert isRepo. Branch validation is handled by the dedicated branch test which accounts for CI detached HEAD returning undefined. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — redact secrets, parse DATABASE_URL scheme, parallelize tool checks - Redact sensitive env var values (password, access_token, connection_string) in scan output with "***" so secrets are never sent to the LLM - Parse DATABASE_URL scheme (postgresql://, mysql://, etc.) to detect the correct database type instead of assuming Postgres - Parallelize tool version checks with Promise.all instead of sequential loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: sync Python reference implementation with TypeScript changes - Remove DATABASE_URL from postgres signals, add scheme-based detection - Add secret redaction (password, access_token, connection_string) - Add tests for DATABASE_URL scheme parsing and deduplication Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 6b87f0f commit 05b3d42

File tree

11 files changed

+2027
-5
lines changed

11 files changed

+2027
-5
lines changed

docs/docs/configure/commands.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,37 @@
11
# Commands
22

3+
## Built-in Commands
4+
5+
altimate-code ships with three built-in slash commands:
6+
7+
| Command | Description |
8+
|---------|-------------|
9+
| `/init` | Create or update an AGENTS.md file with build commands and code style guidelines. |
10+
| `/discover` | Scan your data stack and set up warehouse connections. Detects dbt projects, warehouse connections from profiles/Docker/env vars, installed tools, and config files. Walks you through adding and testing new connections, then indexes schemas. |
11+
| `/review` | Review changes — accepts `commit`, `branch`, or `pr` as an argument (defaults to uncommitted changes). |
12+
13+
### `/discover`
14+
15+
The recommended way to set up a new data engineering project. Run `/discover` in the TUI and the agent will:
16+
17+
1. Call `project_scan` to detect your full environment
18+
2. Present what was found (dbt project, connections, tools, config files)
19+
3. Offer to add each new connection discovered (from dbt profiles, Docker, environment variables)
20+
4. Test each connection with `warehouse_test`
21+
5. Offer to index schemas for autocomplete and context-aware analysis
22+
6. Show available skills and agent modes
23+
24+
### `/review`
25+
26+
```
27+
/review # review uncommitted changes
28+
/review commit # review the last commit
29+
/review branch # review all changes on the current branch
30+
/review pr # review the current pull request
31+
```
32+
33+
## Custom Commands
34+
335
Custom commands let you define reusable slash commands.
436

537
## Creating Commands

docs/docs/data-engineering/tools/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,6 @@ altimate-code has 55+ specialized tools organized by function.
99
| [FinOps Tools](finops-tools.md) | 8 tools | Cost analysis, warehouse sizing, unused resources, RBAC |
1010
| [Lineage Tools](lineage-tools.md) | 1 tool | Column-level lineage tracing with confidence scoring |
1111
| [dbt Tools](dbt-tools.md) | 2 tools + 6 skills | Run, manifest parsing, test generation, scaffolding |
12-
| [Warehouse Tools](warehouse-tools.md) | 2 tools | Connection management and testing |
12+
| [Warehouse Tools](warehouse-tools.md) | 6 tools | Environment scanning, connection management, discovery, testing |
1313

1414
All tools are available in the interactive TUI. The agent automatically selects the right tools based on your request.

docs/docs/data-engineering/tools/warehouse-tools.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,89 @@
11
# Warehouse Tools
22

3+
## project_scan
4+
5+
Scan the entire data engineering environment in one call. Detects dbt projects, warehouse connections, Docker databases, installed tools, and configuration files. Used by the `/discover` command.
6+
7+
```
8+
> /discover
9+
10+
# Environment Scan
11+
12+
## Python Engine
13+
✓ Engine healthy
14+
15+
## Git Repository
16+
✓ Git repo on branch `main` (origin: github.com/org/analytics)
17+
18+
## dbt Project
19+
✓ Project "analytics" (profile: snowflake_prod)
20+
Models: 47, Sources: 12, Tests: 89
21+
✓ packages.yml found
22+
23+
## Warehouse Connections
24+
25+
### Already Configured
26+
Name | Type | Database
27+
prod-snowflake | snowflake | ANALYTICS
28+
29+
### From dbt profiles.yml
30+
Name | Type | Source
31+
dbt_snowflake_dev | snowflake | dbt-profile
32+
33+
### From Docker
34+
Container | Type | Host:Port
35+
local-postgres | postgres | localhost:5432
36+
37+
### From Environment Variables
38+
Name | Type | Signal
39+
env_bigquery | bigquery | GOOGLE_APPLICATION_CREDENTIALS
40+
41+
## Installed Data Tools
42+
✓ dbt v1.8.4
43+
✓ sqlfluff v3.1.0
44+
✗ airflow (not found)
45+
46+
## Config Files
47+
✓ .altimate-code/altimate-code.json
48+
✓ .sqlfluff
49+
✗ .pre-commit-config.yaml (not found)
50+
```
51+
52+
### What it detects
53+
54+
| Category | Detection method |
55+
|----------|-----------------|
56+
| **Git** | `git` commands (branch, remote) |
57+
| **dbt project** | Walks up directories for `dbt_project.yml`, reads name/profile |
58+
| **dbt manifest** | Parses `target/manifest.json` for model/source/test counts |
59+
| **dbt profiles** | Bridge call to parse `~/.dbt/profiles.yml` |
60+
| **Docker DBs** | Bridge call to discover running PostgreSQL/MySQL/MSSQL containers |
61+
| **Existing connections** | Bridge call to list already-configured warehouses |
62+
| **Environment variables** | Scans `process.env` for warehouse signals (see table below) |
63+
| **Schema cache** | Bridge call for indexed warehouse status |
64+
| **Data tools** | Spawns `tool --version` for 9 common tools |
65+
| **Config files** | Checks for `.altimate-code/`, `.sqlfluff`, `.pre-commit-config.yaml` |
66+
67+
### Environment variable detection
68+
69+
| Warehouse | Signal (any one triggers detection) |
70+
|-----------|-------------------------------------|
71+
| Snowflake | `SNOWFLAKE_ACCOUNT` |
72+
| BigQuery | `GOOGLE_APPLICATION_CREDENTIALS`, `BIGQUERY_PROJECT`, `GCP_PROJECT` |
73+
| Databricks | `DATABRICKS_HOST`, `DATABRICKS_SERVER_HOSTNAME` |
74+
| PostgreSQL | `PGHOST`, `PGDATABASE`, `DATABASE_URL` |
75+
| MySQL | `MYSQL_HOST`, `MYSQL_DATABASE` |
76+
| Redshift | `REDSHIFT_HOST` |
77+
78+
### Parameters
79+
80+
| Parameter | Type | Description |
81+
|-----------|------|-------------|
82+
| `skip_docker` | boolean | Skip Docker container discovery (faster) |
83+
| `skip_tools` | boolean | Skip installed tool detection (faster) |
84+
85+
---
86+
387
## warehouse_list
488

589
List all configured warehouse connections.
@@ -54,3 +138,43 @@ Testing connection to bigquery-prod (bigquery)...
54138
| `Object does not exist` | Wrong database/schema | Verify database name in config |
55139
| `Role not authorized` | Insufficient privileges | Use a role with USAGE on warehouse |
56140
| `Timeout` | Network latency | Increase connection timeout |
141+
142+
---
143+
144+
## warehouse_add
145+
146+
Add a new warehouse connection by providing a name and configuration.
147+
148+
```
149+
> warehouse_add my-postgres {"type": "postgres", "host": "localhost", "port": 5432, "database": "analytics", "user": "analyst", "password": "secret"}
150+
151+
✓ Added warehouse 'my-postgres' (postgres)
152+
```
153+
154+
---
155+
156+
## warehouse_remove
157+
158+
Remove an existing warehouse connection.
159+
160+
```
161+
> warehouse_remove my-postgres
162+
163+
✓ Removed warehouse 'my-postgres'
164+
```
165+
166+
---
167+
168+
## warehouse_discover
169+
170+
Discover database containers running in Docker. Detects PostgreSQL, MySQL/MariaDB, and SQL Server containers with their connection details.
171+
172+
```
173+
> warehouse_discover
174+
175+
Container | Type | Host:Port | User | Database | Status
176+
local-postgres | postgres | localhost:5432 | postgres | postgres | running
177+
mysql-dev | mysql | localhost:3306 | root | mydb | running
178+
179+
Use warehouse_add to save any of these as a connection.
180+
```

docs/docs/getting-started.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,23 @@ npm install -g @altimateai/altimate-code
1212
altimate-code
1313
```
1414

15-
The TUI launches with an interactive terminal. On first run, use the `/connect` command to configure:
15+
The TUI launches with an interactive terminal. On first run, use the `/discover` command to auto-detect your data stack:
1616

17-
1. **LLM provider** — Choose your AI backend (Anthropic, OpenAI, Codex, etc.)
18-
2. **Warehouse connection** — Connect to your data warehouse
17+
```
18+
/discover
19+
```
20+
21+
`/discover` scans your environment and sets up everything automatically:
22+
23+
1. **Detects your dbt project** — finds `dbt_project.yml`, parses the manifest, and reads profiles
24+
2. **Discovers warehouse connections** — from `~/.dbt/profiles.yml`, running Docker containers, and environment variables (e.g. `SNOWFLAKE_ACCOUNT`, `PGHOST`, `DATABASE_URL`)
25+
3. **Checks installed tools** — dbt, sqlfluff, airflow, dagster, prefect, soda, sqlmesh, great_expectations, sqlfmt
26+
4. **Offers to configure connections** — walks you through adding and testing each discovered warehouse
27+
5. **Indexes schemas** — populates the schema cache for autocomplete and context-aware analysis
28+
29+
You can also configure connections manually — see [Warehouse connections](#warehouse-connections) below.
30+
31+
To set up your LLM provider, use the `/connect` command.
1932

2033
## Configuration
2134

docs/docs/usage/tui.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The TUI has three main areas:
2020
|--------|--------|---------|
2121
| `@` | Reference a file | `@src/models/user.sql explain this model` |
2222
| `!` | Run a shell command | `!dbt run --select my_model` |
23-
| `/` | Slash command | `/connect`, `/models`, `/theme` |
23+
| `/` | Slash command | `/discover`, `/connect`, `/review`, `/models`, `/theme` |
2424

2525
## Leader Key
2626

packages/altimate-code/src/command/index.ts

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import { Config } from "../config/config"
44
import { Instance } from "../project/instance"
55
import { Identifier } from "../id/id"
66
import PROMPT_INITIALIZE from "./template/initialize.txt"
7+
import PROMPT_DISCOVER from "./template/discover.txt"
78
import PROMPT_REVIEW from "./template/review.txt"
89
import { MCP } from "../mcp"
910
import { Skill } from "../skill"
@@ -53,6 +54,7 @@ export namespace Command {
5354

5455
export const Default = {
5556
INIT: "init",
57+
DISCOVER: "discover",
5658
REVIEW: "review",
5759
} as const
5860

@@ -69,6 +71,15 @@ export namespace Command {
6971
},
7072
hints: hints(PROMPT_INITIALIZE),
7173
},
74+
[Default.DISCOVER]: {
75+
name: Default.DISCOVER,
76+
description: "scan data stack and set up connections",
77+
source: "command",
78+
get template() {
79+
return PROMPT_DISCOVER
80+
},
81+
hints: hints(PROMPT_DISCOVER),
82+
},
7283
[Default.REVIEW]: {
7384
name: Default.REVIEW,
7485
description: "review changes [commit|branch|pr], defaults to uncommitted",
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
You are setting up altimate-code for a data engineering project. Guide the user through environment detection and warehouse connection setup.
2+
3+
Step 1 — Scan the environment:
4+
Call the `project_scan` tool to detect the full data engineering environment. Present the results clearly to the user.
5+
6+
Step 2 — Review what was found:
7+
Summarize the scan results in a friendly way:
8+
- Git repository details
9+
- dbt project (name, profile, model/source/test counts)
10+
- Warehouse connections already configured
11+
- New connections discovered from dbt profiles, Docker containers, and environment variables
12+
- Schema cache status (which warehouses are indexed)
13+
- Installed data tools (dbt, sqlfluff, etc.)
14+
- Configuration files found
15+
16+
Step 3 — Set up new connections:
17+
For each NEW warehouse connection discovered (not already configured):
18+
- Present the connection details and ask the user if they want to add it
19+
- If yes, call `warehouse_add` with the detected configuration
20+
- Then call `warehouse_test` to verify connectivity
21+
- Report whether the connection succeeded or failed
22+
- If it failed, offer to let the user correct the configuration
23+
24+
Skip this step if there are no new connections to add.
25+
26+
Step 4 — Index schemas:
27+
If any warehouses are connected but not yet indexed in the schema cache:
28+
- Ask the user if they want to index schemas now (explain this enables autocomplete, search, and context-aware analysis)
29+
- If yes, call `schema_index` for each selected warehouse
30+
- Report the number of schemas, tables, and columns indexed
31+
32+
Skip this step if all connected warehouses are already indexed or if no warehouses are connected.
33+
34+
Step 5 — Show next steps:
35+
Present a summary of what was set up, then suggest what the user can do next:
36+
37+
**Available skills:**
38+
- `/cost-report` — Analyze warehouse spending and find optimization opportunities
39+
- `/dbt-docs` — Generate or improve dbt model documentation
40+
- `/generate-tests` — Auto-generate dbt tests for your models
41+
- `/sql-review` — Review SQL for correctness, performance, and best practices
42+
- `/migrate-sql` — Translate SQL between warehouse dialects
43+
44+
**Agent modes to explore:**
45+
- `analyst` — Deep-dive into data quality, lineage, and schema questions
46+
- `builder` — Generate SQL, dbt models, and data pipelines
47+
- `validator` — Validate SQL correctness and catch issues before they hit production
48+
- `migrator` — Plan and execute warehouse migrations
49+
50+
**Useful commands:**
51+
- `warehouse_list` — See all configured connections
52+
- `schema_search` — Find tables and columns across warehouses
53+
- `sql_execute` — Run queries against any connected warehouse
54+
55+
$ARGUMENTS

0 commit comments

Comments
 (0)