diff --git a/docs/docs/configure/commands.md b/docs/docs/configure/commands.md index 61f7d01091..7e1ba1fffe 100644 --- a/docs/docs/configure/commands.md +++ b/docs/docs/configure/commands.md @@ -1,5 +1,37 @@ # Commands +## Built-in Commands + +altimate-code ships with three built-in slash commands: + +| Command | Description | +|---------|-------------| +| `/init` | Create or update an AGENTS.md file with build commands and code style guidelines. | +| `/discover` | Scan your data stack and set up warehouse connections. Detects dbt projects, warehouse connections from profiles/Docker/env vars, installed tools, and config files. Walks you through adding and testing new connections, then indexes schemas. | +| `/review` | Review changes — accepts `commit`, `branch`, or `pr` as an argument (defaults to uncommitted changes). | + +### `/discover` + +The recommended way to set up a new data engineering project. Run `/discover` in the TUI and the agent will: + +1. Call `project_scan` to detect your full environment +2. Present what was found (dbt project, connections, tools, config files) +3. Offer to add each new connection discovered (from dbt profiles, Docker, environment variables) +4. Test each connection with `warehouse_test` +5. Offer to index schemas for autocomplete and context-aware analysis +6. Show available skills and agent modes + +### `/review` + +``` +/review # review uncommitted changes +/review commit # review the last commit +/review branch # review all changes on the current branch +/review pr # review the current pull request +``` + +## Custom Commands + Custom commands let you define reusable slash commands. ## Creating Commands diff --git a/docs/docs/data-engineering/tools/index.md b/docs/docs/data-engineering/tools/index.md index 7db5783abf..5635ab2065 100644 --- a/docs/docs/data-engineering/tools/index.md +++ b/docs/docs/data-engineering/tools/index.md @@ -9,6 +9,6 @@ altimate-code has 55+ specialized tools organized by function. | [FinOps Tools](finops-tools.md) | 8 tools | Cost analysis, warehouse sizing, unused resources, RBAC | | [Lineage Tools](lineage-tools.md) | 1 tool | Column-level lineage tracing with confidence scoring | | [dbt Tools](dbt-tools.md) | 2 tools + 6 skills | Run, manifest parsing, test generation, scaffolding | -| [Warehouse Tools](warehouse-tools.md) | 2 tools | Connection management and testing | +| [Warehouse Tools](warehouse-tools.md) | 6 tools | Environment scanning, connection management, discovery, testing | All tools are available in the interactive TUI. The agent automatically selects the right tools based on your request. diff --git a/docs/docs/data-engineering/tools/warehouse-tools.md b/docs/docs/data-engineering/tools/warehouse-tools.md index d5a653e303..adaa76daf7 100644 --- a/docs/docs/data-engineering/tools/warehouse-tools.md +++ b/docs/docs/data-engineering/tools/warehouse-tools.md @@ -1,5 +1,89 @@ # Warehouse Tools +## project_scan + +Scan the entire data engineering environment in one call. Detects dbt projects, warehouse connections, Docker databases, installed tools, and configuration files. Used by the `/discover` command. + +``` +> /discover + +# Environment Scan + +## Python Engine +✓ Engine healthy + +## Git Repository +✓ Git repo on branch `main` (origin: github.com/org/analytics) + +## dbt Project +✓ Project "analytics" (profile: snowflake_prod) + Models: 47, Sources: 12, Tests: 89 + ✓ packages.yml found + +## Warehouse Connections + +### Already Configured +Name | Type | Database +prod-snowflake | snowflake | ANALYTICS + +### From dbt profiles.yml +Name | Type | Source +dbt_snowflake_dev | snowflake | dbt-profile + +### From Docker +Container | Type | Host:Port +local-postgres | postgres | localhost:5432 + +### From Environment Variables +Name | Type | Signal +env_bigquery | bigquery | GOOGLE_APPLICATION_CREDENTIALS + +## Installed Data Tools +✓ dbt v1.8.4 +✓ sqlfluff v3.1.0 +✗ airflow (not found) + +## Config Files +✓ .altimate-code/altimate-code.json +✓ .sqlfluff +✗ .pre-commit-config.yaml (not found) +``` + +### What it detects + +| Category | Detection method | +|----------|-----------------| +| **Git** | `git` commands (branch, remote) | +| **dbt project** | Walks up directories for `dbt_project.yml`, reads name/profile | +| **dbt manifest** | Parses `target/manifest.json` for model/source/test counts | +| **dbt profiles** | Bridge call to parse `~/.dbt/profiles.yml` | +| **Docker DBs** | Bridge call to discover running PostgreSQL/MySQL/MSSQL containers | +| **Existing connections** | Bridge call to list already-configured warehouses | +| **Environment variables** | Scans `process.env` for warehouse signals (see table below) | +| **Schema cache** | Bridge call for indexed warehouse status | +| **Data tools** | Spawns `tool --version` for 9 common tools | +| **Config files** | Checks for `.altimate-code/`, `.sqlfluff`, `.pre-commit-config.yaml` | + +### Environment variable detection + +| Warehouse | Signal (any one triggers detection) | +|-----------|-------------------------------------| +| Snowflake | `SNOWFLAKE_ACCOUNT` | +| BigQuery | `GOOGLE_APPLICATION_CREDENTIALS`, `BIGQUERY_PROJECT`, `GCP_PROJECT` | +| Databricks | `DATABRICKS_HOST`, `DATABRICKS_SERVER_HOSTNAME` | +| PostgreSQL | `PGHOST`, `PGDATABASE`, `DATABASE_URL` | +| MySQL | `MYSQL_HOST`, `MYSQL_DATABASE` | +| Redshift | `REDSHIFT_HOST` | + +### Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `skip_docker` | boolean | Skip Docker container discovery (faster) | +| `skip_tools` | boolean | Skip installed tool detection (faster) | + +--- + ## warehouse_list List all configured warehouse connections. @@ -54,3 +138,43 @@ Testing connection to bigquery-prod (bigquery)... | `Object does not exist` | Wrong database/schema | Verify database name in config | | `Role not authorized` | Insufficient privileges | Use a role with USAGE on warehouse | | `Timeout` | Network latency | Increase connection timeout | + +--- + +## warehouse_add + +Add a new warehouse connection by providing a name and configuration. + +``` +> warehouse_add my-postgres {"type": "postgres", "host": "localhost", "port": 5432, "database": "analytics", "user": "analyst", "password": "secret"} + +✓ Added warehouse 'my-postgres' (postgres) +``` + +--- + +## warehouse_remove + +Remove an existing warehouse connection. + +``` +> warehouse_remove my-postgres + +✓ Removed warehouse 'my-postgres' +``` + +--- + +## warehouse_discover + +Discover database containers running in Docker. Detects PostgreSQL, MySQL/MariaDB, and SQL Server containers with their connection details. + +``` +> warehouse_discover + +Container | Type | Host:Port | User | Database | Status +local-postgres | postgres | localhost:5432 | postgres | postgres | running +mysql-dev | mysql | localhost:3306 | root | mydb | running + +Use warehouse_add to save any of these as a connection. +``` diff --git a/docs/docs/getting-started.md b/docs/docs/getting-started.md index 42bbe11d54..b4224e4fa6 100644 --- a/docs/docs/getting-started.md +++ b/docs/docs/getting-started.md @@ -12,10 +12,23 @@ npm install -g @altimateai/altimate-code altimate-code ``` -The TUI launches with an interactive terminal. On first run, use the `/connect` command to configure: +The TUI launches with an interactive terminal. On first run, use the `/discover` command to auto-detect your data stack: -1. **LLM provider** — Choose your AI backend (Anthropic, OpenAI, Codex, etc.) -2. **Warehouse connection** — Connect to your data warehouse +``` +/discover +``` + +`/discover` scans your environment and sets up everything automatically: + +1. **Detects your dbt project** — finds `dbt_project.yml`, parses the manifest, and reads profiles +2. **Discovers warehouse connections** — from `~/.dbt/profiles.yml`, running Docker containers, and environment variables (e.g. `SNOWFLAKE_ACCOUNT`, `PGHOST`, `DATABASE_URL`) +3. **Checks installed tools** — dbt, sqlfluff, airflow, dagster, prefect, soda, sqlmesh, great_expectations, sqlfmt +4. **Offers to configure connections** — walks you through adding and testing each discovered warehouse +5. **Indexes schemas** — populates the schema cache for autocomplete and context-aware analysis + +You can also configure connections manually — see [Warehouse connections](#warehouse-connections) below. + +To set up your LLM provider, use the `/connect` command. ## Configuration diff --git a/docs/docs/usage/tui.md b/docs/docs/usage/tui.md index 7f966639c2..17c8c88a32 100644 --- a/docs/docs/usage/tui.md +++ b/docs/docs/usage/tui.md @@ -20,7 +20,7 @@ The TUI has three main areas: |--------|--------|---------| | `@` | Reference a file | `@src/models/user.sql explain this model` | | `!` | Run a shell command | `!dbt run --select my_model` | -| `/` | Slash command | `/connect`, `/models`, `/theme` | +| `/` | Slash command | `/discover`, `/connect`, `/review`, `/models`, `/theme` | ## Leader Key diff --git a/packages/altimate-code/src/command/index.ts b/packages/altimate-code/src/command/index.ts index dce7ac8bbc..acd1498fdd 100644 --- a/packages/altimate-code/src/command/index.ts +++ b/packages/altimate-code/src/command/index.ts @@ -4,6 +4,7 @@ import { Config } from "../config/config" import { Instance } from "../project/instance" import { Identifier } from "../id/id" import PROMPT_INITIALIZE from "./template/initialize.txt" +import PROMPT_DISCOVER from "./template/discover.txt" import PROMPT_REVIEW from "./template/review.txt" import { MCP } from "../mcp" import { Skill } from "../skill" @@ -53,6 +54,7 @@ export namespace Command { export const Default = { INIT: "init", + DISCOVER: "discover", REVIEW: "review", } as const @@ -69,6 +71,15 @@ export namespace Command { }, hints: hints(PROMPT_INITIALIZE), }, + [Default.DISCOVER]: { + name: Default.DISCOVER, + description: "scan data stack and set up connections", + source: "command", + get template() { + return PROMPT_DISCOVER + }, + hints: hints(PROMPT_DISCOVER), + }, [Default.REVIEW]: { name: Default.REVIEW, description: "review changes [commit|branch|pr], defaults to uncommitted", diff --git a/packages/altimate-code/src/command/template/discover.txt b/packages/altimate-code/src/command/template/discover.txt new file mode 100644 index 0000000000..3b459c00cf --- /dev/null +++ b/packages/altimate-code/src/command/template/discover.txt @@ -0,0 +1,55 @@ +You are setting up altimate-code for a data engineering project. Guide the user through environment detection and warehouse connection setup. + +Step 1 — Scan the environment: +Call the `project_scan` tool to detect the full data engineering environment. Present the results clearly to the user. + +Step 2 — Review what was found: +Summarize the scan results in a friendly way: +- Git repository details +- dbt project (name, profile, model/source/test counts) +- Warehouse connections already configured +- New connections discovered from dbt profiles, Docker containers, and environment variables +- Schema cache status (which warehouses are indexed) +- Installed data tools (dbt, sqlfluff, etc.) +- Configuration files found + +Step 3 — Set up new connections: +For each NEW warehouse connection discovered (not already configured): +- Present the connection details and ask the user if they want to add it +- If yes, call `warehouse_add` with the detected configuration +- Then call `warehouse_test` to verify connectivity +- Report whether the connection succeeded or failed +- If it failed, offer to let the user correct the configuration + +Skip this step if there are no new connections to add. + +Step 4 — Index schemas: +If any warehouses are connected but not yet indexed in the schema cache: +- Ask the user if they want to index schemas now (explain this enables autocomplete, search, and context-aware analysis) +- If yes, call `schema_index` for each selected warehouse +- Report the number of schemas, tables, and columns indexed + +Skip this step if all connected warehouses are already indexed or if no warehouses are connected. + +Step 5 — Show next steps: +Present a summary of what was set up, then suggest what the user can do next: + +**Available skills:** +- `/cost-report` — Analyze warehouse spending and find optimization opportunities +- `/dbt-docs` — Generate or improve dbt model documentation +- `/generate-tests` — Auto-generate dbt tests for your models +- `/sql-review` — Review SQL for correctness, performance, and best practices +- `/migrate-sql` — Translate SQL between warehouse dialects + +**Agent modes to explore:** +- `analyst` — Deep-dive into data quality, lineage, and schema questions +- `builder` — Generate SQL, dbt models, and data pipelines +- `validator` — Validate SQL correctness and catch issues before they hit production +- `migrator` — Plan and execute warehouse migrations + +**Useful commands:** +- `warehouse_list` — See all configured connections +- `schema_search` — Find tables and columns across warehouses +- `sql_execute` — Run queries against any connected warehouse + +$ARGUMENTS diff --git a/packages/altimate-code/src/tool/project-scan.ts b/packages/altimate-code/src/tool/project-scan.ts new file mode 100644 index 0000000000..c99ffeae78 --- /dev/null +++ b/packages/altimate-code/src/tool/project-scan.ts @@ -0,0 +1,583 @@ +import z from "zod" +import { Tool } from "./tool" +import { Bridge } from "../bridge/client" +import { existsSync, readFileSync } from "fs" +import path from "path" + +// --- Types --- + +export interface GitInfo { + isRepo: boolean + branch?: string + remoteUrl?: string +} + +export interface DbtProjectInfo { + found: boolean + path?: string + name?: string + profile?: string + manifestPath?: string + hasPackages?: boolean +} + +export interface EnvVarConnection { + name: string + type: string + source: "env-var" + signal: string + config: Record +} + +export interface DataToolInfo { + name: string + installed: boolean + version?: string +} + +export interface ConfigFileInfo { + altimateConfig: boolean + sqlfluff: boolean + preCommit: boolean +} + +// --- Detection functions (exported for testing) --- + +export async function detectGit(): Promise { + const isRepoResult = Bun.spawnSync(["git", "rev-parse", "--is-inside-work-tree"], { + stdout: "pipe", + stderr: "pipe", + }) + if (isRepoResult.exitCode !== 0) { + return { isRepo: false } + } + + const branchResult = Bun.spawnSync(["git", "branch", "--show-current"], { + stdout: "pipe", + stderr: "pipe", + }) + const branch = branchResult.exitCode === 0 ? branchResult.stdout.toString().trim() || undefined : undefined + + let remoteUrl: string | undefined + const remoteResult = Bun.spawnSync(["git", "remote", "get-url", "origin"], { + stdout: "pipe", + stderr: "pipe", + }) + if (remoteResult.exitCode === 0) { + remoteUrl = remoteResult.stdout.toString().trim() + } + + return { isRepo: true, branch, remoteUrl } +} + +export async function detectDbtProject(startDir: string): Promise { + let dir = startDir + for (let i = 0; i < 5; i++) { + const candidate = path.join(dir, "dbt_project.yml") + if (existsSync(candidate)) { + let name: string | undefined + let profile: string | undefined + try { + const content = readFileSync(candidate, "utf-8") + const nameMatch = content.match(/^name:\s*['"]?([^\s'"]+)['"]?/m) + if (nameMatch) name = nameMatch[1] + const profileMatch = content.match(/^profile:\s*['"]?([^\s'"]+)['"]?/m) + if (profileMatch) profile = profileMatch[1] + } catch { + // ignore read errors + } + + const manifestPath = path.join(dir, "target", "manifest.json") + const hasManifest = existsSync(manifestPath) + + const hasPackages = existsSync(path.join(dir, "packages.yml")) || existsSync(path.join(dir, "dependencies.yml")) + + return { + found: true, + path: dir, + name, + profile, + manifestPath: hasManifest ? manifestPath : undefined, + hasPackages, + } + } + const parent = path.dirname(dir) + if (parent === dir) break + dir = parent + } + return { found: false } +} + +export async function detectEnvVars(): Promise { + const connections: EnvVarConnection[] = [] + + const warehouses: Array<{ + type: string + signals: string[] + configMap: Record + }> = [ + { + type: "snowflake", + signals: ["SNOWFLAKE_ACCOUNT"], + configMap: { + account: "SNOWFLAKE_ACCOUNT", + user: "SNOWFLAKE_USER", + password: "SNOWFLAKE_PASSWORD", + warehouse: "SNOWFLAKE_WAREHOUSE", + database: "SNOWFLAKE_DATABASE", + schema: "SNOWFLAKE_SCHEMA", + role: "SNOWFLAKE_ROLE", + }, + }, + { + type: "bigquery", + signals: ["GOOGLE_APPLICATION_CREDENTIALS", "BIGQUERY_PROJECT", "GCP_PROJECT"], + configMap: { + project: ["BIGQUERY_PROJECT", "GCP_PROJECT"], + credentials_path: "GOOGLE_APPLICATION_CREDENTIALS", + location: "BIGQUERY_LOCATION", + }, + }, + { + type: "databricks", + signals: ["DATABRICKS_HOST", "DATABRICKS_SERVER_HOSTNAME"], + configMap: { + server_hostname: ["DATABRICKS_HOST", "DATABRICKS_SERVER_HOSTNAME"], + http_path: "DATABRICKS_HTTP_PATH", + access_token: "DATABRICKS_TOKEN", + }, + }, + { + type: "postgres", + signals: ["PGHOST", "PGDATABASE"], + configMap: { + host: "PGHOST", + port: "PGPORT", + database: "PGDATABASE", + user: "PGUSER", + password: "PGPASSWORD", + connection_string: "DATABASE_URL", + }, + }, + { + type: "mysql", + signals: ["MYSQL_HOST", "MYSQL_DATABASE"], + configMap: { + host: "MYSQL_HOST", + port: "MYSQL_TCP_PORT", + database: "MYSQL_DATABASE", + user: "MYSQL_USER", + password: "MYSQL_PASSWORD", + }, + }, + { + type: "redshift", + signals: ["REDSHIFT_HOST"], + configMap: { + host: "REDSHIFT_HOST", + port: "REDSHIFT_PORT", + database: "REDSHIFT_DATABASE", + user: "REDSHIFT_USER", + password: "REDSHIFT_PASSWORD", + }, + }, + ] + + for (const wh of warehouses) { + const matchedSignal = wh.signals.find((s) => process.env[s]) + if (!matchedSignal) continue + + const sensitiveKeys = new Set(["password", "access_token", "connection_string", "private_key_path"]) + const config: Record = {} + for (const [key, envNames] of Object.entries(wh.configMap)) { + const names = Array.isArray(envNames) ? envNames : [envNames] + for (const envName of names) { + const val = process.env[envName] + if (val) { + config[key] = sensitiveKeys.has(key) ? "***" : val + break + } + } + } + + connections.push({ + name: `env_${wh.type}`, + type: wh.type, + source: "env-var", + signal: matchedSignal, + config, + }) + } + + // DATABASE_URL can point to any database type — parse the scheme to categorize correctly + const databaseUrl = process.env["DATABASE_URL"] + if (databaseUrl && !connections.some((c) => c.signal === "DATABASE_URL")) { + const scheme = databaseUrl.split("://")[0]?.toLowerCase() ?? "" + const schemeTypeMap: Record = { + postgresql: "postgres", + postgres: "postgres", + mysql: "mysql", + mysql2: "mysql", + redshift: "redshift", + sqlite: "sqlite", + sqlite3: "sqlite", + } + const dbType = schemeTypeMap[scheme] ?? "postgres" + // Only add if we don't already have this type detected from other env vars + if (!connections.some((c) => c.type === dbType)) { + connections.push({ + name: `env_${dbType}`, + type: dbType, + source: "env-var", + signal: "DATABASE_URL", + config: { connection_string: "***" }, + }) + } + } + + return connections +} + +export const DATA_TOOL_NAMES = [ + "dbt", + "sqlfluff", + "airflow", + "dagster", + "prefect", + "soda", + "sqlmesh", + "great_expectations", + "sqlfmt", +] as const + +/** Extract a semver-like version string from command output. */ +export function parseToolVersion(output: string): string | undefined { + const firstLine = output.trim().split("\n")[0] + const match = firstLine.match(/(\d+\.\d+[\.\d]*)/) + return match ? match[1] : undefined +} + +export async function detectDataTools(skip: boolean): Promise { + if (skip) return [] + + const results = await Promise.all( + DATA_TOOL_NAMES.map(async (tool): Promise => { + try { + const result = Bun.spawnSync([tool, "--version"], { + stdout: "pipe", + stderr: "pipe", + timeout: 5000, + }) + if (result.exitCode === 0) { + return { + name: tool, + installed: true, + version: parseToolVersion(result.stdout.toString()), + } + } + return { name: tool, installed: false } + } catch { + return { name: tool, installed: false } + } + }), + ) + + return results +} + +export async function detectConfigFiles(startDir: string): Promise { + return { + altimateConfig: existsSync(path.join(startDir, ".altimate-code", "altimate-code.json")), + sqlfluff: existsSync(path.join(startDir, ".sqlfluff")), + preCommit: existsSync(path.join(startDir, ".pre-commit-config.yaml")), + } +} + +// --- Connection deduplication --- + +interface ConnectionSource { + name: string + type: string + source: string + database?: string + host?: string + port?: number + config?: Record + signal?: string + container?: string +} + +function normalizeName(name: string): string { + return name.toLowerCase().replace(/^(dbt_|docker_|env_)/, "") +} + +function deduplicateConnections( + existing: Array<{ name: string; type: string; database?: string }>, + dbtProfiles: Array<{ name: string; type: string; config: Record }>, + dockerContainers: Array<{ name: string; db_type: string; host: string; port: number; database?: string }>, + envVars: EnvVarConnection[], +): { + alreadyConfigured: ConnectionSource[] + newFromDbt: ConnectionSource[] + newFromDocker: ConnectionSource[] + newFromEnv: ConnectionSource[] +} { + const seen = new Set() + + const alreadyConfigured: ConnectionSource[] = existing.map((c) => { + seen.add(normalizeName(c.name)) + return { name: c.name, type: c.type, source: "configured", database: c.database } + }) + + const newFromDbt: ConnectionSource[] = [] + for (const c of dbtProfiles) { + const normalized = normalizeName(c.name) + if (!seen.has(normalized)) { + seen.add(normalized) + newFromDbt.push({ name: c.name, type: c.type, source: "dbt-profile", config: c.config }) + } + } + + const newFromDocker: ConnectionSource[] = [] + for (const c of dockerContainers) { + const normalized = normalizeName(c.name) + if (!seen.has(normalized)) { + seen.add(normalized) + newFromDocker.push({ + name: c.name, + type: c.db_type, + source: "docker", + host: c.host, + port: c.port, + database: c.database, + container: c.name, + }) + } + } + + const newFromEnv: ConnectionSource[] = [] + for (const c of envVars) { + const normalized = normalizeName(c.name) + if (!seen.has(normalized)) { + seen.add(normalized) + newFromEnv.push({ name: c.name, type: c.type, source: "env-var", signal: c.signal }) + } + } + + return { alreadyConfigured, newFromDbt, newFromDocker, newFromEnv } +} + +// --- Tool definition --- + +export const ProjectScanTool = Tool.define("project_scan", { + description: + "Scan the data engineering environment to detect dbt projects, warehouse connections, Docker databases, installed tools, and configuration files. Returns a comprehensive report for project setup.", + parameters: z.object({ + skip_docker: z.boolean().optional().describe("Skip Docker container discovery (faster scan)"), + skip_tools: z.boolean().optional().describe("Skip installed tool detection (faster scan)"), + }), + async execute(args, ctx) { + const cwd = process.cwd() + + // Run local detections in parallel + const [git, dbtProject, envVars, dataTools, configFiles] = await Promise.all([ + detectGit(), + detectDbtProject(cwd), + detectEnvVars(), + detectDataTools(!!args.skip_tools), + detectConfigFiles(cwd), + ]) + + // Run bridge-dependent detections with individual error handling + const engineHealth = await Bridge.call("ping", {} as any) + .then((r) => ({ healthy: true, status: r.status })) + .catch(() => ({ healthy: false, status: undefined as string | undefined })) + + const existingConnections = await Bridge.call("warehouse.list", {}) + .then((r) => r.warehouses) + .catch(() => [] as Array<{ name: string; type: string; database?: string }>) + + const dbtProfiles = await Bridge.call("dbt.profiles", {}) + .then((r) => r.connections ?? []) + .catch(() => [] as Array<{ name: string; type: string; config: Record }>) + + const dockerContainers = args.skip_docker + ? [] + : await Bridge.call("warehouse.discover", {} as any) + .then((r) => r.containers ?? []) + .catch(() => [] as Array<{ name: string; db_type: string; host: string; port: number; database?: string }>) + + const schemaCache = await Bridge.call("schema.cache_status", {}).catch(() => null) + + const dbtManifest = dbtProject.manifestPath + ? await Bridge.call("dbt.manifest", { path: dbtProject.manifestPath }).catch(() => null) + : null + + // Deduplicate connections + const connections = deduplicateConnections(existingConnections, dbtProfiles, dockerContainers, envVars) + + // Build output + const lines: string[] = [] + + // Python Engine + lines.push("# Environment Scan") + lines.push("") + lines.push("## Python Engine") + if (engineHealth.healthy) { + lines.push(`✓ Engine healthy (${engineHealth.status})`) + } else { + lines.push("✗ Engine not available") + } + + // Git + lines.push("") + lines.push("## Git Repository") + if (git.isRepo) { + const remote = git.remoteUrl ? ` (origin: ${git.remoteUrl})` : "" + lines.push(`✓ Git repo on branch \`${git.branch ?? "unknown"}\`${remote}`) + } else { + lines.push("✗ Not a git repository") + } + + // dbt Project + lines.push("") + lines.push("## dbt Project") + if (dbtProject.found) { + lines.push(`✓ Project "${dbtProject.name ?? "unknown"}" (profile: ${dbtProject.profile ?? "not set"})`) + lines.push(` Path: ${dbtProject.path}`) + if (dbtProject.manifestPath) { + lines.push(` ✓ manifest.json found`) + if (dbtManifest) { + lines.push(` Models: ${dbtManifest.model_count}, Sources: ${dbtManifest.source_count}, Tests: ${dbtManifest.test_count}`) + } + } else { + lines.push(` ✗ No manifest.json (run dbt compile or dbt build)`) + } + if (dbtProject.hasPackages) { + lines.push(` ✓ packages.yml or dependencies.yml found`) + } + } else { + lines.push("✗ No dbt_project.yml found") + } + + // Warehouse Connections + lines.push("") + lines.push("## Warehouse Connections") + + if (connections.alreadyConfigured.length > 0) { + lines.push("") + lines.push("### Already Configured") + lines.push("Name | Type | Database") + lines.push("-----|------|--------") + for (const c of connections.alreadyConfigured) { + lines.push(`${c.name} | ${c.type} | ${c.database ?? "-"}`) + } + } + + if (connections.newFromDbt.length > 0) { + lines.push("") + lines.push("### From dbt profiles.yml") + lines.push("Name | Type | Source") + lines.push("-----|------|------") + for (const c of connections.newFromDbt) { + lines.push(`${c.name} | ${c.type} | dbt-profile`) + } + } + + if (connections.newFromDocker.length > 0) { + lines.push("") + lines.push("### From Docker") + lines.push("Container | Type | Host:Port") + lines.push("----------|------|----------") + for (const c of connections.newFromDocker) { + lines.push(`${c.container} | ${c.type} | ${c.host}:${c.port}`) + } + } + + if (connections.newFromEnv.length > 0) { + lines.push("") + lines.push("### From Environment Variables") + lines.push("Name | Type | Signal") + lines.push("-----|------|------") + for (const c of connections.newFromEnv) { + lines.push(`${c.name} | ${c.type} | ${c.signal}`) + } + } + + const totalConnections = + connections.alreadyConfigured.length + + connections.newFromDbt.length + + connections.newFromDocker.length + + connections.newFromEnv.length + if (totalConnections === 0) { + lines.push("") + lines.push("No warehouse connections found from any source.") + } + + // Schema Cache + if (schemaCache) { + lines.push("") + lines.push("## Schema Cache") + lines.push(`Tables: ${schemaCache.total_tables}, Columns: ${schemaCache.total_columns}`) + if (schemaCache.warehouses.length > 0) { + lines.push("Warehouse | Type | Tables | Columns | Last Indexed") + lines.push("----------|------|--------|---------|-------------") + for (const w of schemaCache.warehouses) { + const indexed = w.last_indexed ? new Date(w.last_indexed).toLocaleString() : "never" + lines.push(`${w.name} | ${w.type} | ${w.tables_count} | ${w.columns_count} | ${indexed}`) + } + } + } + + // Installed Data Tools + if (dataTools.length > 0) { + lines.push("") + lines.push("## Installed Data Tools") + for (const t of dataTools) { + if (t.installed) { + lines.push(`✓ ${t.name} v${t.version ?? "unknown"}`) + } else { + lines.push(`✗ ${t.name} (not found)`) + } + } + } + + // Config Files + lines.push("") + lines.push("## Config Files") + lines.push(configFiles.altimateConfig ? "✓ .altimate-code/altimate-code.json" : "✗ .altimate-code/altimate-code.json (not found)") + lines.push(configFiles.sqlfluff ? "✓ .sqlfluff" : "✗ .sqlfluff (not found)") + lines.push(configFiles.preCommit ? "✓ .pre-commit-config.yaml" : "✗ .pre-commit-config.yaml (not found)") + + // Build metadata + const toolsFound = dataTools.filter((t) => t.installed).map((t) => t.name) + + return { + title: `Scan: ${totalConnections} connection(s), ${dbtProject.found ? "dbt found" : "no dbt"}`, + metadata: { + engine_healthy: engineHealth.healthy, + git: { isRepo: git.isRepo, branch: git.branch }, + dbt: { + found: dbtProject.found, + name: dbtProject.name, + modelCount: dbtManifest?.model_count, + }, + connections: { + existing: connections.alreadyConfigured.length, + new_dbt: connections.newFromDbt.length, + new_docker: connections.newFromDocker.length, + new_env: connections.newFromEnv.length, + }, + schema_cache: schemaCache + ? { + warehouses: schemaCache.warehouses.length, + tables: schemaCache.total_tables, + columns: schemaCache.total_columns, + } + : { warehouses: 0, tables: 0, columns: 0 }, + tools_found: toolsFound, + }, + output: lines.join("\n"), + } + }, +}) diff --git a/packages/altimate-code/src/tool/registry.ts b/packages/altimate-code/src/tool/registry.ts index 3a542391ad..6f4c74b265 100644 --- a/packages/altimate-code/src/tool/registry.ts +++ b/packages/altimate-code/src/tool/registry.ts @@ -98,6 +98,7 @@ import { SqlGuardFingerprintTool } from "./sqlguard-fingerprint" import { SqlGuardIntrospectionSqlTool } from "./sqlguard-introspection-sql" import { SqlGuardParseDbtTool } from "./sqlguard-parse-dbt" import { SqlGuardIsSafeTool } from "./sqlguard-is-safe" +import { ProjectScanTool } from "./project-scan" import { Glob } from "../util/glob" export namespace ToolRegistry { @@ -262,6 +263,7 @@ export namespace ToolRegistry { SqlGuardIntrospectionSqlTool, SqlGuardParseDbtTool, SqlGuardIsSafeTool, + ProjectScanTool, ...custom, ] } diff --git a/packages/altimate-code/test/tool/project-scan.test.ts b/packages/altimate-code/test/tool/project-scan.test.ts new file mode 100644 index 0000000000..d1483438ab --- /dev/null +++ b/packages/altimate-code/test/tool/project-scan.test.ts @@ -0,0 +1,831 @@ +import { describe, expect, test, beforeEach, afterEach } from "bun:test" +import path from "path" +import os from "os" +import fsp from "fs/promises" + +import { + detectGit, + detectDbtProject, + detectEnvVars, + detectDataTools, + detectConfigFiles, + parseToolVersion, + DATA_TOOL_NAMES, + type GitInfo, + type DbtProjectInfo, + type EnvVarConnection, + type DataToolInfo, + type ConfigFileInfo, +} from "../../src/tool/project-scan" + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const tmpRoot = path.join( + os.tmpdir(), + "project-scan-test-" + process.pid + "-" + Math.random().toString(36).slice(2), +) + +let tmpCounter = 0 +function nextTmpDir(): string { + return path.join(tmpRoot, String(++tmpCounter)) +} + +async function createFile(filePath: string, content = "") { + await fsp.mkdir(path.dirname(filePath), { recursive: true }) + await fsp.writeFile(filePath, content) +} + +// --------------------------------------------------------------------------- +// detectGit +// --------------------------------------------------------------------------- + +describe("detectGit", () => { + test("detects a git repository in the current repo", async () => { + const result = await detectGit() + expect(result.isRepo).toBe(true) + }) + + test("branch is a non-empty string or undefined (detached HEAD)", async () => { + const result = await detectGit() + // In CI, GitHub Actions checks out in detached HEAD → branch is undefined + // Locally, branch is a non-empty string + if (result.branch !== undefined) { + expect(typeof result.branch).toBe("string") + expect(result.branch.length).toBeGreaterThan(0) + } + }) + + test("returns a remote URL when origin exists", async () => { + const result = await detectGit() + // The altimate-code repo should have an origin remote + expect(result.remoteUrl).toBeDefined() + expect(result.remoteUrl!.length).toBeGreaterThan(0) + }) + + test("returns isRepo true for an initialized git directory", async () => { + const dir = nextTmpDir() + await fsp.mkdir(dir, { recursive: true }) + + // Initialize a fresh git repo + Bun.spawnSync(["git", "init", dir], { stdout: "pipe", stderr: "pipe" }) + + // Save and change cwd so detectGit runs in the temp dir + const originalCwd = process.cwd() + process.chdir(dir) + try { + const result = await detectGit() + expect(result.isRepo).toBe(true) + } finally { + process.chdir(originalCwd) + } + }) + + test("returns no remote for a fresh git repo with no origin", async () => { + const dir = nextTmpDir() + await fsp.mkdir(dir, { recursive: true }) + Bun.spawnSync(["git", "init", dir], { stdout: "pipe", stderr: "pipe" }) + + const originalCwd = process.cwd() + process.chdir(dir) + try { + const result = await detectGit() + expect(result.isRepo).toBe(true) + expect(result.remoteUrl).toBeUndefined() + } finally { + process.chdir(originalCwd) + } + }) + + test("returns isRepo false for a non-git directory", async () => { + const dir = nextTmpDir() + await fsp.mkdir(dir, { recursive: true }) + + const originalCwd = process.cwd() + process.chdir(dir) + try { + const result = await detectGit() + expect(result.isRepo).toBe(false) + expect(result.branch).toBeUndefined() + expect(result.remoteUrl).toBeUndefined() + } finally { + process.chdir(originalCwd) + } + }) + + afterEach(async () => { + await fsp.rm(tmpRoot, { recursive: true, force: true }).catch(() => {}) + }) +}) + +// --------------------------------------------------------------------------- +// detectDbtProject +// --------------------------------------------------------------------------- + +describe("detectDbtProject", () => { + afterEach(async () => { + await fsp.rm(tmpRoot, { recursive: true, force: true }).catch(() => {}) + }) + + test("finds dbt_project.yml in the current directory", async () => { + const dir = nextTmpDir() + await createFile( + path.join(dir, "dbt_project.yml"), + "name: my_project\nprofile: my_profile\n", + ) + + const result = await detectDbtProject(dir) + expect(result.found).toBe(true) + expect(result.path).toBe(dir) + expect(result.name).toBe("my_project") + expect(result.profile).toBe("my_profile") + }) + + test("finds dbt_project.yml in a parent directory", async () => { + const rootDir = nextTmpDir() + const childDir = path.join(rootDir, "models", "staging") + await fsp.mkdir(childDir, { recursive: true }) + await createFile( + path.join(rootDir, "dbt_project.yml"), + "name: parent_proj\nprofile: parent_prof\n", + ) + + const result = await detectDbtProject(childDir) + expect(result.found).toBe(true) + expect(result.path).toBe(rootDir) + expect(result.name).toBe("parent_proj") + }) + + test("finds dbt_project.yml in a grandparent directory", async () => { + const rootDir = nextTmpDir() + const deepDir = path.join(rootDir, "a", "b", "c") + await fsp.mkdir(deepDir, { recursive: true }) + await createFile( + path.join(rootDir, "dbt_project.yml"), + "name: deep_proj\nprofile: deep_prof\n", + ) + + const result = await detectDbtProject(deepDir) + expect(result.found).toBe(true) + expect(result.path).toBe(rootDir) + expect(result.name).toBe("deep_proj") + }) + + test("does not search beyond 5 levels", async () => { + const rootDir = nextTmpDir() + // Create directory 6 levels deep + const deepDir = path.join(rootDir, "a", "b", "c", "d", "e", "f") + await fsp.mkdir(deepDir, { recursive: true }) + await createFile( + path.join(rootDir, "dbt_project.yml"), + "name: too_far\nprofile: too_far_prof\n", + ) + + const result = await detectDbtProject(deepDir) + expect(result.found).toBe(false) + }) + + test("returns found false when no dbt_project.yml exists", async () => { + const dir = nextTmpDir() + await fsp.mkdir(dir, { recursive: true }) + + const result = await detectDbtProject(dir) + expect(result.found).toBe(false) + expect(result.path).toBeUndefined() + expect(result.name).toBeUndefined() + expect(result.profile).toBeUndefined() + }) + + test("extracts name and profile from dbt_project.yml", async () => { + const dir = nextTmpDir() + await createFile( + path.join(dir, "dbt_project.yml"), + "name: 'analytics'\nversion: '1.0'\nprofile: 'warehouse_prod'\n", + ) + + const result = await detectDbtProject(dir) + expect(result.found).toBe(true) + expect(result.name).toBe("analytics") + expect(result.profile).toBe("warehouse_prod") + }) + + test("handles quoted name and profile values", async () => { + const dir = nextTmpDir() + await createFile( + path.join(dir, "dbt_project.yml"), + 'name: "quoted_name"\nprofile: "quoted_profile"\n', + ) + + const result = await detectDbtProject(dir) + expect(result.name).toBe("quoted_name") + expect(result.profile).toBe("quoted_profile") + }) + + test("detects manifest.json in target directory", async () => { + const dir = nextTmpDir() + await createFile(path.join(dir, "dbt_project.yml"), "name: proj\n") + await createFile(path.join(dir, "target", "manifest.json"), "{}") + + const result = await detectDbtProject(dir) + expect(result.found).toBe(true) + expect(result.manifestPath).toBe(path.join(dir, "target", "manifest.json")) + }) + + test("manifestPath is undefined when no manifest exists", async () => { + const dir = nextTmpDir() + await createFile(path.join(dir, "dbt_project.yml"), "name: proj\n") + + const result = await detectDbtProject(dir) + expect(result.found).toBe(true) + expect(result.manifestPath).toBeUndefined() + }) + + test("detects packages.yml", async () => { + const dir = nextTmpDir() + await createFile(path.join(dir, "dbt_project.yml"), "name: proj\n") + await createFile(path.join(dir, "packages.yml"), "packages:\n - package: dbt-labs/dbt_utils\n") + + const result = await detectDbtProject(dir) + expect(result.found).toBe(true) + expect(result.hasPackages).toBe(true) + }) + + test("detects dependencies.yml as packages", async () => { + const dir = nextTmpDir() + await createFile(path.join(dir, "dbt_project.yml"), "name: proj\n") + await createFile(path.join(dir, "dependencies.yml"), "packages:\n - package: foo\n") + + const result = await detectDbtProject(dir) + expect(result.hasPackages).toBe(true) + }) + + test("hasPackages is false when neither packages.yml nor dependencies.yml exist", async () => { + const dir = nextTmpDir() + await createFile(path.join(dir, "dbt_project.yml"), "name: proj\n") + + const result = await detectDbtProject(dir) + expect(result.hasPackages).toBe(false) + }) + + test("handles malformed dbt_project.yml with no name or profile", async () => { + const dir = nextTmpDir() + await createFile( + path.join(dir, "dbt_project.yml"), + "version: 1.0\nconfig-version: 2\n", + ) + + const result = await detectDbtProject(dir) + expect(result.found).toBe(true) + expect(result.name).toBeUndefined() + expect(result.profile).toBeUndefined() + }) +}) + +// --------------------------------------------------------------------------- +// detectEnvVars +// --------------------------------------------------------------------------- + +describe("detectEnvVars", () => { + let savedEnv: NodeJS.ProcessEnv + + beforeEach(() => { + savedEnv = { ...process.env } + }) + + afterEach(() => { + process.env = savedEnv + }) + + // Helper to clear all warehouse-related env vars + function clearWarehouseEnvVars() { + const vars = [ + "SNOWFLAKE_ACCOUNT", "SNOWFLAKE_USER", "SNOWFLAKE_PASSWORD", + "SNOWFLAKE_WAREHOUSE", "SNOWFLAKE_DATABASE", "SNOWFLAKE_SCHEMA", "SNOWFLAKE_ROLE", + "GOOGLE_APPLICATION_CREDENTIALS", "BIGQUERY_PROJECT", "GCP_PROJECT", "BIGQUERY_LOCATION", + "DATABRICKS_HOST", "DATABRICKS_SERVER_HOSTNAME", "DATABRICKS_HTTP_PATH", "DATABRICKS_TOKEN", + "PGHOST", "PGPORT", "PGDATABASE", "PGUSER", "PGPASSWORD", "DATABASE_URL", + "MYSQL_HOST", "MYSQL_TCP_PORT", "MYSQL_DATABASE", "MYSQL_USER", "MYSQL_PASSWORD", + "REDSHIFT_HOST", "REDSHIFT_PORT", "REDSHIFT_DATABASE", "REDSHIFT_USER", "REDSHIFT_PASSWORD", + ] + for (const v of vars) { + delete process.env[v] + } + } + + test("returns empty array when no env vars are set", async () => { + clearWarehouseEnvVars() + const result = await detectEnvVars() + expect(result).toEqual([]) + }) + + test("detects Snowflake via SNOWFLAKE_ACCOUNT", async () => { + clearWarehouseEnvVars() + process.env.SNOWFLAKE_ACCOUNT = "my_account" + process.env.SNOWFLAKE_USER = "admin" + process.env.SNOWFLAKE_DATABASE = "analytics" + + const result = await detectEnvVars() + const sf = result.find((r) => r.type === "snowflake") + expect(sf).toBeDefined() + expect(sf!.name).toBe("env_snowflake") + expect(sf!.source).toBe("env-var") + expect(sf!.signal).toBe("SNOWFLAKE_ACCOUNT") + expect(sf!.config.account).toBe("my_account") + expect(sf!.config.user).toBe("admin") + expect(sf!.config.database).toBe("analytics") + }) + + test("detects BigQuery via GOOGLE_APPLICATION_CREDENTIALS", async () => { + clearWarehouseEnvVars() + process.env.GOOGLE_APPLICATION_CREDENTIALS = "/path/to/creds.json" + + const result = await detectEnvVars() + const bq = result.find((r) => r.type === "bigquery") + expect(bq).toBeDefined() + expect(bq!.signal).toBe("GOOGLE_APPLICATION_CREDENTIALS") + expect(bq!.config.credentials_path).toBe("/path/to/creds.json") + }) + + test("detects BigQuery via BIGQUERY_PROJECT", async () => { + clearWarehouseEnvVars() + process.env.BIGQUERY_PROJECT = "my-gcp-project" + + const result = await detectEnvVars() + const bq = result.find((r) => r.type === "bigquery") + expect(bq).toBeDefined() + expect(bq!.signal).toBe("BIGQUERY_PROJECT") + expect(bq!.config.project).toBe("my-gcp-project") + }) + + test("detects BigQuery via GCP_PROJECT when BIGQUERY_PROJECT is not set", async () => { + clearWarehouseEnvVars() + process.env.GCP_PROJECT = "fallback-project" + + const result = await detectEnvVars() + const bq = result.find((r) => r.type === "bigquery") + expect(bq).toBeDefined() + expect(bq!.signal).toBe("GCP_PROJECT") + expect(bq!.config.project).toBe("fallback-project") + }) + + test("detects Databricks via DATABRICKS_HOST", async () => { + clearWarehouseEnvVars() + process.env.DATABRICKS_HOST = "adb-1234.cloud.databricks.com" + process.env.DATABRICKS_TOKEN = "dapi123456" + process.env.DATABRICKS_HTTP_PATH = "/sql/1.0/warehouses/abc" + + const result = await detectEnvVars() + const db = result.find((r) => r.type === "databricks") + expect(db).toBeDefined() + expect(db!.signal).toBe("DATABRICKS_HOST") + expect(db!.config.server_hostname).toBe("adb-1234.cloud.databricks.com") + expect(db!.config.access_token).toBe("***") + expect(db!.config.http_path).toBe("/sql/1.0/warehouses/abc") + }) + + test("prefers DATABRICKS_HOST over DATABRICKS_SERVER_HOSTNAME for signal", async () => { + clearWarehouseEnvVars() + process.env.DATABRICKS_HOST = "primary.databricks.com" + process.env.DATABRICKS_SERVER_HOSTNAME = "secondary.databricks.com" + + const result = await detectEnvVars() + const db = result.find((r) => r.type === "databricks") + expect(db).toBeDefined() + expect(db!.signal).toBe("DATABRICKS_HOST") + // server_hostname configMap entry uses [DATABRICKS_HOST, DATABRICKS_SERVER_HOSTNAME], + // so it should prefer DATABRICKS_HOST + expect(db!.config.server_hostname).toBe("primary.databricks.com") + }) + + test("detects Databricks via DATABRICKS_SERVER_HOSTNAME when DATABRICKS_HOST is absent", async () => { + clearWarehouseEnvVars() + process.env.DATABRICKS_SERVER_HOSTNAME = "alt.databricks.com" + + const result = await detectEnvVars() + const db = result.find((r) => r.type === "databricks") + expect(db).toBeDefined() + expect(db!.signal).toBe("DATABRICKS_SERVER_HOSTNAME") + expect(db!.config.server_hostname).toBe("alt.databricks.com") + }) + + test("detects Postgres via PGHOST", async () => { + clearWarehouseEnvVars() + process.env.PGHOST = "localhost" + process.env.PGPORT = "5432" + process.env.PGDATABASE = "mydb" + process.env.PGUSER = "pgadmin" + + const result = await detectEnvVars() + const pg = result.find((r) => r.type === "postgres") + expect(pg).toBeDefined() + expect(pg!.signal).toBe("PGHOST") + expect(pg!.config.host).toBe("localhost") + expect(pg!.config.port).toBe("5432") + expect(pg!.config.database).toBe("mydb") + expect(pg!.config.user).toBe("pgadmin") + }) + + test("detects Postgres via DATABASE_URL scheme", async () => { + clearWarehouseEnvVars() + process.env.DATABASE_URL = "postgresql://user:pass@host:5432/db" + + const result = await detectEnvVars() + const pg = result.find((r) => r.type === "postgres") + expect(pg).toBeDefined() + expect(pg!.signal).toBe("DATABASE_URL") + expect(pg!.config.connection_string).toBe("***") + }) + + test("detects MySQL via DATABASE_URL with mysql scheme", async () => { + clearWarehouseEnvVars() + process.env.DATABASE_URL = "mysql://user:pass@host:3306/db" + + const result = await detectEnvVars() + const my = result.find((r) => r.type === "mysql") + expect(my).toBeDefined() + expect(my!.signal).toBe("DATABASE_URL") + }) + + test("DATABASE_URL does not duplicate when type already detected", async () => { + clearWarehouseEnvVars() + process.env.PGHOST = "localhost" + process.env.DATABASE_URL = "postgresql://user:pass@host:5432/db" + + const result = await detectEnvVars() + const pgConns = result.filter((r) => r.type === "postgres") + expect(pgConns.length).toBe(1) + expect(pgConns[0].signal).toBe("PGHOST") + }) + + test("detects MySQL via MYSQL_HOST", async () => { + clearWarehouseEnvVars() + process.env.MYSQL_HOST = "mysql.example.com" + process.env.MYSQL_DATABASE = "shop" + + const result = await detectEnvVars() + const my = result.find((r) => r.type === "mysql") + expect(my).toBeDefined() + expect(my!.signal).toBe("MYSQL_HOST") + expect(my!.config.host).toBe("mysql.example.com") + expect(my!.config.database).toBe("shop") + }) + + test("detects MySQL via MYSQL_DATABASE alone", async () => { + clearWarehouseEnvVars() + process.env.MYSQL_DATABASE = "testdb" + + const result = await detectEnvVars() + const my = result.find((r) => r.type === "mysql") + expect(my).toBeDefined() + expect(my!.signal).toBe("MYSQL_DATABASE") + expect(my!.config.database).toBe("testdb") + }) + + test("detects Redshift via REDSHIFT_HOST", async () => { + clearWarehouseEnvVars() + process.env.REDSHIFT_HOST = "redshift-cluster.abc.us-east-1.redshift.amazonaws.com" + process.env.REDSHIFT_DATABASE = "warehouse" + process.env.REDSHIFT_USER = "admin" + + const result = await detectEnvVars() + const rs = result.find((r) => r.type === "redshift") + expect(rs).toBeDefined() + expect(rs!.signal).toBe("REDSHIFT_HOST") + expect(rs!.config.host).toBe("redshift-cluster.abc.us-east-1.redshift.amazonaws.com") + expect(rs!.config.database).toBe("warehouse") + expect(rs!.config.user).toBe("admin") + }) + + test("detects multiple warehouses simultaneously", async () => { + clearWarehouseEnvVars() + process.env.SNOWFLAKE_ACCOUNT = "sf_account" + process.env.PGHOST = "pg_host" + process.env.MYSQL_HOST = "my_host" + + const result = await detectEnvVars() + const types = result.map((r) => r.type) + expect(types).toContain("snowflake") + expect(types).toContain("postgres") + expect(types).toContain("mysql") + expect(result.length).toBe(3) + }) + + test("config only includes keys with actual values", async () => { + clearWarehouseEnvVars() + process.env.SNOWFLAKE_ACCOUNT = "my_account" + // Do NOT set SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, etc. + + const result = await detectEnvVars() + const sf = result.find((r) => r.type === "snowflake") + expect(sf).toBeDefined() + expect(sf!.config.account).toBe("my_account") + // Keys without env var values should not be present + expect(sf!.config.user).toBeUndefined() + expect(sf!.config.password).toBeUndefined() + expect(sf!.config.warehouse).toBeUndefined() + }) + + test("all connections have source set to env-var", async () => { + clearWarehouseEnvVars() + process.env.SNOWFLAKE_ACCOUNT = "acct" + process.env.PGHOST = "host" + + const result = await detectEnvVars() + for (const conn of result) { + expect(conn.source).toBe("env-var") + } + }) + + test("connection names follow env_ prefix convention", async () => { + clearWarehouseEnvVars() + process.env.SNOWFLAKE_ACCOUNT = "acct" + process.env.DATABRICKS_HOST = "host" + process.env.REDSHIFT_HOST = "host" + + const result = await detectEnvVars() + for (const conn of result) { + expect(conn.name).toMatch(/^env_/) + expect(conn.name).toBe(`env_${conn.type}`) + } + }) +}) + +// --------------------------------------------------------------------------- +// parseToolVersion +// --------------------------------------------------------------------------- + +describe("parseToolVersion", () => { + test("parses standard semver (dbt core output)", () => { + // dbt --version outputs "installed: 1.8.4" on first line in newer versions + expect(parseToolVersion("dbt Core - 1.8.4")).toBe("1.8.4") + }) + + test("returns undefined when version is not on first line", () => { + // parseToolVersion only reads the first line + expect(parseToolVersion("Core:\n - installed: 1.8.4")).toBeUndefined() + }) + + test("parses simple version (sqlfluff)", () => { + expect(parseToolVersion("sqlfluff, version 3.1.0")).toBe("3.1.0") + }) + + test("parses version with prefix text (airflow)", () => { + expect(parseToolVersion("apache-airflow==2.9.3")).toBe("2.9.3") + }) + + test("parses version at start of line", () => { + expect(parseToolVersion("1.2.3")).toBe("1.2.3") + }) + + test("parses two-part version", () => { + expect(parseToolVersion("dagster, version 1.7")).toBe("1.7") + }) + + test("parses four-part version", () => { + expect(parseToolVersion("tool 1.2.3.4")).toBe("1.2.3.4") + }) + + test("takes first line only", () => { + expect(parseToolVersion("0.19.2\nsome other output")).toBe("0.19.2") + }) + + test("returns undefined for non-version output", () => { + expect(parseToolVersion("no version here")).toBeUndefined() + }) + + test("returns undefined for empty string", () => { + expect(parseToolVersion("")).toBeUndefined() + }) + + test("returns undefined for whitespace only", () => { + expect(parseToolVersion(" \n ")).toBeUndefined() + }) + + test("handles version embedded in path-like string", () => { + expect(parseToolVersion("/usr/local/lib/python3.11/site-packages (1.0.3)")).toBe("3.11") + }) + + test("parses great_expectations version output", () => { + expect(parseToolVersion("great_expectations, version 1.0.3")).toBe("1.0.3") + }) + + test("parses sqlfmt version output", () => { + expect(parseToolVersion("sqlfmt 0.19.2")).toBe("0.19.2") + }) +}) + +// --------------------------------------------------------------------------- +// DATA_TOOL_NAMES +// --------------------------------------------------------------------------- + +describe("DATA_TOOL_NAMES", () => { + test("contains all expected tools", () => { + expect(DATA_TOOL_NAMES).toContain("dbt") + expect(DATA_TOOL_NAMES).toContain("sqlfluff") + expect(DATA_TOOL_NAMES).toContain("airflow") + expect(DATA_TOOL_NAMES).toContain("dagster") + expect(DATA_TOOL_NAMES).toContain("prefect") + expect(DATA_TOOL_NAMES).toContain("soda") + expect(DATA_TOOL_NAMES).toContain("sqlmesh") + expect(DATA_TOOL_NAMES).toContain("great_expectations") + expect(DATA_TOOL_NAMES).toContain("sqlfmt") + }) + + test("has exactly 9 tools", () => { + expect(DATA_TOOL_NAMES.length).toBe(9) + }) + + test("contains no duplicates", () => { + const unique = new Set(DATA_TOOL_NAMES) + expect(unique.size).toBe(DATA_TOOL_NAMES.length) + }) +}) + +// --------------------------------------------------------------------------- +// detectDataTools +// --------------------------------------------------------------------------- + +describe("detectDataTools", () => { + test("returns empty array when skip is true", async () => { + const result = await detectDataTools(true) + expect(result).toEqual([]) + }) + + test("skip=true returns empty regardless of environment", async () => { + const result1 = await detectDataTools(true) + const result2 = await detectDataTools(true) + expect(result1).toEqual([]) + expect(result2).toEqual([]) + }) + + test("skip=false returns one entry per tool", async () => { + const result = await detectDataTools(false) + expect(result.length).toBe(DATA_TOOL_NAMES.length) + const names = result.map((t) => t.name) + for (const toolName of DATA_TOOL_NAMES) { + expect(names).toContain(toolName) + } + }) + + test("each entry has correct shape", async () => { + const result = await detectDataTools(false) + for (const tool of result) { + expect(typeof tool.name).toBe("string") + expect(typeof tool.installed).toBe("boolean") + if (tool.installed) { + expect(tool.version === undefined || typeof tool.version === "string").toBe(true) + } + } + }) + + test("marks missing tools as not installed", async () => { + const result = await detectDataTools(false) + // At least some tools should be not-installed on a typical dev machine + const notInstalled = result.filter((t) => !t.installed) + expect(notInstalled.length).toBeGreaterThan(0) + for (const tool of notInstalled) { + expect(tool.installed).toBe(false) + } + }) + + test("installed tools have a parseable version", async () => { + const result = await detectDataTools(false) + const installed = result.filter((t) => t.installed) + for (const tool of installed) { + // version should be a string matching semver-like pattern + if (tool.version) { + expect(tool.version).toMatch(/^\d+\.\d+/) + } + } + }) + + test("handles ENOENT gracefully for missing executables", async () => { + // This should not throw — ENOENT is caught internally + const result = await detectDataTools(false) + expect(Array.isArray(result)).toBe(true) + }) +}) + +// --------------------------------------------------------------------------- +// detectConfigFiles +// --------------------------------------------------------------------------- + +describe("detectConfigFiles", () => { + afterEach(async () => { + await fsp.rm(tmpRoot, { recursive: true, force: true }).catch(() => {}) + }) + + test("returns all false when no config files exist", async () => { + const dir = nextTmpDir() + await fsp.mkdir(dir, { recursive: true }) + + const result = await detectConfigFiles(dir) + expect(result.altimateConfig).toBe(false) + expect(result.sqlfluff).toBe(false) + expect(result.preCommit).toBe(false) + }) + + test("detects .altimate-code/altimate-code.json", async () => { + const dir = nextTmpDir() + await createFile(path.join(dir, ".altimate-code", "altimate-code.json"), "{}") + + const result = await detectConfigFiles(dir) + expect(result.altimateConfig).toBe(true) + expect(result.sqlfluff).toBe(false) + expect(result.preCommit).toBe(false) + }) + + test("detects .sqlfluff", async () => { + const dir = nextTmpDir() + await createFile(path.join(dir, ".sqlfluff"), "[sqlfluff]\n") + + const result = await detectConfigFiles(dir) + expect(result.altimateConfig).toBe(false) + expect(result.sqlfluff).toBe(true) + expect(result.preCommit).toBe(false) + }) + + test("detects .pre-commit-config.yaml", async () => { + const dir = nextTmpDir() + await createFile(path.join(dir, ".pre-commit-config.yaml"), "repos:\n") + + const result = await detectConfigFiles(dir) + expect(result.altimateConfig).toBe(false) + expect(result.sqlfluff).toBe(false) + expect(result.preCommit).toBe(true) + }) + + test("detects all config files when present", async () => { + const dir = nextTmpDir() + await createFile(path.join(dir, ".altimate-code", "altimate-code.json"), "{}") + await createFile(path.join(dir, ".sqlfluff"), "") + await createFile(path.join(dir, ".pre-commit-config.yaml"), "") + + const result = await detectConfigFiles(dir) + expect(result.altimateConfig).toBe(true) + expect(result.sqlfluff).toBe(true) + expect(result.preCommit).toBe(true) + }) + + test("returns correct type shape", async () => { + const dir = nextTmpDir() + await fsp.mkdir(dir, { recursive: true }) + + const result = await detectConfigFiles(dir) + expect(typeof result.altimateConfig).toBe("boolean") + expect(typeof result.sqlfluff).toBe("boolean") + expect(typeof result.preCommit).toBe("boolean") + // No extra keys + const keys = Object.keys(result) + expect(keys).toEqual(["altimateConfig", "sqlfluff", "preCommit"]) + }) +}) + +// --------------------------------------------------------------------------- +// Integration-style type/shape tests +// --------------------------------------------------------------------------- + +describe("return type contracts", () => { + test("detectGit returns GitInfo shape", async () => { + const result: GitInfo = await detectGit() + expect(typeof result.isRepo).toBe("boolean") + if (result.isRepo) { + expect(result.branch === undefined || typeof result.branch === "string").toBe(true) + expect(result.remoteUrl === undefined || typeof result.remoteUrl === "string").toBe(true) + } + }) + + test("detectDbtProject returns DbtProjectInfo shape", async () => { + const dir = os.tmpdir() + const result: DbtProjectInfo = await detectDbtProject(dir) + expect(typeof result.found).toBe("boolean") + if (result.found) { + expect(typeof result.path).toBe("string") + } + }) + + test("detectEnvVars returns EnvVarConnection[] shape", async () => { + const result: EnvVarConnection[] = await detectEnvVars() + expect(Array.isArray(result)).toBe(true) + for (const conn of result) { + expect(typeof conn.name).toBe("string") + expect(typeof conn.type).toBe("string") + expect(conn.source).toBe("env-var") + expect(typeof conn.signal).toBe("string") + expect(typeof conn.config).toBe("object") + } + }) + + test("detectDataTools returns DataToolInfo[] shape", async () => { + const result: DataToolInfo[] = await detectDataTools(true) + expect(Array.isArray(result)).toBe(true) + }) + + test("detectConfigFiles returns ConfigFileInfo shape", async () => { + const result: ConfigFileInfo = await detectConfigFiles(os.tmpdir()) + expect(typeof result.altimateConfig).toBe("boolean") + expect(typeof result.sqlfluff).toBe("boolean") + expect(typeof result.preCommit).toBe("boolean") + }) +}) diff --git a/packages/altimate-engine/tests/test_env_detect.py b/packages/altimate-engine/tests/test_env_detect.py new file mode 100644 index 0000000000..ba018475cb --- /dev/null +++ b/packages/altimate-engine/tests/test_env_detect.py @@ -0,0 +1,371 @@ +"""Tests for environment variable based warehouse detection. + +These tests validate the env-var-to-warehouse mapping logic used by the +project_scan tool. The canonical implementation is in TypeScript +(src/tool/project-scan.ts), but these tests document the expected behavior +and can validate a Python-side implementation if one is added later. +""" + +from __future__ import annotations + +import pytest + + +# --- Reference implementation (mirrors TypeScript detectEnvVars) --- + +ENV_VAR_SIGNALS: dict[str, dict] = { + "snowflake": { + "signals": ["SNOWFLAKE_ACCOUNT"], + "config_map": { + "account": "SNOWFLAKE_ACCOUNT", + "user": "SNOWFLAKE_USER", + "password": "SNOWFLAKE_PASSWORD", + "warehouse": "SNOWFLAKE_WAREHOUSE", + "database": "SNOWFLAKE_DATABASE", + "schema": "SNOWFLAKE_SCHEMA", + "role": "SNOWFLAKE_ROLE", + }, + }, + "bigquery": { + "signals": ["GOOGLE_APPLICATION_CREDENTIALS", "BIGQUERY_PROJECT", "GCP_PROJECT"], + "config_map": { + "project": ["BIGQUERY_PROJECT", "GCP_PROJECT"], + "credentials_path": "GOOGLE_APPLICATION_CREDENTIALS", + "location": "BIGQUERY_LOCATION", + }, + }, + "databricks": { + "signals": ["DATABRICKS_HOST", "DATABRICKS_SERVER_HOSTNAME"], + "config_map": { + "server_hostname": ["DATABRICKS_HOST", "DATABRICKS_SERVER_HOSTNAME"], + "http_path": "DATABRICKS_HTTP_PATH", + "access_token": "DATABRICKS_TOKEN", + }, + }, + "postgres": { + "signals": ["PGHOST", "PGDATABASE"], + "config_map": { + "host": "PGHOST", + "port": "PGPORT", + "database": "PGDATABASE", + "user": "PGUSER", + "password": "PGPASSWORD", + "connection_string": "DATABASE_URL", + }, + }, + "mysql": { + "signals": ["MYSQL_HOST", "MYSQL_DATABASE"], + "config_map": { + "host": "MYSQL_HOST", + "port": "MYSQL_TCP_PORT", + "database": "MYSQL_DATABASE", + "user": "MYSQL_USER", + "password": "MYSQL_PASSWORD", + }, + }, + "redshift": { + "signals": ["REDSHIFT_HOST"], + "config_map": { + "host": "REDSHIFT_HOST", + "port": "REDSHIFT_PORT", + "database": "REDSHIFT_DATABASE", + "user": "REDSHIFT_USER", + "password": "REDSHIFT_PASSWORD", + }, + }, +} + + +SENSITIVE_KEYS = {"password", "access_token", "connection_string", "private_key_path"} + +DATABASE_URL_SCHEME_MAP: dict[str, str] = { + "postgresql": "postgres", + "postgres": "postgres", + "mysql": "mysql", + "mysql2": "mysql", + "redshift": "redshift", + "sqlite": "sqlite", + "sqlite3": "sqlite", +} + + +def detect_env_connections(env: dict[str, str] | None = None) -> list[dict]: + """Detect warehouse connections from environment variables. + + Mirrors the TypeScript detectEnvVars implementation. Sensitive values + (password, access_token, connection_string) are redacted with "***". + + Args: + env: Environment dict to scan. Defaults to os.environ. + + Returns: + List of detected connection dicts with keys: name, type, source, signal, config + """ + if env is None: + env = dict(os.environ) + + results: list[dict] = [] + + for wh_type, spec in ENV_VAR_SIGNALS.items(): + # Check if any signal env var is present + triggered_signal = None + for signal_var in spec["signals"]: + if signal_var in env and env[signal_var]: + triggered_signal = signal_var + break + + if triggered_signal is None: + continue + + # Build config from env vars, redacting sensitive fields + config: dict[str, str] = {} + for config_key, env_key in spec["config_map"].items(): + if isinstance(env_key, list): + # First match wins + for key in env_key: + if key in env and env[key]: + config[config_key] = "***" if config_key in SENSITIVE_KEYS else env[key] + break + else: + if env_key in env and env[env_key]: + config[config_key] = "***" if config_key in SENSITIVE_KEYS else env[env_key] + + results.append({ + "name": f"env_{wh_type}", + "type": wh_type, + "source": "env-var", + "signal": triggered_signal, + "config": config, + }) + + # DATABASE_URL scheme-based detection + database_url = env.get("DATABASE_URL", "") + if database_url and not any(r.get("signal") == "DATABASE_URL" for r in results): + scheme = database_url.split("://")[0].lower() if "://" in database_url else "" + db_type = DATABASE_URL_SCHEME_MAP.get(scheme, "postgres") + # Only add if this type wasn't already detected from other env vars + if not any(r["type"] == db_type for r in results): + results.append({ + "name": f"env_{db_type}", + "type": db_type, + "source": "env-var", + "signal": "DATABASE_URL", + "config": {"connection_string": "***"}, + }) + + return results + + +# --- Tests --- + + +class TestSnowflakeDetection: + def test_detected_with_account(self): + env = {"SNOWFLAKE_ACCOUNT": "myorg.us-east-1", "SNOWFLAKE_USER": "admin"} + result = detect_env_connections(env) + assert len(result) == 1 + assert result[0]["type"] == "snowflake" + assert result[0]["signal"] == "SNOWFLAKE_ACCOUNT" + assert result[0]["config"]["account"] == "myorg.us-east-1" + assert result[0]["config"]["user"] == "admin" + + def test_full_config(self): + env = { + "SNOWFLAKE_ACCOUNT": "org.region", + "SNOWFLAKE_USER": "user", + "SNOWFLAKE_PASSWORD": "pass", + "SNOWFLAKE_WAREHOUSE": "COMPUTE_WH", + "SNOWFLAKE_DATABASE": "ANALYTICS", + "SNOWFLAKE_SCHEMA": "PUBLIC", + "SNOWFLAKE_ROLE": "SYSADMIN", + } + result = detect_env_connections(env) + assert len(result) == 1 + assert len(result[0]["config"]) == 7 + # Password should be redacted + assert result[0]["config"]["password"] == "***" + # Non-sensitive values should be present + assert result[0]["config"]["account"] == "org.region" + + def test_not_detected_without_account(self): + env = {"SNOWFLAKE_USER": "admin", "SNOWFLAKE_PASSWORD": "pass"} + result = detect_env_connections(env) + snowflake = [r for r in result if r["type"] == "snowflake"] + assert len(snowflake) == 0 + + +class TestBigQueryDetection: + def test_detected_with_credentials(self): + env = {"GOOGLE_APPLICATION_CREDENTIALS": "/path/to/creds.json"} + result = detect_env_connections(env) + bq = [r for r in result if r["type"] == "bigquery"] + assert len(bq) == 1 + assert bq[0]["config"]["credentials_path"] == "/path/to/creds.json" + + def test_detected_with_bigquery_project(self): + env = {"BIGQUERY_PROJECT": "my-project-123"} + result = detect_env_connections(env) + bq = [r for r in result if r["type"] == "bigquery"] + assert len(bq) == 1 + assert bq[0]["config"]["project"] == "my-project-123" + + def test_detected_with_gcp_project(self): + env = {"GCP_PROJECT": "my-project"} + result = detect_env_connections(env) + bq = [r for r in result if r["type"] == "bigquery"] + assert len(bq) == 1 + + def test_bigquery_project_preferred_over_gcp_project(self): + env = { + "BIGQUERY_PROJECT": "bq-proj", + "GCP_PROJECT": "gcp-proj", + "GOOGLE_APPLICATION_CREDENTIALS": "/creds.json", + } + result = detect_env_connections(env) + bq = [r for r in result if r["type"] == "bigquery"] + assert bq[0]["config"]["project"] == "bq-proj" + + +class TestDatabricksDetection: + def test_detected_with_host(self): + env = {"DATABRICKS_HOST": "adb-123.azuredatabricks.net"} + result = detect_env_connections(env) + db = [r for r in result if r["type"] == "databricks"] + assert len(db) == 1 + assert db[0]["config"]["server_hostname"] == "adb-123.azuredatabricks.net" + + def test_detected_with_server_hostname(self): + env = {"DATABRICKS_SERVER_HOSTNAME": "dbc-abc.cloud.databricks.com"} + result = detect_env_connections(env) + db = [r for r in result if r["type"] == "databricks"] + assert len(db) == 1 + + def test_host_preferred_over_server_hostname(self): + env = {"DATABRICKS_HOST": "host1", "DATABRICKS_SERVER_HOSTNAME": "host2"} + result = detect_env_connections(env) + db = [r for r in result if r["type"] == "databricks"] + assert db[0]["config"]["server_hostname"] == "host1" + + +class TestPostgresDetection: + def test_detected_with_pghost(self): + env = {"PGHOST": "localhost", "PGDATABASE": "mydb"} + result = detect_env_connections(env) + pg = [r for r in result if r["type"] == "postgres"] + assert len(pg) == 1 + assert pg[0]["config"]["host"] == "localhost" + + def test_detected_with_database_url_postgres_scheme(self): + env = {"DATABASE_URL": "postgresql://user:pass@localhost:5432/mydb"} + result = detect_env_connections(env) + pg = [r for r in result if r["type"] == "postgres"] + assert len(pg) == 1 + assert pg[0]["signal"] == "DATABASE_URL" + assert pg[0]["config"]["connection_string"] == "***" + + def test_database_url_mysql_scheme(self): + env = {"DATABASE_URL": "mysql://user:pass@localhost:3306/mydb"} + result = detect_env_connections(env) + my = [r for r in result if r["type"] == "mysql"] + assert len(my) == 1 + assert my[0]["signal"] == "DATABASE_URL" + + def test_database_url_does_not_duplicate(self): + env = {"PGHOST": "localhost", "DATABASE_URL": "postgresql://user:pass@host/db"} + result = detect_env_connections(env) + pg = [r for r in result if r["type"] == "postgres"] + assert len(pg) == 1 + assert pg[0]["signal"] == "PGHOST" + + def test_detected_with_pgdatabase_only(self): + env = {"PGDATABASE": "analytics"} + result = detect_env_connections(env) + pg = [r for r in result if r["type"] == "postgres"] + assert len(pg) == 1 + + +class TestMysqlDetection: + def test_detected_with_host(self): + env = {"MYSQL_HOST": "mysql.example.com", "MYSQL_DATABASE": "shop"} + result = detect_env_connections(env) + my = [r for r in result if r["type"] == "mysql"] + assert len(my) == 1 + + def test_not_detected_without_signals(self): + env = {"MYSQL_USER": "root", "MYSQL_PASSWORD": "secret"} + result = detect_env_connections(env) + my = [r for r in result if r["type"] == "mysql"] + assert len(my) == 0 + + +class TestRedshiftDetection: + def test_detected_with_host(self): + env = {"REDSHIFT_HOST": "cluster.abc.us-east-1.redshift.amazonaws.com"} + result = detect_env_connections(env) + rs = [r for r in result if r["type"] == "redshift"] + assert len(rs) == 1 + + +class TestNoEnvVars: + def test_empty_env(self): + result = detect_env_connections({}) + assert result == [] + + def test_unrelated_env_vars(self): + env = {"HOME": "/home/user", "PATH": "/usr/bin", "EDITOR": "vim"} + result = detect_env_connections(env) + assert result == [] + + def test_empty_signal_values_ignored(self): + env = {"SNOWFLAKE_ACCOUNT": "", "PGHOST": ""} + result = detect_env_connections(env) + assert result == [] + + +class TestMultipleDetections: + def test_multiple_warehouses(self): + env = { + "SNOWFLAKE_ACCOUNT": "org.region", + "PGHOST": "localhost", + "DATABRICKS_HOST": "adb.net", + } + result = detect_env_connections(env) + types = {r["type"] for r in result} + assert "snowflake" in types + assert "postgres" in types + assert "databricks" in types + assert len(result) == 3 + + def test_all_warehouses_detected(self): + env = { + "SNOWFLAKE_ACCOUNT": "org", + "GOOGLE_APPLICATION_CREDENTIALS": "/creds.json", + "DATABRICKS_HOST": "host", + "PGHOST": "localhost", + "MYSQL_HOST": "mysql", + "REDSHIFT_HOST": "redshift", + } + result = detect_env_connections(env) + assert len(result) == 6 + + +class TestConnectionNames: + def test_name_format(self): + env = {"SNOWFLAKE_ACCOUNT": "org"} + result = detect_env_connections(env) + assert result[0]["name"] == "env_snowflake" + + def test_source_is_env_var(self): + env = {"PGHOST": "localhost"} + result = detect_env_connections(env) + assert result[0]["source"] == "env-var" + + +class TestPartialConfig: + def test_only_populated_keys_in_config(self): + env = {"SNOWFLAKE_ACCOUNT": "org"} + result = detect_env_connections(env) + # Only account should be in config, not user/password/etc + assert "account" in result[0]["config"] + assert "password" not in result[0]["config"] + assert "user" not in result[0]["config"]