Skip to content

Commit c5e9cfa

Browse files
anandgupta42claude
andauthored
feat: plan agent refinement, feature discovery (#556)
* feat: plan agent refinement, feature discovery, and telemetry instrumentation ## Plan Agent Improvements - Two-step plan approach: outline first, confirm, then expand - Plan refinement loop: edit instead of restart (capped at 5 revisions) - Approval detection for common phrases ("looks good", "proceed", "lgtm") - Bug fix: `agent_outcome` telemetry for plan sessions emitting null for `duration_ms`, `tool_calls`, `generations` — `sessionAgentName` set too late ## Feature Discovery & Progressive Disclosure - Post-warehouse-connect contextual suggestions (schema, SQL, lineage, PII) - Progressive: `sql_execute` → `sql_analyze` → `schema_inspect` → `lineage_check` - dbt auto-detection → recommends `/dbt-develop`, `/dbt-troubleshoot` - All suggestions try/catch wrapped, never block core flow ## Telemetry - New `plan_revision` event: revision count + action (refine/approve/reject) - New `feature_suggestion` event: suggestions shown, type, warehouse - `skill_used.trigger`: user_command | llm_selected | auto_suggested - `classifySkillTrigger()` helper ## Tests - 140 new tests across 4 files, all passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review comments - Validate `sessionId` before telemetry track in post-connect-suggestions - Rename shadowed `ctx` to `suggestionCtx` in warehouse-add.ts - Add rejection detection ("no", "stop", "reject", etc.) to plan revision action — telemetry now emits "reject" in addition to "approve"/"refine" - Add `warehouseType` to `trackSuggestions` in sql-analyze and schema-inspect for telemetry payload consistency - Use hyphenated skill names in telemetry (`dbt-develop` not `dbt_develop`) to match user-facing `/dbt-develop` command names - Derive `suggestionsShown` from `suggestionCtx` to prevent drift - DRY up event types list in telemetry.test.ts (`ALL_EVENT_TYPES` constant) - Remove `@ts-nocheck` from plan-skill-telemetry.test.ts - Add runtime `Telemetry.track()` verification to type-only tests - Use semantic regex in plan-refinement tests instead of brittle string matches - Fix trackBlock extraction with generous region window instead of fixed slice - Remove duplicate regression tests (covered in telemetry.test.ts) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: e2e tests, performance benchmarks, and UX gap fixes ## E2E Integration Tests (41 tests) - `feature-discovery-e2e.test.ts` (29 tests): Full warehouse-add → suggestions flow, progressive disclosure chain, plan refinement session, telemetry event validation with mocked Dispatcher - `performance-regression.test.ts` (12 tests): 1000x suggestion generation < 50ms, 10000x progressive lookup < 50ms, 100k phrase detection < 200ms, output determinism verification ## UX Gap Fixes - **Suggestion deduplication**: Progressive hints shown at most once per session per tool via `shownProgressiveSuggestions` Set. Running `sql_execute` 10 times no longer repeats the same tip. - **Approval false positives**: "yes, but change X" now correctly classified as "refine" not "approve". Added `refinementQualifiers` (" but ", " however ", " except ", " change ", etc.) that override approval detection. "no" uses `\bno\b` word boundary to avoid matching "know", "notion", etc. - **Revision cap communication**: When 5 revision cap is hit, a synthetic message informs the LLM to tell the user. Telemetry tracks "cap_reached". - **Warehouse add latency**: Suggestion gathering now uses `Promise.all` + `Promise.race` with 1500ms timeout. Slow schema/dbt checks silently skipped instead of blocking the response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: plan layer safety e2e tests (68 tests) Comprehensive user-perspective tests for the plan refinement changes: ## Phrase Classification (42 tests) - 14 approval cases ("looks good", "proceed", "lgtm", etc.) - 12 rejection cases ("no", "stop", "reject", etc.) - 8 pure refinement cases (feedback without approval/rejection signals) - 8 tricky edge cases: - "yes, but change X" → refine (qualifier overrides approval) - "no, I mean yes" → reject (rejection takes priority) - "I know this looks good" → approve ("know" ≠ "no" via word boundary) - Unicode input, multiline, special characters, empty string ## Non-Plan Agent Safety (3 tests) - Plan variables initialized before loop - Refinement block guarded by `agent.name === "plan"` - Plan file detection guarded by agent check ## Session Agent Name Fix (2 tests) - `sessionAgentName` set before early break conditions - `agent_outcome` telemetry uses `sessionAgentName` ## Revision Cap (4 tests) - Cap enforced at >= 5 - Synthetic message communicates limit to LLM - Telemetry emits "cap_reached" - Synthetic message doesn't persist to DB ## Adversarial (4 tests) - 70k char input doesn't crash - Unicode, special chars, multiline handled ## Import Safety (2 tests) - post-connect-suggestions is self-contained (no heavy imports) - Progressive suggestion dedup works correctly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add mongodb to devDependencies for typecheck resolution The mongodb driver (#482) was added to optionalDependencies only, so `tsgo --noEmit` failed with TS2307 when checking the drivers package. Adding it to devDependencies ensures types are available during CI typecheck without making it a runtime requirement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: track suggestion failures in warehouse-add telemetry Per Surya's review — the catch block in warehouse-add now emits a `core_failure` event with `error_class: "internal"` and `input_signature: "post_connect_suggestions"` so we can monitor how often post-connect suggestions fail and why. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: 125 simulated user scenarios for plan + suggestions Exercises the real code paths with realistic inputs: - 84 phrase classification scenarios: approval (20), rejection (24), refinement (16), qualifier overrides (12), word boundary (12) - 13 warehouse suggestion configs: 8 warehouse types x indexed/dbt/multi - 8 progressive disclosure chains: full progression, dedup, interleaved - 6 revision cap simulations: mixed actions, all rejections, cap behavior - 5 performance stress: 10k classifications < 500ms, determinism checks - 10 adversarial: unicode lookalikes, 50KB, SQL injection, null bytes All 125 pass in 255ms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: 40 real tool execution simulations with mocked Dispatcher Spawns actual tool execute() functions (not mocked) with registered Dispatcher handlers to simulate real user tool invocations: Warehouse Add (18 scenarios): - 8 warehouse types with post-connect suggestions - Schema indexed/not-indexed variations - Multi-warehouse data_diff suggestion - Failure modes: add fails, throws, missing type - Resilience: schema.cache_status fails, warehouse.list fails - Timeout: slow dispatcher (3s) races against 1.5s timeout SQL Execute (6 scenarios): - First call gets suggestion, subsequent calls deduped - 10 consecutive calls — only first has hint - Failure and empty result handling - Blocked query (DROP DATABASE) throws SQL Analyze (4 scenarios): - First call suggests schema_inspect, second deduped - Parse error and analyzer failure handling Schema Inspect (3 scenarios): - First call suggests lineage_check, second deduped - Failure handling Schema Index (3 scenarios): - Lists all capabilities on first call - Dedup on second, failure handling Full User Journeys (4 scenarios): - Complete 5-tool chain: warehouse_add → schema_index → sql_execute → sql_analyze → schema_inspect - 20 repeated queries with dedup verification - Interleaved tool calls with independent dedup - All dispatchers failing — warehouse add still succeeds Performance (2 scenarios): - Warehouse add < 500ms with fast dispatchers - 50 consecutive sql_execute < 2s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document plan refinement, feature discovery, and new telemetry events - agent-modes.md: Expanded Plan section with two-step workflow (outline then expand), refinement loop (approve/refine/reject), 5-revision cap, and example conversation - telemetry.md: Added plan_revision and feature_suggestion events, updated skill_used with trigger field - warehouses.md: Added "Post-Connection Suggestions" section covering progressive disclosure chain and once-per-session dedup - getting-started.md: Added feature suggestions mention after /discover Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace mock.module() with spyOn to prevent cross-file test pollution Root cause of 112 CI failures: three test files used Bun's mock.module() to replace the Telemetry module with a partial mock. mock.module() is process-global in Bun — it persisted across ALL test files, causing Telemetry functions (categorizeToolName, classifyError, bucketCount, computeInputSignature, maskArgs, etc.) to be undefined for any test file loaded after the mock. Fix: replaced mock.module() with spyOn() + afterEach(mock.restore()) in post-connect-suggestions.test.ts, performance-regression.test.ts, and feature-discovery-e2e.test.ts. spyOn is per-function and properly cleaned up between tests. Result: 5740 pass, 0 fail (was 112 fail) across 285 files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: reset dedup state in tests + replace passwords with fake values - post-connect-suggestions.test.ts: add resetShownSuggestions() in beforeEach to prevent cross-file dedup state from causing the "after sql_execute suggests sql_analyze" test to fail in CI - connections.test.ts: replace "secret", "pw123", "ssh-pw", "access-token-123" with obviously fake test values to resolve GitGuardian false positives Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace all credential-like test values for GitGuardian Replace remaining flagged values: "passphrase", "oauth-token", "client-secret", "my_secret", "my-passphrase", dapi_secret with obviously fake test-prefixed placeholders. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent ffe03b4 commit c5e9cfa

27 files changed

+3673
-115
lines changed

bun.lock

Lines changed: 36 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/docs/configure/warehouses.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -365,3 +365,16 @@ Testing connection to prod-snowflake (snowflake)...
365365
Warehouse: COMPUTE_WH
366366
Database: ANALYTICS
367367
```
368+
369+
## Post-Connection Suggestions
370+
371+
After you successfully connect a warehouse, altimate suggests next steps to help you get the most out of your connection. Suggestions are shown progressively based on what you've already done:
372+
373+
1. **Index your schemas** — populate the schema cache for autocomplete and context-aware analysis
374+
2. **Run SQL analysis** — scan your query history for anti-patterns and optimization opportunities
375+
3. **Inspect schema structure** — review tables, columns, and relationships
376+
4. **Check lineage** — trace column-level data flow across your models
377+
378+
If altimate detects a dbt project in your workspace, it also recommends relevant dbt skills (`/dbt-develop`, `/dbt-troubleshoot`, `/dbt-analyze`).
379+
380+
Each suggestion is shown **once per session** — dismissing or acting on a suggestion removes it from the queue. You can also run a suggested action later via its corresponding tool or slash command.

docs/docs/data-engineering/agent-modes.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,51 @@ altimate --agent plan
142142

143143
Plan mode restricts the agent to reading files and editing plan files only. No SQL, no bash, no file modifications. Use this to outline an approach before switching to builder to execute it.
144144

145+
### Two-step workflow
146+
147+
Plan mode uses a two-step approach to keep you in control:
148+
149+
1. **Outline** — The agent produces a short outline (3-5 bullet points) summarizing what it intends to do.
150+
2. **Expand** — After you confirm, the agent expands the outline into a full, detailed plan.
151+
152+
This prevents wasted effort on plans that are heading in the wrong direction.
153+
154+
### Refinement loop
155+
156+
You don't have to start over if the plan isn't quite right. After the agent presents its outline or full plan, you can:
157+
158+
- **Approve** — Accept the plan as-is (e.g., "looks good", "approved", "let's go").
159+
- **Refine** — Edit specific parts and ask the agent to revise (e.g., "change step 3 to use incremental instead of table").
160+
- **Reject** — Discard the plan entirely (e.g., "scrap this", "start over", "reject").
161+
162+
Refinements are capped at **5 revisions per session** to avoid endless loops. After the cap is reached, the agent asks you to approve or reject the current version.
163+
164+
### Example conversation
165+
166+
```
167+
You: Plan a migration of our raw_events table from a view to an incremental model
168+
169+
Plan: Here's my proposed approach:
170+
• Inspect the current raw_events view and its downstream dependencies
171+
• Design an incremental model with a timestamp-based strategy
172+
• Add schema tests for the new model
173+
• Create a cutover checklist for swapping the view to the model
174+
• Update the dbt DAG documentation
175+
176+
You: Change the second bullet — use insert_overwrite instead of timestamp-based
177+
178+
Plan: Updated outline:
179+
• Inspect the current raw_events view and its downstream dependencies
180+
• Design an incremental model with an insert_overwrite strategy
181+
• Add schema tests for the new model
182+
• Create a cutover checklist for swapping the view to the model
183+
• Update the dbt DAG documentation
184+
185+
You: Approved
186+
187+
Plan: [expands into a full detailed plan]
188+
```
189+
145190
---
146191

147192
## SQL Write Access Control

docs/docs/getting-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Set up warehouse connections so altimate can query your data platform. You have
6060
4. **Offers to configure connections** and walks you through adding and testing each discovered warehouse
6161
5. **Indexes schemas** to populate the schema cache for autocomplete and context-aware analysis
6262

63-
Once complete, altimate indexes your schemas and detects your tooling, enabling schema-aware autocomplete and context-rich analysis.
63+
Once complete, altimate indexes your schemas and detects your tooling, enabling schema-aware autocomplete and context-rich analysis. After connecting, you'll see feature suggestions tailored to your warehouse type — such as indexing schemas, running SQL analysis, or checking lineage. These appear progressively and each is shown once per session. See [Post-Connection Suggestions](configure/warehouses.md#post-connection-suggestions) for details.
6464

6565
### Option B: Manual configuration
6666

docs/docs/reference/telemetry.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,9 @@ We collect the following categories of events:
3333
| `error_recovered` | Successful recovery from a transient error (error type, strategy, attempt count) |
3434
| `mcp_server_census` | MCP server capabilities after connect (tool and resource counts, but no tool names) |
3535
| `context_overflow_recovered` | Context overflow is handled (strategy) |
36-
| `skill_used` | A skill is loaded (skill name and source — `builtin`, `global`, or `project` — no skill content) |
36+
| `skill_used` | A skill is loaded (skill name, source — `builtin`, `global`, or `project`, and trigger — `user`, `auto`, or `suggestion` — no skill content) |
37+
| `plan_revision` | A plan revision occurs in Plan mode (revision_number, action: `refine`, `approve`, `reject`, or `cap_reached`) |
38+
| `feature_suggestion` | A post-connection feature suggestion is shown (suggestion_type, suggestions_shown, warehouse_type — no user input) |
3739
| `sql_execute_failure` | A SQL execution fails (warehouse type, query type, error message, PII-masked SQL — no raw values) |
3840
| `core_failure` | An internal tool error occurs (tool name, category, error class, truncated error message, PII-safe input signature, and optionally masked arguments — no raw values or credentials) |
3941
| `first_launch` | Fired once on first CLI run after installation. Contains version and is_upgrade flag. No PII. |

packages/drivers/package.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
"./*": "./src/*.ts"
99
},
1010
"files": ["src"],
11+
"devDependencies": {
12+
"mongodb": "^6.0.0"
13+
},
1114
"optionalDependencies": {
1215
"pg": "^8.0.0",
1316
"snowflake-sdk": "^2.0.3",

0 commit comments

Comments
 (0)