Skip to content

Commit f94f0cb

Browse files
anandgupta42claude
andcommitted
docs: add comprehensive training guide with scenarios and limitations
- New `data-engineering/training/index.md` (350+ lines): - Quick start with 3 entry points (trainer mode, inline corrections, /train skill) - Deep dive into all 4 trainer workflows (scan, validate, teach, gap analysis) - 5 comprehensive scenarios: new project onboarding, post-incident learning, quarterly review, business domain teaching, pre-migration documentation - Explicit limitations section (not a hard gate, budget limits, no auto-learning, heuristic validation, no conflict resolution, no version history) - Full reference tables for tools, skills, limits, and feature flag - Updated `agent-modes.md`: add Researcher and Trainer mode sections with examples, capabilities, and "when to use" guidance - Updated `getting-started.md`: add training link to "Next steps" - Updated `mkdocs.yml`: add Training nav section under Data Engineering Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 335dcb8 commit f94f0cb

File tree

4 files changed

+649
-2
lines changed

4 files changed

+649
-2
lines changed

docs/docs/data-engineering/agent-modes.md

Lines changed: 143 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
11
# Agent Modes
22

3-
altimate runs in one of four specialized modes. Each mode has different permissions, tool access, and behavioral guardrails.
3+
altimate runs in one of seven specialized modes. Each mode has different permissions, tool access, and behavioral guardrails.
4+
5+
| Mode | Access | Purpose |
6+
|---|---|---|
7+
| **Builder** | Read/Write | Create and modify data pipelines |
8+
| **Analyst** | Read-only | Safe exploration and cost analysis |
9+
| **Validator** | Read + Validate | Data quality and integrity checks |
10+
| **Migrator** | Cross-warehouse | Dialect translation and migration |
11+
| **Researcher** | Read-only + Parallel | Deep multi-step investigations |
12+
| **Trainer** | Read-only + Training | Teach your AI teammate |
13+
| **Executive** | Read-only | Business-friendly reporting (no SQL jargon) |
414

515
## Builder
616

@@ -210,3 +220,135 @@ Migrator:
210220
| PostgreSQL | Snowflake, BigQuery, Databricks |
211221
| MySQL | PostgreSQL, Snowflake |
212222
| SQL Server | PostgreSQL, Snowflake |
223+
224+
---
225+
226+
## Researcher
227+
228+
**Read-only + parallel investigation. For complex analytical questions.**
229+
230+
```bash
231+
altimate --agent researcher
232+
```
233+
234+
Researcher mode follows a 4-phase protocol for thorough investigation:
235+
236+
1. **Plan** — Outline questions, data sources, and tool sequence
237+
2. **Gather** — Execute investigation steps, parallelize where possible
238+
3. **Analyze** — Cross-reference findings, identify root causes
239+
4. **Report** — Produce structured report with evidence and recommendations
240+
241+
### Example: Investigate a cost spike
242+
243+
```
244+
You: Why did our Snowflake costs spike last week?
245+
246+
Researcher: Let me plan the investigation.
247+
248+
Investigation Plan:
249+
1. Pull credit consumption trends (finops_analyze_credits)
250+
2. Identify top expensive queries (finops_expensive_queries)
251+
3. Check for deployment changes (git log)
252+
4. Correlate query patterns with cost changes
253+
254+
Proceeding...
255+
256+
> finops_analyze_credits prod-snowflake --days 14
257+
Last 7 days: 847 credits (↑ 38% vs prior week)
258+
Top warehouse: TRANSFORM_WH (523 credits, 62%)
259+
260+
> finops_expensive_queries prod-snowflake --days 7
261+
1. Full table scan on events (89 credits, 47 runs)
262+
2. New model fct_attribution (45 credits, added 3 days ago)
263+
264+
> bash: git log --oneline --since="7 days ago" -- models/
265+
abc1234 feat: add fct_attribution model
266+
def5678 fix: remove partition filter from stg_events
267+
268+
# Investigation Report
269+
270+
## Summary
271+
Costs increased 38% due to two changes deployed 3 days ago.
272+
273+
## Key Findings
274+
1. fct_attribution model (new) scans 2.1B events rows without partition filter — 45 credits/week
275+
2. Partition filter was accidentally removed from stg_events in commit def5678 — caused 47 full scans
276+
277+
## Recommendations
278+
1. [ ] Add date partition filter to fct_attribution (estimated savings: 40 credits/week)
279+
2. [ ] Restore partition filter in stg_events (estimated savings: 60 credits/week)
280+
```
281+
282+
### What researcher can do
283+
284+
- Run all read-only SQL and schema tools
285+
- Use `bash` for git history and code analysis
286+
- Launch parallel sub-investigations with the `task` tool
287+
- Access `websearch` and `webfetch` for external research
288+
- Save discoveries as training for future sessions
289+
290+
### When to use researcher vs analyst
291+
292+
| Use Analyst for... | Use Researcher for... |
293+
|---|---|
294+
| Quick data exploration | Multi-step investigations |
295+
| Single-query answers | Root cause analysis |
296+
| Schema browsing | Cross-referencing multiple data sources |
297+
| Ad-hoc cost checks | Comprehensive audits with reports |
298+
299+
---
300+
301+
## Trainer
302+
303+
**Read-only + training management. For teaching your AI teammate.**
304+
305+
```bash
306+
altimate --agent trainer
307+
```
308+
309+
Trainer mode is dedicated to building your AI's knowledge base. It cannot modify project files — it can only read, scan, validate, and manage training entries.
310+
311+
### Example: Scan for patterns and teach
312+
313+
```
314+
You: Scan my project for patterns I should teach you
315+
316+
Trainer:
317+
> training_scan target:models
318+
Scanned 24 files in models/
319+
320+
Discovered Patterns:
321+
- Naming: `stg_*` (12 files), `fct_*` (6 files), `dim_*` (4 files)
322+
- Common: CTEs in 92%, {{ source() }} in 50%, incremental in 25%
323+
324+
Suggested Next Steps:
325+
Review the patterns above and tell me which ones to save.
326+
327+
You: Save the naming convention
328+
329+
Trainer: Saved pattern "dbt-model-naming" to project training.
330+
Content: "stg_{source}__{entity}, int_{entity}__{verb}, fct_{entity}, dim_{entity}"
331+
Training usage: 180/6000 chars (3% full).
332+
This will be shared with your team when committed to git.
333+
```
334+
335+
### What trainer can do
336+
337+
- Scan codebases for patterns (`training_scan`)
338+
- Validate training against actual code (`training_validate`)
339+
- Save, list, and remove training entries
340+
- Guide users through systematic knowledge capture
341+
- Analyze training gaps and suggest what to teach next
342+
343+
### When to use trainer mode
344+
345+
| Scenario | Why trainer mode |
346+
|---|---|
347+
| New project setup | Systematically scan and extract conventions |
348+
| Team onboarding | Walk through existing training with explanations |
349+
| Post-incident review | Save lessons learned as rules |
350+
| Quarterly audit | Validate training, remove stale entries, consolidate |
351+
| Loading a style guide | Extract rules and standards from documentation |
352+
| Pre-migration prep | Document current patterns as context |
353+
354+
For a comprehensive guide with scenarios and examples, see [Training Your AI Teammate](training/index.md).

0 commit comments

Comments
 (0)