Skip to content

Commit 6d56feb

Browse files
anandgupta42claude
andauthored
feat: add AI Teammate training system with learn-by-example patterns (#148)
* Add AI Teammate repositioning design document Comprehensive design for repositioning altimate from "AI tool" to "AI teammate" — including trainable knowledge system (/teach, /train, /feedback), Deep Research mode for multi-step investigations, team memory that persists via git, and UX reframing from "agent modes" to "teammate roles." https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq * Enrich design doc with OpenClaw research and proactive behaviors Add detailed competitive analysis from OpenClaw (self-improving memory, heartbeat scheduler, meet-users-where-they-are), Devin ($10.2B valuation, "junior partner" framing), and Factory AI (workflow embedding). Add proactive behaviors section with background monitors (cost alerts, freshness checks, schema drift, PII scanning) and auto-promotion of learned corrections. https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq * Implement AI Teammate training system and Deep Research mode Core training infrastructure built on top of existing memory system: Training Store & Types: - TrainingStore wraps MemoryStore with training-specific conventions - Four knowledge kinds: pattern, rule, glossary, standard - Structured metadata (applied count, source, acceptance tracking) - Training blocks stored in .opencode/memory/training/ (git-committable) - One person teaches, whole team benefits via git Training Tools: - training_save: Save learned patterns, rules, glossary, standards - training_list: List all learned knowledge with applied counts - training_remove: Remove outdated training entries Training Skills: - /teach: Learn patterns from example files in the codebase - /train: Learn standards from documents or style guides - /training-status: Dashboard of all learned knowledge System Prompt Injection: - Training knowledge injected alongside memory at session start - Structured by kind: rules first, then patterns, standards, glossary - Budget-limited to 6000 chars to control prompt size - Zero LLM calls on startup — just reads files from disk Deep Research Agent Mode: - New "researcher" agent for multi-step investigations - 4-phase protocol: Plan → Gather → Analyze → Report - Read-only access to all warehouse, schema, FinOps tools - Structured reports with evidence, root causes, action items Agent Awareness: - All agent prompts updated with training awareness section - Agents offer to save corrections as rules when users correct behavior - Training tools permitted in all agent modes Tests: - 88 new tests across 5 test files (types, store, prompt, tools, integration) - All tests standalone (no Instance dependency) - Full lifecycle tests: save → list → format → inject → remove - Edge cases: budget limits, meta roundtrips, coexistence with memory https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq * Polish AI Teammate training UX: auto-lowercase names, update detection, budget visibility - Fix researcher agent permissions: add training_save/remove (was read-only) - Auto-lowercase + space-to-hyphen name transform in training_save (ARR → arr) - Detect update vs new save, show "Updated" with preserved applied count - Show training budget usage (chars/percent) on save, list, and remove - Improve training_list: group by kind, show most-applied entries, budget % - Improve training_remove: show available entries on not-found, applied count - Show similar entry names in duplicate warnings (not just count) - Raise content limit from 1800 to 2500 chars - Export TRAINING_BUDGET constant, add budgetUsage() to TrainingPrompt - Add 30 new tests: auto-lowercase, update detection, budget overflow, name collision, scale (80 entries), improved messaging - All 118 training tests + 305 memory tests pass https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq * Enhance training UX: attribution, correction detection, priority sorting - Builder prompt: add attribution instructions (cite training entries that influenced output), correction detection (explicit + implicit patterns), conflict flagging between contradictory training entries - Add /teach, /train, /training-status to Available Skills list in builder prompt - Sort training entries by applied count (descending) in prompt injection so most-used entries get priority within the 6000-char budget - Restructure Teammate Training section with clear subsections https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq * Fix experience gaps from user journey simulations Simulation findings and fixes: 1. training_save now echoes back saved content so user can verify what was captured (new saves show content preview, updates show old vs new diff) 2. When training limit is reached, error now lists existing entries sorted by applied count and suggests the least-applied entry for removal 3. Researcher prompt now documents training_save/remove permissions (was contradicting its own permissions by saying "read-only" while having write access to training) 4. Added 10 new tests: content echo, update diff, limit suggestion, special character preservation (SQL -->, Jinja, HTML comments, code blocks), priority sorting verification Verified: --> in content does NOT corrupt meta block (false positive). The non-greedy regex terminates at the meta block's own --> correctly. 128 training tests + 305 memory tests all pass. https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq * Add self-improvement loop: applied tracking, insights, staleness detection OpenClaw-inspired self-improvement mechanisms: 1. Wire up incrementApplied at injection time — counters now actually increment once per session per entry (deduped via session-scoped set), making "Most Applied" dashboard and priority sorting meaningful 2. TrainingInsights module analyzes training metadata and surfaces: - Stale entries (7+ days old, never applied) — suggests cleanup - High-value entries (5+ applications) — highlights most impactful - Near-limit warnings (18-19 of 20 entries per kind) - Consolidation opportunities (3+ entries with shared name prefix) 3. Insights automatically shown in training_list output 4. 24 new tests covering all insight types, boundary conditions, session tracking dedup, and format output 152 training tests + 305 memory tests all pass. https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq * fix: add dedicated training feature flag and remove unused insight type - Add `ALTIMATE_DISABLE_TRAINING` flag independent of memory's disable flag - Use new flag in session prompt injection and tool registry - Remove unused `budget-warning` insight type from `TrainingInsight` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: reset training session tracking, add error logging, fix list truncation - Call `TrainingPrompt.resetSession()` at session start (step === 1) to prevent applied counters from growing unbounded across sessions - Add structured error logging to all three training tools - Add truncation indicator (`...`) when training list preview is cut off Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use `.altimate-code/memory` as primary storage path with `.opencode` fallback Memory store was hardcoded to `.opencode/memory/` but the config system already uses `.altimate-code` as primary with `.opencode` as fallback. Now checks for `.altimate-code/` directory first, falls back to `.opencode/`, and defaults to `.altimate-code/` for new projects. Result is cached per process to avoid repeated filesystem checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Trainer agent mode with pattern discovery and training validation Add dedicated trainer mode — the 8th primary agent — for systematically building the AI teammate's knowledge base. Unlike inline corrections in other modes, trainer mode actively scans codebases, validates training against reality, and guides knowledge curation. Changes: - New `trainer` agent mode with read-only permissions (no write/edit/sql_execute) - New `training_scan` tool: auto-discover patterns in models, SQL, config, tests, docs - New `training_validate` tool: check training compliance against actual codebase - Expand `TrainingKind` to 6 types: add `context` (background "why" knowledge) and `playbook` (multi-step procedures) - Update `count()` to derive from enum (prevents drift when kinds change) - Add KIND_HEADERS for context and playbook in prompt injection - Update injection order: rules first, playbooks last (budget priority) - Update training-save and training-list descriptions for new kinds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add comprehensive training guide with scenarios and limitations - New `data-engineering/training/index.md` (350+ lines): - Quick start with 3 entry points (trainer mode, inline corrections, /train skill) - Deep dive into all 4 trainer workflows (scan, validate, teach, gap analysis) - 5 comprehensive scenarios: new project onboarding, post-incident learning, quarterly review, business domain teaching, pre-migration documentation - Explicit limitations section (not a hard gate, budget limits, no auto-learning, heuristic validation, no conflict resolution, no version history) - Full reference tables for tools, skills, limits, and feature flag - Updated `agent-modes.md`: add Researcher and Trainer mode sections with examples, capabilities, and "when to use" guidance - Updated `getting-started.md`: add training link to "Next steps" - Updated `mkdocs.yml`: add Training nav section under Data Engineering Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: increase training budget to 16K chars and rewrite docs as harness customization guide Training is not a CLAUDE.md replacement — it's the mechanism by which users customize the data engineering harness for their specific project. The agent works WITH the user to discover what it needs to know, rather than requiring users to write perfect static instructions. Changes: - Increase TRAINING_BUDGET from 6000 to 16000 chars (removes the #1 criticism from user simulations — budget was worse than unlimited CLAUDE.md) - Complete docs rewrite with correct positioning: - "Customizing Your AI Teammate" framing (not "Training Your AI Teammate") - Research-backed "why" section (40-70% knowledge omission, guided discovery) - Clear comparison table: training vs CLAUDE.md (complementary, not competing) - 6 real-world scenarios including Databricks, Salesforce quirks, cost spikes - Honest limitations section (not a linter, not an audit trail, not automatic) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: merge training into memory with context-aware relevance scoring Replace two parallel injection systems (memory 8KB + training 16KB) with a single unified injection that scores blocks by relevance to the current agent. How it works: - All blocks (memory + training) loaded in one pass - Each block scored: agent tag match (+10), training kind relevance per agent (+1-5), applied count bonus (+0-3), recency (+0-2), non-training base (+5) - Builder sees rules/patterns first; analyst sees glossary/context first - Budget is 20KB unified, filled greedily by score - Training blocks still tracked with applied counts (fire-and-forget) Architecture: - memory/prompt.ts: new scoreBlock(), unified inject() with InjectionContext - memory/types.ts: UNIFIED_INJECTION_BUDGET, AGENT_TRAINING_RELEVANCE weights - session/prompt.ts: single inject call with agent context (was 2 separate) - training/prompt.ts: deprecated, delegates to MemoryPrompt (backward compat) No changes to: MemoryStore, TrainingStore, training tools, memory tools. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: cut training_scan and training_validate, simplify docs Research from 8 independent evaluations + SkillsBench (7,308 test runs) found that compact focused context beats comprehensive docs by 20pp. The training system's value is in correction capture (2-sec saves) and team propagation (git sync) — not in regex scanning or keyword grep. Removed: - training_scan (255 lines) — regex pattern counting, not discovery - training_validate (315 lines) — keyword grep, not validation Simplified: - trainer.txt: removed scan/validate workflows, focused on guided teaching and curation - agent-modes.md: updated trainer section with correction-focused example - training docs: complete rewrite with new pitch: "Correct the agent once. It remembers forever. Your team inherits it." Backed by SkillsBench research showing compact > comprehensive. Net: -753 lines. 152 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove dead accepted/rejected fields, add training tips, expand limitations Gaps found by simulation team: 1. Remove `accepted`/`rejected` counters from TrainingBlockMeta — they were never incremented anywhere in the codebase (dead code since inception) 2. Add 5 training discoverability tips to TUI tips (was 0 mentions in 152 tips) 3. Expand limitations section in docs with honest, complete list: context budget, 20/kind limit, no approval workflow, SQL-focused, git discipline required Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update site-wide docs for training and new agent modes - Homepage: update from "Four agents" to "Seven agents" — add Researcher, Trainer, Executive cards with descriptions - Getting Started: update training link to match new pitch "Corrections That Stick" - Tools index: add Training row (3 tools + 3 skills) with link - All references now consistent with simplified training system Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Sentry review findings — 7 bugs fixed 1. stripTrainingMeta/parseTrainingMeta regex: remove multiline `m` flag that could match user content starting with `<!-- training` mid-string (types.ts, store.ts) 2. training_save content limit: reduce from 2500 to 1800 chars to account for ~200 char metadata overhead against MemoryStore's 2048 char limit (training-save.ts) 3. injectTrainingOnly: change `break` to `continue` so budget-exceeding section headers skip to next kind instead of stopping all injection (memory/prompt.ts) 4. injectTrainingOnly: track itemCount and return empty string when no items injected (was returning header-only string, inflating budget reports) (memory/prompt.ts) 5. projectDir cache: replace module-level singleton with Map keyed by Instance.directory to prevent stale paths when AsyncLocalStorage context changes across concurrent requests (memory/store.ts) 6. budgetUsage side effect: already fixed — delegates to injectTrainingOnly which is read-only (no applied count increment). Sentry comments were against pre-refactor code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: CI failure + new Sentry finding — orphaned headers and agent test 1. Agent test: add researcher + trainer to "all disabled" test so it correctly expects "no primary visible agent" when ALL agents are off 2. Orphaned section headers: add pre-check that at least one entry fits before adding section header in both injectTrainingOnly and inject memory section (prevents header-only output inflating budget reports) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address multi-model code review findings Fixes from 6-model consensus review (Claude + GPT + Gemini + Kimi + MiniMax + GLM-5): 1. training_remove: add name validation regex matching training_save (Gemini finding — prevents path traversal via malformed names) 2. training_save: improve name transform to strip ALL non-alphanumeric chars, not just whitespace (Gemini finding — "don't-use-float!" now becomes "don-t-use-float" instead of failing regex) 3. incrementApplied: replace silent `.catch(() => {})` with warning log (Kimi + GLM-5 consensus — fire-and-forget is by design but failures should be visible in logs for debugging) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address new Sentry findings — regex m flag and off-by-one budget check 1. formatTrainingEntry regex: remove multiline `m` flag that could match user content mid-string (memory/prompt.ts:82) 2. Memory block budget check: change `<` to `<=` so blocks that fit exactly into remaining budget are included (memory/prompt.ts:204) 3 prior Sentry findings already fixed in earlier commits: - projectDir cache (Map keyed by Instance.directory) - injectTrainingOnly header-only return (itemCount guard) - orphaned section headers (first-entry pre-check) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address 6-model consensus review — 4 remaining bugs Fixes from consensus across Claude, GPT 5.2, Gemini 3.1, Kimi K2.5, MiniMax M2.5, and GLM-5: 1. parseTrainingMeta: check safeParse().success before accessing .data (GLM-5 + MiniMax consensus — accessing .data on failed parse returns undefined, could cause downstream errors) 2. Stale detection: use `e.updated` not `e.created` so entries updated recently aren't incorrectly flagged as stale (MiniMax finding) 3. training_list: pass scope/kind filter to count() so summary table matches the filtered entries list (GPT finding) 4. training_remove: show hint entries from same scope only, not all scopes (GPT + MiniMax finding) Prior fixes already addressed: name validation on remove (Gemini), name transform punctuation (Gemini), silent incrementApplied catch (Kimi + GLM-5), regex m flag (MiniMax + Sentry). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent ac82e29 commit 6d56feb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+6107
-144
lines changed

.github/meta/commit.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
docs: update site-wide docs for training and new agent modes
2+
3+
- Homepage: update from "Four agents" to "Seven agents" — add Researcher,
4+
Trainer, Executive cards with descriptions
5+
- Getting Started: update training link to match new pitch
6+
"Corrections That Stick"
7+
- Tools index: add Training row (3 tools + 3 skills) with link
8+
- All references now consistent with simplified training system
9+
10+
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

.opencode/skills/teach/SKILL.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
name: teach
3+
description: Teach your AI teammate a pattern by showing it an example file from your codebase
4+
---
5+
6+
# Teach
7+
8+
## Purpose
9+
Learn a reusable pattern from an example file. The user shows you a well-written artifact (model, query, config), and you extract the patterns worth following.
10+
11+
## Workflow
12+
13+
1. **Identify the file**: The user provides a file reference (e.g., `@models/staging/stg_orders.sql`). Read the file.
14+
15+
2. **Analyze patterns**: Extract the structural patterns, NOT the specific content. Focus on:
16+
- File structure and organization (sections, ordering)
17+
- Naming conventions (prefixes, suffixes, casing)
18+
- SQL patterns (CTE vs subquery, join style, column ordering)
19+
- dbt conventions (materialization, tests, config blocks)
20+
- Common boilerplate (headers, comments, imports)
21+
- Data type choices
22+
- Error handling patterns
23+
24+
3. **Present findings**: Show the user what you learned in a structured list. Be specific:
25+
- Good: "Column order: keys first, then dimensions, then measures, then timestamps"
26+
- Bad: "Good column ordering"
27+
28+
4. **Ask for confirmation**: Let the user confirm, modify, or reject your findings before saving.
29+
30+
5. **Save via training_save**: Use the `training_save` tool with:
31+
- `kind`: "pattern"
32+
- `name`: A descriptive slug (e.g., "staging-model", "incremental-config")
33+
- `content`: The extracted patterns as a concise, actionable checklist
34+
- `scope`: "project" (default — shared with team via git)
35+
- `source`: The file path you learned from
36+
- `citations`: Reference to the source file
37+
38+
## Important Guidelines
39+
40+
- Extract PATTERNS, not content. "Use `{{ source() }}` macro" is a pattern. "Query the orders table" is content.
41+
- Keep it concise — max 10 bullet points per pattern. If more are needed, split into multiple patterns.
42+
- Use the file's actual conventions, don't impose your own preferences.
43+
- If the file doesn't have clear patterns worth learning, say so honestly.
44+
- Do NOT make any LLM calls beyond the normal conversation flow — pattern extraction happens in your analysis, not via separate API calls.
45+
46+
## Usage Examples
47+
48+
```
49+
/teach @models/staging/stg_orders.sql
50+
/teach staging-model @models/staging/stg_customers.sql
51+
/teach @dbt_project.yml
52+
```
53+
54+
If the user provides a name (first argument before the @file), use that as the pattern name. Otherwise, infer a name from the file type and purpose.

.opencode/skills/train/SKILL.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
name: train
3+
description: Train your AI teammate on team standards from a document or style guide
4+
---
5+
6+
# Train
7+
8+
## Purpose
9+
Learn team standards and conventions from a document (style guide, review checklist, coding standards, etc.). Extracts actionable rules and saves them as training.
10+
11+
## Workflow
12+
13+
1. **Get the document**: The user provides either:
14+
- A file reference: `@docs/sql-style-guide.md`
15+
- A URL: The full URL to fetch (use webfetch tool)
16+
- Inline text: Pasted directly in the chat
17+
18+
2. **Read and analyze**: Parse the document and extract:
19+
- Specific, enforceable rules (naming, formatting, prohibited patterns)
20+
- Review criteria and checklists
21+
- Glossary terms and definitions
22+
- Architectural standards
23+
24+
3. **Categorize**: Group findings by training kind:
25+
- `rule` — Specific do/don't rules (e.g., "Never use SELECT *")
26+
- `standard` — Broader conventions (e.g., "SQL style guide compliance")
27+
- `glossary` — Term definitions (e.g., "ARR = Annual Recurring Revenue")
28+
29+
4. **Present summary**: Show the user what you extracted:
30+
- Number of rules, standards, and glossary terms found
31+
- Preview of each item
32+
- Ask for confirmation before saving
33+
34+
5. **Save via training_save**: Save each item using the `training_save` tool. For documents with many rules, consolidate related rules into logical groups (e.g., "sql-naming-rules" with 5 rules, rather than 5 separate entries).
35+
36+
## Important Guidelines
37+
38+
- Only extract ACTIONABLE items. Skip vague guidance like "write clean code."
39+
- Consolidate related rules into single training entries to avoid clutter.
40+
- Preserve the original wording when it's specific and clear.
41+
- If the document is too large, focus on the most impactful rules.
42+
- Always use `scope: project` unless the user specifies global.
43+
- Do NOT make any extra LLM calls — analysis happens in the normal conversation flow.
44+
45+
## Usage Examples
46+
47+
```
48+
/train @docs/sql-style-guide.md
49+
/train https://wiki.company.com/data-team/review-checklist
50+
/train (then paste content inline)
51+
```
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
name: training-status
3+
description: Show what your AI teammate has learned — patterns, rules, glossary, and standards
4+
---
5+
6+
# Training Status
7+
8+
## Purpose
9+
Display a comprehensive overview of everything your AI teammate has been trained on.
10+
11+
## Workflow
12+
13+
1. **Fetch all training**: Use the `training_list` tool with no filters to get all training entries.
14+
15+
2. **Present the dashboard**: Format the output as a clean status report:
16+
17+
```
18+
Training Status
19+
20+
Patterns: X (staging-model, incremental-config, ...)
21+
Rules: X (no-float, no-select-star, ...)
22+
Glossary: X (arr, mrr, churn-date, ...)
23+
Standards: X (sql-style-guide, review-checklist, ...)
24+
25+
Recent Training:
26+
- 2 days ago: Learned rule "no-float" (from user correction)
27+
- 5 days ago: Learned pattern "staging-model" (from stg_orders.sql)
28+
- 1 week ago: Loaded standard "sql-style-guide" (from docs/sql-style.md)
29+
30+
Most Applied:
31+
- "staging-model" pattern — applied 12 times
32+
- "no-float" rule — applied 8 times
33+
```
34+
35+
3. **Offer actions**: After showing status, suggest:
36+
- `/teach` to learn new patterns
37+
- `/train` to load standards from documents
38+
- `training_remove` to remove outdated entries
39+
- `training_list` with filters for detailed views
40+
41+
## Usage
42+
43+
```
44+
/training-status
45+
```

0 commit comments

Comments
 (0)