Skip to content

Commit f41e03e

Browse files
anandgupta42claude
andauthored
feat: add telemetry intelligence signals for debugging and improvements (#564)
* feat: add implicit quality signal telemetry event Add `task_outcome_signal` event that maps agent outcomes to behavioral signals (accepted/error/abandoned/cancelled). Emitted alongside `agent_outcome` at session end with zero user cost — pure client-side computation from data already in memory. - New event type with `signal`, `tool_count`, `step_count`, `duration_ms`, `last_tool_category` fields - Exported `deriveQualitySignal()` for testable outcome→signal mapping - MCP tool detection via `mcp__` prefix for accurate categorization - 8 unit tests covering all signal derivations and event shape Closes AI-6028 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add task intent classification telemetry event Add `task_classified` event emitted at session start with keyword/regex classification of the first user message. Categories: debug_dbt, write_sql, optimize_query, build_model, analyze_lineage, explore_schema, migrate_sql, manage_warehouse, finops, general. - `classifyTaskIntent()` — pure regex matcher, zero LLM cost, <1ms - Includes warehouse type from fingerprint cache - Strong/weak confidence levels (1.0 vs 0.5) - 15 unit tests covering all intent categories + edge cases Closes AI-6029 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: emit aggregated tool chain outcome at session end Add `tool_chain_outcome` event that captures the ordered tool sequence, error count, recovery count, and final outcome at session end. Only emitted when tools were actually used (non-empty chain). - Tracks up to 50 tool names in execution order - Detects error→success recovery patterns for auto-fix insights - Aggregates existing per-tool-call data — near-zero additional cost - 3 unit tests for event shape and error/recovery tracking Closes AI-6030 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: link error-recovery pairs with hashed fingerprint Add `error_fingerprint` event emitted per unique error at session end. SHA256-hashes normalized error messages for anonymous grouping, links each error to its recovery tool (if the next tool succeeded). - `hashError()` — 16-char hex hash of masked error messages - Tracks error→recovery pairs during tool chain execution - Capped at 20 fingerprints per session to bound telemetry volume - 4 unit tests for hashing, event shape, and recovery tracking Closes AI-6031 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: emit SQL structure fingerprint using altimate-core Add `sql_fingerprint` event emitted after successful SQL execution via `sql_execute`. Uses `extractMetadata()` + `getStatementTypes()` from altimate-core NAPI — local parsing, no API calls, ~1-5ms. - Captures: statement types, categories, table/function count, subqueries, aggregation, window functions, AST node count - No table/column names or SQL content — PII-safe by design - Wrapped in try/catch so fingerprinting never breaks query execution - `computeSqlFingerprint()` exported from sql-classify for reuse - 6 unit tests including PII safety verification Closes AI-6032 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: expand environment_census with dbt project fingerprint Add optional dbt project metrics to the existing `environment_census` event: snapshot/seed count buckets, materialization distribution (table/view/incremental/ephemeral counts). Data already parsed at startup — just extracts more fields from the same manifest parse. - Backward compatible — new fields are optional - No extra file reads or API calls Closes AI-6033 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: emit schema complexity signal during warehouse introspection Add `schema_complexity` event emitted alongside `warehouse_introspection` after successful schema indexing. Uses data already computed during introspection — no extra warehouse queries. - Bucketed table/column/schema counts + avg columns per table - Division-by-zero guard for empty warehouses - Emitted inside existing try/catch — never breaks introspection Closes AI-6034 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update telemetry reference with 7 new intelligence signals Add task_outcome_signal, task_classified, tool_chain_outcome, error_fingerprint, sql_fingerprint, schema_complexity to the event catalog. Update environment_census description for new dbt fields. Update naming convention section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add comprehensive integration tests for telemetry signals Add 38 integration tests that verify all 7 telemetry signals fire through real code paths with spy on Telemetry.track(): - Signal 1: quality signal derivation + error/abandoned/cancelled cases - Signal 2: intent classifier with 10 real DE prompts + PII safety - Signal 3: tool chain collection with error recovery state machine - Signal 4: error fingerprint hashing + consecutive error flush - Signal 5: SQL fingerprint via altimate-core (aggregation, CTE, DDL) - Signal 6: environment_census expansion + backward compatibility - Signal 7: schema complexity bucketing + zero-table edge case - Full E2E: complete session simulation with all 7 signals in order Also fixes regex patterns for natural language flexibility: - dbt debug: allows words between "dbt" and error keywords - migrate: allows words between "to/from" and warehouse name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add altimate-core failure isolation tests Verify computeSqlFingerprint resilience when altimate-core NAPI: - throws (segfault, OOM) — returns null, never leaks exception - returns undefined — uses safe defaults (empty arrays, 0 counts) - returns garbage data — handled gracefully via ?? fallbacks Also verifies sql-execute.ts code structure ensures fingerprinting runs AFTER query result and is wrapped in isolated try/catch. Tests crash-resistant SQL inputs (control chars, empty, incomplete, very wide queries) and deterministic output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address stakeholder review findings Fixes from 5-stakeholder review (architect, privacy, perf, markers, tests): - Marker fix: remove nested altimate_change start/end, fold new variables into existing session telemetry tracking block - Performance: cap errorRecords at 200 entries (prevent unbounded growth) - Performance: slice intent classifier input to 2000 chars (bound regex) - Architecture: fix import path in sql-execute.ts (../telemetry not ../../altimate/telemetry) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump altimate-core to 0.2.6 Picks up extractMetadata fixes: - Aggregate function names (COUNT, SUM, AVG, etc.) now in functions array - IN (SELECT ...) and EXISTS (SELECT ...) subquery detection - Any/All quantified comparison subquery detection (guarded) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 22c9c38 commit f41e03e

File tree

12 files changed

+1772
-23
lines changed

12 files changed

+1772
-23
lines changed

.github/meta/commit.txt

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,15 @@
1-
ci: add Verdaccio sanity suite to CI and release workflows
1+
feat: add task intent classification telemetry event
22

3-
Adds the Verdaccio-based sanity suite (real `npm install -g` flow)
4-
to both CI and release pipelines:
3+
Add `task_classified` event emitted at session start with keyword/regex
4+
classification of the first user message. Categories: debug_dbt, write_sql,
5+
optimize_query, build_model, analyze_lineage, explore_schema, migrate_sql,
6+
manage_warehouse, finops, general.
57

6-
**CI (`ci.yml`):**
7-
- New `sanity-verdaccio` job on push to main
8-
- Builds linux-x64 binary + dbt-tools, runs full Docker Compose suite
9-
- Independent of other jobs (doesn't block PRs)
8+
- `classifyTaskIntent()` — pure regex matcher, zero LLM cost, <1ms
9+
- Includes warehouse type from fingerprint cache
10+
- Strong/weak confidence levels (1.0 vs 0.5)
11+
- 15 unit tests covering all intent categories + edge cases
1012

11-
**Release (`release.yml`):**
12-
- New `sanity-verdaccio` job between build and npm publish
13-
- Downloads linux-x64 artifact from build matrix
14-
- **Blocks `publish-npm`** — broken install flow prevents release
15-
- Dependency chain: build → sanity-verdaccio → publish-npm → github-release
13+
Closes AI-6029
1614

1715
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bun.lock

Lines changed: 7 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/docs/reference/telemetry.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ We collect the following categories of events:
2727
| `doom_loop_detected` | A repeated tool call pattern is detected (tool name and count) |
2828
| `compaction_triggered` | Context compaction runs (strategy and token counts) |
2929
| `tool_outputs_pruned` | Tool outputs are pruned during compaction (count) |
30-
| `environment_census` | Environment snapshot on project scan (warehouse types, dbt presence, feature flags, but no hostnames) |
30+
| `environment_census` | Environment snapshot on project scan (warehouse types, dbt presence, dbt materialization distribution, snapshot/seed counts, feature flags, but no hostnames or project names) |
3131
| `context_utilization` | Context window usage per generation (token counts, utilization percentage, cache hit ratio) |
3232
| `agent_outcome` | Agent session outcome (agent type, tool/generation counts, cost, outcome status) |
3333
| `error_recovered` | Successful recovery from a transient error (error type, strategy, attempt count) |
@@ -39,6 +39,12 @@ We collect the following categories of events:
3939
| `sql_execute_failure` | A SQL execution fails (warehouse type, query type, error message, PII-masked SQL — no raw values) |
4040
| `core_failure` | An internal tool error occurs (tool name, category, error class, truncated error message, PII-safe input signature, and optionally masked arguments — no raw values or credentials) |
4141
| `first_launch` | Fired once on first CLI run after installation. Contains version and is_upgrade flag. No PII. |
42+
| `task_outcome_signal` | Behavioral quality signal at session end — accepted, error, abandoned, or cancelled. Includes tool count, step count, duration, and last tool category. No user content. |
43+
| `task_classified` | Intent classification of the first user message using keyword matching — category (e.g. `debug_dbt`, `write_sql`, `optimize_query`), confidence score, and detected warehouse type. No user text is sent — only the classified category. |
44+
| `tool_chain_outcome` | Aggregated tool execution sequence at session end — ordered tool names (capped at 50), error count, recovery count, final outcome, duration, and cost. No tool arguments or outputs. |
45+
| `error_fingerprint` | Hashed error pattern for anonymous grouping — SHA-256 hash of masked error message, error class, tool name, and whether recovery succeeded. Raw error content is never sent. |
46+
| `sql_fingerprint` | SQL structural shape via AST parsing — statement types, table count, function count, subquery/aggregation/window function presence, and AST node count. No table names, column names, or SQL content. |
47+
| `schema_complexity` | Warehouse schema structural metrics from introspection — bucketed table, column, and schema counts plus average columns per table. No schema names or content. |
4248

4349
Each event includes a timestamp, anonymous session ID, CLI version, and an anonymous machine ID (a random UUID stored in `~/.altimate/machine-id`, generated once and never tied to any personal information).
4450

@@ -129,6 +135,11 @@ Event type names use **snake_case** with a `domain_action` pattern:
129135
- `context_utilization`, `context_overflow_recovered` for context management events
130136
- `agent_outcome` for agent session events
131137
- `error_recovered` for error recovery events
138+
- `task_outcome_signal`, `task_classified` for session quality signals
139+
- `tool_chain_outcome` for tool execution chain aggregation
140+
- `error_fingerprint` for anonymous error pattern grouping
141+
- `sql_fingerprint` for SQL structural analysis
142+
- `schema_complexity` for warehouse schema metrics
132143

133144
### Adding a New Event
134145

packages/opencode/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@
7878
"@ai-sdk/togetherai": "1.0.34",
7979
"@ai-sdk/vercel": "1.0.33",
8080
"@ai-sdk/xai": "2.0.51",
81-
"@altimateai/altimate-core": "0.2.5",
81+
"@altimateai/altimate-core": "0.2.6",
8282
"@altimateai/drivers": "workspace:*",
8383
"@aws-sdk/credential-providers": "3.993.0",
8484
"@clack/prompts": "1.0.0-alpha.1",

packages/opencode/src/altimate/native/schema/register.ts

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,20 @@ register("schema.index", async (params: SchemaIndexParams): Promise<SchemaIndexR
4646
duration_ms: Date.now() - startTime,
4747
result_count: result.tables_indexed,
4848
})
49+
// altimate_change start — schema complexity signal from introspection results
50+
Telemetry.track({
51+
type: "schema_complexity",
52+
timestamp: Date.now(),
53+
session_id: Telemetry.getContext().sessionId,
54+
warehouse_type: warehouseType,
55+
table_count_bucket: Telemetry.bucketCount(result.tables_indexed),
56+
column_count_bucket: Telemetry.bucketCount(result.columns_indexed),
57+
schema_count_bucket: Telemetry.bucketCount(result.schemas_indexed),
58+
avg_columns_per_table: result.tables_indexed > 0
59+
? Math.round(result.columns_indexed / result.tables_indexed)
60+
: 0,
61+
})
62+
// altimate_change end
4963
} catch {}
5064
return result
5165
} catch (e) {

packages/opencode/src/altimate/telemetry/index.ts

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,13 @@ export namespace Telemetry {
212212
dbt_model_count_bucket: string
213213
dbt_source_count_bucket: string
214214
dbt_test_count_bucket: string
215+
// altimate_change start — dbt project fingerprint expansion
216+
dbt_snapshot_count_bucket?: string
217+
dbt_seed_count_bucket?: string
218+
/** JSON-encoded Record<string, number> — count per materialization type */
219+
dbt_materialization_dist?: string
220+
dbt_macro_count_bucket?: string
221+
// altimate_change end
215222
connection_sources: string[]
216223
mcp_server_count: number
217224
skill_count: number
@@ -445,8 +452,209 @@ export namespace Telemetry {
445452
dialect?: string
446453
duration_ms: number
447454
}
455+
// implicit quality signal for task outcome intelligence
456+
| {
457+
type: "task_outcome_signal"
458+
timestamp: number
459+
session_id: string
460+
/** Behavioral signal derived from session outcome patterns */
461+
signal: "accepted" | "error" | "abandoned" | "cancelled"
462+
/** Total tool calls in this loop() invocation */
463+
tool_count: number
464+
/** Number of LLM generation steps in this loop() invocation */
465+
step_count: number
466+
/** Total session wall-clock duration in milliseconds */
467+
duration_ms: number
468+
/** Last tool category the agent used (or "none") */
469+
last_tool_category: string
470+
}
471+
// task intent classification for understanding DE problem distribution
472+
| {
473+
type: "task_classified"
474+
timestamp: number
475+
session_id: string
476+
/** Classified intent category */
477+
intent:
478+
| "debug_dbt"
479+
| "write_sql"
480+
| "optimize_query"
481+
| "build_model"
482+
| "analyze_lineage"
483+
| "explore_schema"
484+
| "migrate_sql"
485+
| "manage_warehouse"
486+
| "finops"
487+
| "general"
488+
/** Keyword match confidence: 1.0 for strong match, 0.5 for weak */
489+
confidence: number
490+
/** Detected warehouse type from fingerprint (or "unknown") */
491+
warehouse_type: string
492+
}
493+
// schema complexity signal — structural metrics from warehouse introspection
494+
| {
495+
type: "schema_complexity"
496+
timestamp: number
497+
session_id: string
498+
warehouse_type: string
499+
/** Bucketed table count */
500+
table_count_bucket: string
501+
/** Bucketed total column count across all tables */
502+
column_count_bucket: string
503+
/** Bucketed schema count */
504+
schema_count_bucket: string
505+
/** Average columns per table (rounded to integer) */
506+
avg_columns_per_table: number
507+
}
508+
// sql structure fingerprint — AST shape without content
509+
| {
510+
type: "sql_fingerprint"
511+
timestamp: number
512+
session_id: string
513+
/** JSON-encoded statement types, e.g. ["SELECT"] */
514+
statement_types: string
515+
/** Broad categories, e.g. ["query"] */
516+
categories: string
517+
/** Number of tables referenced */
518+
table_count: number
519+
/** Number of functions used */
520+
function_count: number
521+
/** Whether the query has subqueries */
522+
has_subqueries: boolean
523+
/** Whether the query uses aggregation */
524+
has_aggregation: boolean
525+
/** Whether the query uses window functions */
526+
has_window_functions: boolean
527+
/** AST node count — proxy for complexity */
528+
node_count: number
529+
}
530+
// error pattern fingerprint — hashed error grouping with recovery data
531+
| {
532+
type: "error_fingerprint"
533+
timestamp: number
534+
session_id: string
535+
/** SHA256 hash of normalized (masked) error message for grouping */
536+
error_hash: string
537+
/** Classification from classifyError() */
538+
error_class: string
539+
/** Tool that produced the error */
540+
tool_name: string
541+
/** Tool category */
542+
tool_category: string
543+
/** Whether a subsequent tool call succeeded (error was recovered) */
544+
recovery_successful: boolean
545+
/** Tool that succeeded after the error (if recovered) */
546+
recovery_tool: string
547+
}
548+
// tool chain effectiveness — aggregated tool sequence + outcome at session end
549+
| {
550+
type: "tool_chain_outcome"
551+
timestamp: number
552+
session_id: string
553+
/** JSON-encoded ordered tool names (capped at 50) */
554+
chain: string
555+
/** Number of tools in the chain */
556+
chain_length: number
557+
/** Whether any tool call errored */
558+
had_errors: boolean
559+
/** Number of errors followed by successful tool calls */
560+
error_recovery_count: number
561+
/** Final session outcome */
562+
final_outcome: string
563+
/** Total session duration in ms */
564+
total_duration_ms: number
565+
/** Total LLM cost */
566+
total_cost: number
567+
}
448568
// altimate_change end
449569

570+
/** SHA256 hash a masked error message for anonymous grouping. */
571+
export function hashError(maskedMessage: string): string {
572+
return createHash("sha256").update(maskedMessage).digest("hex").slice(0, 16)
573+
}
574+
575+
/** Classify user intent from the first message text.
576+
* Pure regex/keyword matcher — zero LLM cost, <1ms. */
577+
export function classifyTaskIntent(
578+
text: string,
579+
): { intent: string; confidence: number } {
580+
const lower = text.slice(0, 2000).toLowerCase()
581+
582+
// Order matters: more specific patterns first
583+
const patterns: Array<{ intent: string; strong: RegExp[]; weak: RegExp[] }> = [
584+
{
585+
intent: "debug_dbt",
586+
strong: [/dbt\s+.*?(error|fail|bug|issue|broken|fix|debug|not\s+work)/],
587+
weak: [/dbt\s+(run|build|test|compile|parse)/, /dbt_project/, /ref\s*\(/, /source\s*\(/],
588+
},
589+
{
590+
intent: "build_model",
591+
strong: [/(?:create|build|write|add|new)\s+.*?(?:dbt\s+)?model/, /(?:create|build)\s+.*?(?:staging|mart|dim|fact)/],
592+
weak: [/\bmodel\b/, /materialization/, /incremental/],
593+
},
594+
{
595+
intent: "optimize_query",
596+
strong: [/optimiz|performance|slow\s+query|speed\s+up|make.*faster|too\s+slow|query\s+cost/],
597+
weak: [/index|partition|cluster|explain\s+plan/],
598+
},
599+
{
600+
intent: "write_sql",
601+
strong: [/(?:write|create|build|generate)\s+(?:a\s+)?(?:sql|query)/, /(?:write|create)\s+(?:a\s+)?(?:select|insert|update|delete)/],
602+
weak: [/\bsql\b/, /\bquery\b/, /\bjoin\b/, /\bwhere\b/],
603+
},
604+
{
605+
intent: "analyze_lineage",
606+
strong: [/lineage|upstream|downstream|dependency|depends\s+on|impact\s+analysis/],
607+
weak: [/dag|graph|flow|trace/],
608+
},
609+
{
610+
intent: "explore_schema",
611+
strong: [/(?:show|list|describe|inspect|explore)\s+.*?(?:schema|tables?|columns?|database)/, /what\s+.*?(?:tables|columns|schemas)/],
612+
weak: [/\bschema\b/, /\btable\b/, /\bcolumn\b/, /introspect/],
613+
},
614+
{
615+
intent: "migrate_sql",
616+
strong: [/migrat|convert.*(?:to|from)\s+.*?(?:snowflake|bigquery|postgres|redshift|databricks)/, /translate.*(?:sql|dialect)/],
617+
weak: [/dialect|transpile|port\s+(?:to|from)/],
618+
},
619+
{
620+
intent: "manage_warehouse",
621+
strong: [/(?:connect|setup|configure|add|test)\s+.*?(?:warehouse|connection|database)/, /warehouse.*(?:config|setting)/],
622+
weak: [/\bwarehouse\b/, /connection\s+string/, /\bcredentials\b/],
623+
},
624+
{
625+
intent: "finops",
626+
strong: [/cost|spend|bill|credits|usage|expensive\s+quer|warehouse\s+size/],
627+
weak: [/resource|utilization|idle/],
628+
},
629+
]
630+
631+
for (const { intent, strong, weak } of patterns) {
632+
if (strong.some((r) => r.test(lower))) return { intent, confidence: 1.0 }
633+
}
634+
for (const { intent, weak } of patterns) {
635+
if (weak.some((r) => r.test(lower))) return { intent, confidence: 0.5 }
636+
}
637+
return { intent: "general", confidence: 1.0 }
638+
}
639+
640+
/** Derive a quality signal from the agent outcome.
641+
* Exported so tests can verify the derivation logic without
642+
* duplicating the implementation. */
643+
export function deriveQualitySignal(
644+
outcome: "completed" | "abandoned" | "aborted" | "error",
645+
): "accepted" | "error" | "abandoned" | "cancelled" {
646+
switch (outcome) {
647+
case "abandoned":
648+
return "abandoned"
649+
case "aborted":
650+
return "cancelled"
651+
case "error":
652+
return "error"
653+
case "completed":
654+
return "accepted"
655+
}
656+
}
657+
450658
// altimate_change start — expanded error classification patterns for better triage
451659
// Order matters: earlier patterns take priority. Use specific phrases, not
452660
// single words, to avoid false positives (e.g., "connection refused" not "connection").

0 commit comments

Comments
 (0)