feat(ai): improve integration with advanced telemetry (#272)

HugoRCD · web-flow · commit 2b5c8a44de2a · 2026-04-12T00:05:07.000+02:00
diff --git a/.changeset/ai-telemetry-integration.md b/.changeset/ai-telemetry-integration.md
@@ -0,0 +1,5 @@
+---
+'evlog': minor
+---
+
+Add AI SDK telemetry integration (`createEvlogIntegration`), cost estimation, and enriched embedding capture. `createEvlogIntegration()` implements the AI SDK's `TelemetryIntegration` interface to capture per-tool execution timing/success/errors and total generation wall time. Cost estimation computes `ai.estimatedCost` from a user-provided pricing map. `captureEmbed` now accepts model ID, dimensions, and batch count for richer embedding observability.
diff --git a/AGENTS.md b/AGENTS.md
@@ -139,11 +139,32 @@ export default defineEventHandler(async (event) => {
 })
 ```
 
+For deeper observability (tool execution timing, total generation wall time), add `createEvlogIntegration()`:
+
+```typescript
+import { createAILogger, createEvlogIntegration } from 'evlog/ai'
+
+const ai = createAILogger(log, {
+ cost: { 'claude-sonnet-4.6': { input: 3, output: 15 } },
+})
+
+const agent = new ToolLoopAgent({
+ model: ai.wrap('anthropic/claude-sonnet-4.6'),
+ tools: { searchWeb, queryDatabase },
+ experimental_telemetry: {
+ isEnabled: true,
+ integrations: [createEvlogIntegration(ai)],
+ },
+})
+```
+
+This adds `ai.tools` (per-tool `{ name, durationMs, success, error? }`), `ai.totalDurationMs`, and `ai.estimatedCost` to the wide event.
+
 For embedding calls, use `captureEmbed`:
 
 ```typescript
 const { embedding, usage } = await embed({ model: embeddingModel, value: query })
-ai.captureEmbed({ usage })
+ai.captureEmbed({ usage, model: 'text-embedding-3-small', dimensions: 1536 })
 ```
 
 ### Structured Errors
diff --git a/apps/docs/content/2.logging/5.ai-sdk.md b/apps/docs/content/2.logging/5.ai-sdk.md
@@ -16,19 +16,21 @@ links:
     variant: subtle
 ---
 
-`evlog/ai` gives you full AI observability by wrapping your model with middleware. Token usage, tool calls, streaming performance, cache hits, reasoning tokens, all captured into the wide event automatically.
+`evlog/ai` gives you full AI observability by wrapping your model with middleware and an optional telemetry integration. Token usage, tool calls, tool execution timing, streaming performance, cache hits, reasoning tokens, cost estimation — all captured into the wide event automatically.
 
 ::code-collapse
 
 ```txt [Prompt]
 Add AI observability to my app with evlog.
 
 - Install the AI SDK: pnpm add ai
-- Import createAILogger from 'evlog/ai'
+- Import createAILogger and createEvlogIntegration from 'evlog/ai'
 - Create an AI logger with createAILogger(log) where log is your request logger
 - Wrap your model with ai.wrap('anthropic/claude-sonnet-4.6') and pass it to generateText, streamText, etc.
 - Token usage, tool calls, streaming metrics, and errors are captured automatically into the wide event
-- For embedding calls, use ai.captureEmbed({ usage }) after embed() or embedMany()
+- For deeper observability (tool execution timing, total generation wall time), add createEvlogIntegration(ai) to experimental_telemetry.integrations
+- For embedding calls, use ai.captureEmbed({ usage, model, dimensions, count }) after embed() or embedMany()
+- For cost estimation, pass a cost map: createAILogger(log, { cost: { 'claude-sonnet-4.6': { input: 3, output: 15 } } })
 - Works with all frameworks: Nuxt, Express, Hono, Fastify, NestJS, Elysia, standalone
 
 Docs: https://www.evlog.dev/logging/ai-sdk
@@ -117,8 +119,8 @@ Your wide event now includes:
 
 | Method | Description |
 |--------|-------------|
-| `wrap(model)` | Wraps a language model with middleware. Accepts a model string (e.g. `'anthropic/claude-sonnet-4.6'`) or a `LanguageModelV3` object. Works with `generateText`, `streamText`, `generateObject`, `streamObject`, and `ToolLoopAgent`. Also works with pre-wrapped models (e.g. from supermemory). |
-| `captureEmbed(result)` | Manually captures token usage from `embed()` or `embedMany()` results (embedding models use a different type). |
+| `wrap(model)` | Wraps a language model with middleware. Accepts a model string (e.g. `'anthropic/claude-sonnet-4.6'`) or a `LanguageModelV3` object. Works with `generateText`, `streamText`, and `ToolLoopAgent`. Also works with pre-wrapped models (e.g. from supermemory). |
+| `captureEmbed(result)` | Manually captures token usage, model info, and dimensions from `embed()` or `embedMany()` results (embedding models use a different type). |
 
 The middleware intercepts calls at the provider level. It does not touch your callbacks, prompts, or responses. Captured data flows through the normal evlog pipeline (sampling, enrichers, drains) and ends up in Axiom, Better Stack, or wherever you drain to.
 
@@ -127,6 +129,7 @@ The middleware intercepts calls at the provider level. It does not touch your ca
 | Option | Type | Default | Description |
 |--------|------|---------|-------------|
 | `toolInputs` | `boolean \| ToolInputsOptions` | `false` | When enabled, `toolCalls` contains `{ name, input }` objects instead of plain strings. Opt-in because inputs can be large and may contain sensitive data. |
+| `cost` | `Record<string, ModelCost>` | `undefined` | Pricing map for cost estimation. Keys are model IDs, values are `{ input, output }` in dollars per 1M tokens. |
 
 Pass `true` to capture all inputs as-is, or an options object for fine-grained control:
 
@@ -152,6 +155,14 @@ const ai = createAILogger(log, {
     },
   },
 })
+
+// Cost estimation
+const ai = createAILogger(log, {
+  cost: {
+    'claude-sonnet-4.6': { input: 3, output: 15 },
+    'gpt-4o': { input: 2.5, output: 10 },
+  },
+})
 ```
 
 ## Usage Patterns
@@ -282,7 +293,11 @@ export default defineEventHandler(async (event) => {
     model: openai.embedding('text-embedding-3-small'),
     value: query,
   })
-  ai.captureEmbed({ usage })
+  ai.captureEmbed({
+    usage,
+    model: 'text-embedding-3-small',
+    dimensions: 1536,
+  })
 
   const docs = await findSimilar(embedding)
 
@@ -295,6 +310,16 @@ export default defineEventHandler(async (event) => {
 })
 ```
 
+For `embedMany`, pass the batch count:
+
+```typescript
+const { embeddings, usage } = await embedMany({
+  model: openai.embedding('text-embedding-3-small'),
+  values: documents,
+})
+ai.captureEmbed({ usage, model: 'text-embedding-3-small', count: documents.length })
+```
+
 ### Multiple models
 
 Wrap each model separately, they share the same accumulator. When multiple models are used, the wide event includes both `model` (last model) and `models` (all unique models):
@@ -335,6 +360,87 @@ import { anthropic } from '@ai-sdk/anthropic'
 const model = ai.wrap(anthropic('claude-sonnet-4.6'))
 ```
 
+## Telemetry Integration
+
+For deeper observability — tool execution timing, success/failure tracking, and total generation wall time — use `createEvlogIntegration()`. It implements the AI SDK's `TelemetryIntegration` interface and captures data that middleware alone cannot see.
+
+### Combined with middleware (recommended)
+
+When passed an `AILogger`, the integration shares its accumulator. Both paths write to the same `ai.*` field:
+
+```typescript [server/api/agent.post.ts]
+import { generateText } from 'ai'
+import { createAILogger, createEvlogIntegration } from 'evlog/ai'
+
+export default defineEventHandler(async (event) => {
+  const log = useLogger(event)
+  const ai = createAILogger(log)
+
+  const result = await generateText({
+    model: ai.wrap('anthropic/claude-sonnet-4.6'),
+    tools: { getWeather, searchDB },
+    experimental_telemetry: {
+      isEnabled: true,
+      integrations: [createEvlogIntegration(ai)],
+    },
+  })
+
+  return { text: result.text }
+})
+```
+
+Your wide event now includes tool execution details:
+
+```json [Wide Event]
+{
+  "ai": {
+    "calls": 2,
+    "steps": 2,
+    "model": "claude-sonnet-4.6",
+    "provider": "anthropic",
+    "inputTokens": 3500,
+    "outputTokens": 800,
+    "totalTokens": 4300,
+    "toolCalls": ["getWeather", "searchDB"],
+    "tools": [
+      { "name": "getWeather", "durationMs": 150, "success": true },
+      { "name": "searchDB", "durationMs": 45, "success": true }
+    ],
+    "totalDurationMs": 2340,
+    "msToFirstChunk": 180,
+    "msToFinish": 2100,
+    "tokensPerSecond": 380
+  }
+}
+```
+
+### Standalone (without middleware)
+
+If your model is already wrapped (e.g. by another middleware), pass the request logger directly:
+
+```typescript [server/api/chat.post.ts]
+import { createEvlogIntegration } from 'evlog/ai'
+
+const integration = createEvlogIntegration(log)
+
+const result = await generateText({
+  model: somePreWrappedModel,
+  experimental_telemetry: {
+    isEnabled: true,
+    integrations: [integration],
+  },
+})
+```
+
+### What the integration captures
+
+| Data | Source | Description |
+|------|--------|-------------|
+| `ai.tools[]` | `onToolCallFinish` | Per-tool `name`, `durationMs`, `success`, and `error` (if failed) |
+| `ai.totalDurationMs` | `onStart` → `onFinish` | Total wall time from generation start to completion |
+
+The middleware captures tokens, model info, and streaming metrics. The integration captures tool execution timing. Together, they give you complete AI observability.
+
 ## Captured Data
 
 | Wide event field | Source | Description |
@@ -358,6 +464,10 @@ const model = ai.wrap(anthropic('claude-sonnet-4.6'))
 | `ai.msToFinish` | Stream timing | Total stream duration (streaming only) |
 | `ai.tokensPerSecond` | Computed | Output tokens per second (streaming only) |
 | `ai.error` | Error capture | Error message if a model call fails |
+| `ai.tools` | `TelemetryIntegration` | Per-tool `{ name, durationMs, success, error? }` (requires `createEvlogIntegration`) |
+| `ai.totalDurationMs` | `TelemetryIntegration` | Total generation wall time (requires `createEvlogIntegration`) |
+| `ai.embedding` | `captureEmbed` | `{ model?, tokens, dimensions?, count? }` — embedding metadata |
+| `ai.estimatedCost` | Computed | Estimated cost in dollars (requires `cost` option) |
 
 ## Composability
 
diff --git a/apps/docs/skills/review-logging-patterns/SKILL.md b/apps/docs/skills/review-logging-patterns/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: review-logging-patterns
-description: Review code for logging patterns and suggest evlog adoption. Guides setup on Nuxt, Next.js, SvelteKit, Nitro, TanStack Start, React Router, NestJS, Express, Hono, Fastify, Elysia, Cloudflare Workers, and standalone TypeScript. Detects console.log spam, unstructured errors, and missing context. Covers wide events, structured errors, drain adapters (Axiom, OTLP, HyperDX, PostHog, Sentry, Better Stack, Datadog), sampling, enrichers, and AI SDK integration (token usage, tool calls, streaming metrics).
+description: Review code for logging patterns and suggest evlog adoption. Guides setup on Nuxt, Next.js, SvelteKit, Nitro, TanStack Start, React Router, NestJS, Express, Hono, Fastify, Elysia, Cloudflare Workers, and standalone TypeScript. Detects console.log spam, unstructured errors, and missing context. Covers wide events, structured errors, drain adapters (Axiom, OTLP, HyperDX, PostHog, Sentry, Better Stack, Datadog), sampling, enrichers, and AI SDK integration (token usage, tool calls, streaming metrics, telemetry integration, cost estimation, embedding metadata).
 license: MIT
 metadata:
   author: HugoRCD
@@ -866,7 +866,9 @@ Works in all frameworks: Nuxt (`evlog` config), Nitro (`evlog()` module options)
 
 ## AI SDK Integration
 
-Capture token usage, tool calls, model info, and streaming metrics from the Vercel AI SDK into wide events. Import from `evlog/ai`. Requires `ai >= 6.0.0` as a peer dependency.
+Capture token usage, tool calls, model info, streaming metrics, tool execution timing, cost estimation, and embedding metadata from the Vercel AI SDK into wide events. Import from `evlog/ai`. Requires `ai >= 6.0.0` as a peer dependency.
+
+### Basic setup (middleware)
 
 ```typescript
 import { createAILogger } from 'evlog/ai'
@@ -877,22 +879,62 @@ const ai = createAILogger(log)
 const result = streamText({
   model: ai.wrap('anthropic/claude-sonnet-4.6'),  // accepts string or model object
   messages,
-  onFinish: ({ text }) => {
-    // User callbacks remain free — no conflict
+})
+```
+
+`ai.wrap()` uses model middleware to transparently capture all LLM calls. Works with `generateText`, `streamText`, and `ToolLoopAgent`.
+
+### Telemetry integration (deeper observability)
+
+For tool execution timing, success/failure tracking, and total generation wall time, add `createEvlogIntegration()`:
+
+```typescript
+import { createAILogger, createEvlogIntegration } from 'evlog/ai'
+
+const ai = createAILogger(log)
+
+const agent = new ToolLoopAgent({
+  model: ai.wrap('anthropic/claude-sonnet-4.6'),
+  tools: { searchWeb, queryDatabase },
+  stopWhen: stepCountIs(5),
+  experimental_telemetry: {
+    isEnabled: true,
+    integrations: [createEvlogIntegration(ai)],
   },
 })
 ```
 
-`ai.wrap()` uses model middleware to transparently capture all LLM calls. Works with `generateText`, `streamText`, `generateObject`, `streamObject`, and `ToolLoopAgent`.
+This adds `ai.tools` (per-tool `{ name, durationMs, success, error? }`) and `ai.totalDurationMs` to the wide event.
 
-For embeddings (different model type):
+### Embeddings
 
 ```typescript
 const { embedding, usage } = await embed({ model: embeddingModel, value: query })
-ai.captureEmbed({ usage })
+ai.captureEmbed({ usage, model: 'text-embedding-3-small', dimensions: 1536 })
 ```
 
-Wide event `ai` field includes: `calls`, `model`, `provider`, `inputTokens`, `outputTokens`, `totalTokens`, `cacheReadTokens`, `reasoningTokens`, `finishReason`, `toolCalls`, `steps`, `msToFirstChunk`, `msToFinish`, `tokensPerSecond`, `error`.
+For `embedMany`, pass the batch count:
+
+```typescript
+ai.captureEmbed({ usage, model: 'text-embedding-3-small', count: documents.length })
+```
+
+### Cost estimation
+
+Pass a pricing map to get `ai.estimatedCost` in the wide event:
+
+```typescript
+const ai = createAILogger(log, {
+  cost: {
+    'claude-sonnet-4.6': { input: 3, output: 15 },
+    'gpt-4o': { input: 2.5, output: 10 },
+  },
+})
+```
+
+### Wide event `ai` field
+
+Includes: `calls`, `model`, `provider`, `inputTokens`, `outputTokens`, `totalTokens`, `cacheReadTokens`, `reasoningTokens`, `finishReason`, `toolCalls`, `steps`, `msToFirstChunk`, `msToFinish`, `tokensPerSecond`, `error`, `tools` (via telemetry integration), `totalDurationMs` (via telemetry integration), `embedding` (via `captureEmbed`), `estimatedCost` (via `cost` option).
 
 Anti-patterns to detect:
 
@@ -901,6 +943,8 @@ Anti-patterns to detect:
 | Manual token tracking in `onFinish` | `ai.wrap()` — middleware captures automatically |
 | `console.log('tokens:', result.usage)` | `ai.wrap()` — structured `ai.*` fields in wide event |
 | No AI observability | Add `createAILogger(log)` + `ai.wrap()` |
+| No tool execution timing | Add `createEvlogIntegration(ai)` to `experimental_telemetry.integrations` |
+| Manual cost calculation | Use `cost` option in `createAILogger()` |
 
 ---
 
diff --git a/apps/nuxthub-playground/server/api/chat.post.ts b/apps/nuxthub-playground/server/api/chat.post.ts
@@ -1,5 +1,5 @@
 import { ToolLoopAgent, createAgentUIStreamResponse, stepCountIs } from 'ai'
-import { createAILogger } from 'evlog/ai'
+import { createAILogger, createEvlogIntegration } from 'evlog/ai'
 import { queryEvents } from '../tools/query-events'
 
 const systemPrompt = `You are a helpful assistant that analyzes application logs stored in a SQLite database.
@@ -63,14 +63,23 @@ export default defineEventHandler(async (event) => {
 
   logger.set({ action: 'chat', messagesCount: messages.length })
 
-  const ai = createAILogger(logger, { toolInputs: true })
+  const ai = createAILogger(logger, {
+    toolInputs: true,
+    cost: {
+      'gemini-3-flash': { input: 0.1, output: 0.4 },
+    },
+  })
 
   try {
     const agent = new ToolLoopAgent({
       model: ai.wrap('google/gemini-3-flash'),
       instructions: systemPrompt,
       tools: { queryEvents },
       stopWhen: stepCountIs(5),
+      experimental_telemetry: {
+        isEnabled: true,
+        integrations: [createEvlogIntegration(ai)],
+      },
     })
     return createAgentUIStreamResponse({
       agent,
diff --git a/apps/nuxthub-playground/server/api/test/ai-wrap.get.ts b/apps/nuxthub-playground/server/api/test/ai-wrap.get.ts
@@ -1,6 +1,6 @@
 import { gateway, generateText, wrapLanguageModel } from 'ai'
 import type { LanguageModelV3Middleware } from '@ai-sdk/provider'
-import { createAILogger } from 'evlog/ai'
+import { createAILogger, createEvlogIntegration } from 'evlog/ai'
 
 /**
  * Simulates an external middleware (supermemory, guardrails, etc.)
@@ -23,7 +23,12 @@ export default defineEventHandler(async (event) => {
   const logger = useLogger(event)
   logger.set({ action: 'test-ai-wrap-composition' })
 
-  const ai = createAILogger(logger, { toolInputs: true })
+  const ai = createAILogger(logger, {
+    toolInputs: true,
+    cost: {
+      'gemini-3-flash': { input: 0.1, output: 0.4 },
+    },
+  })
 
   const base = gateway('google/gemini-3-flash')
   const preWrapped = wrapLanguageModel({ model: base, middleware: externalMiddleware })
@@ -33,6 +38,10 @@ export default defineEventHandler(async (event) => {
     model,
     prompt: 'Say hello.',
     maxOutputTokens: 200,
+    experimental_telemetry: {
+      isEnabled: true,
+      integrations: [createEvlogIntegration(ai)],
+    },
   })
 
   const middlewareRan = result.text.startsWith('MIDDLEWARE_OK:')
diff --git a/packages/evlog/src/ai/index.ts b/packages/evlog/src/ai/index.ts
diff --git a/packages/evlog/src/logger.ts b/packages/evlog/src/logger.ts
diff --git a/packages/evlog/test/ai/ai.test.ts b/packages/evlog/test/ai/ai.test.ts

-Original file line number
+Diff line change
@@ @@ -0,0 +1,5 @@ @@
 +---
 +'evlog': minor
 +---
++
 +Add AI SDK telemetry integration (`createEvlogIntegration`), cost estimation, and enriched embedding capture. `createEvlogIntegration()` implements the AI SDK's `TelemetryIntegration` interface to capture per-tool execution timing/success/errors and total generation wall time. Cost estimation computes `ai.estimatedCost` from a user-provided pricing map. `captureEmbed` now accepts model ID, dimensions, and batch count for richer embedding observability.