Skip to content

Commit e45312c

Browse files
authored
chore: Add cursor rules for AI integrations contributions (#19167)
This PR adds `.cursor/rules/adding-a-new-ai-integration.mdc`, a complete reference guide for implementing AI provider integrations (OpenAI, Anthropic, Vercel AI, LangChain, etc.) in the Sentry JavaScript SDK. Closes #19168 (added automatically)
1 parent b458a81 commit e45312c

2 files changed

Lines changed: 359 additions & 0 deletions

File tree

Lines changed: 351 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,351 @@
1+
---
2+
description: Guidelines for contributing a new Sentry JavaScript SDK AI integration.
3+
alwaysApply: true
4+
---
5+
6+
# Adding a New AI Integration
7+
8+
Use these guidelines when contributing a new Sentry JavaScript SDK AI integration.
9+
10+
## Quick Decision Tree
11+
12+
**CRITICAL**
13+
14+
```
15+
Does the AI SDK have native OpenTelemetry support?
16+
├─ YES → Does it emit OTel spans automatically?
17+
│ ├─ YES (like Vercel AI) → Pattern 1: OTEL Span Processors
18+
│ └─ NO → Pattern 2: OTEL Instrumentation (wrap client)
19+
└─ NO → Does the SDK provide hooks/callbacks?
20+
├─ YES (like LangChain) → Pattern 3: Callback/Hook Based
21+
└─ NO → Pattern 4: Client Wrapping
22+
23+
Multi-runtime considerations:
24+
- Node.js: Use OpenTelemetry instrumentation
25+
- Edge (Cloudflare/Vercel): No OTel, processors only or manual wrapping
26+
```
27+
28+
---
29+
30+
## Span Hierarchy
31+
32+
**Two span types:**
33+
34+
- `gen_ai.invoke_agent` - Parent/pipeline spans (chains, agents, orchestration)
35+
- `gen_ai.chat`, `gen_ai.generate_text`, etc. - Child spans (actual LLM calls)
36+
37+
**Hierarchy example:**
38+
39+
```
40+
gen_ai.invoke_agent (ai.generateText)
41+
└── gen_ai.generate_text (ai.generateText.doGenerate)
42+
```
43+
44+
**References:**
45+
46+
- Vercel AI: `packages/core/src/tracing/vercel-ai/constants.ts`
47+
- LangChain: `onChainStart` callback in `packages/core/src/tracing/langchain/index.ts`
48+
49+
---
50+
51+
## Streaming vs Non-Streaming
52+
53+
**Non-streaming:** Use `startSpan()`, set attributes immediately from response
54+
55+
**Streaming:** Use `startSpanManual()` and prefer event listeners/hooks when available (like Anthropic's `stream.on()`). If not available, use async generator pattern:
56+
57+
```typescript
58+
interface StreamingState {
59+
responseTexts: string[]; // Accumulate fragments
60+
promptTokens: number | undefined;
61+
completionTokens: number | undefined;
62+
// ...
63+
}
64+
65+
async function* instrumentStream(stream, span, recordOutputs) {
66+
const state: StreamingState = { responseTexts: [], ... };
67+
try {
68+
for await (const event of stream) {
69+
processEvent(event, state, recordOutputs); // Accumulate data
70+
yield event; // Pass through
71+
}
72+
} finally {
73+
setTokenUsageAttributes(span, state.promptTokens, state.completionTokens);
74+
span.setAttributes({ [GEN_AI_RESPONSE_STREAMING_ATTRIBUTE]: true });
75+
span.end(); // MUST call manually
76+
}
77+
}
78+
```
79+
80+
**Key rules:**
81+
82+
- Accumulate with arrays/strings, don't overwrite
83+
- Set `GEN_AI_RESPONSE_STREAMING_ATTRIBUTE: true`
84+
- Call `span.end()` in finally block
85+
86+
**Detection:** Check request parameters for `stream: true` to determine if response will be streamed.
87+
88+
**References:**
89+
90+
- OpenAI async generator: `instrumentStream` in `packages/core/src/tracing/openai/streaming.ts`
91+
- Anthropic event listeners: `instrumentMessageStream` in `packages/core/src/tracing/anthropic-ai/streaming.ts`
92+
- Detection logic: Check `params.stream === true` in `packages/core/src/tracing/openai/index.ts`
93+
94+
---
95+
96+
## Token Accumulation
97+
98+
**Child spans (LLM calls):** Set tokens directly from API response
99+
100+
```typescript
101+
setTokenUsageAttributes(span, inputTokens, outputTokens, totalTokens);
102+
```
103+
104+
**Parent spans (invoke_agent):** Accumulate from children using event processor
105+
106+
```typescript
107+
// First pass: accumulate from children
108+
for (const span of event.spans) {
109+
if (span.parent_span_id && isGenAiOperationSpan(span)) {
110+
accumulateTokensForParent(span, tokenAccumulator);
111+
}
112+
}
113+
114+
// Second pass: apply to invoke_agent parents
115+
for (const span of event.spans) {
116+
if (span.op === 'gen_ai.invoke_agent') {
117+
applyAccumulatedTokens(span, tokenAccumulator);
118+
}
119+
}
120+
```
121+
122+
**Reference:** `vercelAiEventProcessor` and `accumulateTokensForParent` in `packages/core/src/tracing/vercel-ai/`
123+
124+
---
125+
126+
## Shared Utilities
127+
128+
Location: `packages/core/src/tracing/ai/`
129+
130+
### `gen-ai-attributes.ts`
131+
132+
OpenTelemetry Semantic Convention attribute names. **Always use these constants!**
133+
134+
- `GEN_AI_SYSTEM_ATTRIBUTE` - 'openai', 'anthropic', etc.
135+
- `GEN_AI_REQUEST_MODEL_ATTRIBUTE` - Model from request
136+
- `GEN_AI_RESPONSE_MODEL_ATTRIBUTE` - Model from response
137+
- `GEN_AI_INPUT_MESSAGES_ATTRIBUTE` - Input (requires recordInputs)
138+
- `GEN_AI_RESPONSE_TEXT_ATTRIBUTE` - Output (requires recordOutputs)
139+
- `GEN_AI_USAGE_INPUT_TOKENS_ATTRIBUTE` - Token counts
140+
- `GEN_AI_OPERATION_NAME_ATTRIBUTE` - 'chat', 'embeddings', etc.
141+
142+
### `utils.ts`
143+
144+
- `setTokenUsageAttributes()` - Set token usage on span
145+
- `getTruncatedJsonString()` - Truncate for attributes
146+
- `truncateGenAiMessages()` - Truncate message arrays
147+
- `buildMethodPath()` - Build method path from traversal
148+
149+
---
150+
151+
## Pattern 1: OTEL Span Processors
152+
153+
**Use when:** SDK emits OTel spans automatically (Vercel AI)
154+
155+
### Key Steps
156+
157+
1. **Core:** Create `add{Provider}Processors()` in `packages/core/src/tracing/{provider}/index.ts`
158+
- Registers `spanStart` listener + event processor
159+
- Post-processes spans to match semantic conventions
160+
161+
2. **Node.js:** Add performance optimization in `packages/node/src/integrations/tracing/{provider}/index.ts`
162+
- Use `callWhenPatched()` to defer processor registration
163+
- Only register when package is actually imported (see `vercelAIIntegration` function)
164+
165+
3. **Edge:** Direct registration in `packages/cloudflare/src/integrations/tracing/{provider}.ts`
166+
- No OTel patching available
167+
- Just call `add{Provider}Processors()` immediately
168+
169+
**Reference:** `packages/node/src/integrations/tracing/vercelai/`
170+
171+
---
172+
173+
## Pattern 2: OTEL Instrumentation (Client Wrapping)
174+
175+
**Use when:** SDK has NO native OTel support (OpenAI, Anthropic, Google GenAI)
176+
177+
### Key Steps
178+
179+
1. **Core:** Create `instrument{Provider}Client()` in `packages/core/src/tracing/{provider}/index.ts`
180+
- Use Proxy to wrap client methods recursively
181+
- Create spans manually with `startSpan()` or `startSpanManual()`
182+
183+
2. **Node.js Instrumentation:** Patch module exports in `instrumentation.ts`
184+
- Wrap client constructor
185+
- Check `_INTERNAL_shouldSkipAiProviderWrapping()` (for LangChain)
186+
- See `instrumentOpenAi` in `packages/node/src/integrations/tracing/openai/instrumentation.ts`
187+
188+
3. **Node.js Integration:** Export instrumentation function
189+
- Use `generateInstrumentOnce()` helper
190+
- See `openAIIntegration` in `packages/node/src/integrations/tracing/openai/index.ts`
191+
192+
**Reference:** `packages/node/src/integrations/tracing/openai/`
193+
194+
---
195+
196+
## Pattern 3: Callback/Hook Based
197+
198+
**Use when:** SDK provides lifecycle hooks (LangChain, LangGraph)
199+
200+
### Key Steps
201+
202+
1. **Core:** Create `create{Provider}CallbackHandler()` in `packages/core/src/tracing/{provider}/index.ts`
203+
- Implement SDK's callback interface
204+
- Create spans in callback methods
205+
206+
2. **Node.js Instrumentation:** Auto-inject callbacks
207+
- Patch runnable methods to add handler automatically
208+
- **Important:** Disable underlying AI provider wrapping (see `instrumentLangchain` in `packages/node/src/integrations/tracing/langchain/instrumentation.ts`)
209+
210+
**Reference:** `packages/node/src/integrations/tracing/langchain/`
211+
212+
---
213+
214+
## Auto-Instrumentation (Out-of-the-Box Support)
215+
216+
**RULE:** AI SDKs should be auto-enabled in Node.js runtime if possible.
217+
218+
✅ **Auto-enable if:**
219+
220+
- SDK works in Node.js runtime
221+
- OTel only patches when package imported (zero cost if unused)
222+
223+
❌ **Don't auto-enable if:**
224+
225+
- SDK is niche/experimental
226+
- Integration has significant limitations
227+
228+
### Steps to Auto-Enable
229+
230+
**1. Add to auto performance integrations**
231+
232+
Location: `packages/node/src/integrations/tracing/index.ts`
233+
234+
```typescript
235+
export function getAutoPerformanceIntegrations(): Integration[] {
236+
return [
237+
// AI providers - IMPORTANT: LangChain MUST come first!
238+
langChainIntegration(), // Disables underlying providers
239+
langGraphIntegration(),
240+
vercelAIIntegration(),
241+
openAIIntegration(),
242+
anthropicAIIntegration(),
243+
googleGenAIIntegration(),
244+
{provider}Integration(), // <-- Add here
245+
];
246+
}
247+
```
248+
249+
**2. Add to preload instrumentation**
250+
251+
```typescript
252+
export function getOpenTelemetryInstrumentationToPreload() {
253+
return [
254+
instrumentOpenAi,
255+
instrumentAnthropicAi,
256+
instrument{Provider}, // <-- Add here
257+
];
258+
}
259+
```
260+
261+
**3. Export from package index**
262+
263+
```typescript
264+
// packages/node/src/index.ts
265+
export { {provider}Integration } from './integrations/tracing/{provider}';
266+
export type { {Provider}Options } from './integrations/tracing/{provider}';
267+
268+
// If browser-compatible: packages/browser/src/index.ts
269+
export { {provider}Integration } from './integrations/tracing/{provider}';
270+
```
271+
272+
**4. Add E2E test** in `packages/node-integration-tests/suites/{provider}/`
273+
274+
- Verify spans created automatically (no manual setup)
275+
- Test `recordInputs` and `recordOutputs` options
276+
- Test integration can be disabled
277+
278+
---
279+
280+
## Directory Structure
281+
282+
```
283+
packages/
284+
├── core/src/tracing/
285+
│ ├── ai/ # Shared utilities
286+
│ │ ├── gen-ai-attributes.ts
287+
│ │ ├── utils.ts
288+
│ │ └── messageTruncation.ts
289+
│ └── {provider}/ # Provider-specific
290+
│ ├── index.ts # Main logic
291+
│ ├── types.ts
292+
│ ├── constants.ts
293+
│ └── streaming.ts
294+
295+
├── node/src/integrations/tracing/{provider}/
296+
│ ├── index.ts # Integration definition
297+
│ └── instrumentation.ts # OTel instrumentation
298+
299+
├── cloudflare/src/integrations/tracing/
300+
│ └── {provider}.ts # Single file
301+
302+
└── vercel-edge/src/integrations/tracing/
303+
└── {provider}.ts # Single file
304+
```
305+
306+
---
307+
308+
## Key Best Practices
309+
310+
1. **Respect `sendDefaultPii`** for recordInputs/recordOutputs
311+
2. **Use semantic attributes** from `gen-ai-attributes.ts` (never hardcode)
312+
3. **Set Sentry origin**: `SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN = 'auto.ai.openai'` (use provider name: `openai`, `anthropic`, `vercelai`, etc. - only alphanumerics, `_`, and `.` allowed)
313+
4. **Truncate large data**: Use helper functions from `utils.ts`
314+
5. **Correct span operations**: `gen_ai.invoke_agent` for parent, `gen_ai.chat` for children
315+
6. **Streaming**: Use `startSpanManual()`, accumulate state, call `span.end()`
316+
7. **Token accumulation**: Direct on child spans, accumulate on parent from children
317+
8. **Performance**: Use `callWhenPatched()` for Pattern 1
318+
9. **LangChain**: Check `_INTERNAL_shouldSkipAiProviderWrapping()` in Pattern 2
319+
320+
---
321+
322+
## Reference Implementations
323+
324+
- **Pattern 1 (Span Processors):** `packages/node/src/integrations/tracing/vercelai/`
325+
- **Pattern 2 (Client Wrapping):** `packages/node/src/integrations/tracing/openai/`
326+
- **Pattern 3 (Callback/Hooks):** `packages/node/src/integrations/tracing/langchain/`
327+
328+
---
329+
330+
## Auto-Instrumentation Checklist
331+
332+
- [ ] Added to `getAutoPerformanceIntegrations()` in correct order
333+
- [ ] Added to `getOpenTelemetryInstrumentationToPreload()`
334+
- [ ] Exported from `packages/node/src/index.ts`
335+
- [ ] **If browser-compatible:** Exported from `packages/browser/src/index.ts`
336+
- [ ] Added E2E test in `packages/node-integration-tests/suites/{provider}/`
337+
- [ ] E2E test verifies auto-instrumentation
338+
- [ ] JSDoc says "enabled by default" or "not enabled by default"
339+
- [ ] Documented how to disable (if auto-enabled)
340+
- [ ] Documented limitations clearly
341+
- [ ] Verified OTel only patches when package imported
342+
343+
---
344+
345+
## Questions?
346+
347+
1. Look at reference implementations above
348+
2. Check shared utilities in `packages/core/src/tracing/ai/`
349+
3. Review OpenTelemetry Semantic Conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/
350+
351+
**When in doubt, follow the pattern of the most similar existing integration!**

.cursor/rules/sdk_development.mdc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,14 @@ This is a Lerna monorepo with 40+ packages in the `@sentry/*` namespace.
7878
- Client/server entry points where applicable (nextjs, nuxt, sveltekit)
7979
- Integration tests use Playwright (Remix, browser-integration-tests)
8080

81+
### AI Integrations
82+
83+
- `packages/core/src/tracing/{provider}/` - Core instrumentation logic (OpenAI, Anthropic, Vercel AI, LangChain, etc.)
84+
- `packages/node/src/integrations/tracing/{provider}/` - Node.js-specific integration + OTel instrumentation
85+
- `packages/cloudflare/src/integrations/tracing/{provider}.ts` - Edge runtime support
86+
- Patterns: OTEL Span Processors, Client Wrapping, Callback/Hook Based
87+
- See `.cursor/rules/adding-a-new-ai-integration.mdc` for implementation guide
88+
8189
### User Experience Packages
8290

8391
- `packages/replay-internal/` - Session replay functionality

0 commit comments

Comments
 (0)