feat(instrumentation-llamaindex): migrate to OTel 1.40 GenAI semantic conventions by lenatraceloop · Pull Request #925 · traceloop/openllmetry-js

lenatraceloop · 2026-04-09T13:08:31Z

Fixed TLP-2036

Summary

fix(instrumentation-llamaindex): patch OpenAI LLM from @llamaindex/openai — customLLMInstrumentation promoted to class field, patchOpenAI() now correctly wraps the OpenAI LLM class from the separate @llamaindex/openai
package
Replace deprecated gen_ai.system, gen_ai.prompt, gen_ai.completion attributes with OTel 1.40 standard: gen_ai.provider.name, gen_ai.input.messages, gen_ai.output.messages
Add gen_ai.response.finish_reasons and token usage (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) — always emitted as metadata regardless of traceContent setting
Fix streaming: track last chunk's raw field to extract token usage and finish reason when stream_options: { include_usage: true } is set on the LLM
Span name changed from llamaindex.open_ai.chat to chat {model} (OTel 1.40 convention)

New tests

test/semconv.test.ts — pure unit tests for OTel constants, mapOpenAIContentBlock, formatInputMessages, formatOutputMessage; covers traceContent: true/false behavior
test/finish_reasons.test.ts — unit tests for openAIFinishReasonMap; verifies every OpenAI raw finish reason maps to a valid OTel value
test/instrumentation.test.ts — new mock LLM tests covering span attributes for non-streaming, streaming, and traceContent: false scenarios;

Test plan

pnpm nx run @traceloop/instrumentation-llamaindex:test — 40 tests pass

Link for data

https://app.traceloop.dev/projects/lena-bedrock-anthropic-js-tests/trace?projectSlug=lena-bedrock-anthropic-js-tests

Summary by CodeRabbit

New Features
- Improved LlamaIndex instrumentation: expanded GenAI semantic attributes, better attribution of input/output messages, and enhanced finish-reason and token-usage reporting for streaming and non‑streaming flows.
- Optional OpenAI integration support for LlamaIndex manual instrumentation.
Tests
- Added comprehensive tests for semantic-convention mapping, finish-reason translation, streaming behavior, and token/ message handling.
Chores
- Updated package dependencies and sample app initialization to enable the new instrumentation.

…enai

coderabbitai · 2026-04-09T13:08:48Z

📝 Walkthrough

Walkthrough

Adds OpenAI GenAI semantic-convention support to LlamaIndex instrumentation, instruments OpenAI chat directly, replaces legacy attributes with GenAI attributes (provider, operation, input/output messages, finish reasons, usage), adjusts streaming handling, introduces a shared CustomLLMInstrumentation instance, and expands tests and SDK initialization options.

Changes

Cohort / File(s)	Summary
Dependency `packages/instrumentation-llamaindex/package.json`	Added `@traceloop/instrumentation-utils` as a workspace dependency.
Custom LLM Instrumentation `packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts`	Reworked span naming to `chat {model}`; replaced legacy attributes with GenAI semconv attributes (`provider_name`, `operation_name`, `input/output messages`, `response.finish_reasons`, token usage); conditional emission of input/output messages via `formatInputMessages`/`formatOutputMessage`; exported `openAIFinishReasonMap` and internal provider mapping; `config` changed to be a function.
Instrumentation Core `packages/instrumentation-llamaindex/src/instrumentation.ts`	Persistent `customLLMInstrumentation` created in constructor; `manuallyInstrument()` signature extended to accept optional OpenAI module; uses shared instrumentation for LLM chat wrapping; adds patch/unpatch for `OpenAI.prototype.chat`.
Streaming Utils `packages/instrumentation-llamaindex/src/utils.ts`	`llmGeneratorWrapper` callback signature updated to `(message, lastChunk?)`, capturing final streamed chunk for finish-reason and usage extraction.
Tests — SemConv & Finish Reasons `packages/instrumentation-llamaindex/test/semconv.test.ts`, `packages/instrumentation-llamaindex/test/finish_reasons.test.ts`	Added unit tests validating GenAI semantic-convention mappings, content-block mapping, input/output message formatting, and `openAIFinishReasonMap` correctness.
Tests — Instrumentation `packages/instrumentation-llamaindex/test/instrumentation.test.ts`	Expanded integration-style tests for CustomLLMInstrumentation covering span naming, provider/operation/model attrs, conditional traceContent behavior, finish-reason extraction for streaming/non-streaming, and token usage attributes.
Sample Apps `packages/sample-app/src/sample_llama_index_openai_agent.ts`, `packages/sample-app/src/sample_llamaindex.ts`	Registered `@llamaindex/openai` in sample app instrumentation and configured `OpenAI` to include `stream_options: { include_usage: true }`.
SDK Integration `packages/traceloop-sdk/src/lib/interfaces/initialize-options.interface.ts`, `packages/traceloop-sdk/src/lib/tracing/index.ts`	Added optional `llamaIndexOpenAI?: any` to `InitializeOptions.instrumentModules` and passed it into `manuallyInstrument()` during manual initialization.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant LI as LlamaIndex Chat
    participant CLLI as CustomLLMInstrumentation
    participant OAI as OpenAI Client
    participant Span as Span

    App->>LI: chat(messages)
    activate LI
    LI->>CLLI: chatWrapper invoked
    activate CLLI
    CLLI->>Span: set provider_name, operation_name, model
    alt traceContent enabled
        CLLI->>Span: formatInputMessages -> gen_ai.input.messages
    end
    CLLI->>OAI: forward request (streaming or non-streaming)
    activate OAI
    OAI-->>CLLI: response / stream chunks
    deactivate OAI
    alt Streaming
        CLLI->>CLLI: collect chunks, record lastChunk
        CLLI->>Span: on completion set finish_reasons and usage from lastChunk.raw
    else Non-streaming
        CLLI->>Span: set finish_reasons and usage from result.raw
    end
    alt traceContent enabled
        CLLI->>Span: formatOutputMessage -> gen_ai.output.messages
    end
    CLLI-->>LI: return result
    deactivate CLLI
    LI-->>App: response
    deactivate LI

sequenceDiagram
    participant Init as Initialize
    participant SDK as Traceloop SDK
    participant LIINST as LlamaIndex Instrumentation
    participant OAI as OpenAI Module
    participant CLLI as CustomLLMInstrumentation

    Init->>SDK: initialize({ instrumentModules })
    activate SDK
    SDK->>LIINST: manuallyInstrument(llamaIndex, llamaIndexOpenAI)
    activate LIINST
    LIINST->>CLLI: create persistent CustomLLMInstrumentation
    activate CLLI
    CLLI-->>LIINST: ready
    deactivate CLLI
    LIINST->>OAI: patch OpenAI.prototype.chat (if provided)
    LIINST-->>SDK: instrumentation wired
    deactivate LIINST
    SDK-->>Init: done
    deactivate SDK

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

feat(instrumentation-openai): Migrate OpenAI instrumentation to OTel 1.40 semantic conventions #909: Shares migration to OTel GenAI semconv, content mapping, and finish-reason handling used here.
feat(ai-semantic-conventions, instrumentation-anthropic): align GenAI semconv with OTel 1.40 and migrate Anthropic instrumentation #896: Adds/updates instrumentation-utils (formatters/mappers) consumed by this change.

Suggested reviewers

galzilber
doronkopit5
OzBenSimhonTraceloop

Poem

🐇 I stitched the spans with carrot thread,
Mapped finish reasons, hopped ahead,
Provider named and messages spun,
Streams keep usage till they're done,
A little trace, a hop — instrumentation fun!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective of the pull request: migrating LlamaIndex instrumentation from deprecated GenAI attributes to OTel 1.40 semantic conventions, which is the primary focus across all modified files.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch lena/llama-index-otel-1.40

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/traceloop-sdk/src/lib/tracing/index.ts (1)

216-224: ⚠️ Potential issue | 🟡 Minor

llamaIndexOpenAI is silently ignored unless llamaIndex is also set.

InitializeOptions.instrumentModules now exposes llamaIndexOpenAI as a sibling option, but this branch only runs when llamaIndex exists. Passing llamaIndexOpenAI alone becomes a silent no-op in the manual-init path, so please either validate that combination and warn/throw, or encode the dependency in the API shape.

Possible guard

   if (instrumentModules?.llamaIndex) {
     llamaIndexInstrumentation = new LlamaIndexInstrumentation({
       exceptionLogger,
     });
     instrumentations.push(llamaIndexInstrumentation);
     llamaIndexInstrumentation.manuallyInstrument(
       instrumentModules.llamaIndex,
       instrumentModules.llamaIndexOpenAI,
     );
+  } else if (instrumentModules?.llamaIndexOpenAI) {
+    diag.warn(
+      "[Traceloop] instrumentModules.llamaIndexOpenAI requires instrumentModules.llamaIndex",
+    );
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/src/lib/tracing/index.ts` around lines 216 - 224, The
code currently only initializes llamaIndexInstrumentation when
instrumentModules.llamaIndex is present, so instrumentModules.llamaIndexOpenAI
passed alone is ignored; update the initialization logic in the block that
references instrumentModules to detect when llamaIndexOpenAI is provided without
llamaIndex and either (a) emit a clear warning/error (using exceptionLogger or
process logger) or (b) enforce the dependency by throwing an error;
specifically, check instrumentModules.llamaIndexOpenAI &&
!instrumentModules.llamaIndex and then call exceptionLogger.error (or throw)
with a message explaining that llamaIndexOpenAI requires llamaIndex, or
alternatively adjust the creation of LlamaIndexInstrumentation / the call to
llamaIndexInstrumentation.manuallyInstrument to accept and handle
llamaIndexOpenAI-only usage so manuallyInstrument receives
instrumentModules.llamaIndexOpenAI when llamaIndex is present or otherwise fail
fast.

🧹 Nitpick comments (1)

packages/instrumentation-llamaindex/test/instrumentation.test.ts (1)

234-240: Consider adding test coverage for streaming with usage data.

The streaming mock only yields { delta } without a raw field. This tests the case where stream_options: { include_usage: true } is not enabled on the LLM.

Per the PR description, the instrumentation now extracts token usage and finish reason from the last chunk's raw field when available. Consider adding a test that verifies this behavior works correctly when the final streaming chunk includes raw.choices[0].finish_reason and raw.usage.

💡 Example streaming mock with usage

function makeMockChatWithStreamUsage(options: {
  responseContent?: string;
  finishReason?: string;
  promptTokens?: number;
  completionTokens?: number;
}) {
  const responseContent = options.responseContent ?? "Hello!";
  const finishReason = options.finishReason ?? "stop";
  const promptTokens = options.promptTokens ?? 10;
  const completionTokens = options.completionTokens ?? 5;

  return async function chat({ stream }: any) {
    if (stream) {
      async function* generate() {
        yield { delta: responseContent.slice(0, 3) };
        // Final chunk with raw usage (like OpenAI with include_usage: true)
        yield {
          delta: responseContent.slice(3),
          raw: {
            choices: [{ finish_reason: finishReason }],
            usage: {
              prompt_tokens: promptTokens,
              completion_tokens: completionTokens,
              total_tokens: promptTokens + completionTokens,
            },
          },
        };
      }
      return generate();
    }
    // ... non-streaming path
  };
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts` around
lines 234 - 240, Add a new test that uses a streaming mock which yields multiple
chunks and includes a final chunk with a raw field containing
choices[0].finish_reason and usage (prompt_tokens, completion_tokens,
total_tokens); update or add a helper like makeMockChatWithStreamUsage (or
modify the existing chat/generate generator in the test) to yield partial
delta(s) and then a final yield with raw. In the test, invoke the instrumented
LLM with stream: true and assert that the instrumentation recorded the finish
reason and the token usage values (prompt/completion/total) from the final
chunk's raw field and that non-streaming behavior is unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/instrumentation-llamaindex/src/instrumentation.ts`:
- Around line 39-47: CustomLLMInstrumentation is being constructed with a stale
snapshot of this._config; change the constructor call in
LlamaIndexInstrumentation to pass a getter callback (e.g., () => this._config)
instead of this._config so the wrapper reads the live config after setConfig()
updates it, and then update CustomLLMInstrumentation's constructor signature and
any internal references in custom-llm-instrumentation.ts to call that getter
(e.g., configGetter()) rather than treating config as a static object; apply the
same change for the other instantiations mentioned (lines 50-52) so all uses
read the live config.

---

Outside diff comments:
In `@packages/traceloop-sdk/src/lib/tracing/index.ts`:
- Around line 216-224: The code currently only initializes
llamaIndexInstrumentation when instrumentModules.llamaIndex is present, so
instrumentModules.llamaIndexOpenAI passed alone is ignored; update the
initialization logic in the block that references instrumentModules to detect
when llamaIndexOpenAI is provided without llamaIndex and either (a) emit a clear
warning/error (using exceptionLogger or process logger) or (b) enforce the
dependency by throwing an error; specifically, check
instrumentModules.llamaIndexOpenAI && !instrumentModules.llamaIndex and then
call exceptionLogger.error (or throw) with a message explaining that
llamaIndexOpenAI requires llamaIndex, or alternatively adjust the creation of
LlamaIndexInstrumentation / the call to
llamaIndexInstrumentation.manuallyInstrument to accept and handle
llamaIndexOpenAI-only usage so manuallyInstrument receives
instrumentModules.llamaIndexOpenAI when llamaIndex is present or otherwise fail
fast.

---

Nitpick comments:
In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts`:
- Around line 234-240: Add a new test that uses a streaming mock which yields
multiple chunks and includes a final chunk with a raw field containing
choices[0].finish_reason and usage (prompt_tokens, completion_tokens,
total_tokens); update or add a helper like makeMockChatWithStreamUsage (or
modify the existing chat/generate generator in the test) to yield partial
delta(s) and then a final yield with raw. In the test, invoke the instrumented
LLM with stream: true and assert that the instrumentation recorded the finish
reason and the token usage values (prompt/completion/total) from the final
chunk's raw field and that non-streaming behavior is unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 55b87ea2-35bf-4754-95ad-f41fd1b11b02

📥 Commits

Reviewing files that changed from the base of the PR and between a869545 and 6713c74.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (11)

packages/instrumentation-llamaindex/package.json
packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts
packages/instrumentation-llamaindex/src/instrumentation.ts
packages/instrumentation-llamaindex/src/utils.ts
packages/instrumentation-llamaindex/test/finish_reasons.test.ts
packages/instrumentation-llamaindex/test/instrumentation.test.ts
packages/instrumentation-llamaindex/test/semconv.test.ts
packages/sample-app/src/sample_llama_index_openai_agent.ts
packages/sample-app/src/sample_llamaindex.ts
packages/traceloop-sdk/src/lib/interfaces/initialize-options.interface.ts
packages/traceloop-sdk/src/lib/tracing/index.ts

packages/instrumentation-llamaindex/src/instrumentation.ts

coderabbitai

🧹 Nitpick comments (3)

packages/instrumentation-llamaindex/test/instrumentation.test.ts (3)

215-215: Misleading variable name: noopDiag is assigned the real diagnostics logger.

The variable is named noopDiag but diag is the actual OpenTelemetry diagnostics logger, not a no-op. This could cause confusion for maintainers and may produce unexpected log output during tests.

🔧 Suggested fix: either rename or use a true no-op

Option 1: Rename to reflect actual behavior:

-const noopDiag: DiagLogger = diag;
+const testDiag: DiagLogger = diag;

Option 2: Use an actual no-op logger if silence is intended:

-const noopDiag: DiagLogger = diag;
+const noopDiag: DiagLogger = {
+  error: () => {},
+  warn: () => {},
+  info: () => {},
+  debug: () => {},
+  verbose: () => {},
+};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts` at line
215, The variable named noopDiag is misleading because it references the real
OpenTelemetry diagnostics logger (diag); either rename noopDiag to a truthful
identifier like diagLogger or realDiag (update all usages in
instrumentation.test.ts accordingly), or replace the assignment with an actual
no-op DiagLogger implementation (an object implementing the DiagLogger methods
such as debug/info/warn/error as no-ops) so tests are truly silent; update the
variable type (DiagLogger) and any assertions to match the chosen approach.

228-234: Missing test coverage for streaming with usage data.

The streaming mock only yields { delta: responseContent } without the raw field. This correctly tests the scenario where OpenAI streaming doesn't include usage. However, the PR description mentions fixing streaming handling "when stream_options: { include_usage: true } is set on the LLM" — but there's no test covering that scenario where streaming DOES provide finish_reason and usage in the last chunk.

🧪 Suggested: Add test for streaming with usage

Add a variant mock that includes raw in the final chunk:

function makeMockChatWithStreamUsage(options: {
  responseContent?: string;
  finishReason?: string;
  promptTokens?: number;
  completionTokens?: number;
}) {
  const responseContent = options.responseContent ?? "Hello!";
  const finishReason = options.finishReason ?? "stop";
  const promptTokens = options.promptTokens ?? 10;
  const completionTokens = options.completionTokens ?? 5;

  return async function chat({ stream }: any) {
    if (stream) {
      async function* generate() {
        yield { delta: "partial" };
        // Final chunk includes raw with usage (OpenAI behavior with stream_options)
        yield {
          delta: responseContent,
          raw: {
            choices: [{ finish_reason: finishReason }],
            usage: {
              prompt_tokens: promptTokens,
              completion_tokens: completionTokens,
              total_tokens: promptTokens + completionTokens,
            },
          },
        };
      }
      return generate();
    }
    // ... non-streaming case
  };
}

Then add a test:

it("sets finish_reasons and usage when streaming with stream_options", async () => {
  // test that attributes ARE set when raw data is available
});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts` around
lines 228 - 234, The streaming mock in the test only yields { delta:
responseContent } and doesn't cover the case where the final streaming chunk
includes raw usage/finish_reason; add a new mock (e.g.,
makeMockChatWithStreamUsage) that returns an async generator from chat({ stream
}) which yields a partial delta and then a final chunk containing delta plus
raw: { choices: [{ finish_reason }], usage: { prompt_tokens, completion_tokens,
total_tokens } }, then add a test (e.g., it("sets finish_reasons and usage when
streaming with stream_options")) that uses this mock to assert the code under
test records finish_reasons and usage fields when the streaming raw data is
present; update any existing test harness to swap in the new chat mock for this
scenario.

502-520: Consider updating the test description for accuracy.

The test name says "not available" but finish_reasons can be available in streaming when stream_options: { include_usage: true } is set. The current mock simply doesn't provide this data. Consider renaming for clarity:

-    it("does NOT set finish_reasons in streaming (not available)", async () => {
+    it("does NOT set finish_reasons in streaming when raw data is absent", async () => {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts` around
lines 502 - 520, The test description is misleading — it claims finish_reasons
are "not available" in streaming, but they can be present when stream_options
include usage; update the test title string in the it(...) block to accurately
reflect that the mock simply doesn't include finish_reasons (e.g., "does NOT set
finish_reasons in streaming when mock omits them"), keeping the test body and
assertions (references: makeInstrumentation, chatWrapper, wrapped.call,
ATTR_GEN_AI_RESPONSE_FINISH_REASONS) unchanged so the behavior is still
validated against the mocked stream.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts`:
- Line 215: The variable named noopDiag is misleading because it references the
real OpenTelemetry diagnostics logger (diag); either rename noopDiag to a
truthful identifier like diagLogger or realDiag (update all usages in
instrumentation.test.ts accordingly), or replace the assignment with an actual
no-op DiagLogger implementation (an object implementing the DiagLogger methods
such as debug/info/warn/error as no-ops) so tests are truly silent; update the
variable type (DiagLogger) and any assertions to match the chosen approach.
- Around line 228-234: The streaming mock in the test only yields { delta:
responseContent } and doesn't cover the case where the final streaming chunk
includes raw usage/finish_reason; add a new mock (e.g.,
makeMockChatWithStreamUsage) that returns an async generator from chat({ stream
}) which yields a partial delta and then a final chunk containing delta plus
raw: { choices: [{ finish_reason }], usage: { prompt_tokens, completion_tokens,
total_tokens } }, then add a test (e.g., it("sets finish_reasons and usage when
streaming with stream_options")) that uses this mock to assert the code under
test records finish_reasons and usage fields when the streaming raw data is
present; update any existing test harness to swap in the new chat mock for this
scenario.
- Around line 502-520: The test description is misleading — it claims
finish_reasons are "not available" in streaming, but they can be present when
stream_options include usage; update the test title string in the it(...) block
to accurately reflect that the mock simply doesn't include finish_reasons (e.g.,
"does NOT set finish_reasons in streaming when mock omits them"), keeping the
test body and assertions (references: makeInstrumentation, chatWrapper,
wrapped.call, ATTR_GEN_AI_RESPONSE_FINISH_REASONS) unchanged so the behavior is
still validated against the mocked stream.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ac1a5c7-5bc8-4001-9d7f-29c36c8034b8

📥 Commits

Reviewing files that changed from the base of the PR and between 6713c74 and 42af26a.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (1)

packages/instrumentation-llamaindex/test/instrumentation.test.ts

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/instrumentation-llamaindex/src/utils.ts (1)

69-83: ⚠️ Potential issue | 🟠 Major

Ensure callback always runs on stream close/error, not only normal completion.

On Line 82, fn(message, lastChunk) runs only after full iteration. If the consumer stops early or iteration throws, callback execution is skipped, which can miss usage/finish_reason emission and leave streaming spans unfinalized.

Proposed fix

 export async function* llmGeneratorWrapper(
   streamingResult:
     | AsyncIterable<llamaindex.ChatResponseChunk>
     | AsyncIterable<llamaindex.CompletionResponse>,
   ctx: Context,
   fn: (message: string, lastChunk?: any) => void,
 ) {
   let message = "";
   // Track the last chunk so the callback can extract usage/finish_reason from
   // chunk.raw — OpenAI sends these in the final streaming chunk when
   // stream_options: { include_usage: true } is set on the LLM.
   let lastChunk: any;
 
-  for await (const messageChunk of bindAsyncGenerator(
-    ctx,
-    streamingResult as AsyncGenerator,
-  )) {
-    if ((messageChunk as llamaindex.ChatResponseChunk).delta) {
-      message += (messageChunk as llamaindex.ChatResponseChunk).delta;
-    }
-    if ((messageChunk as llamaindex.CompletionResponse).text) {
-      message += (messageChunk as llamaindex.CompletionResponse).text;
-    }
-    lastChunk = messageChunk;
-    yield messageChunk;
-  }
-  fn(message, lastChunk);
+  try {
+    for await (const messageChunk of bindAsyncGenerator(
+      ctx,
+      streamingResult as AsyncGenerator,
+    )) {
+      if ((messageChunk as llamaindex.ChatResponseChunk).delta) {
+        message += (messageChunk as llamaindex.ChatResponseChunk).delta;
+      }
+      if ((messageChunk as llamaindex.CompletionResponse).text) {
+        message += (messageChunk as llamaindex.CompletionResponse).text;
+      }
+      lastChunk = messageChunk;
+      yield messageChunk;
+    }
+  } finally {
+    fn(message, lastChunk);
+  }
 }

As per coding guidelines: "Instrumentations must extract request/response data and token usage from wrapped calls" and "Instrumentations must capture and record errors appropriately."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/src/utils.ts` around lines 69 - 83, The
callback fn(message, lastChunk) currently only runs after successful full
iteration; wrap the for-await loop in try/catch/finally so fn is always invoked
on normal completion or on error/early return: use try { for await (const
messageChunk of bindAsyncGenerator(ctx, streamingResult as AsyncGenerator)) {
... } } catch (err) { fn(message, lastChunk); throw err; } finally { if not
already called, fn(message, lastChunk); } — ensure this change is applied around
the bindAsyncGenerator loop and preserves rethrowing errors from streamingResult
so streaming spans are finalized and errors still propagate.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/instrumentation-llamaindex/src/utils.ts`:
- Around line 69-83: The callback fn(message, lastChunk) currently only runs
after successful full iteration; wrap the for-await loop in try/catch/finally so
fn is always invoked on normal completion or on error/early return: use try {
for await (const messageChunk of bindAsyncGenerator(ctx, streamingResult as
AsyncGenerator)) { ... } } catch (err) { fn(message, lastChunk); throw err; }
finally { if not already called, fn(message, lastChunk); } — ensure this change
is applied around the bindAsyncGenerator loop and preserves rethrowing errors
from streamingResult so streaming spans are finalized and errors still
propagate.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e0e7c660-d053-495e-8d8d-58c1f39c7067

📥 Commits

Reviewing files that changed from the base of the PR and between 42af26a and 6b1ba71.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (8)

packages/instrumentation-llamaindex/package.json
packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts
packages/instrumentation-llamaindex/src/instrumentation.ts
packages/instrumentation-llamaindex/src/utils.ts
packages/instrumentation-llamaindex/test/finish_reasons.test.ts
packages/instrumentation-llamaindex/test/instrumentation.test.ts
packages/instrumentation-llamaindex/test/semconv.test.ts
packages/sample-app/src/sample_llamaindex.ts

✅ Files skipped from review due to trivial changes (2)

packages/instrumentation-llamaindex/package.json
packages/sample-app/src/sample_llamaindex.ts

🚧 Files skipped from review as they are similar to previous changes (5)

packages/instrumentation-llamaindex/test/finish_reasons.test.ts
packages/instrumentation-llamaindex/test/semconv.test.ts
packages/instrumentation-llamaindex/test/instrumentation.test.ts
packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts
packages/instrumentation-llamaindex/src/instrumentation.ts

lenatraceloop · 2026-04-09T14:13:49Z

Thanks for the thorough reviews! Here's a rundown of everything we addressed:

utils.ts — callback not called on stream error/early exit (Major)

Fixed in a previous commit — wrapped the for await loop in try/catch/finally with an fnCalled guard so the callback is guaranteed to run regardless of whether the stream completes normally, the consumer breaks early, or the stream throws. Spans are always finalized and usage/finish_reason data is never lost.

tracing/index.ts — llamaIndexOpenAI silently ignored without llamaIndex (Minor)

Added a JSDoc comment on llamaIndexOpenAI in InitializeOptions.instrumentModules making the dependency on llamaIndex explicit.

Nitpick: noopDiag is misleadingly named

Fixed — renamed to testDiag throughout instrumentation.test.ts.

Nitpick: Missing test for streaming with usage data

Fixed — added makeMockChatWithStreamUsage mock and a new test "sets finish_reasons and usage when streaming with raw data in last chunk" that asserts finish_reason and token usage are correctly recorded when the final streaming chunk includes raw data.

Nitpick: Test description "not available in streaming" is misleading

Fixed — updated to "does NOT set finish_reasons in streaming when mock omits them".

… conventions.

nina-kollman · 2026-04-09T14:54:57Z

packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts

+  ATTR_GEN_AI_USAGE_OUTPUT_TOKENS,
+  GEN_AI_OPERATION_NAME_VALUE_CHAT,
+  GEN_AI_PROVIDER_NAME_VALUE_OPENAI,
 } from "@opentelemetry/semantic-conventions/incubating";


The import from the incubating was when the att were not official yet.
See if some of the att can be imported from the official import now:
} from "@opentelemetry/semantic-conventions";

If not leave it of course

Checked it — in v1.40.0, the main @opentelemetry/semantic-conventions entry point only exports stable attributes (HTTP, network, URL, etc.).
All GEN_AI attributes are still in Development status and only available via /incubating.
Confirmed in the https://opentelemetry.io/docs/specs/semconv/gen-ai/ and the
https://github.com/open-telemetry/semantic-conventions/releases/tag/v1.40.0.
So keeping the import from /incubating for now.

nina-kollman · 2026-04-09T15:01:04Z

packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts

-          .startSpan(`llamaindex.${lodash.snakeCase(className)}.chat`, {
-            kind: SpanKind.CLIENT,
-          });
+        const span = plugin.tracer().startSpan(`chat ${this.metadata.model}`, {


Just verifying this is the new converntion ?

Yes. This is the new OTel 1.40 GenAI convention — span name format is {operation} {model} (e.g. chat gpt-4o), as specified in the https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/.
The old llamaindex.open_ai.chat format was replaced as part of the migration.

fix(instrumentation-llamaindex): patch OpenAI LLM from @llamaindex/op…

d255ea4

…enai

lenatraceloop requested a review from nina-kollman April 9, 2026 13:08

lenatraceloop requested review from OzBenSimhonTraceloop, doronkopit5, dvirski and galzilber April 9, 2026 13:08

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

packages/instrumentation-llamaindex/src/instrumentation.ts Show resolved Hide resolved

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

lenatraceloop force-pushed the lena/llama-index-otel-1.40 branch from 42af26a to 51cd26b Compare April 9, 2026 13:32

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

lenatraceloop force-pushed the lena/llama-index-otel-1.40 branch 2 times, most recently from be5d69f to 97367b4 Compare April 9, 2026 14:03

feat(instrumentation-llamaindex): migrate to OTel 1.40 GenAI semantic…

ed7af2a

… conventions.

lenatraceloop force-pushed the lena/llama-index-otel-1.40 branch from 123df85 to ed7af2a Compare April 9, 2026 14:33

nina-kollman reviewed Apr 9, 2026

View reviewed changes

Conversation

lenatraceloop commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New tests

Test plan

Link for data

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

lenatraceloop commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nina-kollman Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

lenatraceloop Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

nina-kollman Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

lenatraceloop Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lenatraceloop commented Apr 9, 2026 •

edited

Loading

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

lenatraceloop commented Apr 9, 2026 •

edited

Loading

lenatraceloop Apr 9, 2026 •

edited

Loading