Skip to content

feat(instrumentation-llamaindex): migrate to OTel 1.40 GenAI semantic conventions#925

Open
lenatraceloop wants to merge 2 commits intomainfrom
lena/llama-index-otel-1.40
Open

feat(instrumentation-llamaindex): migrate to OTel 1.40 GenAI semantic conventions#925
lenatraceloop wants to merge 2 commits intomainfrom
lena/llama-index-otel-1.40

Conversation

@lenatraceloop
Copy link
Copy Markdown
Member

@lenatraceloop lenatraceloop commented Apr 9, 2026

Fixed TLP-2036

Summary

  • fix(instrumentation-llamaindex): patch OpenAI LLM from @llamaindex/openaicustomLLMInstrumentation promoted to class field, patchOpenAI() now correctly wraps the OpenAI LLM class from the separate @llamaindex/openai
    package
  • Replace deprecated gen_ai.system, gen_ai.prompt, gen_ai.completion attributes with OTel 1.40 standard: gen_ai.provider.name, gen_ai.input.messages, gen_ai.output.messages
  • Add gen_ai.response.finish_reasons and token usage (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) — always emitted as metadata regardless of traceContent setting
  • Fix streaming: track last chunk's raw field to extract token usage and finish reason when stream_options: { include_usage: true } is set on the LLM
  • Span name changed from llamaindex.open_ai.chat to chat {model} (OTel 1.40 convention)

New tests

  • test/semconv.test.ts — pure unit tests for OTel constants, mapOpenAIContentBlock, formatInputMessages, formatOutputMessage; covers traceContent: true/false behavior
  • test/finish_reasons.test.ts — unit tests for openAIFinishReasonMap; verifies every OpenAI raw finish reason maps to a valid OTel value
  • test/instrumentation.test.ts — new mock LLM tests covering span attributes for non-streaming, streaming, and traceContent: false scenarios;

Test plan

  • pnpm nx run @traceloop/instrumentation-llamaindex:test — 40 tests pass

Link for data

https://app.traceloop.dev/projects/lena-bedrock-anthropic-js-tests/trace?projectSlug=lena-bedrock-anthropic-js-tests

Summary by CodeRabbit

  • New Features

    • Improved LlamaIndex instrumentation: expanded GenAI semantic attributes, better attribution of input/output messages, and enhanced finish-reason and token-usage reporting for streaming and non‑streaming flows.
    • Optional OpenAI integration support for LlamaIndex manual instrumentation.
  • Tests

    • Added comprehensive tests for semantic-convention mapping, finish-reason translation, streaming behavior, and token/ message handling.
  • Chores

    • Updated package dependencies and sample app initialization to enable the new instrumentation.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

Adds OpenAI GenAI semantic-convention support to LlamaIndex instrumentation, instruments OpenAI chat directly, replaces legacy attributes with GenAI attributes (provider, operation, input/output messages, finish reasons, usage), adjusts streaming handling, introduces a shared CustomLLMInstrumentation instance, and expands tests and SDK initialization options.

Changes

Cohort / File(s) Summary
Dependency
packages/instrumentation-llamaindex/package.json
Added @traceloop/instrumentation-utils as a workspace dependency.
Custom LLM Instrumentation
packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts
Reworked span naming to chat {model}; replaced legacy attributes with GenAI semconv attributes (provider_name, operation_name, input/output messages, response.finish_reasons, token usage); conditional emission of input/output messages via formatInputMessages/formatOutputMessage; exported openAIFinishReasonMap and internal provider mapping; config changed to be a function.
Instrumentation Core
packages/instrumentation-llamaindex/src/instrumentation.ts
Persistent customLLMInstrumentation created in constructor; manuallyInstrument() signature extended to accept optional OpenAI module; uses shared instrumentation for LLM chat wrapping; adds patch/unpatch for OpenAI.prototype.chat.
Streaming Utils
packages/instrumentation-llamaindex/src/utils.ts
llmGeneratorWrapper callback signature updated to (message, lastChunk?), capturing final streamed chunk for finish-reason and usage extraction.
Tests — SemConv & Finish Reasons
packages/instrumentation-llamaindex/test/semconv.test.ts, packages/instrumentation-llamaindex/test/finish_reasons.test.ts
Added unit tests validating GenAI semantic-convention mappings, content-block mapping, input/output message formatting, and openAIFinishReasonMap correctness.
Tests — Instrumentation
packages/instrumentation-llamaindex/test/instrumentation.test.ts
Expanded integration-style tests for CustomLLMInstrumentation covering span naming, provider/operation/model attrs, conditional traceContent behavior, finish-reason extraction for streaming/non-streaming, and token usage attributes.
Sample Apps
packages/sample-app/src/sample_llama_index_openai_agent.ts, packages/sample-app/src/sample_llamaindex.ts
Registered @llamaindex/openai in sample app instrumentation and configured OpenAI to include stream_options: { include_usage: true }.
SDK Integration
packages/traceloop-sdk/src/lib/interfaces/initialize-options.interface.ts, packages/traceloop-sdk/src/lib/tracing/index.ts
Added optional llamaIndexOpenAI?: any to InitializeOptions.instrumentModules and passed it into manuallyInstrument() during manual initialization.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant LI as LlamaIndex Chat
    participant CLLI as CustomLLMInstrumentation
    participant OAI as OpenAI Client
    participant Span as Span

    App->>LI: chat(messages)
    activate LI
    LI->>CLLI: chatWrapper invoked
    activate CLLI
    CLLI->>Span: set provider_name, operation_name, model
    alt traceContent enabled
        CLLI->>Span: formatInputMessages -> gen_ai.input.messages
    end
    CLLI->>OAI: forward request (streaming or non-streaming)
    activate OAI
    OAI-->>CLLI: response / stream chunks
    deactivate OAI
    alt Streaming
        CLLI->>CLLI: collect chunks, record lastChunk
        CLLI->>Span: on completion set finish_reasons and usage from lastChunk.raw
    else Non-streaming
        CLLI->>Span: set finish_reasons and usage from result.raw
    end
    alt traceContent enabled
        CLLI->>Span: formatOutputMessage -> gen_ai.output.messages
    end
    CLLI-->>LI: return result
    deactivate CLLI
    LI-->>App: response
    deactivate LI
Loading
sequenceDiagram
    participant Init as Initialize
    participant SDK as Traceloop SDK
    participant LIINST as LlamaIndex Instrumentation
    participant OAI as OpenAI Module
    participant CLLI as CustomLLMInstrumentation

    Init->>SDK: initialize({ instrumentModules })
    activate SDK
    SDK->>LIINST: manuallyInstrument(llamaIndex, llamaIndexOpenAI)
    activate LIINST
    LIINST->>CLLI: create persistent CustomLLMInstrumentation
    activate CLLI
    CLLI-->>LIINST: ready
    deactivate CLLI
    LIINST->>OAI: patch OpenAI.prototype.chat (if provided)
    LIINST-->>SDK: instrumentation wired
    deactivate LIINST
    SDK-->>Init: done
    deactivate SDK
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • galzilber
  • doronkopit5
  • OzBenSimhonTraceloop

Poem

🐇 I stitched the spans with carrot thread,
Mapped finish reasons, hopped ahead,
Provider named and messages spun,
Streams keep usage till they're done,
A little trace, a hop — instrumentation fun!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the pull request: migrating LlamaIndex instrumentation from deprecated GenAI attributes to OTel 1.40 semantic conventions, which is the primary focus across all modified files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lena/llama-index-otel-1.40

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/traceloop-sdk/src/lib/tracing/index.ts (1)

216-224: ⚠️ Potential issue | 🟡 Minor

llamaIndexOpenAI is silently ignored unless llamaIndex is also set.

InitializeOptions.instrumentModules now exposes llamaIndexOpenAI as a sibling option, but this branch only runs when llamaIndex exists. Passing llamaIndexOpenAI alone becomes a silent no-op in the manual-init path, so please either validate that combination and warn/throw, or encode the dependency in the API shape.

Possible guard
   if (instrumentModules?.llamaIndex) {
     llamaIndexInstrumentation = new LlamaIndexInstrumentation({
       exceptionLogger,
     });
     instrumentations.push(llamaIndexInstrumentation);
     llamaIndexInstrumentation.manuallyInstrument(
       instrumentModules.llamaIndex,
       instrumentModules.llamaIndexOpenAI,
     );
+  } else if (instrumentModules?.llamaIndexOpenAI) {
+    diag.warn(
+      "[Traceloop] instrumentModules.llamaIndexOpenAI requires instrumentModules.llamaIndex",
+    );
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/src/lib/tracing/index.ts` around lines 216 - 224, The
code currently only initializes llamaIndexInstrumentation when
instrumentModules.llamaIndex is present, so instrumentModules.llamaIndexOpenAI
passed alone is ignored; update the initialization logic in the block that
references instrumentModules to detect when llamaIndexOpenAI is provided without
llamaIndex and either (a) emit a clear warning/error (using exceptionLogger or
process logger) or (b) enforce the dependency by throwing an error;
specifically, check instrumentModules.llamaIndexOpenAI &&
!instrumentModules.llamaIndex and then call exceptionLogger.error (or throw)
with a message explaining that llamaIndexOpenAI requires llamaIndex, or
alternatively adjust the creation of LlamaIndexInstrumentation / the call to
llamaIndexInstrumentation.manuallyInstrument to accept and handle
llamaIndexOpenAI-only usage so manuallyInstrument receives
instrumentModules.llamaIndexOpenAI when llamaIndex is present or otherwise fail
fast.
🧹 Nitpick comments (1)
packages/instrumentation-llamaindex/test/instrumentation.test.ts (1)

234-240: Consider adding test coverage for streaming with usage data.

The streaming mock only yields { delta } without a raw field. This tests the case where stream_options: { include_usage: true } is not enabled on the LLM.

Per the PR description, the instrumentation now extracts token usage and finish reason from the last chunk's raw field when available. Consider adding a test that verifies this behavior works correctly when the final streaming chunk includes raw.choices[0].finish_reason and raw.usage.

💡 Example streaming mock with usage
function makeMockChatWithStreamUsage(options: {
  responseContent?: string;
  finishReason?: string;
  promptTokens?: number;
  completionTokens?: number;
}) {
  const responseContent = options.responseContent ?? "Hello!";
  const finishReason = options.finishReason ?? "stop";
  const promptTokens = options.promptTokens ?? 10;
  const completionTokens = options.completionTokens ?? 5;

  return async function chat({ stream }: any) {
    if (stream) {
      async function* generate() {
        yield { delta: responseContent.slice(0, 3) };
        // Final chunk with raw usage (like OpenAI with include_usage: true)
        yield {
          delta: responseContent.slice(3),
          raw: {
            choices: [{ finish_reason: finishReason }],
            usage: {
              prompt_tokens: promptTokens,
              completion_tokens: completionTokens,
              total_tokens: promptTokens + completionTokens,
            },
          },
        };
      }
      return generate();
    }
    // ... non-streaming path
  };
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts` around
lines 234 - 240, Add a new test that uses a streaming mock which yields multiple
chunks and includes a final chunk with a raw field containing
choices[0].finish_reason and usage (prompt_tokens, completion_tokens,
total_tokens); update or add a helper like makeMockChatWithStreamUsage (or
modify the existing chat/generate generator in the test) to yield partial
delta(s) and then a final yield with raw. In the test, invoke the instrumented
LLM with stream: true and assert that the instrumentation recorded the finish
reason and the token usage values (prompt/completion/total) from the final
chunk's raw field and that non-streaming behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/instrumentation-llamaindex/src/instrumentation.ts`:
- Around line 39-47: CustomLLMInstrumentation is being constructed with a stale
snapshot of this._config; change the constructor call in
LlamaIndexInstrumentation to pass a getter callback (e.g., () => this._config)
instead of this._config so the wrapper reads the live config after setConfig()
updates it, and then update CustomLLMInstrumentation's constructor signature and
any internal references in custom-llm-instrumentation.ts to call that getter
(e.g., configGetter()) rather than treating config as a static object; apply the
same change for the other instantiations mentioned (lines 50-52) so all uses
read the live config.

---

Outside diff comments:
In `@packages/traceloop-sdk/src/lib/tracing/index.ts`:
- Around line 216-224: The code currently only initializes
llamaIndexInstrumentation when instrumentModules.llamaIndex is present, so
instrumentModules.llamaIndexOpenAI passed alone is ignored; update the
initialization logic in the block that references instrumentModules to detect
when llamaIndexOpenAI is provided without llamaIndex and either (a) emit a clear
warning/error (using exceptionLogger or process logger) or (b) enforce the
dependency by throwing an error; specifically, check
instrumentModules.llamaIndexOpenAI && !instrumentModules.llamaIndex and then
call exceptionLogger.error (or throw) with a message explaining that
llamaIndexOpenAI requires llamaIndex, or alternatively adjust the creation of
LlamaIndexInstrumentation / the call to
llamaIndexInstrumentation.manuallyInstrument to accept and handle
llamaIndexOpenAI-only usage so manuallyInstrument receives
instrumentModules.llamaIndexOpenAI when llamaIndex is present or otherwise fail
fast.

---

Nitpick comments:
In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts`:
- Around line 234-240: Add a new test that uses a streaming mock which yields
multiple chunks and includes a final chunk with a raw field containing
choices[0].finish_reason and usage (prompt_tokens, completion_tokens,
total_tokens); update or add a helper like makeMockChatWithStreamUsage (or
modify the existing chat/generate generator in the test) to yield partial
delta(s) and then a final yield with raw. In the test, invoke the instrumented
LLM with stream: true and assert that the instrumentation recorded the finish
reason and the token usage values (prompt/completion/total) from the final
chunk's raw field and that non-streaming behavior is unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 55b87ea2-35bf-4754-95ad-f41fd1b11b02

📥 Commits

Reviewing files that changed from the base of the PR and between a869545 and 6713c74.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (11)
  • packages/instrumentation-llamaindex/package.json
  • packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts
  • packages/instrumentation-llamaindex/src/instrumentation.ts
  • packages/instrumentation-llamaindex/src/utils.ts
  • packages/instrumentation-llamaindex/test/finish_reasons.test.ts
  • packages/instrumentation-llamaindex/test/instrumentation.test.ts
  • packages/instrumentation-llamaindex/test/semconv.test.ts
  • packages/sample-app/src/sample_llama_index_openai_agent.ts
  • packages/sample-app/src/sample_llamaindex.ts
  • packages/traceloop-sdk/src/lib/interfaces/initialize-options.interface.ts
  • packages/traceloop-sdk/src/lib/tracing/index.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
packages/instrumentation-llamaindex/test/instrumentation.test.ts (3)

215-215: Misleading variable name: noopDiag is assigned the real diagnostics logger.

The variable is named noopDiag but diag is the actual OpenTelemetry diagnostics logger, not a no-op. This could cause confusion for maintainers and may produce unexpected log output during tests.

🔧 Suggested fix: either rename or use a true no-op

Option 1: Rename to reflect actual behavior:

-const noopDiag: DiagLogger = diag;
+const testDiag: DiagLogger = diag;

Option 2: Use an actual no-op logger if silence is intended:

-const noopDiag: DiagLogger = diag;
+const noopDiag: DiagLogger = {
+  error: () => {},
+  warn: () => {},
+  info: () => {},
+  debug: () => {},
+  verbose: () => {},
+};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts` at line
215, The variable named noopDiag is misleading because it references the real
OpenTelemetry diagnostics logger (diag); either rename noopDiag to a truthful
identifier like diagLogger or realDiag (update all usages in
instrumentation.test.ts accordingly), or replace the assignment with an actual
no-op DiagLogger implementation (an object implementing the DiagLogger methods
such as debug/info/warn/error as no-ops) so tests are truly silent; update the
variable type (DiagLogger) and any assertions to match the chosen approach.

228-234: Missing test coverage for streaming with usage data.

The streaming mock only yields { delta: responseContent } without the raw field. This correctly tests the scenario where OpenAI streaming doesn't include usage. However, the PR description mentions fixing streaming handling "when stream_options: { include_usage: true } is set on the LLM" — but there's no test covering that scenario where streaming DOES provide finish_reason and usage in the last chunk.

🧪 Suggested: Add test for streaming with usage

Add a variant mock that includes raw in the final chunk:

function makeMockChatWithStreamUsage(options: {
  responseContent?: string;
  finishReason?: string;
  promptTokens?: number;
  completionTokens?: number;
}) {
  const responseContent = options.responseContent ?? "Hello!";
  const finishReason = options.finishReason ?? "stop";
  const promptTokens = options.promptTokens ?? 10;
  const completionTokens = options.completionTokens ?? 5;

  return async function chat({ stream }: any) {
    if (stream) {
      async function* generate() {
        yield { delta: "partial" };
        // Final chunk includes raw with usage (OpenAI behavior with stream_options)
        yield {
          delta: responseContent,
          raw: {
            choices: [{ finish_reason: finishReason }],
            usage: {
              prompt_tokens: promptTokens,
              completion_tokens: completionTokens,
              total_tokens: promptTokens + completionTokens,
            },
          },
        };
      }
      return generate();
    }
    // ... non-streaming case
  };
}

Then add a test:

it("sets finish_reasons and usage when streaming with stream_options", async () => {
  // test that attributes ARE set when raw data is available
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts` around
lines 228 - 234, The streaming mock in the test only yields { delta:
responseContent } and doesn't cover the case where the final streaming chunk
includes raw usage/finish_reason; add a new mock (e.g.,
makeMockChatWithStreamUsage) that returns an async generator from chat({ stream
}) which yields a partial delta and then a final chunk containing delta plus
raw: { choices: [{ finish_reason }], usage: { prompt_tokens, completion_tokens,
total_tokens } }, then add a test (e.g., it("sets finish_reasons and usage when
streaming with stream_options")) that uses this mock to assert the code under
test records finish_reasons and usage fields when the streaming raw data is
present; update any existing test harness to swap in the new chat mock for this
scenario.

502-520: Consider updating the test description for accuracy.

The test name says "not available" but finish_reasons can be available in streaming when stream_options: { include_usage: true } is set. The current mock simply doesn't provide this data. Consider renaming for clarity:

-    it("does NOT set finish_reasons in streaming (not available)", async () => {
+    it("does NOT set finish_reasons in streaming when raw data is absent", async () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts` around
lines 502 - 520, The test description is misleading — it claims finish_reasons
are "not available" in streaming, but they can be present when stream_options
include usage; update the test title string in the it(...) block to accurately
reflect that the mock simply doesn't include finish_reasons (e.g., "does NOT set
finish_reasons in streaming when mock omits them"), keeping the test body and
assertions (references: makeInstrumentation, chatWrapper, wrapped.call,
ATTR_GEN_AI_RESPONSE_FINISH_REASONS) unchanged so the behavior is still
validated against the mocked stream.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/instrumentation-llamaindex/test/instrumentation.test.ts`:
- Line 215: The variable named noopDiag is misleading because it references the
real OpenTelemetry diagnostics logger (diag); either rename noopDiag to a
truthful identifier like diagLogger or realDiag (update all usages in
instrumentation.test.ts accordingly), or replace the assignment with an actual
no-op DiagLogger implementation (an object implementing the DiagLogger methods
such as debug/info/warn/error as no-ops) so tests are truly silent; update the
variable type (DiagLogger) and any assertions to match the chosen approach.
- Around line 228-234: The streaming mock in the test only yields { delta:
responseContent } and doesn't cover the case where the final streaming chunk
includes raw usage/finish_reason; add a new mock (e.g.,
makeMockChatWithStreamUsage) that returns an async generator from chat({ stream
}) which yields a partial delta and then a final chunk containing delta plus
raw: { choices: [{ finish_reason }], usage: { prompt_tokens, completion_tokens,
total_tokens } }, then add a test (e.g., it("sets finish_reasons and usage when
streaming with stream_options")) that uses this mock to assert the code under
test records finish_reasons and usage fields when the streaming raw data is
present; update any existing test harness to swap in the new chat mock for this
scenario.
- Around line 502-520: The test description is misleading — it claims
finish_reasons are "not available" in streaming, but they can be present when
stream_options include usage; update the test title string in the it(...) block
to accurately reflect that the mock simply doesn't include finish_reasons (e.g.,
"does NOT set finish_reasons in streaming when mock omits them"), keeping the
test body and assertions (references: makeInstrumentation, chatWrapper,
wrapped.call, ATTR_GEN_AI_RESPONSE_FINISH_REASONS) unchanged so the behavior is
still validated against the mocked stream.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ac1a5c7-5bc8-4001-9d7f-29c36c8034b8

📥 Commits

Reviewing files that changed from the base of the PR and between 6713c74 and 42af26a.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (1)
  • packages/instrumentation-llamaindex/test/instrumentation.test.ts

@lenatraceloop lenatraceloop force-pushed the lena/llama-index-otel-1.40 branch from 42af26a to 51cd26b Compare April 9, 2026 13:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/instrumentation-llamaindex/src/utils.ts (1)

69-83: ⚠️ Potential issue | 🟠 Major

Ensure callback always runs on stream close/error, not only normal completion.

On Line 82, fn(message, lastChunk) runs only after full iteration. If the consumer stops early or iteration throws, callback execution is skipped, which can miss usage/finish_reason emission and leave streaming spans unfinalized.

Proposed fix
 export async function* llmGeneratorWrapper(
   streamingResult:
     | AsyncIterable<llamaindex.ChatResponseChunk>
     | AsyncIterable<llamaindex.CompletionResponse>,
   ctx: Context,
   fn: (message: string, lastChunk?: any) => void,
 ) {
   let message = "";
   // Track the last chunk so the callback can extract usage/finish_reason from
   // chunk.raw — OpenAI sends these in the final streaming chunk when
   // stream_options: { include_usage: true } is set on the LLM.
   let lastChunk: any;
 
-  for await (const messageChunk of bindAsyncGenerator(
-    ctx,
-    streamingResult as AsyncGenerator,
-  )) {
-    if ((messageChunk as llamaindex.ChatResponseChunk).delta) {
-      message += (messageChunk as llamaindex.ChatResponseChunk).delta;
-    }
-    if ((messageChunk as llamaindex.CompletionResponse).text) {
-      message += (messageChunk as llamaindex.CompletionResponse).text;
-    }
-    lastChunk = messageChunk;
-    yield messageChunk;
-  }
-  fn(message, lastChunk);
+  try {
+    for await (const messageChunk of bindAsyncGenerator(
+      ctx,
+      streamingResult as AsyncGenerator,
+    )) {
+      if ((messageChunk as llamaindex.ChatResponseChunk).delta) {
+        message += (messageChunk as llamaindex.ChatResponseChunk).delta;
+      }
+      if ((messageChunk as llamaindex.CompletionResponse).text) {
+        message += (messageChunk as llamaindex.CompletionResponse).text;
+      }
+      lastChunk = messageChunk;
+      yield messageChunk;
+    }
+  } finally {
+    fn(message, lastChunk);
+  }
 }

As per coding guidelines: "Instrumentations must extract request/response data and token usage from wrapped calls" and "Instrumentations must capture and record errors appropriately."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-llamaindex/src/utils.ts` around lines 69 - 83, The
callback fn(message, lastChunk) currently only runs after successful full
iteration; wrap the for-await loop in try/catch/finally so fn is always invoked
on normal completion or on error/early return: use try { for await (const
messageChunk of bindAsyncGenerator(ctx, streamingResult as AsyncGenerator)) {
... } } catch (err) { fn(message, lastChunk); throw err; } finally { if not
already called, fn(message, lastChunk); } — ensure this change is applied around
the bindAsyncGenerator loop and preserves rethrowing errors from streamingResult
so streaming spans are finalized and errors still propagate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/instrumentation-llamaindex/src/utils.ts`:
- Around line 69-83: The callback fn(message, lastChunk) currently only runs
after successful full iteration; wrap the for-await loop in try/catch/finally so
fn is always invoked on normal completion or on error/early return: use try {
for await (const messageChunk of bindAsyncGenerator(ctx, streamingResult as
AsyncGenerator)) { ... } } catch (err) { fn(message, lastChunk); throw err; }
finally { if not already called, fn(message, lastChunk); } — ensure this change
is applied around the bindAsyncGenerator loop and preserves rethrowing errors
from streamingResult so streaming spans are finalized and errors still
propagate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e0e7c660-d053-495e-8d8d-58c1f39c7067

📥 Commits

Reviewing files that changed from the base of the PR and between 42af26a and 6b1ba71.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (8)
  • packages/instrumentation-llamaindex/package.json
  • packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts
  • packages/instrumentation-llamaindex/src/instrumentation.ts
  • packages/instrumentation-llamaindex/src/utils.ts
  • packages/instrumentation-llamaindex/test/finish_reasons.test.ts
  • packages/instrumentation-llamaindex/test/instrumentation.test.ts
  • packages/instrumentation-llamaindex/test/semconv.test.ts
  • packages/sample-app/src/sample_llamaindex.ts
✅ Files skipped from review due to trivial changes (2)
  • packages/instrumentation-llamaindex/package.json
  • packages/sample-app/src/sample_llamaindex.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/instrumentation-llamaindex/test/finish_reasons.test.ts
  • packages/instrumentation-llamaindex/test/semconv.test.ts
  • packages/instrumentation-llamaindex/test/instrumentation.test.ts
  • packages/instrumentation-llamaindex/src/custom-llm-instrumentation.ts
  • packages/instrumentation-llamaindex/src/instrumentation.ts

@lenatraceloop lenatraceloop force-pushed the lena/llama-index-otel-1.40 branch 2 times, most recently from be5d69f to 97367b4 Compare April 9, 2026 14:03
@lenatraceloop
Copy link
Copy Markdown
Member Author

lenatraceloop commented Apr 9, 2026

Thanks for the thorough reviews! Here's a rundown of everything we addressed:


utils.ts — callback not called on stream error/early exit (Major)

Fixed in a previous commit — wrapped the for await loop in try/catch/finally with an fnCalled guard so the callback is guaranteed to run regardless of whether the stream completes normally, the consumer breaks early, or the stream throws. Spans are always finalized and usage/finish_reason data is never lost.


tracing/index.tsllamaIndexOpenAI silently ignored without llamaIndex (Minor)

Added a JSDoc comment on llamaIndexOpenAI in InitializeOptions.instrumentModules making the dependency on llamaIndex explicit.


Nitpick: noopDiag is misleadingly named

Fixed — renamed to testDiag throughout instrumentation.test.ts.


Nitpick: Missing test for streaming with usage data

Fixed — added makeMockChatWithStreamUsage mock and a new test "sets finish_reasons and usage when streaming with raw data in last chunk" that asserts finish_reason and token usage are correctly recorded when the final streaming chunk includes raw data.


Nitpick: Test description "not available in streaming" is misleading

Fixed — updated to "does NOT set finish_reasons in streaming when mock omits them".

@lenatraceloop lenatraceloop force-pushed the lena/llama-index-otel-1.40 branch from 123df85 to ed7af2a Compare April 9, 2026 14:33
ATTR_GEN_AI_USAGE_OUTPUT_TOKENS,
GEN_AI_OPERATION_NAME_VALUE_CHAT,
GEN_AI_PROVIDER_NAME_VALUE_OPENAI,
} from "@opentelemetry/semantic-conventions/incubating";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import from the incubating was when the att were not official yet.
See if some of the att can be imported from the official import now:
} from "@opentelemetry/semantic-conventions";

If not leave it of course

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked it — in v1.40.0, the main @opentelemetry/semantic-conventions entry point only exports stable attributes (HTTP, network, URL, etc.).
All GEN_AI attributes are still in Development status and only available via /incubating.
Confirmed in the https://opentelemetry.io/docs/specs/semconv/gen-ai/ and the
https://github.com/open-telemetry/semantic-conventions/releases/tag/v1.40.0.
So keeping the import from /incubating for now.

.startSpan(`llamaindex.${lodash.snakeCase(className)}.chat`, {
kind: SpanKind.CLIENT,
});
const span = plugin.tracer().startSpan(`chat ${this.metadata.model}`, {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just verifying this is the new converntion ?

Copy link
Copy Markdown
Member Author

@lenatraceloop lenatraceloop Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is the new OTel 1.40 GenAI convention — span name format is {operation} {model} (e.g. chat gpt-4o), as specified in the https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/.
The old llamaindex.open_ai.chat format was replaced as part of the migration.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants