Skip to content

Adding embedding token metric #248

Open
shuningc wants to merge 4 commits intomainfrom
AddingEmbeddingTokenMetrics
Open

Adding embedding token metric #248
shuningc wants to merge 4 commits intomainfrom
AddingEmbeddingTokenMetrics

Conversation

@shuningc
Copy link
Copy Markdown
Contributor

@shuningc shuningc commented Apr 7, 2026

Summary

This PR adds token usage metrics for embedding operations, bringing parity with LLM invocations. Previously, embeddings only emitted duration metrics; now they also emit gen_ai.client.token.usage with input token counts.

Changes

Core (opentelemetry-util-genai)

util/opentelemetry-util-genai/src/opentelemetry/util/genai/emitters/metrics.py

  • Added _record_token_metrics() call in on_end() for EmbeddingInvocation
  • Added _record_token_metrics() call in on_error() for EmbeddingInvocation
  • Added get_context_metric_attributes() for session context on embedding metrics
  • Passes None for completion_tokens (embeddings don't produce output tokens)

util/opentelemetry-util-genai/tests/test_metrics.py

  • Added EmbeddingInvocation import
  • Added _invoke_embedding() helper method
  • Added _invoke_embedding_failure() helper method
  • Added test_embedding_emits_input_token_metric - verifies token metric with correct attributes
  • Added test_embedding_failure_emits_token_metric - verifies metrics on error path

LangChain Instrumentation

instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/init.py](instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/init.py

  • Added _count_tokens(self, texts, model) method using tiktoken library
  • Uses model-specific encoding with cl100k_base fallback
  • Modified _start_embedding() to count tokens client-side and populate input_tokens on EmbeddingInvocation

Why client-side counting? LangChain's embed_documents() returns only the embedding vectors—it strips the API response metadata including usage.prompt_tokens. Unlike ChatOpenAI which exposes response.llm_output.usage, there's no way to get server-reported token counts for embeddings through LangChain's API.

instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py

  • New VCR-based integration test for embedding token metrics
  • Validates span attributes (gen_ai.operation.name, gen_ai.request.model, gen_ai.usage.input_tokens)
  • Validates metrics (gen_ai.client.token.usage, gen_ai.client.operation.duration)
  • Outputs full OTLP-style JSON for debugging/verification

instrumentation-genai/opentelemetry-instrumentation-langchain/tests/cassettes/test_langchain_embedding_call.yaml

  • VCR cassette recording for the embedding API call

instrumentation-genai/opentelemetry-instrumentation-langchain/tests/conftest.py

  • Added ignore_hosts: ["openaipublic.blob.core.windows.net"] to VCR config to prevent intercepting tiktoken encoding downloads

Metrics Emitted

After this change, embedding operations emit:

Metric Description Token Type
gen_ai.client.token.usage Number of tokens used input only
gen_ai.client.operation.duration Duration in seconds N/A

Example metric output:

"metrics": [
                        {
                            "name": "gen_ai.client.token.usage",
                            "description": "Number of input and output tokens used",
                            "unit": "{token}",
                            "data": {
                                "data_points": [
                                    {
                                        "attributes": {
                                            "gen_ai.token.type": "input",
                                            "gen_ai.provider.name": "openai",
                                            "gen_ai.operation.name": "embedding",
                                            "gen_ai.request.model": "text-embedding-ada-002"
                                        },
                                        "start_time_unix_nano": 1775523710327305000,
                                        "time_unix_nano": 1775523710327705000,
                                        "count": 1,
                                        "sum": 7,
                                        "min": 7,
                                        "max": 7,
                                        "exemplars": [],
                                        "bucket_counts": [
                                            0,
                                            0,
                                            1,
                                            0,
                                            0,
                                            0,
                                            0,
                                            0,
                                            0,
                                            0,
                                            0,
                                            0,
                                            0,
                                            0,
                                            0
                                        ],
                                        "explicit_bounds": [
                                            1,
                                            4,
                                            16,
                                            64,
                                            256,
                                            1024,
                                            4096,
                                            16384,
                                            65536,
                                            262144,
                                            1048576,
                                            4194304,
                                            16777216,
                                            67108864
                                        ]
                                    }
                                ],
                                "aggregation_temporality": 2
                            }
                        }
]

Testing

# Run util-genai metric tests
pytest ./util/opentelemetry-util-genai/tests/test_metrics.py -v -k embedding

# Run LangChain embedding test
pytest ./instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py -v -s -p no:deepeval

Dependencies

  • tiktoken (optional): Used for client-side token estimation in LangChain. If not installed, token metrics won't be emitted for embeddings but duration metrics still work.

@shuningc shuningc requested review from a team as code owners April 7, 2026 01:08
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not making this change in other instrumentations?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using langchain as an example right now. If it looks good, I will add it to other instrumentations.

@pradystar
Copy link
Copy Markdown
Contributor

Dependencies
tiktoken (optional): Used for client-side token estimation in LangChain. If not installed, token metrics won't be emitted for embeddings but duration metrics still work.

Are you sure langchain does not provide tokens consumed in embedding callbacks? Is this dependency ok to be added in upstream and where is this documented for SDOT? I don't see it in pyproject.toml or requirements.

@shuningc
Copy link
Copy Markdown
Contributor Author

shuningc commented Apr 7, 2026

Dependencies
tiktoken (optional): Used for client-side token estimation in LangChain. If not installed, token metrics won't be emitted for embeddings but duration metrics still work.

Are you sure langchain does not provide tokens consumed in embedding callbacks? Is this dependency ok to be added in upstream and where is this documented for SDOT? I don't see it in pyproject.toml or requirements.

Langchain doesn't have embedding callback such as LLMInvocation. The token is internally computed with tiktoken but the return value is a list[list[float]] for the embeddings, no token values exposed. https://github.com/langchain-ai/langchain/blob/master/libs/partners/openai/langchain_openai/embeddings/base.py#L578 If the user has already had langchain-openai imported, it will contain tiktoken dependency. I just updated the Readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants