Track AI model usage for accurate cost attribution across providers.
Botanu provides LLM tracking that aligns with OpenTelemetry GenAI Semantic Conventions. This ensures compatibility with standard observability tooling while enabling detailed cost analysis.
from botanu.tracking.llm import track_llm_call
with track_llm_call(provider="openai", model="gpt-4") as tracker:
response = await openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
tracker.set_tokens(
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
)
tracker.set_request_id(response.id)| Attribute | Example | Description |
|---|---|---|
gen_ai.operation.name |
chat |
Type of operation |
gen_ai.provider.name |
openai |
Normalized provider name |
gen_ai.request.model |
gpt-4 |
Requested model |
gen_ai.response.model |
gpt-4-0613 |
Actual model used |
gen_ai.usage.input_tokens |
150 |
Input/prompt tokens |
gen_ai.usage.output_tokens |
200 |
Output/completion tokens |
gen_ai.response.id |
chatcmpl-... |
Provider request ID |
Record token usage from the response:
tracker.set_tokens(
input_tokens=150,
output_tokens=200,
cached_tokens=50, # For providers with caching
cache_read_tokens=50, # Anthropic-style cache read
cache_write_tokens=100, # Anthropic-style cache write
)Record provider and client request IDs for billing reconciliation:
tracker.set_request_id(
provider_request_id=response.id, # From provider response
client_request_id="my-client-123", # Your tracking ID
)When the response uses a different model than requested:
tracker.set_response_model("gpt-4-0613")Record request parameters for analysis:
tracker.set_request_params(
temperature=0.7,
top_p=0.9,
max_tokens=1000,
stop_sequences=["END"],
frequency_penalty=0.5,
presence_penalty=0.3,
)Mark as a streaming request:
tracker.set_streaming(True)Mark as a cache hit (for semantic caching):
tracker.set_cache_hit(True)Track retry attempts:
tracker.set_attempt(2) # Second attemptRecord the stop reason:
tracker.set_finish_reason("stop") # or "length", "content_filter", etc.Record errors (automatically called on exceptions):
try:
response = await client.chat(...)
except openai.RateLimitError as e:
tracker.set_error(e)
raiseAdd custom attributes:
tracker.add_metadata(
prompt_version="v2.1",
experiment_id="exp-123",
)Use ModelOperation constants for the operation parameter:
from botanu.tracking.llm import track_llm_call, ModelOperation
# Chat completion
with track_llm_call(provider="openai", model="gpt-4", operation=ModelOperation.CHAT):
...
# Embeddings
with track_llm_call(provider="openai", model="text-embedding-3-small", operation=ModelOperation.EMBEDDINGS):
...
# Text completion (legacy)
with track_llm_call(provider="openai", model="davinci", operation=ModelOperation.TEXT_COMPLETION):
...Available operations:
| Constant | Value | Use Case |
|---|---|---|
CHAT |
chat |
Chat completions (default) |
TEXT_COMPLETION |
text_completion |
Legacy completions |
EMBEDDINGS |
embeddings |
Embedding generation |
GENERATE_CONTENT |
generate_content |
Generic content generation |
EXECUTE_TOOL |
execute_tool |
Tool/function execution |
CREATE_AGENT |
create_agent |
Agent creation |
INVOKE_AGENT |
invoke_agent |
Agent invocation |
RERANK |
rerank |
Reranking |
IMAGE_GENERATION |
image_generation |
Image generation |
SPEECH_TO_TEXT |
speech_to_text |
Transcription |
TEXT_TO_SPEECH |
text_to_speech |
Speech synthesis |
Provider names are automatically normalized:
| Input | Normalized |
|---|---|
openai, OpenAI |
openai |
azure_openai, azure-openai |
azure.openai |
anthropic, claude |
anthropic |
bedrock, aws_bedrock |
aws.bedrock |
vertex, vertexai, gemini |
gcp.vertex_ai |
cohere |
cohere |
mistral, mistralai |
mistral |
together, togetherai |
together |
groq |
groq |
Track tool calls triggered by LLMs:
from botanu.tracking.llm import track_tool_call
with track_tool_call(tool_name="search_database", tool_call_id="call_abc123") as tool:
results = await do_work(query)
tool.set_result(
success=True,
items_returned=len(results),
bytes_processed=1024,
)# Set execution result
tool.set_result(
success=True,
items_returned=10,
bytes_processed=2048,
)
# Set tool call ID from LLM response
tool.set_tool_call_id("call_abc123")
# Record error
tool.set_error(exception)
# Add custom metadata
tool.add_metadata(query_type="semantic")For cases where you can't use context managers:
from botanu.tracking.llm import set_llm_attributes
set_llm_attributes(
provider="openai",
model="gpt-4",
operation="chat",
input_tokens=150,
output_tokens=200,
streaming=True,
provider_request_id="chatcmpl-...",
)from botanu.tracking.llm import set_token_usage
set_token_usage(
input_tokens=150,
output_tokens=200,
cached_tokens=50,
)For wrapping existing client methods:
from botanu.tracking.llm import llm_instrumented
class MyOpenAIClient:
@llm_instrumented(provider="openai", tokens_from_response=True)
def chat(self, model: str, messages: list):
return openai.chat.completions.create(model=model, messages=messages)The SDK automatically records these metrics:
| Metric | Type | Description |
|---|---|---|
gen_ai.client.token.usage |
Histogram | Token counts by type |
gen_ai.client.operation.duration |
Histogram | Operation duration in seconds |
botanu.gen_ai.attempts |
Counter | Request attempts (including retries) |
from botanu import botanu_workflow, emit_outcome
from botanu.tracking.llm import track_llm_call
@botanu_workflow("process-with-fallback", event_id=event_id, customer_id=customer_id)
async def process_with_fallback(data: str):
"""Try one provider first, fall back to another."""
try:
with track_llm_call(provider="anthropic", model="claude-3-opus") as tracker:
tracker.set_attempt(1)
response = await do_work(data, provider="anthropic")
tracker.set_tokens(
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
)
emit_outcome("success", value_type="items_processed", value_amount=1)
return response.content
except RateLimitError:
# Fallback to second provider
with track_llm_call(provider="openai", model="gpt-4") as tracker:
tracker.set_attempt(2)
response = await do_work(data, provider="openai")
tracker.set_tokens(
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
)
emit_outcome("success", value_type="items_processed", value_amount=1)
return response.content- Auto-Instrumentation - Automatic LLM tracking
- Data Tracking - Database and storage tracking
- Outcomes - Recording business outcomes