diff --git a/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/.env.example b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/.env.example new file mode 100644 index 00000000..d8fa35fb --- /dev/null +++ b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/.env.example @@ -0,0 +1,33 @@ +# ============================================================================= +# .env.example — copy to .env and fill in your values for local development +# ============================================================================= + +# --------------------------------------------------------------------------- +# Azure OpenAI credentials +# --------------------------------------------------------------------------- +AZURE_OPENAI_API_KEY= +AZURE_OPENAI_ENDPOINT=https://.cognitiveservices.azure.com/ +AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini +AZURE_OPENAI_API_VERSION=2024-12-01-preview + +# Alternative: use the OpenAI-compatible endpoint (recommended — reports +# gen_ai.request.model correctly in telemetry instead of "gpt-4o-mini-...-deployment") +# OPENAI_BASE_URL=https://.openai.azure.com/openai/deployments// +# OPENAI_API_KEY= +# OPENAI_MODEL=gpt-4o-mini + +# --------------------------------------------------------------------------- +# OpenTelemetry — local development +# --------------------------------------------------------------------------- +OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 +OTEL_EXPORTER_OTLP_PROTOCOL=grpc +OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta +OTEL_SERVICE_NAME=multi-agent-travel-planner-gunicorn +OTEL_RESOURCE_ATTRIBUTES=deployment.environment=local +OTEL_INSTRUMENTATION_GENAI_EMITTERS=span_metric +OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY +# Required on Azure App Service (opentelemetry-instrument does not configure +# the MeterProvider unless these are explicitly set) +OTEL_METRICS_EXPORTER=otlp +OTEL_TRACES_EXPORTER=otlp +OTEL_LOGS_EXPORTER=otlp diff --git a/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/README.md b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/README.md new file mode 100644 index 00000000..cd992682 --- /dev/null +++ b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/README.md @@ -0,0 +1,322 @@ +# Multi-Agent Travel Planner — Gunicorn + Azure App Service + +A LangGraph multi-agent travel planner served by **Gunicorn + Uvicorn workers** with +OpenTelemetry instrumentation sending `gen_ai.*` spans and metrics to Splunk Observability Cloud. + +Five specialized agents collaborate to produce a full itinerary: + +``` +coordinator_gc → flight_specialist_gc → hotel_specialist_gc → activity_specialist_gc → plan_synthesizer_gc +``` + +--- + +## Architecture + +``` +HTTP client + │ + ▼ +Azure App Service (Gunicorn + UvicornWorker, FastAPI) + │ OTLP/gRPC + ▼ +Azure Container Instance (Splunk OTel Collector 0.123.0) + │ signalfx exporter + otlphttp/splunk + ▼ +Splunk Observability Cloud (APM traces + gen_ai.* metrics) +``` + +--- + +## Local development + +### Prerequisites + +```bash +pip install splunk-otel-instrumentation-langchain==0.1.14 +pip install -r requirements.txt +``` + +Copy `.env.example` to `~/.env` and fill in your Azure OpenAI credentials and OTel settings: + +```bash +cp .env.example ~/.env +# edit ~/.env +``` + +Start a local Splunk OTel Collector (or use `otel-tui` for quick inspection): + +```bash +# Example with Docker — replace and +docker run -d --name otelcol-local \ + -p 4317:4317 -p 4318:4318 -p 13133:13133 \ + -e SPLUNK_ACCESS_TOKEN= \ + -e SPLUNK_REALM= \ + quay.io/signalfx/splunk-otel-collector:0.123.0 +``` + +### Run the app + +OTel is initialised **programmatically** inside `app.py` (see +[OTel troubleshooting — use programmatic auto-instrumentation](https://opentelemetry.io/docs/zero-code/python/troubleshooting/#use-programmatic-auto-instrumentation)). +Run Gunicorn directly — no `opentelemetry-instrument` wrapper needed: + +```bash +source ~/.env + +OTEL_SERVICE_NAME=multi-agent-travel-planner-gunicorn \ +OTEL_RESOURCE_ATTRIBUTES=deployment.environment=local \ +OTEL_INSTRUMENTATION_GENAI_EMITTERS=span_metric \ +OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY \ +OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta \ +OTEL_METRICS_EXPORTER=otlp \ +OTEL_TRACES_EXPORTER=otlp \ +OTEL_LOGS_EXPORTER=otlp \ +gunicorn \ + -w 1 \ + -k uvicorn.workers.UvicornWorker \ + app:app \ + --access-logfile "-" \ + --timeout 301 \ + --bind 0.0.0.0:8000 +``` + +### Run without OTel (quick test) + +```bash +source ~/.env +uvicorn app:app --port 8000 +``` + +### Test requests + +```bash +# Health check +curl http://localhost:8000/health + +# Plan a trip +curl -X POST http://localhost:8000/plan \ + -H "Content-Type: application/json" \ + -d '{"origin":"Seattle","destination":"Tokyo","travellers":2}' +``` + +--- + +## Azure deployment + +### Prerequisites + +- Azure CLI installed and authenticated (`az login`) +- Contributor access to an Azure resource group +- Splunk Observability Cloud access token and realm + +### Step 1 — Deploy the OTel Collector to ACI + +The collector receives OTLP from the App Service and forwards to Splunk. Deploy it +first so you have the collector IP for Step 3. + +```bash +export SPLUNK_ACCESS_TOKEN= +export SPLUNK_HEC_TOKEN= +export SPLUNK_HEC_URL=https://http-inputs-.splunkcloud.com:443/services/collector/event +export SPLUNK_REALM= # e.g. us1 +export RESOURCE_GROUP= +export STORAGE_ACCOUNT= # max 24 chars + +chmod +x collector/deploy-aci.sh +./collector/deploy-aci.sh +``` + +The script creates an Azure File Share, uploads `otel-collector-config.yaml`, and +starts the container. Note the **Public IP** printed at the end. + +Collector image: `quay.io/signalfx/splunk-otel-collector:0.123.0` +(see `collector/deploy-aci.sh` — override with `CONTAINER_IMAGE=...` if needed). + +### Step 2 — Create the App Service plan and web app + +```bash +export RESOURCE_GROUP= +export LOCATION=westus +export PLAN_NAME= +export APP_NAME= + +# Create Linux App Service plan (B1 is sufficient) +az appservice plan create \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${PLAN_NAME}" \ + --location "${LOCATION}" \ + --is-linux \ + --sku B1 + +# Create the web app (Python 3.12) +az webapp create \ + --resource-group "${RESOURCE_GROUP}" \ + --plan "${PLAN_NAME}" \ + --name "${APP_NAME}" \ + --runtime "PYTHON|3.12" \ + --startup-file "sh startup.sh" + +# Enable Oryx build during zip deployment. +# Without this, az webapp deployment source config-zip extracts the zip but +# does NOT run pip install, so packages from requirements.txt are missing and +# startup.sh fails with "gunicorn: not found" or import errors. +az webapp config appsettings set \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${APP_NAME}" \ + --settings SCM_DO_BUILD_DURING_DEPLOYMENT=true \ + --output none +``` + +### Step 3 — Configure application settings + +Replace placeholders with your values. Use the collector IP from Step 1. + +```bash +az webapp config appsettings set \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${APP_NAME}" \ + --settings \ + AZURE_OPENAI_API_KEY="" \ + AZURE_OPENAI_ENDPOINT="https://.cognitiveservices.azure.com/" \ + AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini" \ + AZURE_OPENAI_API_VERSION="2024-12-01-preview" \ + OTEL_EXPORTER_OTLP_ENDPOINT="http://:4317" \ + OTEL_EXPORTER_OTLP_PROTOCOL="grpc" \ + OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE="delta" \ + OTEL_SERVICE_NAME="multi-agent-travel-planner-azure" \ + OTEL_RESOURCE_ATTRIBUTES="deployment.environment=" \ + OTEL_INSTRUMENTATION_GENAI_EMITTERS="span_metric" \ + OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT="SPAN_ONLY" \ + OTEL_METRICS_EXPORTER="otlp" \ + OTEL_TRACES_EXPORTER="otlp" \ + OTEL_LOGS_EXPORTER="otlp" +``` + +### Step 4 — Build and deploy the app + +Run from the `gunicorn/` directory: + +```bash +cd instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn + +zip -j /tmp/gunicorn-deploy.zip app.py requirements.txt startup.sh + +# config-zip triggers the Oryx build that installs requirements.txt into antenv. +# (az webapp deploy --type zip skips the Oryx build and leaves packages uninstalled.) +az webapp deployment source config-zip \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${APP_NAME}" \ + --src /tmp/gunicorn-deploy.zip +``` + +### Step 5 — Verify + +```bash +# Health check +curl https://${APP_NAME}.azurewebsites.net/health + +# Plan a trip +curl -X POST https://${APP_NAME}.azurewebsites.net/plan \ + -H "Content-Type: application/json" \ + -d '{"origin":"Seattle","destination":"Paris","travellers":2}' + +# Tail live logs +az webapp log tail \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${APP_NAME}" + +# Check collector health +curl http://:13133/ +``` + +--- + +## App management + +```bash +# Stop (pauses billing on B1) +az webapp stop --resource-group "${RESOURCE_GROUP}" --name "${APP_NAME}" + +# Start +az webapp start --resource-group "${RESOURCE_GROUP}" --name "${APP_NAME}" + +# Redeploy after code changes +zip -j /tmp/gunicorn-deploy.zip app.py requirements.txt startup.sh +az webapp deployment source config-zip \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${APP_NAME}" \ + --src /tmp/gunicorn-deploy.zip + +# Update collector IP if ACI was recreated +az webapp config appsettings set \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${APP_NAME}" \ + --settings OTEL_EXPORTER_OTLP_ENDPOINT="http://:4317" +az webapp restart --resource-group "${RESOURCE_GROUP}" --name "${APP_NAME}" +``` + +--- + +## Expected telemetry + +In **Splunk APM** (filter by `deployment.environment = `): + +| Signal | What you see | +|---|---| +| Traces | One root trace per `/plan` request; child spans per agent (`coordinator_gc`, `flight_specialist_gc`, …) | +| Spans | `gen_ai.system`, `gen_ai.request.model`, `gen_ai.operation.name` on each LLM call | +| Metrics | `gen_ai.client.operation.duration` histogram, `gen_ai.client.token.usage` histogram | +| Agent view | Per-agent requests, latency, token usage, quality scores | + +--- + +## How OTel instrumentation works + +### Programmatic initialization (fork-safe) + +OTel is initialized **inside the worker process** via +[`opentelemetry.instrumentation.auto_instrumentation.initialize()`](https://opentelemetry.io/docs/zero-code/python/troubleshooting/#use-programmatic-auto-instrumentation) +at the top of `app.py`, guarded so it runs exactly once per process: + +```python +from opentelemetry import trace +from opentelemetry.sdk.trace import TracerProvider + +if not isinstance(trace.get_tracer_provider(), TracerProvider): + from opentelemetry.instrumentation.auto_instrumentation import initialize + initialize() +``` + +**Why not `opentelemetry-instrument gunicorn`?** + +The CLI wrapper initializes the OTel SDK in the Gunicorn **master** process. After +`fork()`, only the calling thread is preserved in each worker — the +`PeriodicExportingMetricReader` timer thread is silently lost, so **metrics are never +exported** even though traces continue to flow (the `BatchSpanProcessor` is more +resilient to fork). + +The programmatic approach runs `initialize()` **after** fork, giving each worker its +own fresh metric reader thread. + +Reference: [Pre-fork server issues — OTel Python troubleshooting](https://opentelemetry.io/docs/zero-code/python/troubleshooting/#pre-fork-server-issues) + +Support matrix for `opentelemetry-instrument` (multiple workers): + +| Stack | Traces | Metrics | Logs | +|---|---|---|---| +| Uvicorn | ✓ | ✗ | ✓ | +| Gunicorn | ✓ | ✗ | ✓ | +| **Gunicorn + UvicornWorker** | **✓** | **✓** | **✓** | + +> On **Linux** (Azure App Service), Gunicorn + UvicornWorker with the CLI wrapper +> also works because UvicornWorker handles fork safety. On **macOS**, gRPC C extensions +> cause a SIGSEGV after fork — use the programmatic approach for local development. + +### Collector version + +The ACI collector is pinned to `quay.io/signalfx/splunk-otel-collector:0.123.0`. + +The `sapm` exporter was deprecated in `0.115.0` and **removed in `0.147.0`**. The +config uses `otlphttp/splunk` for traces, which is compatible with all versions from +`0.115.0` onwards. diff --git a/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/app.py b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/app.py new file mode 100644 index 00000000..1e9b7acb --- /dev/null +++ b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/app.py @@ -0,0 +1,603 @@ +"""Multi-Agent Travel Planner — FastAPI + LangGraph served by Gunicorn + Uvicorn workers. + +Overview +──────── +A five-agent LangGraph pipeline that produces a complete travel itinerary: + + coordinator → flight_specialist → hotel_specialist + → activity_specialist → plan_synthesizer + +Each agent is an independent LLM call (Azure OpenAI or standard OpenAI) tagged +with ``gen_ai.agent.name`` metadata so that per-agent telemetry is visible in +Splunk Observability Cloud's APM Agent view. + +Telemetry emitted (via splunk-otel-instrumentation-langchain) +────────────────────────────────────────────────────────────── +Spans: + - One ``gen_ai.agent.invoke`` span per LangGraph node (coordinator, + flight_specialist, hotel_specialist, activity_specialist, plan_synthesizer) + - One ``gen_ai.client.chat`` span per LLM call inside each node + +Metrics (delta temporality): + - ``gen_ai.client.token.usage`` — prompt / completion tokens per agent + - ``gen_ai.client.operation.duration`` — latency histogram per operation + +All signals carry ``deployment.environment``, ``service.name``, +``gen_ai.agent.name``, and ``gen_ai.request.model`` attributes. + +OTel initialisation strategy +───────────────────────────── +Two approaches are supported and can be selected at startup time: + + A) CLI auto-instrumentation (zero-code, recommended for Gunicorn + UvicornWorker): + + opentelemetry-instrument gunicorn -w N -k uvicorn.workers.UvicornWorker app:app + + UvicornWorker is fork-aware and preserves background threads, so all three + signals (traces, metrics, logs) flow correctly without extra configuration. + Reference: https://opentelemetry.io/docs/zero-code/python/troubleshooting/#pre-fork-server-issues + + B) Programmatic auto-instrumentation (used in this file / Azure App Service): + + gunicorn -w N -k uvicorn.workers.UvicornWorker app:app # startup.sh + + ``initialize()`` is called at module import time, inside each worker process + after the Gunicorn ``fork()``. This ensures the ``PeriodicExportingMetricReader`` + background thread starts in the right process, preventing the silent metric + drop that occurs when the SDK is initialised in the master process and then + inherited across a fork (Linux). + Reference: https://opentelemetry.io/docs/zero-code/python/troubleshooting/#use-programmatic-auto-instrumentation + +Guard logic (lines below the docstring): + 1. ``sys.modules`` sentinel — prevents re-running ``initialize()`` if app.py is + reimported within the same process (hot-reload, test collection). + 2. ``TracerProvider`` type check — skips ``initialize()`` when the CLI wrapper + (approach A) has already set up a real SDK provider in this process, so both + approaches coexist cleanly without double-initialisation. + +Support matrix (Gunicorn + UvicornWorker, verified on Azure App Service): + Approach Workers Traces Metrics Logs Notes + CLI wrapper 1 ✓ ✓ ✓ + CLI wrapper N ✓ ✓ ✓ UvicornWorker handles fork safety + Programmatic 1 ✓ ✓ ✓ + Programmatic N ✓ ✓ ✓ initialize() runs post-fork per worker + +Verified deployments +───────────────────── +• Local: uvicorn app:app --reload (development) +• Local: gunicorn -w 1 -k uvicorn.workers.UvicornWorker app:app +• Azure App Svc: Python 3.12 runtime, startup.sh, Splunk OTel Collector 0.123.0 in ACI + - Traces visible in Splunk APM Trace Analyzer + - Metrics (token usage, operation duration) visible in Splunk APM Agent view + +Required environment variables +──────────────────────────────── + AZURE_OPENAI_ENDPOINT Azure OpenAI resource endpoint + AZURE_OPENAI_API_KEY Azure OpenAI API key + AZURE_OPENAI_DEPLOYMENT Model deployment name (e.g. gpt-4o-mini) + AZURE_OPENAI_API_VERSION API version (e.g. 2024-02-01) + + OTEL_SERVICE_NAME Service name shown in Splunk + OTEL_RESOURCE_ATTRIBUTES e.g. deployment.environment=my-env + OTEL_EXPORTER_OTLP_ENDPOINT OTel Collector endpoint (e.g. http://host:4317) + OTEL_EXPORTER_OTLP_PROTOCOL grpc (recommended) + OTEL_METRICS_EXPORTER otlp + OTEL_TRACES_EXPORTER otlp + OTEL_LOGS_EXPORTER otlp + OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE delta + OTEL_INSTRUMENTATION_GENAI_EMITTERS span_metric + OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT SPAN_ONLY + +Test: + curl -X POST http://localhost:8000/plan \\ + -H "Content-Type: application/json" \\ + -d '{"origin":"Seattle","destination":"Tokyo","travellers":2}' +""" + +from __future__ import annotations + +import sys + +# --------------------------------------------------------------------------- +# Programmatic OTel initialisation — approach B (see docstring above). +# +# Guard logic: +# 1. sys.modules sentinel: prevents re-running if app.py is re-imported +# within the same process (e.g. hot-reload, test collection). +# 2. SDK provider check: skips initialize() when the CLI wrapper +# (opentelemetry-instrument) has already set up a real TracerProvider +# in this process. This lets both approach A and B coexist cleanly: +# - approach A: CLI inits first → SDK provider present → skip here +# - approach B: no CLI → proxy provider present → initialize() runs here +# --------------------------------------------------------------------------- +_OTEL_INIT_KEY = "__travel_planner_otel_initialized__" +if _OTEL_INIT_KEY not in sys.modules: + sys.modules[_OTEL_INIT_KEY] = True # type: ignore[assignment] + try: + from opentelemetry import trace as _otel_trace + from opentelemetry.sdk.trace import TracerProvider as _SDKTracerProvider + + if not isinstance(_otel_trace.get_tracer_provider(), _SDKTracerProvider): + from opentelemetry.instrumentation.auto_instrumentation import initialize + + initialize() + except Exception: + pass # SDK not installed — safe to continue without telemetry + +import json # noqa: E402 +import os # noqa: E402 +import random # noqa: E402 +from datetime import datetime, timedelta # noqa: E402 +from pathlib import Path # noqa: E402 +from typing import Annotated, List, Optional, TypedDict # noqa: E402 +from uuid import uuid4 # noqa: E402 + +from dotenv import load_dotenv # noqa: E402 + +# Load ~/.env so AzureOpenAI credentials and OTEL vars are available at import time. +load_dotenv(Path.home() / ".env") + +from fastapi import FastAPI, HTTPException # noqa: E402 +from langchain.agents import create_agent as _create_react_agent # noqa: E402 +from langchain_core.messages import ( # noqa: E402 + AIMessage, + BaseMessage, + HumanMessage, + SystemMessage, +) +from langchain_core.tools import tool # noqa: E402 +from langchain_openai import AzureChatOpenAI, ChatOpenAI # noqa: E402 +from langgraph.graph import END, START, StateGraph # noqa: E402 +from langgraph.graph.message import AnyMessage, add_messages # noqa: E402 +from pydantic import BaseModel # noqa: E402 + +# --------------------------------------------------------------------------- +# LLM factory — auto-detects AzureOpenAI or standard OpenAI from env vars. +# Agent names have "gc" suffix to distinguish this gunicorn deployment. +# --------------------------------------------------------------------------- + +_GC_SUFFIX = "_gc" + + +def _create_llm( + agent_name: str, *, temperature: float, session_id: str +) -> ChatOpenAI | AzureChatOpenAI: + """Create an LLM tagged with the gc-suffixed agent name.""" + model = os.environ.get("OPENAI_MODEL", "gpt-4o-mini") + full_name = f"{agent_name}{_GC_SUFFIX}" + tags = [f"agent:{full_name}", "travel-planner-gc"] + metadata = { + "agent_name": full_name, + "agent_type": full_name, + "session_id": session_id, + "thread_id": session_id, + "ls_model_name": model, + "ls_temperature": temperature, + } + + # Prefer OpenAI-compatible Azure endpoint (OPENAI_BASE_URL) — reports + # gen_ai.request.model correctly in telemetry. + base_url = os.environ.get("OPENAI_BASE_URL") + if base_url: + return ChatOpenAI( + model=model, + api_key=os.environ.get("OPENAI_API_KEY", ""), + base_url=base_url, + temperature=temperature, + tags=tags, + metadata=metadata, + ) + + if os.environ.get("AZURE_OPENAI_ENDPOINT"): + # AZURE_OPENAI_DEPLOYMENT is the canonical name used in Azure App Service docs; + # AZURE_CHAT_DEPLOYMENT is the legacy name kept for backward compatibility. + deployment = os.environ.get("AZURE_OPENAI_DEPLOYMENT") or os.environ.get( + "AZURE_CHAT_DEPLOYMENT", model + ) + return AzureChatOpenAI( + azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], + api_key=os.environ.get("AZURE_OPENAI_API_KEY", ""), + api_version=os.environ.get("AZURE_OPENAI_API_VERSION", "2024-02-01"), + azure_deployment=deployment, + temperature=temperature, + tags=tags, + model_kwargs={"metadata": metadata}, + ) + + return ChatOpenAI( + model=model, + temperature=temperature, + tags=tags, + metadata=metadata, + ) + + +# --------------------------------------------------------------------------- +# Sample data +# --------------------------------------------------------------------------- + +DESTINATIONS = { + "paris": { + "country": "France", + "currency": "EUR", + "airport": "CDG", + "highlights": [ + "Eiffel Tower at sunset", + "Seine dinner cruise", + "Day trip to Versailles", + ], + }, + "tokyo": { + "country": "Japan", + "currency": "JPY", + "airport": "HND", + "highlights": [ + "Tsukiji market food tour", + "Ghibli Museum visit", + "Day trip to Hakone hot springs", + ], + }, + "rome": { + "country": "Italy", + "currency": "EUR", + "airport": "FCO", + "highlights": [ + "Colosseum underground tour", + "Private pasta masterclass", + "Sunset walk through Trastevere", + ], + }, +} + + +def _compute_dates() -> tuple[str, str]: + start = datetime.now() + timedelta(days=30) + end = start + timedelta(days=7) + return start.strftime("%Y-%m-%d"), end.strftime("%Y-%m-%d") + + +# --------------------------------------------------------------------------- +# Tools +# --------------------------------------------------------------------------- + + +@tool +def mock_search_flights(origin: str, destination: str, departure: str) -> str: + """Return mock flight options for a given origin/destination pair.""" + random.seed(hash((origin, destination, departure)) % (2**32)) + airline = random.choice(["SkyLine", "AeroJet", "CloudNine"]) + fare = random.randint(700, 1250) + return ( + f"Top choice: {airline} non-stop {origin}->{destination}, " + f"depart {departure} 09:15, arrive 17:05. Premium economy ${fare} return." + ) + + +@tool +def mock_search_hotels(destination: str, check_in: str, check_out: str) -> str: + """Return mock hotel recommendation for the stay.""" + random.seed(hash((destination, check_in, check_out)) % (2**32)) + name = random.choice(["Grand Meridian", "Hotel Lumière", "The Atlas"]) + rate = random.randint(240, 410) + return ( + f"{name} near the historic centre. Boutique suites, rooftop bar, " + f"average nightly rate ${rate} including breakfast." + ) + + +@tool +def mock_search_activities(destination: str) -> str: + """Return a short list of signature activities for the destination.""" + data = DESTINATIONS.get(destination.lower(), DESTINATIONS["paris"]) + bullets = "\n".join(f"- {item}" for item in data["highlights"]) + return f"Signature experiences in {destination.title()}:\n{bullets}" + + +# --------------------------------------------------------------------------- +# LangGraph state +# --------------------------------------------------------------------------- + + +class PlannerState(TypedDict): + messages: Annotated[List[AnyMessage], add_messages] + user_request: str + session_id: str + origin: str + destination: str + departure: str + return_date: str + travellers: int + flight_summary: Optional[str] + hotel_summary: Optional[str] + activities_summary: Optional[str] + final_itinerary: Optional[str] + current_agent: str + + +# --------------------------------------------------------------------------- +# LangGraph nodes — agent names use _gc suffix +# --------------------------------------------------------------------------- + + +def coordinator_node(state: PlannerState) -> PlannerState: + llm = _create_llm("coordinator", temperature=0.2, session_id=state["session_id"]) + agent = _create_react_agent(llm, tools=[]).with_config( + { + "run_name": f"coordinator{_GC_SUFFIX}", + "tags": ["agent", f"agent:coordinator{_GC_SUFFIX}"], + "metadata": { + "agent_name": f"coordinator{_GC_SUFFIX}", + "session_id": state["session_id"], + }, + } + ) + system_message = SystemMessage( + content=( + "You are the lead travel coordinator. Extract the key details from the " + "traveller's request and describe the plan for the specialist agents." + ) + ) + result = agent.invoke({"messages": [system_message] + list(state["messages"])}) + final_message = result["messages"][-1] + state["messages"].append( + final_message + if isinstance(final_message, BaseMessage) + else AIMessage(content=str(final_message)) + ) + state["current_agent"] = "flight_specialist" + return state + + +def flight_specialist_node(state: PlannerState) -> PlannerState: + llm = _create_llm( + "flight_specialist", temperature=0.4, session_id=state["session_id"] + ) + agent = _create_react_agent(llm, tools=[mock_search_flights]).with_config( + { + "run_name": f"flight_specialist{_GC_SUFFIX}", + "tags": ["agent", f"agent:flight_specialist{_GC_SUFFIX}"], + "metadata": { + "agent_name": f"flight_specialist{_GC_SUFFIX}", + "session_id": state["session_id"], + }, + } + ) + step = ( + f"Find an appealing flight from {state['origin']} to {state['destination']} " + f"departing {state['departure']} for {state['travellers']} travellers." + ) + result = agent.invoke({"messages": [HumanMessage(content=step)]}) + final_message = result["messages"][-1] + state["flight_summary"] = ( + final_message.content + if isinstance(final_message, BaseMessage) + else str(final_message) + ) + state["messages"].append( + final_message + if isinstance(final_message, BaseMessage) + else AIMessage(content=str(final_message)) + ) + state["current_agent"] = "hotel_specialist" + return state + + +def hotel_specialist_node(state: PlannerState) -> PlannerState: + llm = _create_llm( + "hotel_specialist", temperature=0.5, session_id=state["session_id"] + ) + agent = _create_react_agent(llm, tools=[mock_search_hotels]).with_config( + { + "run_name": f"hotel_specialist{_GC_SUFFIX}", + "tags": ["agent", f"agent:hotel_specialist{_GC_SUFFIX}"], + "metadata": { + "agent_name": f"hotel_specialist{_GC_SUFFIX}", + "session_id": state["session_id"], + }, + } + ) + step = ( + f"Recommend a boutique hotel in {state['destination']} between " + f"{state['departure']} and {state['return_date']} for {state['travellers']} travellers." + ) + result = agent.invoke({"messages": [HumanMessage(content=step)]}) + final_message = result["messages"][-1] + state["hotel_summary"] = ( + final_message.content + if isinstance(final_message, BaseMessage) + else str(final_message) + ) + state["messages"].append( + final_message + if isinstance(final_message, BaseMessage) + else AIMessage(content=str(final_message)) + ) + state["current_agent"] = "activity_specialist" + return state + + +def activity_specialist_node(state: PlannerState) -> PlannerState: + llm = _create_llm( + "activity_specialist", temperature=0.6, session_id=state["session_id"] + ) + agent = _create_react_agent(llm, tools=[mock_search_activities]).with_config( + { + "run_name": f"activity_specialist{_GC_SUFFIX}", + "tags": ["agent", f"agent:activity_specialist{_GC_SUFFIX}"], + "metadata": { + "agent_name": f"activity_specialist{_GC_SUFFIX}", + "session_id": state["session_id"], + }, + } + ) + step = f"Curate signature activities for travellers spending a week in {state['destination']}." + result = agent.invoke({"messages": [HumanMessage(content=step)]}) + final_message = result["messages"][-1] + state["activities_summary"] = ( + final_message.content + if isinstance(final_message, BaseMessage) + else str(final_message) + ) + state["messages"].append( + final_message + if isinstance(final_message, BaseMessage) + else AIMessage(content=str(final_message)) + ) + state["current_agent"] = "plan_synthesizer" + return state + + +def plan_synthesizer_node(state: PlannerState) -> PlannerState: + llm = _create_llm( + "plan_synthesizer", temperature=0.3, session_id=state["session_id"] + ) + system_prompt = SystemMessage( + content=( + "You are the travel plan synthesiser. Combine the specialist insights into a " + "concise, structured itinerary covering flights, accommodation and activities." + ) + ) + content = json.dumps( + { + "flight": state["flight_summary"], + "hotel": state["hotel_summary"], + "activities": state["activities_summary"], + }, + indent=2, + ) + response = llm.invoke( + [ + system_prompt, + HumanMessage( + content=( + f"Traveller request: {state['user_request']}\n\n" + f"Origin: {state['origin']} | Destination: {state['destination']}\n" + f"Dates: {state['departure']} to {state['return_date']}\n\n" + f"Specialist summaries:\n{content}" + ) + ), + ] + ) + state["final_itinerary"] = response.content + state["messages"].append(response) + state["current_agent"] = "completed" + return state + + +def should_continue(state: PlannerState) -> str: + mapping = { + "start": "coordinator", + "flight_specialist": "flight_specialist", + "hotel_specialist": "hotel_specialist", + "activity_specialist": "activity_specialist", + "plan_synthesizer": "plan_synthesizer", + } + return mapping.get(state["current_agent"], END) + + +def build_workflow() -> StateGraph: + graph = StateGraph(PlannerState) + graph.add_node("coordinator", coordinator_node) + graph.add_node("flight_specialist", flight_specialist_node) + graph.add_node("hotel_specialist", hotel_specialist_node) + graph.add_node("activity_specialist", activity_specialist_node) + graph.add_node("plan_synthesizer", plan_synthesizer_node) + graph.add_conditional_edges(START, should_continue) + graph.add_conditional_edges("coordinator", should_continue) + graph.add_conditional_edges("flight_specialist", should_continue) + graph.add_conditional_edges("hotel_specialist", should_continue) + graph.add_conditional_edges("activity_specialist", should_continue) + graph.add_conditional_edges("plan_synthesizer", should_continue) + return graph + + +# --------------------------------------------------------------------------- +# FastAPI app +# --------------------------------------------------------------------------- + +app = FastAPI( + title="Multi-Agent Travel Planner (Gunicorn)", + description="LangGraph travel planner served by Gunicorn + Uvicorn workers with zero-code OTel.", + version="0.1.0", +) + + +class PlanRequest(BaseModel): + origin: str = "Seattle" + destination: str = "Paris" + travellers: int = 2 + user_request: Optional[str] = None + + +class PlanResponse(BaseModel): + session_id: str + origin: str + destination: str + departure: str + return_date: str + travellers: int + flight_summary: Optional[str] + hotel_summary: Optional[str] + activities_summary: Optional[str] + final_itinerary: Optional[str] + + +@app.get("/health") +async def health(): + return {"status": "ok"} + + +@app.post("/plan", response_model=PlanResponse) +async def plan(request: PlanRequest): + """Run the multi-agent travel planner and return the itinerary.""" + session_id = str(uuid4()) + departure, return_date = _compute_dates() + + user_request = request.user_request or ( + f"Planning a week-long trip from {request.origin} to {request.destination}. " + "Looking for a boutique hotel, comfortable flights and unique local experiences." + ) + + initial_state: PlannerState = { + "messages": [HumanMessage(content=user_request)], + "user_request": user_request, + "session_id": session_id, + "origin": request.origin, + "destination": request.destination, + "departure": departure, + "return_date": return_date, + "travellers": request.travellers, + "flight_summary": None, + "hotel_summary": None, + "activities_summary": None, + "final_itinerary": None, + "current_agent": "start", + } + + workflow = build_workflow() + compiled = workflow.compile() + config = {"configurable": {"thread_id": session_id}, "recursion_limit": 10} + + final_state: Optional[PlannerState] = None + try: + for step in compiled.stream(initial_state, config): + _, node_state = next(iter(step.items())) + final_state = node_state + except Exception as exc: + raise HTTPException(status_code=500, detail=str(exc)) from exc + + if not final_state: + raise HTTPException(status_code=500, detail="Workflow produced no state") + + return PlanResponse( + session_id=session_id, + origin=request.origin, + destination=request.destination, + departure=departure, + return_date=return_date, + travellers=request.travellers, + flight_summary=final_state.get("flight_summary"), + hotel_summary=final_state.get("hotel_summary"), + activities_summary=final_state.get("activities_summary"), + final_itinerary=final_state.get("final_itinerary"), + ) diff --git a/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/collector/README.md b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/collector/README.md new file mode 100644 index 00000000..32ac24f1 --- /dev/null +++ b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/collector/README.md @@ -0,0 +1,94 @@ +# Splunk OTel Collector — Azure Container Instance + +This directory contains the configuration and deployment script for running the Splunk Distribution of the OpenTelemetry Collector as an **Azure Container Instance (ACI)** gateway. + +The collector receives traces and metrics from the App Service web app over OTLP gRPC (port 4317) and forwards them to Splunk Observability Cloud. + +## Prerequisites + +- Azure CLI installed and logged in (`az login`) +- Contributor rights on the target resource group +- Splunk Observability Cloud ingest token and realm + +## Required environment variables + +| Variable | Description | Example | +|---|---|---| +| `SPLUNK_ACCESS_TOKEN` | Splunk Observability Cloud ingest token | `gXgmP9v-...` | +| `SPLUNK_HEC_TOKEN` | Splunk HEC token for log ingestion | `bdef2e63-...` | +| `SPLUNK_HEC_URL` | Splunk HEC endpoint URL | `https://http-inputs-.splunkcloud.com:443/services/collector/event` | +| `SPLUNK_REALM` | Splunk Observability Cloud realm | `us1` | +| `RESOURCE_GROUP` | Azure resource group | `my-resource-group` | +| `STORAGE_ACCOUNT` | Storage account name (globally unique, lowercase, max 24 chars) | `myotelcfgstorage` | + +## Optional overrides + +| Variable | Default | Description | +|---|---|---| +| `LOCATION` | `westus` | Azure region (must match the resource group) | +| `CONTAINER_NAME` | `splunk-otel-collector` | ACI container name | +| `DEPLOYMENT_ENV` | `azure` | Value for `deployment.environment` resource attribute | +| `SPLUNK_MEMORY_LIMIT_MIB` | `900` | Memory ceiling for the collector's `memory_limiter` processor | + +## Deploy + +```bash +export SPLUNK_ACCESS_TOKEN= +export SPLUNK_HEC_TOKEN= +export SPLUNK_HEC_URL=https://http-inputs-.splunkcloud.com:443/services/collector/event +export SPLUNK_REALM=us1 +export RESOURCE_GROUP= +export STORAGE_ACCOUNT= + +chmod +x collector/deploy-aci.sh +./collector/deploy-aci.sh +``` + +The script prints the container's public IP at the end. Use it to configure `OTEL_EXPORTER_OTLP_ENDPOINT` in the App Service settings: + +```bash +az webapp config appsettings set \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${APP_NAME}" \ + --settings OTEL_EXPORTER_OTLP_ENDPOINT="http://:4317" +``` + +## Operations + +```bash +# Health check +curl http://:13133/ + +# Tail live logs +az container logs \ + --resource-group "${RESOURCE_GROUP}" \ + --name splunk-otel-collector \ + --follow + +# Restart +az container restart \ + --resource-group "${RESOURCE_GROUP}" \ + --name splunk-otel-collector + +# Delete +az container delete \ + --resource-group "${RESOURCE_GROUP}" \ + --name splunk-otel-collector \ + --yes +``` + +## Architecture + +``` +App Service (Gunicorn + Uvicorn) + │ OTLP gRPC :4317 + ▼ +Azure Container Instance + splunk-otel-collector + │ sapm + ├──────────────► Splunk APM (traces) + │ signalfx + ├──────────────► Splunk IMM (metrics) + │ splunk_hec + └──────────────► Splunk Log Observer (logs) +``` diff --git a/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/collector/deploy-aci.sh b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/collector/deploy-aci.sh new file mode 100644 index 00000000..a879dd51 --- /dev/null +++ b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/collector/deploy-aci.sh @@ -0,0 +1,180 @@ +#!/usr/bin/env bash +# ============================================================================= +# deploy-aci.sh — Deploy Splunk OTel Collector to Azure Container Instance +# +# Prerequisites: +# - Azure CLI installed and logged in (az login) +# - Contributor rights on the resource group +# +# Required environment variables (set before running): +# SPLUNK_ACCESS_TOKEN Splunk Observability Cloud ingest token +# SPLUNK_HEC_TOKEN Splunk HEC token for log ingestion +# SPLUNK_HEC_URL Splunk HEC endpoint URL +# SPLUNK_REALM Splunk realm (e.g. us1, eu0) +# RESOURCE_GROUP Azure resource group +# STORAGE_ACCOUNT Storage account name (globally unique, lowercase, max 24 chars) +# +# Optional overrides (defaults shown below): +# LOCATION Azure region (default: westus) +# CONTAINER_NAME ACI container name (default: splunk-otel-collector) +# DEPLOYMENT_ENV deployment.environment tag (default: azure) +# SPLUNK_MEMORY_LIMIT_MIB Memory ceiling in MiB (default: 900) +# +# Usage: +# export SPLUNK_ACCESS_TOKEN= +# export SPLUNK_HEC_TOKEN= +# export SPLUNK_HEC_URL=https://http-inputs-.splunkcloud.com:443/services/collector/event +# export SPLUNK_REALM=us1 +# export RESOURCE_GROUP= +# export STORAGE_ACCOUNT= +# chmod +x deploy-aci.sh && ./deploy-aci.sh +# ============================================================================= +set -euo pipefail + +# --------------------------------------------------------------------------- +# Validate required env vars +# --------------------------------------------------------------------------- +: "${SPLUNK_ACCESS_TOKEN:?Required env var SPLUNK_ACCESS_TOKEN is not set}" +: "${SPLUNK_HEC_TOKEN:?Required env var SPLUNK_HEC_TOKEN is not set}" +: "${SPLUNK_HEC_URL:?Required env var SPLUNK_HEC_URL is not set}" +: "${SPLUNK_REALM:?Required env var SPLUNK_REALM is not set}" +: "${RESOURCE_GROUP:?Required env var RESOURCE_GROUP is not set}" +: "${STORAGE_ACCOUNT:?Required env var STORAGE_ACCOUNT is not set (globally unique, lowercase, max 24 chars)}" + +# --------------------------------------------------------------------------- +# Config — override via env vars or edit defaults here +# --------------------------------------------------------------------------- +LOCATION="${LOCATION:-westus}" # must match your RG location +FILE_SHARE="${FILE_SHARE:-otelconfig}" +CONFIG_FILE="$(dirname "$0")/otel-collector-config.yaml" + +CONTAINER_NAME="${CONTAINER_NAME:-splunk-otel-collector}" +# Pinned to 0.123.0 — the sapm exporter was deprecated in 0.115.0 and removed +# in 0.147.0 (collector v1.12.0). If upgrading past 0.147.0, verify the +# otel-collector-config.yaml uses otlphttp/splunk instead of sapm for traces. +CONTAINER_IMAGE="${CONTAINER_IMAGE:-quay.io/signalfx/splunk-otel-collector:0.123.0}" + +DEPLOYMENT_ENV="${DEPLOYMENT_ENV:-azure}" + +# Derived from realm +SPLUNK_API_URL="https://api.${SPLUNK_REALM}.signalfx.com" +SPLUNK_INGEST_URL="https://ingest.${SPLUNK_REALM}.signalfx.com" +SPLUNK_TRACE_URL="https://ingest.${SPLUNK_REALM}.signalfx.com/v2/trace" + +# Container paths for the Splunk OTel Collector image (fixed for this image) +SPLUNK_BUNDLE_DIR="/usr/lib/splunk-otel-collector/agent-bundle" +SPLUNK_COLLECTD_DIR="/usr/lib/splunk-otel-collector/agent-bundle/run/collectd" +SPLUNK_LISTEN_INTERFACE="0.0.0.0" +SPLUNK_MEMORY_LIMIT_MIB="${SPLUNK_MEMORY_LIMIT_MIB:-900}" + +# --------------------------------------------------------------------------- +# 1. Create Storage Account + File Share (for the config file) +# --------------------------------------------------------------------------- +echo "" +echo "==> [1/4] Creating storage account: ${STORAGE_ACCOUNT}" +az storage account create \ + --name "${STORAGE_ACCOUNT}" \ + --resource-group "${RESOURCE_GROUP}" \ + --location "${LOCATION}" \ + --sku Standard_LRS \ + --kind StorageV2 \ + --output none + +STORAGE_KEY=$(az storage account keys list \ + --resource-group "${RESOURCE_GROUP}" \ + --account-name "${STORAGE_ACCOUNT}" \ + --query "[0].value" \ + --output tsv) + +echo "==> [1/4] Creating file share: ${FILE_SHARE}" +az storage share create \ + --name "${FILE_SHARE}" \ + --account-name "${STORAGE_ACCOUNT}" \ + --account-key "${STORAGE_KEY}" \ + --output none + +# --------------------------------------------------------------------------- +# 2. Upload collector config to the file share +# --------------------------------------------------------------------------- +echo "" +echo "==> [2/4] Uploading otel-collector-config.yaml to file share" +az storage file upload \ + --share-name "${FILE_SHARE}" \ + --source "${CONFIG_FILE}" \ + --path "otel-collector-config.yaml" \ + --account-name "${STORAGE_ACCOUNT}" \ + --account-key "${STORAGE_KEY}" \ + --output none + +# --------------------------------------------------------------------------- +# 3. Deploy the Azure Container Instance +# --------------------------------------------------------------------------- +echo "" +echo "==> [3/4] Creating container instance: ${CONTAINER_NAME}" + +az container create \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${CONTAINER_NAME}" \ + --image "${CONTAINER_IMAGE}" \ + --os-type Linux \ + --cpu 1 \ + --memory 2 \ + --restart-policy Always \ + \ + --ports 4317 4318 13133 6060 9411 \ + --ip-address Public \ + \ + --azure-file-volume-account-name "${STORAGE_ACCOUNT}" \ + --azure-file-volume-account-key "${STORAGE_KEY}" \ + --azure-file-volume-share-name "${FILE_SHARE}" \ + --azure-file-volume-mount-path /etc/otel \ + \ + --command-line "/otelcol --config /etc/otel/otel-collector-config.yaml" \ + \ + --environment-variables \ + SPLUNK_ACCESS_TOKEN="${SPLUNK_ACCESS_TOKEN}" \ + SPLUNK_HEC_TOKEN="${SPLUNK_HEC_TOKEN}" \ + SPLUNK_HEC_URL="${SPLUNK_HEC_URL}" \ + SPLUNK_API_URL="${SPLUNK_API_URL}" \ + SPLUNK_INGEST_URL="${SPLUNK_INGEST_URL}" \ + SPLUNK_TRACE_URL="${SPLUNK_TRACE_URL}" \ + SPLUNK_REALM="${SPLUNK_REALM}" \ + SPLUNK_BUNDLE_DIR="${SPLUNK_BUNDLE_DIR}" \ + SPLUNK_COLLECTD_DIR="${SPLUNK_COLLECTD_DIR}" \ + SPLUNK_LISTEN_INTERFACE="${SPLUNK_LISTEN_INTERFACE}" \ + SPLUNK_MEMORY_LIMIT_MIB="${SPLUNK_MEMORY_LIMIT_MIB}" \ + OTEL_RESOURCE_ATTRIBUTES="deployment.environment=${DEPLOYMENT_ENV}" \ + \ + --output table + +# --------------------------------------------------------------------------- +# 4. Show the public IP so we can update OTEL_EXPORTER_OTLP_ENDPOINT +# --------------------------------------------------------------------------- +echo "" +echo "==> [4/4] Fetching container IP address" +COLLECTOR_IP=$(az container show \ + --resource-group "${RESOURCE_GROUP}" \ + --name "${CONTAINER_NAME}" \ + --query "ipAddress.ip" \ + --output tsv) + +echo "" +echo "============================================================" +echo " Splunk OTel Collector deployed successfully!" +echo "============================================================" +echo " Container: ${CONTAINER_NAME}" +echo " Public IP: ${COLLECTOR_IP}" +echo "" +echo " Update your App Service configuration:" +echo "" +echo " az webapp config appsettings set \\" +echo " --resource-group \"${RESOURCE_GROUP}\" \\" +echo " --name \"\${APP_NAME}\" \\" +echo " --settings OTEL_EXPORTER_OTLP_ENDPOINT=\"http://${COLLECTOR_IP}:4317\"" +echo "" +echo " Health check:" +echo " curl http://${COLLECTOR_IP}:13133/" +echo "" +echo " View logs:" +echo " az container logs --resource-group ${RESOURCE_GROUP} --name ${CONTAINER_NAME} --follow" +echo "============================================================" diff --git a/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/collector/otel-collector-config.yaml b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/collector/otel-collector-config.yaml new file mode 100644 index 00000000..74f50bd6 --- /dev/null +++ b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/collector/otel-collector-config.yaml @@ -0,0 +1,142 @@ +# Splunk OTel Collector — gateway config for Azure Container Instance +# +# Tested with: quay.io/signalfx/splunk-otel-collector:0.123.0 +# +# Required env vars (set in ACI via deploy-aci.sh): +# SPLUNK_ACCESS_TOKEN Splunk Observability Cloud ingest token +# SPLUNK_API_URL e.g. https://api.us1.signalfx.com +# SPLUNK_INGEST_URL e.g. https://ingest.us1.signalfx.com +# SPLUNK_TRACE_URL e.g. https://ingest.us1.signalfx.com/v2/trace +# SPLUNK_HEC_TOKEN Splunk HEC token for log ingestion +# SPLUNK_HEC_URL e.g. https://http-inputs-.splunkcloud.com:443/services/collector/event +# SPLUNK_BUNDLE_DIR /usr/lib/splunk-otel-collector/agent-bundle +# SPLUNK_COLLECTD_DIR /usr/lib/splunk-otel-collector/agent-bundle/run/collectd +# SPLUNK_LISTEN_INTERFACE 0.0.0.0 in ACI +# SPLUNK_MEMORY_LIMIT_MIB e.g. 900 +# +# Note: deployment.environment is set by the application via OTEL_RESOURCE_ATTRIBUTES. +# The resourcedetection processor uses override: false so it never overwrites +# app-supplied attributes. + +extensions: + health_check: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:13133" + http_forwarder: + ingress: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:6060" + egress: + endpoint: "${SPLUNK_API_URL}" + +receivers: + otlp: + protocols: + grpc: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:4317" + http: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:4318" + jaeger: + protocols: + grpc: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:14250" + thrift_binary: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:6832" + thrift_compact: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:6831" + thrift_http: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:14268" + zipkin: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:9411" + signalfx: + endpoint: "${SPLUNK_LISTEN_INTERFACE}:9943" + prometheus/internal: + config: + scrape_configs: + - job_name: 'otel-collector' + scrape_interval: 10s + static_configs: + - targets: ["localhost:8888"] + metric_relabel_configs: + - source_labels: [__name__] + regex: 'otelcol_rpc_.*' + action: drop + - source_labels: [__name__] + regex: 'otelcol_http_.*' + action: drop + - source_labels: [__name__] + regex: 'otelcol_processor_batch_.*' + action: drop + nop: + +processors: + batch: + metadata_keys: + - X-SF-Token + memory_limiter: + check_interval: 2s + limit_mib: ${SPLUNK_MEMORY_LIMIT_MIB} + resourcedetection: + detectors: [azure, system] + # override: false preserves app-sent attributes (e.g. deployment.environment + # from OTEL_RESOURCE_ATTRIBUTES). Cloud/host attributes from the detector are + # still added when they are absent from the incoming resource. + override: false +exporters: + # Traces → Splunk APM via OTLP HTTP (replaces sapm, removed in latest image) + otlphttp/splunk: + traces_endpoint: "${SPLUNK_INGEST_URL}/v2/trace/otlp" + headers: + X-SF-Token: "${SPLUNK_ACCESS_TOKEN}" + signalfx: + access_token: "${SPLUNK_ACCESS_TOKEN}" + api_url: "${SPLUNK_API_URL}" + ingest_url: "${SPLUNK_INGEST_URL}" + sync_host_metadata: true + correlation: + send_otlp_histograms: true + otlphttp/entities: + logs_endpoint: "${SPLUNK_INGEST_URL}/v3/event" + headers: + "X-SF-Token": "${SPLUNK_ACCESS_TOKEN}" + splunk_hec: + token: "${SPLUNK_HEC_TOKEN}" + endpoint: "${SPLUNK_HEC_URL}" + source: "otel" + sourcetype: "otel" + profiling_data_enabled: false + splunk_hec/profiling: + token: "${SPLUNK_ACCESS_TOKEN}" + endpoint: "${SPLUNK_INGEST_URL}/v1/log" + log_data_enabled: false + debug: + verbosity: basic + +service: + telemetry: + metrics: + level: basic + extensions: [health_check, http_forwarder] + pipelines: + traces: + receivers: [jaeger, otlp, zipkin] + processors: [memory_limiter, batch, resourcedetection] + exporters: [otlphttp/splunk, signalfx] + metrics: + receivers: [otlp, signalfx] + processors: [memory_limiter, batch, resourcedetection] + exporters: [signalfx] + metrics/internal: + receivers: [prometheus/internal] + processors: [memory_limiter, batch, resourcedetection] + exporters: [signalfx] + logs/signalfx: + receivers: [signalfx] + processors: [memory_limiter, batch, resourcedetection] + exporters: [signalfx] + logs/entities: + receivers: [nop] + processors: [memory_limiter, batch, resourcedetection] + exporters: [otlphttp/entities] + logs: + receivers: [otlp] + processors: [memory_limiter, batch, resourcedetection] + exporters: [splunk_hec, splunk_hec/profiling] diff --git a/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/requirements.txt b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/requirements.txt new file mode 100644 index 00000000..8a7cc5f9 --- /dev/null +++ b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/requirements.txt @@ -0,0 +1,11 @@ +fastapi +gunicorn +uvicorn[standard] +langchain +langchain-core +langchain-openai +langgraph +python-dotenv +opentelemetry-distro +opentelemetry-exporter-otlp +splunk-otel-instrumentation-langchain==0.1.14 diff --git a/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/startup.sh b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/startup.sh new file mode 100644 index 00000000..fc901a28 --- /dev/null +++ b/instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/gunicorn/startup.sh @@ -0,0 +1,59 @@ +#!/bin/sh +# ============================================================================= +# startup.sh — Azure App Service entry point for FastAPI + Gunicorn + OTel +# +# Why this file exists: +# App Service runs startup commands with /bin/sh (not bash). +# Oryx builds a virtual environment called "antenv" and sets PYTHONPATH to +# its site-packages, but does NOT add antenv/bin to PATH. +# This means `opentelemetry-instrument` and `gunicorn` are not found without +# first activating the virtual environment. +# +# This script: +# 1. Activates antenv using POSIX-compatible `.` (not bash `source`) +# 2. Runs opentelemetry-instrument gunicorn with Uvicorn workers +# +# Startup command to set in App Service → Configuration → General settings: +# sh startup.sh +# ============================================================================= + +set -e + +echo "[startup] Python: $(python3 --version 2>&1)" +echo "[startup] Working directory: $(pwd)" + +# Activate the Oryx-built virtual environment. +# antenv is always co-located with app.py after Oryx extracts the zip. +if [ -f "antenv/bin/activate" ]; then + # shellcheck disable=SC1091 + . antenv/bin/activate + echo "[startup] Activated antenv at $(pwd)/antenv" +else + echo "[startup] WARNING: antenv/bin/activate not found — falling back to system Python" + echo "[startup] Contents of current directory:" + ls -la +fi + +# App Service sets PORT; fall back to 8000 (App Service default for custom apps). +APP_PORT="${PORT:-8000}" +echo "[startup] Launching on port ${APP_PORT}" +echo "[startup] Service: ${OTEL_SERVICE_NAME:-multi-agent-travel-planner-azure}" +echo "[startup] OTLP endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT:-not set}" +echo "[startup] Environment: ${OTEL_RESOURCE_ATTRIBUTES:-not set}" + +# OTel is initialised programmatically inside app.py via initialize() with a +# sys.modules guard (see the top of app.py for the full explanation). +# This runs post-fork in each Gunicorn worker, giving every worker its own +# fresh PeriodicExportingMetricReader thread — fixing the silent metric drop +# caused by the opentelemetry-instrument wrapper + --preload pattern. +# +# UvicornWorker: required to serve the FastAPI ASGI application. +# --timeout 301: slightly above the default 300s to allow the LLM pipeline +# to complete before Gunicorn kills a slow worker. +exec gunicorn \ + -w 1 \ + -k uvicorn.workers.UvicornWorker \ + app:app \ + --access-logfile "-" \ + --timeout 301 \ + --bind "0.0.0.0:${APP_PORT}"