Skip to content

feat(langchain): multi-agent travel planner with Gunicorn + Azure App Service deployment#347

Open
adityamehra wants to merge 2 commits into
mainfrom
feat/multi-agent-travel-planner-gunicorn
Open

feat(langchain): multi-agent travel planner with Gunicorn + Azure App Service deployment#347
adityamehra wants to merge 2 commits into
mainfrom
feat/multi-agent-travel-planner-gunicorn

Conversation

@adityamehra

@adityamehra adityamehra commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds examples/multi_agent_travel_planner/gunicorn/ — a five-agent LangGraph travel itinerary pipeline (coordinator → flight → hotel → activity → synthesizer) served by Gunicorn + UvicornWorker with full OpenTelemetry observability.
  • Includes a complete Azure deployment path: Azure App Service (Python 3.12) + Splunk OTel Collector 0.123.0 in Azure Container Instance (ACI).
  • Verified end-to-end: traces and metrics visible in Splunk Observability Cloud from both local Gunicorn and Azure App Service.

Files added

File Purpose
app.py FastAPI application; five LangGraph agents with _gc suffix agent names
startup.sh Azure App Service startup script (Gunicorn + UvicornWorker, no --preload)
requirements.txt Pinned splunk-otel-instrumentation-langchain==0.1.14 from PyPI
.env.example Template for all required environment variables
README.md Full local + Azure deployment guide with verification steps
collector/otel-collector-config.yaml Splunk OTel Collector config (ACI gateway mode)
collector/deploy-aci.sh Script to deploy/redeploy the ACI collector
collector/README.md Collector setup and troubleshooting notes

Key design decisions

Fork-safe OTel initialization (app.py)

The standard opentelemetry-instrument gunicorn pattern fails silently for metrics on Linux when --preload is used: the PeriodicExportingMetricReader background thread is created in the master process and does not survive fork(). This example uses programmatic initialization (initialize() called at module import time) so each Gunicorn worker initializes the full OTel SDK — including a fresh metric reader thread — after the fork:

_OTEL_INIT_KEY = "__travel_planner_otel_initialized__"
if _OTEL_INIT_KEY not in sys.modules:
    sys.modules[_OTEL_INIT_KEY] = True
    from opentelemetry import trace as _otel_trace
    from opentelemetry.sdk.trace import TracerProvider as _SDKTracerProvider
    if not isinstance(_otel_trace.get_tracer_provider(), _SDKTracerProvider):
        from opentelemetry.instrumentation.auto_instrumentation import initialize
        initialize()

The sys.modules sentinel prevents double-init on reimport; the TracerProvider type check lets the CLI wrapper (opentelemetry-instrument) and programmatic approaches coexist.

For more details please refer the OTel troubleshooting guide

Collector: sapmotlphttp/splunk

splunk-otel-collector:latest (≥ v0.147.0 / collector v1.12.0) removed the sapm exporter. The config uses otlphttp/splunk for traces instead and the collector image is pinned to 0.123.0 for stability.

resourcedetection processor: override: false

Ensures deployment.environment set by the app via OTEL_RESOURCE_ATTRIBUTES is never overwritten by the collector's resource detection.

Telemetry emitted

  • Spans: gen_ai.agent.invoke (one per LangGraph node) + gen_ai.client.chat (one per LLM call)
  • Metrics (delta): gen_ai.client.token.usage, gen_ai.client.operation.duration
  • All signals carry gen_ai.agent.name, deployment.environment, service.name, gen_ai.request.model

Test plan

# Local
cd examples/multi_agent_travel_planner/gunicorn
pip install -r requirements.txt
gunicorn -w 1 -k uvicorn.workers.UvicornWorker app:app --bind 0.0.0.0:8000

curl -X POST http://localhost:8000/plan \
  -H "Content-Type: application/json" \
  -d '{"origin":"Seattle","destination":"Tokyo","travellers":2}'
  • Response returns flight_summary, hotel_summary, activities_summary, final_itinerary
  • Traces appear in Splunk APM Trace Analyzer under the configured service.name
  • gen_ai.client.token.usage metric appears in Splunk APM Agent view per agent

Made with Cursor

… + Azure deployment

Adds a new example demonstrating a five-agent LangGraph travel planner
(coordinator → flight → hotel → activity → synthesizer) served by Gunicorn
+ UvicornWorker with full OpenTelemetry observability.

Key design decisions:
- Programmatic OTel initialization via initialize() at module import time,
  running post-fork in each Gunicorn worker to ensure PeriodicExportingMetricReader
  starts in the correct process (fixes silent metric drop on Linux with --preload).
- sys.modules sentinel + TracerProvider type check allow both the CLI wrapper
  (opentelemetry-instrument) and programmatic approaches to coexist without
  double-initialization.
- Agent names carry a _gc suffix to distinguish the Gunicorn deployment from
  other runtimes in Splunk APM Agent view.
- Azure App Service deployment via startup.sh + Oryx build (SCM_DO_BUILD_DURING_DEPLOYMENT=true).
- Splunk OTel Collector in Azure Container Instance (pinned to 0.123.0) with
  otlphttp/splunk exporter replacing the deprecated sapm exporter.
- resourcedetection processor uses override: false so app-supplied
  deployment.environment is never overwritten by the collector.

Verified: traces (gen_ai.agent.invoke, gen_ai.client.chat) and metrics
(gen_ai.client.token.usage, gen_ai.client.operation.duration) are visible
in Splunk Observability Cloud from both local Gunicorn and Azure App Service.

Co-authored-by: Cursor <cursoragent@cursor.com>
@adityamehra adityamehra requested review from a team as code owners June 12, 2026 00:58
…corn example

Co-authored-by: Cursor <cursoragent@cursor.com>

@keith-decker keith-decker left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPLUNK_ACCESS_TOKEN and SPLUNK_HEC_TOKEN are passed as normal ACI env vars; they should use secure env vars.

But that shouldn't block the demo.

I'm approving, but we might want to consider pinning the langchain and langgraph dependencies. We can either catch when the demo breaks on updates without that, or keep the demo working through updates.


final_state: Optional[PlannerState] = None
try:
for step in compiled.stream(initial_state, config):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a synchronous blocking iterator, not async.
Probably compiled.astream() (async version) is better

_, node_state = next(iter(step.items()))
final_state = node_state
except Exception as exc:
raise HTTPException(status_code=500, detail=str(exc)) from exc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The details might contain endpoints or secrets which are exposed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants