Skip to content

Latest commit

 

History

History
219 lines (156 loc) · 13.2 KB

File metadata and controls

219 lines (156 loc) · 13.2 KB

📊 Observability — distributed tracing

The app can emit distributed traces to either Elastic APM or Jaeger via OpenTelemetry. Tracing is off by default and activated by one of two Spring profiles, each paired with its own Docker Compose overlay.

👉 Back to the main README.

Table of contents

  1. Backends — pick one
  2. Stack components
  3. Run with Elastic APM
  4. Run with Jaeger
  5. Run without tracing (default)
  6. Ports & URLs
  7. Exploring traces in Kibana — guided tour
  8. Same traces in Jaeger
  9. Useful filters
  10. Why Axon splits work across multiple traces

Backends — pick one

Backend Spring profile Compose overlay Footprint UI When to use
Elastic APM observability-elastic docker-compose.observability-elastic.yaml ~2 GB RAM Kibana APM (rich, service map, KQL) Production-realistic UX, queries, dashboards
Jaeger observability-jaeger docker-compose.observability-jaeger.yaml ~150 MB RAM Jaeger UI (simple waterfall) Quick spin-up, demos, CI, constrained machines

The two profiles are alternatives — pick the backend you want for a given session. The base observability profile (sampling, OTLP/HTTP transport, Axon OpenTelemetrySpanFactory wiring) is inherited by both via Spring's spring.profiles.group, so switching backends is a one-line change.

Stack components

Component Role
axon-tracing-opentelemetry Instruments command/event/query handlers, aggregates, repositories, event store
micrometer-tracing-bridge-otel Spring Boot's official bridge from Micrometer Observation to OpenTelemetry
opentelemetry-exporter-otlp Pushes traces over OTLP/HTTP to the chosen backend
Elastic APM 9.x Receives OTLP, stores in Elasticsearch, visualizes in Kibana
Jaeger 2.x Receives OTLP directly, in-memory storage, lightweight UI

Activation surface:

  • Base profile observability — common tracing config (sampling, transport, Axon span factory bean). Never activated directly.
  • Profile observability-elastic — base + Elastic APM endpoint
  • Profile observability-jaeger — base + Jaeger endpoint
  • Profile groups in application.yaml make the two child profiles automatically include the base.

Run with Elastic APM

  1. Start the base stack + Elastic observability stack:

    docker compose -f docker-compose.yaml -f docker-compose.observability-elastic.yaml up -d

    Wait ~60s for Elasticsearch and Kibana to be ready.

  2. Run the app with the observability-elastic profile:

    SPRING_PROFILES_ACTIVE=observability-elastic ./mvnw spring-boot:run

    Or: ./mvnw spring-boot:run -Dspring-boot.run.profiles=observability-elastic.

  3. Generate some traffic via Swagger UI at http://localhost:3773/swagger-ui/index.html.

  4. Open Kibana APM — see the guided tour below.

Run with Jaeger

  1. Start the base stack + Jaeger:

    docker compose -f docker-compose.yaml -f docker-compose.observability-jaeger.yaml up -d

    Jaeger v2 starts in seconds — UI is ready almost immediately.

  2. Run the app with the observability-jaeger profile:

    SPRING_PROFILES_ACTIVE=observability-jaeger ./mvnw spring-boot:run

    Or: ./mvnw spring-boot:run -Dspring-boot.run.profiles=observability-jaeger.

  3. Generate some traffic via Swagger UI at http://localhost:3773/swagger-ui/index.html.

  4. Open Jaeger UI at http://localhost:16686 — see Same traces in Jaeger below.

Run without tracing (default)

docker compose up -d
./mvnw spring-boot:run

No observability-* profile = no traces emitted, no extra containers needed. Useful for daily development.

Ports & URLs

Service Port URL Used by profile
Kibana (APM UI) 5601 http://localhost:5601 observability-elastic
Kibana → APM → Services http://localhost:5601/app/apm/services observability-elastic
Kibana → APM → Traces http://localhost:5601/app/apm/traces observability-elastic
Elasticsearch 9200 http://localhost:9200 observability-elastic
Elastic APM Server (OTLP receiver) 8200 http://localhost:8200/v1/traces observability-elastic
Jaeger UI 16686 http://localhost:16686 observability-jaeger
Jaeger OTLP receiver (HTTP) 4318 http://localhost:4318/v1/traces observability-jaeger
Jaeger OTLP receiver (gRPC) 4317 grpc://localhost:4317 observability-jaeger

Exploring traces in Kibana — guided tour

A quick walkthrough of what you can see and where to click. Examples below were captured after running the full chain BuildDwelling → IncreaseAvailableCreatures → RecruitCreature → GetDwellingById through Swagger UI with the observability-elastic profile active.

1. Service inventory

Kibana → ☰ → Observability → APM → Services — landing page lists every service emitting traces. After firing a few requests you should see heroesofddd with average latency, throughput, and error rate.

Service inventory

2. Transactions

Click heroesofdddTransactions tab. Each row is a distinct "entry point" — both HTTP endpoints (auto-instrumented by Spring Web) and Axon's async boundaries (each command/event/query handler is a top-level transaction because Axon hops across the gRPC bus and async event processors).

Transactions

3. Traces — the full list

Kibana → APM → Traces shows every individual trace tree.

For this project you'll see roots like:

  • http put /games/{gameId}/dwellings/{dwellingId} — HTTP root
  • CommandBus.handleDistributedCommand(RecruitCreature) — command handling on the aggregate side after the gRPC hop to Axon Server
  • StreamingEventProcessor.process(CreatureRecruited) — projector / automation
  • CommandBus.handleCommand(AddCreatureToArmy) — command emitted by the WhenCreatureRecruitedThenAddToArmy automation
  • QueryBus.processQueryMessage(GetDwellingById) — query side

Traces list

4. Trace waterfall — Axon internals visible

Click any CommandBus.handleCommand(...) trace and Kibana renders a waterfall like this — the full call path of the Axon command handler, including aggregate loading and event publication:

Axon trace waterfall

What you're looking at:

✓ CommandBus.handleDistributedCommand(RecruitCreature)     22 ms     ← gRPC server side
  └─ CommandBus.dispatchCommand(RecruitCreature)           22 ms
     └─ CommandBus.handleCommand(RecruitCreature)          22 ms
        ├─ Repository.load                                 10 ms     ← event sourcing
        │  ├─ Repository.obtainLock                        41 μs
        │  └─ Repository.initializeState(Dwelling)          1.0 ms   ← rehydrate aggregate
        ├─ Dwelling.decide(RecruitCreature)                 3.2 ms   ← AGGREGATE business logic
        ├─ EventBus.publishEvent(CreatureRecruited)        29 μs
        └─ EventBus.commitEvents                            5.8 ms   ← persist to event store

This is exactly the layered shape from Event Sourcing theory — repository → aggregate → event publication — rendered as data, not as a diagram in a slide.

5. Span attributes — gameId, playerId, message metadata

Click any Axon span (e.g. CommandBus.handleCommand(RecruitCreature)) → Metadata tab. The flyout shows OpenTelemetry attributes — including correlation data injected by GameConfiguration.gameDataProvider:

Span attributes

labels.axon_metadata_gameId    = scenario-1                  ← from gameDataProvider
labels.axon_metadata_playerId  = player-1                    ← from gameDataProvider
labels.axon_message_id         = 2201ae5d-3871-45b2-a661-...
labels.axon_message_name       = com.dddheroes…RecruitCreature
labels.axon_message_type       = GrpcBackedCommandMessage
labels.axon_payload_type       = com.dddheroes…RecruitCreature

This is the practical payoff: filtering traces by labels.axon_metadata_gameId : "scenario-1" in the Kibana search bar isolates every span — across every aggregate, processor and projector — that participated in one game session.

Same traces in Jaeger

With observability-jaeger active, the same trace data is produced — Jaeger just renders it differently:

  1. Open http://localhost:16686.
  2. Service dropdown → pick heroesofddd.
  3. Operation dropdown → e.g. CommandBus.handleCommand(RecruitCreature).
  4. Click Find Traces → see the same waterfall tree (Repository.load, Dwelling.decide, EventBus.publishEvent, …).
  5. Click any span → "Tags" panel shows the same OTel attributes as Kibana labels, with dot-notation: axon.message.id, axon.message.name, axon.metadata.gameId, axon.metadata.playerId, etc.

Trade-offs vs Kibana APM:

  • Lighter — one container, instant startup, no Elasticsearch index management
  • Simpler — direct trace search by service / operation / tag / duration
  • No service map — Kibana shows topology between services; Jaeger v2 OSS doesn't
  • No KQL — Jaeger uses a simpler tag-equality search (tag: axon.metadata.gameId=scenario-1) rather than full KQL
  • No persistence by default — all-in-one stores traces in memory; restart loses them
  • No metrics/logs correlation — Kibana correlates traces with the rest of the Elastic stack

Useful filters

Kibana (KQL)

Paste into the Kibana search bar (KQL) at the top of any APM page:

Goal KQL
Traces for one game labels.axon_metadata_gameId : "scenario-1"
Only command handlers transaction.name : "CommandBus.handleCommand*"
Only one aggregate's decisions span.name : "Dwelling.decide(*)"
Only automation reactions span.name : "*Processor.react(*)"
Only event publications span.name : "EventBus.publishEvent(*)"

Jaeger (Tags field)

In the Jaeger UI search form, the Tags field accepts space-separated key=value pairs:

Goal Tag query
Traces for one game axon.metadata.gameId=scenario-1
One specific player's traces axon.metadata.playerId=player-1
Combine axon.metadata.gameId=scenario-1 axon.metadata.playerId=player-1

(Operation-level filtering — e.g. "only Dwelling.decide spans" — is done via the Operation dropdown, not the Tags field.)

Why Axon splits work across multiple traces

You'll notice that an HTTP request often produces two or three separate trace trees rather than one giant tree. That's expected. Axon hops over async boundaries that don't preserve OpenTelemetry context: the gRPC call to Axon Server (server-side handleDistributedCommand starts a new root) and the asynchronous event processors (each process(Event) is its own root). Inside one boundary, however, the tree is complete — as the waterfall above shows. Behavior is identical in both Kibana APM and Jaeger.

To stitch sessions together end-to-end, use the axon_metadata_gameId (Kibana) / axon.metadata.gameId (Jaeger) tag filter described above.