📊 Observability — distributed tracing

The app can emit distributed traces to either Elastic APM or Jaeger via OpenTelemetry. Tracing is off by default and activated by one of two Spring profiles, each paired with its own Docker Compose overlay.

👉 Back to the main README.

Backends — pick one
Stack components
Run with Elastic APM
Run with Jaeger
Run without tracing (default)
Ports & URLs
Exploring traces in Kibana — guided tour
Same traces in Jaeger
Useful filters
Why Axon splits work across multiple traces

Backends — pick one

Backend	Spring profile	Compose overlay	Footprint	UI	When to use
Elastic APM	`observability-elastic`	`docker-compose.observability-elastic.yaml`	~2 GB RAM	Kibana APM (rich, service map, KQL)	Production-realistic UX, queries, dashboards
Jaeger	`observability-jaeger`	`docker-compose.observability-jaeger.yaml`	~150 MB RAM	Jaeger UI (simple waterfall)	Quick spin-up, demos, CI, constrained machines

The two profiles are alternatives — pick the backend you want for a given session. The base observability profile (sampling, OTLP/HTTP transport, Axon OpenTelemetrySpanFactory wiring) is inherited by both via Spring's spring.profiles.group, so switching backends is a one-line change.

Stack components

Component	Role
`axon-tracing-opentelemetry`	Instruments command/event/query handlers, aggregates, repositories, event store
`micrometer-tracing-bridge-otel`	Spring Boot's official bridge from Micrometer Observation to OpenTelemetry
`opentelemetry-exporter-otlp`	Pushes traces over OTLP/HTTP to the chosen backend
Elastic APM 9.x	Receives OTLP, stores in Elasticsearch, visualizes in Kibana
Jaeger 2.x	Receives OTLP directly, in-memory storage, lightweight UI

Activation surface:

Base profile observability — common tracing config (sampling, transport, Axon span factory bean). Never activated directly.
Profile observability-elastic — base + Elastic APM endpoint
Profile observability-jaeger — base + Jaeger endpoint
Profile groups in application.yaml make the two child profiles automatically include the base.

Run with Elastic APM

Start the base stack + Elastic observability stack:
```
docker compose -f docker-compose.yaml -f docker-compose.observability-elastic.yaml up -d
```
Wait ~60s for Elasticsearch and Kibana to be ready.
Run the app with the observability-elastic profile:
```
SPRING_PROFILES_ACTIVE=observability-elastic ./mvnw spring-boot:run
```
Or: ./mvnw spring-boot:run -Dspring-boot.run.profiles=observability-elastic.
Generate some traffic via Swagger UI at http://localhost:3773/swagger-ui/index.html.
Open Kibana APM — see the guided tour below.

Run with Jaeger

Start the base stack + Jaeger:
```
docker compose -f docker-compose.yaml -f docker-compose.observability-jaeger.yaml up -d
```
Jaeger v2 starts in seconds — UI is ready almost immediately.
Run the app with the observability-jaeger profile:
```
SPRING_PROFILES_ACTIVE=observability-jaeger ./mvnw spring-boot:run
```
Or: ./mvnw spring-boot:run -Dspring-boot.run.profiles=observability-jaeger.
Generate some traffic via Swagger UI at http://localhost:3773/swagger-ui/index.html.
Open Jaeger UI at http://localhost:16686 — see Same traces in Jaeger below.

Run without tracing (default)

docker compose up -d
./mvnw spring-boot:run

No observability-* profile = no traces emitted, no extra containers needed. Useful for daily development.

Ports & URLs

Service	Port	URL	Used by profile
Kibana (APM UI)	5601	http://localhost:5601	`observability-elastic`
Kibana → APM → Services		http://localhost:5601/app/apm/services	`observability-elastic`
Kibana → APM → Traces		http://localhost:5601/app/apm/traces	`observability-elastic`
Elasticsearch	9200	http://localhost:9200	`observability-elastic`
Elastic APM Server (OTLP receiver)	8200	http://localhost:8200/v1/traces	`observability-elastic`
Jaeger UI	16686	http://localhost:16686	`observability-jaeger`
Jaeger OTLP receiver (HTTP)	4318	http://localhost:4318/v1/traces	`observability-jaeger`
Jaeger OTLP receiver (gRPC)	4317	grpc://localhost:4317	`observability-jaeger`

Exploring traces in Kibana — guided tour

A quick walkthrough of what you can see and where to click. Examples below were captured after running the full chain BuildDwelling → IncreaseAvailableCreatures → RecruitCreature → GetDwellingById through Swagger UI with the observability-elastic profile active.

1. Service inventory

Kibana → ☰ → Observability → APM → Services — landing page lists every service emitting traces. After firing a few requests you should see heroesofddd with average latency, throughput, and error rate.

2. Transactions

Click heroesofddd → Transactions tab. Each row is a distinct "entry point" — both HTTP endpoints (auto-instrumented by Spring Web) and Axon's async boundaries (each command/event/query handler is a top-level transaction because Axon hops across the gRPC bus and async event processors).

3. Traces — the full list

Kibana → APM → Traces shows every individual trace tree.

For this project you'll see roots like:

http put /games/{gameId}/dwellings/{dwellingId} — HTTP root
CommandBus.handleDistributedCommand(RecruitCreature) — command handling on the aggregate side after the gRPC hop to Axon Server
StreamingEventProcessor.process(CreatureRecruited) — projector / automation
CommandBus.handleCommand(AddCreatureToArmy) — command emitted by the WhenCreatureRecruitedThenAddToArmy automation
QueryBus.processQueryMessage(GetDwellingById) — query side

4. Trace waterfall — Axon internals visible

Click any CommandBus.handleCommand(...) trace and Kibana renders a waterfall like this — the full call path of the Axon command handler, including aggregate loading and event publication:

What you're looking at:

✓ CommandBus.handleDistributedCommand(RecruitCreature)     22 ms     ← gRPC server side
  └─ CommandBus.dispatchCommand(RecruitCreature)           22 ms
     └─ CommandBus.handleCommand(RecruitCreature)          22 ms
        ├─ Repository.load                                 10 ms     ← event sourcing
        │  ├─ Repository.obtainLock                        41 μs
        │  └─ Repository.initializeState(Dwelling)          1.0 ms   ← rehydrate aggregate
        ├─ Dwelling.decide(RecruitCreature)                 3.2 ms   ← AGGREGATE business logic
        ├─ EventBus.publishEvent(CreatureRecruited)        29 μs
        └─ EventBus.commitEvents                            5.8 ms   ← persist to event store

This is exactly the layered shape from Event Sourcing theory — repository → aggregate → event publication — rendered as data, not as a diagram in a slide.

5. Span attributes — gameId, playerId, message metadata

Click any Axon span (e.g. CommandBus.handleCommand(RecruitCreature)) → Metadata tab. The flyout shows OpenTelemetry attributes — including correlation data injected by GameConfiguration.gameDataProvider:

labels.axon_metadata_gameId    = scenario-1                  ← from gameDataProvider
labels.axon_metadata_playerId  = player-1                    ← from gameDataProvider
labels.axon_message_id         = 2201ae5d-3871-45b2-a661-...
labels.axon_message_name       = com.dddheroes…RecruitCreature
labels.axon_message_type       = GrpcBackedCommandMessage
labels.axon_payload_type       = com.dddheroes…RecruitCreature

This is the practical payoff: filtering traces by labels.axon_metadata_gameId : "scenario-1" in the Kibana search bar isolates every span — across every aggregate, processor and projector — that participated in one game session.

Same traces in Jaeger

With observability-jaeger active, the same trace data is produced — Jaeger just renders it differently:

Open http://localhost:16686.
Service dropdown → pick heroesofddd.
Operation dropdown → e.g. CommandBus.handleCommand(RecruitCreature).
Click Find Traces → see the same waterfall tree (Repository.load, Dwelling.decide, EventBus.publishEvent, …).
Click any span → "Tags" panel shows the same OTel attributes as Kibana labels, with dot-notation: axon.message.id, axon.message.name, axon.metadata.gameId, axon.metadata.playerId, etc.

Trade-offs vs Kibana APM:

✅ Lighter — one container, instant startup, no Elasticsearch index management
✅ Simpler — direct trace search by service / operation / tag / duration
❌ No service map — Kibana shows topology between services; Jaeger v2 OSS doesn't
❌ No KQL — Jaeger uses a simpler tag-equality search (tag: axon.metadata.gameId=scenario-1) rather than full KQL
❌ No persistence by default — all-in-one stores traces in memory; restart loses them
❌ No metrics/logs correlation — Kibana correlates traces with the rest of the Elastic stack

Useful filters

Kibana (KQL)

Paste into the Kibana search bar (KQL) at the top of any APM page:

Goal	KQL
Traces for one game	`labels.axon_metadata_gameId : "scenario-1"`
Only command handlers	`transaction.name : "CommandBus.handleCommand*"`
Only one aggregate's decisions	`span.name : "Dwelling.decide(*)"`
Only automation reactions	`span.name : "Processor.react()"`
Only event publications	`span.name : "EventBus.publishEvent(*)"`

Jaeger (Tags field)

In the Jaeger UI search form, the Tags field accepts space-separated key=value pairs:

Goal	Tag query
Traces for one game	`axon.metadata.gameId=scenario-1`
One specific player's traces	`axon.metadata.playerId=player-1`
Combine	`axon.metadata.gameId=scenario-1 axon.metadata.playerId=player-1`

(Operation-level filtering — e.g. "only Dwelling.decide spans" — is done via the Operation dropdown, not the Tags field.)

Why Axon splits work across multiple traces

You'll notice that an HTTP request often produces two or three separate trace trees rather than one giant tree. That's expected. Axon hops over async boundaries that don't preserve OpenTelemetry context: the gRPC call to Axon Server (server-side handleDistributedCommand starts a new root) and the asynchronous event processors (each process(Event) is its own root). Inside one boundary, however, the tree is complete — as the waterfall above shows. Behavior is identical in both Kibana APM and Jaeger.

To stitch sessions together end-to-end, use the axon_metadata_gameId (Kibana) / axon.metadata.gameId (Jaeger) tag filter described above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📊 Observability — distributed tracing

Table of contents

Backends — pick one

Stack components

Run with Elastic APM

Run with Jaeger

Run without tracing (default)

Ports & URLs

Exploring traces in Kibana — guided tour

1. Service inventory

2. Transactions

3. Traces — the full list

4. Trace waterfall — Axon internals visible

5. Span attributes — gameId, playerId, message metadata

Same traces in Jaeger

Useful filters

Kibana (KQL)

Jaeger (Tags field)

Why Axon splits work across multiple traces

FilesExpand file tree

OBSERVABILITY.md

Latest commit

History

OBSERVABILITY.md

File metadata and controls

📊 Observability — distributed tracing

Table of contents

Backends — pick one

Stack components

Run with Elastic APM

Run with Jaeger

Run without tracing (default)

Ports & URLs

Exploring traces in Kibana — guided tour

1. Service inventory

2. Transactions

3. Traces — the full list

4. Trace waterfall — Axon internals visible

5. Span attributes — gameId, playerId, message metadata

Same traces in Jaeger

Useful filters

Kibana (KQL)

Jaeger (Tags field)

Why Axon splits work across multiple traces