|
| 1 | +# Architecture of dd-trace-java |
| 2 | + |
| 3 | +High-level architecture of the Datadog Java APM agent. |
| 4 | +Start here to orient yourself in the codebase. |
| 5 | + |
| 6 | +## Bird's Eye View |
| 7 | + |
| 8 | +dd-trace-java is a Java agent that auto-instruments JVM applications at runtime via bytecode manipulation. |
| 9 | +It attaches to a running JVM using the `-javaagent` flag, intercepts class loading, and rewrites method bytecode |
| 10 | +to inject tracing, security, profiling, and observability logic. No application code changes required. |
| 11 | + |
| 12 | +Ships ~120 integrations (~200 instrumentations) covering major frameworks |
| 13 | +(Spring, Servlet, gRPC, JDBC, Kafka, etc.) and supports multiple Datadog products through a single jar: |
| 14 | +**Tracing**, **Profiling**, **Application Security (AppSec)**, **IAST**, **CI Visibility**, |
| 15 | +**Dynamic Instrumentation**, **LLM Observability**, **Crash Tracking**, **Data Streams**, |
| 16 | +**Feature Flagging**, and **USM**. |
| 17 | + |
| 18 | +Communicates with a local Datadog Agent process (or directly with the Datadog intake APIs) |
| 19 | +to send collected telemetry. |
| 20 | + |
| 21 | +## Startup Sequence |
| 22 | + |
| 23 | +1. **`AgentBootstrap.premain()`** — JVM entry point. Runs on the application classloader |
| 24 | + with minimal logic: locates the agent jar, creates an isolated classloader, jumps to `Agent.start()`. |
| 25 | + Must remain tiny and side-effect-free. |
| 26 | + |
| 27 | +2. **`Agent.start()`** — Runs on the bootstrap classloader. Creates the agent classloader, |
| 28 | + reads configuration, determines which products are enabled, starts each subsystem on dedicated threads. |
| 29 | + |
| 30 | +3. **`AgentInstaller`** — Installs the ByteBuddy `ClassFileTransformer` that intercepts all class loading. |
| 31 | + Discovers all `InstrumenterModule` implementations via service loading, registers their |
| 32 | + type matchers and advice classes. |
| 33 | + |
| 34 | +4. **Product subsystems start** — Each enabled product is started via its own `*System.start()` method, |
| 35 | + receiving shared communication objects. |
| 36 | + |
| 37 | +## Codemap |
| 38 | + |
| 39 | +### `dd-java-agent/` |
| 40 | + |
| 41 | +Main agent module. Produces the final shadow jar (`dd-java-agent.jar`) using a composite shadow jar |
| 42 | +strategy. Each product module builds its own shadow jar, embedded as a nested directory inside the |
| 43 | +main jar (`inst/`, `profiling/`, `appsec/`, `iast/`, `debugger/`, `ci-visibility/`, `llm-obs/`, |
| 44 | +`shared/`, `trace/`, etc.). A dedicated `sharedShadowJar` bundles common transitive dependencies |
| 45 | +(OkHttp, JCTools, LZ4, etc.) to avoid duplication across feature jars. All dependencies are relocated |
| 46 | +under `datadog.` prefixes to prevent classpath conflicts. Class files inside feature jars are renamed |
| 47 | +to `.classdata` to prevent unintended loading. See [`docs/how_to_work_with_gradle.md`](docs/how_to_work_with_gradle.md). |
| 48 | + |
| 49 | +- **`src/`** — `AgentBootstrap` and `AgentJar`, the entry point loaded by `-javaagent`. |
| 50 | + Deliberately minimal. |
| 51 | + |
| 52 | +- **`agent-bootstrap/`** — Classes on the bootstrap classloader: `Agent` (startup orchestrator), |
| 53 | + decorator base classes (`HttpServerDecorator`, `DatabaseClientDecorator`, etc.), and bootstrap-safe |
| 54 | + utilities. Visible to all classloaders, so instrumentation advice and helpers can use them directly. |
| 55 | + See [`docs/bootstrap_design_guidelines.md`](docs/bootstrap_design_guidelines.md). |
| 56 | + |
| 57 | +- **`agent-builder/`** — ByteBuddy integration layer. Class transformer pipeline: |
| 58 | + `DDClassFileTransformer` intercepts every class load, `GlobalIgnoresMatcher` applies early |
| 59 | + filtering, `CombiningMatcher` evaluates instrumentation matchers, `SplittingTransformer` |
| 60 | + applies matched transformations. The `ignored_class_name.trie` is a compiled trie built at |
| 61 | + build time that short-circuits matcher evaluation for known non-transformable classes (JVM |
| 62 | + internals, agent infrastructure, monitoring libraries, large framework packages). When a class |
| 63 | + is unexpectedly not instrumented, check the trie first. |
| 64 | + |
| 65 | +- **`agent-tooling/`** — Instrumentation framework. Key types: |
| 66 | + - `InstrumenterModule` — Base class for all instrumentation modules. Declares a target system |
| 67 | + (Tracing, AppSec, IAST, Profiling, CiVisibility, USM, etc.) and one or more instrumentations. |
| 68 | + - `Instrumenter` — Type matching interface: `ForSingleType`, `ForKnownTypes`, |
| 69 | + `ForTypeHierarchy`, `ForBootstrap`. |
| 70 | + - `muzzle/` — Build-time and runtime safety checks. Verifies that expected types and methods |
| 71 | + exist in the library version at runtime. If not, the instrumentation is silently skipped. |
| 72 | + See [`docs/how_instrumentations_work.md`](docs/how_instrumentations_work.md) and [`docs/add_new_instrumentation.md`](docs/add_new_instrumentation.md). |
| 73 | + |
| 74 | +- **`instrumentation/`** — All auto-instrumentations, organized as `{framework}/{framework}-{minVersion}/`. |
| 75 | + Nearly 200 framework directories. Each follows the same pattern: an `InstrumenterModule` declares the |
| 76 | + target system and integration name, one or more `Instrumenter` implementations select target types |
| 77 | + via matchers, advice classes inject bytecode via `@Advice.OnMethodEnter`/`@Advice.OnMethodExit`, |
| 78 | + and decorator/helper classes contain the actual product logic. Instrumentations are discovered |
| 79 | + via `@AutoService(InstrumenterModule.class)` (Java SPI) and validated by Muzzle at build time. |
| 80 | + See [`docs/how_instrumentations_work.md`](docs/how_instrumentations_work.md) and [`docs/add_new_instrumentation.md`](docs/add_new_instrumentation.md) for details. |
| 81 | + |
| 82 | +- **`appsec/`** — Application Security. Entry point: `AppSecSystem.start()`. Runs the Datadog WAF |
| 83 | + to detect and block attacks in real-time. Hooks into the gateway to intercept HTTP requests. |
| 84 | + |
| 85 | +- **`agent-iast/`** — Interactive Application Security Testing. Entry point: `IastSystem.start()`. |
| 86 | + Performs taint tracking: marks user input as tainted, propagates taint through string operations, |
| 87 | + and reports when tainted data reaches dangerous sinks (SQL injection, XSS, command injection, etc.). |
| 88 | + |
| 89 | +- **`agent-ci-visibility/`** — CI Visibility. Entry point: `CiVisibilitySystem.start()`. |
| 90 | + Instruments test frameworks (JUnit, TestNG, Gradle, Maven, Cucumber) to collect test results, |
| 91 | + code coverage, and performance metrics. |
| 92 | + |
| 93 | +- **`agent-profiling/`** — Continuous Profiling. Entry point: `ProfilingAgent`. |
| 94 | + Collects CPU, memory, and wall-clock profiles using JFR or the Datadog native profiler (`ddprof`). |
| 95 | + Uploads profiles to the Datadog backend. |
| 96 | + |
| 97 | +- **`agent-debugger/`** — Dynamic Instrumentation. Entry point: `DebuggerAgent`. |
| 98 | + Probes, snapshot capture, exception replay, code origin mapping. |
| 99 | + Driven by remote configuration. |
| 100 | + |
| 101 | +- **`agent-llmobs/`** — LLM Observability. Entry point: `LLMObsSystem.start()`. |
| 102 | + Monitors LLM API calls (OpenAI, LangChain, etc.): token usage, model inference, evaluations. |
| 103 | + |
| 104 | +- **`agent-crashtracking/`** — Crash Tracking. Detects JVM crashes and fatal exceptions, |
| 105 | + collects system metadata, and uploads crash reports to Datadog's error tracking intake. |
| 106 | + |
| 107 | +- **`agent-otel/`** — OpenTelemetry compatibility shim. `OtelTracerProvider`, `OtelSpan`, |
| 108 | + `OtelContext` and other wrappers implement the OTel API by delegating to the Datadog tracer. |
| 109 | + Paired with instrumentations in `instrumentation/opentelemetry/` that intercept OTel API calls |
| 110 | + and redirect them to shim instances. |
| 111 | + |
| 112 | +### `dd-trace-core/` |
| 113 | + |
| 114 | +Core tracing engine. Grew organically and now also hosts product-specific features that depend on |
| 115 | +tight integration with span creation, interception, or serialization. New code should go in |
| 116 | +`products/` or `components/` instead. Core tracing types: |
| 117 | + |
| 118 | +- `CoreTracer` — Tracer implementation. Creates spans, manages sampling, drives the writer pipeline. |
| 119 | + Implements `AgentTracer.TracerAPI`. |
| 120 | +- `DDSpan` / `DDSpanContext` — Concrete span and context implementations with Datadog-specific metadata. |
| 121 | +- `PendingTrace` — Collects all spans in a trace. Flushes to the writer when the root span finishes. |
| 122 | +- `scopemanager/` — `ContinuableScopeManager`, `ContinuableScope`, `ScopeContinuation`. Active span |
| 123 | + per thread, async context propagation via continuations. |
| 124 | +- `propagation/` — Trace context propagation codecs: Datadog, W3C TraceContext, B3, Haystack, X-Ray. |
| 125 | +- `common/writer/` — Writer pipeline. `DDAgentWriter` buffers traces and dispatches via |
| 126 | + `PayloadDispatcherImpl` to the Datadog Agent's `/v0.4/traces` endpoint. `DDIntakeWriter` for |
| 127 | + direct API submission. `TraceProcessingWorker` for async processing. |
| 128 | +- `common/sampling/` — Sampling logic: `RuleBasedTraceSampler`, `RateByServiceTraceSampler`, |
| 129 | + `SingleSpanSampler`. Supports both head-based and rule-based sampling. |
| 130 | +- `tagprocessor/` — Post-processing of span tags: peer service calculation, base service naming, |
| 131 | + query obfuscation, endpoint resolution. |
| 132 | + |
| 133 | +Non-tracing code that also lives here due to organic growth: |
| 134 | + |
| 135 | +- `datastreams/` — Data Streams Monitoring. Tracks message pipeline latency across Kafka, RabbitMQ, SQS, etc. |
| 136 | +- `civisibility/` — CI Visibility trace interceptors and protocol adapters. Hooks into the trace |
| 137 | + completion pipeline to filter and reformat test spans for the CI Test Cycle intake. |
| 138 | +- `lambda/` — AWS Lambda support. Coordinates span creation with the serverless extension, |
| 139 | + handling invocation start/end and trace context propagation. |
| 140 | +- `llmobs/` — LLM Observability span mapper. Serializes LLM-specific spans (messages, tool calls) |
| 141 | + to the dedicated LLM Obs intake format. |
| 142 | + |
| 143 | +### `dd-trace-api/` |
| 144 | + |
| 145 | +Public API. Types application developers may use directly: `Tracer`, `GlobalTracer`, `DDTags`, |
| 146 | +`DDSpanTypes`, `Trace` (annotation), `ConfigDefaults`. Also houses all configuration key constants |
| 147 | +by domain: `TracerConfig`, `GeneralConfig`, `AppSecConfig`, `ProfilingConfig`, `CiVisibilityConfig`, |
| 148 | +`IastConfig`, `DebuggerConfig`, etc. |
| 149 | + |
| 150 | +### `internal-api/` |
| 151 | + |
| 152 | +Internal shared API across all agent modules (not public). Like `dd-trace-core`, grew organically |
| 153 | +and now hosts interfaces for many products beyond tracing. New product APIs should go in |
| 154 | +`products/` or `components/`. |
| 155 | + |
| 156 | +Core tracing abstractions: |
| 157 | + |
| 158 | +- `AgentTracer` — Static tracer facade. Instrumentations call `AgentTracer.startSpan()`, |
| 159 | + `AgentTracer.activateSpan()`, etc. |
| 160 | +- `AgentSpan` / `AgentScope` / `AgentSpanContext` — Internal span/scope/context interfaces. |
| 161 | +- `AgentPropagation` — Context propagation interfaces (`Getter`, `Setter`) that instrumentations |
| 162 | + implement to inject/extract trace context from framework-specific carriers (HTTP headers, message |
| 163 | + properties, etc.). |
| 164 | +- `Config` / `InstrumenterConfig` — Master configuration class and instrumenter-specific config, |
| 165 | + centralizing settings for all products. `InstrumenterConfig` is separated from `Config` due to |
| 166 | + GraalVM native-image constraints: in native-image builds, all bytecode instrumentation must be |
| 167 | + applied at build time (ahead-of-time compilation), so configuration that controls instrumentation |
| 168 | + decisions (which classes to instrument, which integrations to enable, resolver behavior, field |
| 169 | + injection flags) must be frozen into the native image binary. Runtime-only settings (agent |
| 170 | + endpoints, service names, sampling rates) remain in `Config`. |
| 171 | + See [`docs/add_new_configurations.md`](docs/add_new_configurations.md). |
| 172 | + |
| 173 | +Cross-product abstractions: |
| 174 | + |
| 175 | +- `gateway/` — Instrumentation Gateway: event bus (`InstrumentationGateway`, |
| 176 | + `SubscriptionService`, `Events`, `CallbackProvider`, `RequestContext`) decoupling |
| 177 | + instrumentations from product modules. Primarily used by AppSec and IAST to hook into |
| 178 | + the HTTP request lifecycle without modifying instrumentations. |
| 179 | +- `cache/` — Shared caching primitives (`DDCache`, `FixedSizeCache`, `RadixTreeCache`) used |
| 180 | + throughout the agent. |
| 181 | +- `naming/` — Service and span operation naming schemas (v0, v1) for databases, messaging, |
| 182 | + cloud services, etc. |
| 183 | +- `telemetry/` — Multi-product telemetry collection interfaces (`MetricCollector`, |
| 184 | + `WafMetricCollector`, `LLMObsMetricCollector`, etc.). |
| 185 | + |
| 186 | +Product-specific APIs that also live here: |
| 187 | + |
| 188 | +- `iast/` — IAST vulnerability detection interfaces: taint tracking (`Taintable`, `IastContext`), |
| 189 | + sink definitions for each vulnerability type (SQL injection, XSS, command injection, etc.), |
| 190 | + and call site instrumentation hooks. About 60 files. |
| 191 | +- `civisibility/` — CI Visibility interfaces: test identification, code coverage, build/test |
| 192 | + event handlers, and CI-specific telemetry metrics. About 95 files. |
| 193 | +- `datastreams/` — Data Streams Monitoring interfaces: pathway context, stats points, |
| 194 | + and schema registry integration. |
| 195 | +- `appsec/` — AppSec interfaces: HTTP client request/response payloads for WAF analysis, |
| 196 | + RASP call sites. |
| 197 | +- `profiling/` — Profiler integration: recording data, timing, and enablement interfaces. |
| 198 | +- `llmobs/` — LLM Observability context. |
| 199 | + |
| 200 | +### `components/` |
| 201 | + |
| 202 | +Low-level shared platform components. Not tied to any product, no external dependencies, |
| 203 | +bootstrap-safe: |
| 204 | + |
| 205 | +- `context` — Immutable context propagation framework. Provides `Context`, `ContextKey`, |
| 206 | + and `Propagator` abstractions for storing and propagating key-value pairs across threads |
| 207 | + and carrier objects. |
| 208 | +- `environment` — JVM and OS detection utilities. `JavaVersion` for version parsing, |
| 209 | + `JavaVirtualMachine` for JVM implementation detection (OpenJDK, Graal, J9), |
| 210 | + `OperatingSystem` for OS/architecture detection, and `EnvironmentVariables`/`SystemProperties` |
| 211 | + for safe access and mocking. |
| 212 | +- `json` — Lightweight, dependency-free JSON serialization. `JsonWriter` for building JSON |
| 213 | + with a fluent API, `JsonReader` for streaming parsing. |
| 214 | +- `native-loader` — Platform-aware native library loading with pluggable strategies. |
| 215 | + `NativeLoader` handles OS/architecture detection, resource extraction from JARs, |
| 216 | + and temp file management. |
| 217 | + |
| 218 | +### `products/` |
| 219 | + |
| 220 | +Self-contained product modules following a layered submodule pattern: |
| 221 | + |
| 222 | +- `{product}-api/` — Public API interfaces, zero dependencies. |
| 223 | +- `{product}-bootstrap/` — Data classes safe for the bootstrap classloader. |
| 224 | +- `{product}-lib/` — Core implementation (shadow jar, excludes shared dependencies). |
| 225 | +- `{product}-agent/` — Agent integration entry point (shadow jar). |
| 226 | + |
| 227 | +Current products: |
| 228 | + |
| 229 | +- `metrics/` — StatsD client and monitoring abstraction. Provides `Monitoring` interface with |
| 230 | + counters, timers, and histograms for internal agent metrics collection. |
| 231 | +- `feature-flagging/` — Server-side feature flag evaluation driven by remote configuration. |
| 232 | + Implements the OpenFeature SDK, handles the Unified Feature Control (UFC) protocol, |
| 233 | + and tracks flag exposure per user/session. |
| 234 | + |
| 235 | +### `communication/` |
| 236 | + |
| 237 | +HTTP transport to the Datadog Agent and intake APIs. `SharedCommunicationObjects` holds shared |
| 238 | +`OkHttpClient` instances (Unix domain socket and named pipe support), agent URL, feature discovery, |
| 239 | +and the configuration poller. All product modules receive this at startup. |
| 240 | + |
| 241 | +### `remote-config/` |
| 242 | + |
| 243 | +Remote configuration client. `DefaultConfigurationPoller` periodically polls the Datadog Agent |
| 244 | +for configuration updates (AppSec rules, debugger probes, sampling rates, feature flags). |
| 245 | +Uses TUF (The Update Framework) for signature validation. |
| 246 | + |
| 247 | +### `telemetry/` |
| 248 | + |
| 249 | +Agent telemetry. `TelemetrySystem` collects and reports which features are enabled, |
| 250 | +which integrations loaded, performance metrics, and product-specific counters. |
| 251 | +Each product registers periodic actions that collect domain-specific metrics. |
| 252 | + |
| 253 | +### `utils/` |
| 254 | + |
| 255 | +Shared utilities, each in its own submodule: |
| 256 | + |
| 257 | +- `config-utils` — `ConfigProvider` for reading and merging configuration from environment variables, |
| 258 | + system properties, properties files, and CI environment. |
| 259 | +- `container-utils` — Parses container runtime information (Docker, Kubernetes, ECS). |
| 260 | +- `filesystem-utils` — Permission-safe file existence checks that handle `SecurityException`. |
| 261 | +- `flare-utils` — Tracer flare collection (`TracerFlareService`) that gathers diagnostics |
| 262 | + (logs, spans, system info) and sends them to Datadog for troubleshooting. |
| 263 | +- `queue-utils` — High-performance lock-free queues (`MpscArrayQueue`, `SpscArrayQueue`) for |
| 264 | + inter-thread communication and span buffering. |
| 265 | +- `socket-utils` — Socket factories (`UnixDomainSocketFactory`, `NamedPipeSocket`) for connecting |
| 266 | + to the local Datadog Agent via Unix sockets or named pipes. |
| 267 | +- `time-utils` — Time source abstractions (`TimeSource`, `ControllableTimeSource`) for testable |
| 268 | + time handling and delay parsing. |
| 269 | +- `version-utils` — Agent version string (`VersionInfo.VERSION`) read from packaged resources. |
| 270 | +- `test-utils` — Testing utilities: `@Flaky` annotation, log capture, GC control, |
| 271 | + forked test configuration. |
| 272 | +- `test-agent-utils` — Message decoders for parsing v04/v05 binary protocol frames in tests. |
| 273 | + |
| 274 | +### `dd-trace-ot/` |
| 275 | + |
| 276 | +Legacy OpenTracing compatibility library. Publishes a standalone JAR artifact (`dd-trace-ot.jar`) |
| 277 | +that implements the `io.opentracing.Tracer` interface by wrapping the Datadog `CoreTracer`. |
| 278 | +This is a pure library for manual instrumentation only — there is no auto-instrumentation or |
| 279 | +bytecode advice. |
| 280 | + |
| 281 | +### `dd-smoke-tests/` |
| 282 | + |
| 283 | +End-to-end smoke tests. Each boots a real application with the agent jar and verifies traces, spans, |
| 284 | +and product behavior. Covers Spring Boot, Play, Vert.x, Quarkus, WildFly, and more. |
| 285 | +Core test hierarchy (Groovy/Spock): |
| 286 | +- `ProcessManager` — Base. Spawns forked JVM processes with the agent via `ProcessBuilder`, |
| 287 | + captures stdout to log files, tears down on cleanup. `assertNoErrorLogs()` scans logs for errors. |
| 288 | +- `AbstractSmokeTest` extends `ProcessManager` — Adds a mock Datadog Agent (`TestHttpServer`) |
| 289 | + receiving traces (v0.4/v0.5), telemetry, remote config, and EVP proxy requests. Polling helpers: |
| 290 | + `waitForTraceCount`, `waitForSpan`, `waitForTelemetryFlat`. |
| 291 | +- `AbstractServerSmokeTest` extends `AbstractSmokeTest` — For HTTP server apps. Adds port |
| 292 | + management, waits for server port to open, verifies expected trace output. |
0 commit comments