feat(profiling): Complete OTLP profiles implementation with JFR conversion pipeline#10098
Draft
feat(profiling): Complete OTLP profiles implementation with JFR conversion pipeline#10098
Conversation
c98c2a0 to
52c579e
Compare
Contributor
|
This pull request has been marked as stale because it has not had activity over the past quarter. It will be closed in 7 days if no further activity occurs. Feel free to reopen the PR if you are still working on it. |
Add profiling-otel module with core infrastructure for JFR to OTLP profiles conversion: - Dictionary tables for OTLP compression (StringTable, FunctionTable, LocationTable, StackTable, LinkTable, AttributeTable) - ProtobufEncoder for hand-coded protobuf wire format encoding - OtlpProtoFields constants for OTLP profiles proto field numbers - Unit tests for all dictionary tables and encoder - Architecture documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add JMH benchmark filtering via -PjmhIncludes property in build.gradle.kts
- Update JfrToOtlpConverterBenchmark parameters to {50, 500, 5000} events
- Run comprehensive benchmarks and document actual performance results
- Update BENCHMARKS.md with measured throughput data (Apple M3 Max)
- Update ARCHITECTURE.md with performance characteristics
- Key findings: Stack depth is primary bottleneck (~60% reduction per 10x increase)
- Linear scaling with event count, minimal impact from context count
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
…pport Reverted Phase 1 optimization attempts that showed no improvement: - Removed tryGetExisting() optimization from JfrToOtlpConverter - Deleted tryGetExisting() method from FunctionTable - The optimization added overhead (2 FunctionKey allocations vs 1) Added JMH profiling support: - Added profiling configuration to build.gradle.kts - Enable with -PjmhProfile=true flag - Configures stack profiler (CPU sampling) and GC profiler (allocations) Profiling results reveal actual bottlenecks: - JFR File I/O: ~20% (jafar-parser, external dependency) - Protobuf encoding: ~5% (fundamental serialization cost) - Conversion logic: ~3% (our code) - Dictionary operations: ~1-2% (NOT the bottleneck) Key findings: - Dictionary operations already well-optimized at ~1-2% of runtime - Modern JVM escape analysis optimizes temporary allocations - Stack depth is dominant factor (O(n) frame processing) - HashMap lookups (~10-20ns) dominated by I/O overhead Updated documentation: - BENCHMARKS.md: Added profiling section with findings - ARCHITECTURE.md: Added profiling support and results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…ant pool IDs Leverage JFR's internal stack trace deduplication by caching conversions based on constant pool IDs. This avoids redundant processing of identical stack traces that appear multiple times in profiling data. Implementation: - Add @JfrField(raw=true) stackTraceId() methods to all event interfaces (ExecutionSample, MethodSample, ObjectSample, JavaMonitorEnter, JavaMonitorWait) - Implement HashMap cache in JfrToOtlpConverter with lazy stack trace resolution - Cache key combines stackTraceId XOR (identityHashCode(chunkInfo) << 32) for chunk-unique identification - Modify convertStackTrace() to accept Supplier<JfrStackTrace> and check cache before resolution - Update all event handlers to pass method references (event::stackTrace) instead of resolved stacks - Add stackDuplicationPercent parameter to JfrToOtlpConverterBenchmark (0%, 70%, 90%) - Document Phase 5.6: Stack Trace Deduplication Optimization in ARCHITECTURE.md Performance Results: - 0% stack duplication: 8.1 ops/s (baseline, no cache benefit) - 70% stack duplication: 14.4 ops/s (+78% improvement, typical production workload) - 90% stack duplication: 20.5 ops/s (+153% improvement, 2.5x faster for hot-path heavy workloads) All 82 tests pass. Zero overhead for unique stacks, significant gains for realistic duplication patterns. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…n Docker unavailable Use @testcontainers(disabledWithoutDocker = true) to automatically skip OtlpCollectorValidationTest when Docker is not available instead of failing with IllegalStateException. This allows the test suite to pass cleanly in environments without Docker while still running all other tests. When Docker is available, these tests will run normally. Result: 82 tests pass, Docker tests gracefully skipped when unavailable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implement support for OTLP profiles original_payload and original_payload_format fields (fields 9 and 10) to include source JFR recording(s) in OTLP output for debugging and compliance verification. Key features: - Zero-copy streaming architecture using SequenceInputStream - Automatic uber-JFR concatenation for multiple recordings - Disabled by default per OTLP spec recommendation (size considerations) - Fluent API: setIncludeOriginalPayload(boolean) Implementation details: - Enhanced ProtobufEncoder with streaming writeBytesField(InputStream, long) method - Single file optimization: direct FileInputStream - Multiple files: SequenceInputStream chains files with zero memory overhead - Streams data in 8KB chunks directly into protobuf output Test coverage: - Default behavior verification (payload disabled) - Single file with payload enabled - Multiple files creating uber-JFR concatenation - Setting persistence across converter reuse Documentation: - Added Phase 6 to ARCHITECTURE.md with usage examples, design decisions, and performance characteristics - Centralized jafar-parser dependency version in gradle/libs.versions.toml 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…uration constants Implement foundation for parallel OTLP profile uploads alongside JFR format. **Step 1: RecordingData Reference Counting** Add thread-safe reference counting to support multiple listeners accessing the same RecordingData: - Add AtomicInteger refCount and volatile boolean released flag - Add retain() method to increment reference count before passing to additional listeners - Make release() final with automatic reference counting (decrements and calls doRelease at 0) - Add protected doRelease() for actual cleanup (called when refcount reaches 0) - Update all implementations: OpenJdkRecordingData, DatadogProfilerRecordingData, OracleJdkRecordingData, CompositeRecordingData Reference counting pattern enables multiple uploaders (JFR + OTLP) to safely share RecordingData without double-release or resource leaks. Each listener calls retain() before use and release() when done. Actual cleanup happens only when refcount reaches zero. **Step 2: OTLP Configuration Constants** Add configuration property keys to ProfilingConfig for OTLP profile format support: - profiling.otlp.enabled (default: false) - Enable parallel OTLP upload - profiling.otlp.include.original.payload (default: false) - Embed source JFR in OTLP - profiling.otlp.url (default: "") - OTLP endpoint URL (empty = derive from agent URL) - profiling.otlp.compression (default: "gzip") - Compression type for OTLP upload Configuration will be read directly from ConfigProvider in OtlpProfileUploader for testability. Next steps: - Step 3: Implement OtlpProfileUploader class (reads config from ConfigProvider) - Step 4: Integrate with ProfilingAgent - Step 5: Add tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add OtlpProfileUploader class implementing RecordingDataListener - Read configuration from ConfigProvider for testability - Support GZIP compression (configurable via boolean flag) - Use JfrToOtlpConverter to transform JFR recordings to OTLP format - Derive OTLP endpoint from agent URL (port 4318, /v1/profiles) - Handle both synchronous and asynchronous uploads - Use TempLocationManager for temp file creation - Add profiling-otel dependency to profiling-uploader module - Add basic unit tests for OtlpProfileUploader Configuration options: - profiling.otlp.enabled (default: false) - profiling.otlp.url (default: derived from agent URL) - profiling.otlp.compression.enabled (default: true) - profiling.otlp.include.original.payload (default: false) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…e counting Integrate OtlpProfileUploader into ProfilingAgent to enable parallel JFR and OTLP profile uploads when configured. Implements explicit reference counting pattern for RecordingData to safely support multiple concurrent handlers. Key changes: 1. ProfilingAgent integration: - Add OtlpProfileUploader alongside ProfileUploader - Extract handler methods (handleRecordingData, handleRecordingDataWithDump) - Use method references instead of capturing lambdas for better performance - Call retain() once for each handler (dumper, OTLP, JFR) - Update shutdown hooks to properly cleanup OTLP uploader 2. Explicit reference counting in RecordingData: - Change initial refcount from 1 to 0 for clarity - Each handler must call retain() before processing - Each handler calls release() when done - doRelease() called only when refcount reaches 0 - Updated javadocs to reflect explicit counting pattern 3. Comprehensive test coverage: - RecordingDataRefCountingTest validates all handler combinations - Tests single, dual, and triple handler scenarios - Verifies thread-safety with concurrent handlers - Tests error conditions (premature release, retain after release) - Confirms idempotent release behavior Benefits: - Symmetric treatment of all handlers (no special first handler) - Clear, explicit reference counting (easier to understand and verify) - No resource leaks or premature cleanup - Efficient method references (no lambda capture overhead) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Include OTLP profiles converter and its dependencies in the agent-profiling uber JAR for integration into dd-java-agent.jar. The profiling-otel module and its jafar-parser dependency are now bundled, while shared dependencies (internal-api, components:json) are correctly excluded via the existing excludeShared configuration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add command-line interface for testing and validating JFR to OTLP
conversions with real profiling data.
Features:
- Convert single or multiple JFR files to OTLP protobuf or JSON
- Include original JFR payload for validation (optional)
- Merge multiple recordings into single output
- Detailed conversion statistics
Usage:
./gradlew :dd-java-agent:agent-profiling:profiling-otel:convertJfr \
-Pargs="recording.jfr output.pb"
./gradlew :dd-java-agent:agent-profiling:profiling-otel:convertJfr \
-Pargs="--json recording.jfr output.json"
See doc/CLI.md for complete documentation and examples.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add --pretty flag to control JSON pretty-printing in the CLI converter. By default, JSON output is compact for efficient processing. Use --pretty for human-readable output with indentation. Usage: # Compact JSON (default) ./gradlew convertJfr --args="--json input.jfr output.json" # Pretty-printed JSON ./gradlew convertJfr --args="--json --pretty input.jfr output.json" The pretty-printer is a simple, dependency-free implementation that adds newlines and 2-space indentation without external libraries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Integrates OpenTelemetry's profcheck tool to validate OTLP profiles conform to the specification. This provides automated conformance testing and helps catch encoding bugs early. Key additions: - Docker-based profcheck integration (docker/Dockerfile.profcheck) - Gradle tasks for building profcheck image and validation - ProfcheckValidationTest with Testcontainers integration - Comprehensive documentation in PROFCHECK_INTEGRATION.md Gradle tasks: - buildProfcheck: Builds profcheck Docker image from upstream PR - validateOtlp: Validates OTLP files using profcheck - Auto-build profcheck image before tests tagged with @tag("docker") Test results: - ✅ testEmptyProfile: Passes validation - ✅ testAllocationProfile: Passes validation - ❌ testCpuProfile: Revealed stack_index out of range bugs - ❌ testMixedProfile: Revealed protobuf wire-format encoding bugs The test failures are expected and valuable - they uncovered real bugs in the OTLP encoder that need to be fixed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Dictionary tables (location, function, link, stack, attribute) were omitting their required index 0 sentinel entries from the wire format, causing profcheck validation failures. Root cause: 1. Dictionary loops started at i=1 instead of i=0, skipping sentinels 2. ProtobufEncoder.writeNestedMessage() had an if (length > 0) check that completely skipped writing empty messages 3. Sentinel entries encode as empty messages (all fields are 0/empty) 4. Result: Index 0 was not present in wire format, causing off-by-one array indexing errors in profcheck validation Fix: - Changed ProtobufEncoder.writeNestedMessage() to always write tag+length even for empty messages (required for sentinels) - Changed all dictionary table loops to start from i=0 to include sentinels - Added attribute_table encoding (was completely missing) - Updated JSON encoding to match protobuf encoding - Fixed test to use correct event type (datadog.ObjectSample) All profcheck validation tests now pass with "conformance checks passed". 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…rter
This commit adds support for mapping JFR event attributes to OTLP profile
sample attributes, enabling richer profiling data with contextual metadata.
Key changes:
1. Sample Attributes Implementation:
- Added attributeIndices field to SampleData class
- Implemented getSampleTypeAttributeIndex() helper for creating sample type attributes
- Updated all event handlers (CPU, allocation, lock) to include sample.type attribute
- Uses packed repeated int32 format for attribute_indices per proto3 spec
2. ObjectSample Enhancements:
- Added objectClass, size, and weight fields to ObjectSample interface
- Implemented upscaling: sample value = size * weight
- Added alloc.class attribute for allocation profiling
- Maintains backwards compatibility with allocationSize field
3. OTLP Proto Field Number Corrections:
- Fixed Sample field numbers to match official Go module proto:
* stack_index = 1
* values = 2 (was 4)
* attribute_indices = 3 (was 2)
* link_index = 4 (was 3)
* timestamps_unix_nano = 5 (was 5)
- Corrects discrepancy between proto file and generated Go code
4. Dual Validation System:
- Updated Dockerfile.profcheck to include both protoc and profcheck
- Created validate-profile wrapper script
- Protoc validation is authoritative (official Protocol Buffers compiler)
- Profcheck warnings are captured but don't fail builds
- Documents known profcheck timestamp validation issues
5. Test Updates:
- Updated smoke tests to use new ObjectSample fields (size, weight)
- Modified validation tests to check for protoc validation success
- All validation tests passing with spec-compliant output
Design decisions:
- Measurements (duration, size*weight) are stored as sample VALUES
- Labels/metadata (sample.type, alloc.class) are stored as ATTRIBUTES
- AttributeTable provides automatic deduplication via internString()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed profcheck timestamp validation errors and made profcheck validation mandatory to pass alongside protoc validation. Timestamp Issues Fixed: - Removed manual startTime field assignments in all test JFR events - Manual timestamps were being interpreted as JFR ticks (not epoch nanos) - Let JFR recording system automatically assign correct timestamps - JFR auto-timestamps are properly converted via chunkInfo.asInstant() Validation Changes: - Made profcheck validation mandatory (previously only protoc was required) - Updated validation script to require both protoc AND profcheck to pass - Removed special handling for "known attribute_indices bug" (now fixed) - Updated test assertions to verify both validators pass - Both validators now cleanly pass for all test profiles Result: Complete OTLP profiles spec compliance with both: - protoc (official Protocol Buffers compiler) - structural validation - profcheck (OpenTelemetry conformance checker) - semantic validation All tests passing: empty, CPU, allocation, and mixed profiles. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added convert-jfr.sh script that provides a simplified interface for converting JFR files to OTLP format without needing to remember Gradle task paths. Features: - Automatic compilation if needed - Simplified command-line interface - Colored output for better visibility - File size reporting - Comprehensive help message - Error handling with clear messages Usage: ./convert-jfr.sh recording.jfr output.pb ./convert-jfr.sh --json recording.jfr output.json ./convert-jfr.sh --pretty recording.jfr output.json ./convert-jfr.sh file1.jfr file2.jfr merged.pb Updated CLI.md documentation with: - Quick start section featuring the convenience script - Complete usage examples - Feature list and when to use the script vs Gradle directly The script wraps the existing Gradle convertJfr task, providing a more user-friendly interface for development and testing workflows. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Enhanced the conversion script with detailed diagnostic output showing: - Input file sizes (individual and total) - Output file size - Wall-clock conversion time - Compression ratio (output vs input size) - Space savings (bytes and percentage) Usage: ./convert-jfr.sh --diagnostics recording.jfr output.pb Example output: [DIAG] Input: recording.jfr (89.3KB) [DIAG] Total input size: 89.3KB [DIAG] === Conversion Diagnostics === [DIAG] Wall time: 127.3ms [DIAG] Output size: 45.2KB [DIAG] Size ratio: 50.6% of input [DIAG] Savings: 44.1KB (49.4% reduction) Features: - Cross-platform file size detection (macOS and Linux) - Nanosecond-precision timing - Human-readable size formatting (B, KB, MB, GB) - Automatic compression ratio calculation - Color-coded diagnostic output (cyan) Updated CLI.md with: - --diagnostics option documentation - Example output showing diagnostic information - Updated feature list 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…iagnostics Added convert-jfr.sh convenience wrapper for JFR to OTLP conversion with comprehensive diagnostic output and cross-platform compatibility. Features: - Simple CLI interface wrapping Gradle convertJfr task - Support for all converter options (--json, --pretty, --include-payload) - --diagnostics flag showing detailed metrics: * Input/output file sizes with human-readable formatting * Actual conversion time (parsed from converter output) * Compression ratios and savings - Colored output for better readability - Cross-platform file size detection (Linux and macOS) - Automatic compilation via Gradle Implementation: - Parses converter's own timing output to show actual conversion time (e.g., 141ms) instead of total Gradle execution time (13+ seconds) - Uses try-fallback approach for stat command (GNU stat → BSD stat) - Works on Linux, macOS with GNU coreutils, and native macOS Documentation: - Added "Convenience Script" section to doc/CLI.md - Usage examples and feature list - Diagnostic output examples Example: ./convert-jfr.sh --diagnostics recording.jfr output.pb Shows: 141ms conversion time, 2.0MB → 2.2KB (99.9% reduction) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…speedup Replaced Gradle-based execution with a fat jar approach for dramatic performance improvement in the JFR to OTLP conversion script. Performance improvement: - Previous: ~13+ seconds (Gradle overhead) - New: ~0.4 seconds (< 0.5s total) - Speedup: ~31x faster - Actual conversion time: ~120ms (unchanged) Implementation: - Added shadowJar task to build.gradle.kts with minimization - Modified convert-jfr.sh to use fat jar directly via java -jar - Added automatic rebuild detection based on source file mtimes - Jar only rebuilds when source files are newer than jar - Cross-platform mtime detection (GNU stat → BSD stat fallback) - Suppressed harmless SLF4J warnings (defaults to NOP logger) Features: - Automatic jar rebuild only when source files change - Fast startup (no Gradle overhead) - Clean output with SLF4J warnings filtered - All existing diagnostics and features preserved Fat jar details: - Size: 1.9MB (minimized with shadow plugin) - Location: build/libs/profiling-otel-*-cli.jar - Main-Class manifest entry for direct execution - Excludes unnecessary SLF4J service providers Documentation: - Updated CLI.md to highlight performance improvements - Noted fat jar usage instead of Gradle task Example: ./convert-jfr.sh --diagnostics recording.jfr output.pb Total time: 0.4s (vs 13+ seconds with Gradle) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Simplified the conversion script output to avoid duplicate information:
Default mode (no flags):
- Single concise line: "[SUCCESS] Converted: output.pb (45.2KB, 127ms)"
- No verbose converter output shown
- Perfect for scripting and quick conversions
Diagnostics mode (--diagnostics):
- Shows converter's detailed output (files, format, time)
- Enhanced diagnostics section with compression metrics
- Clear input→output flow visualization
- Space savings calculations
Changes:
- Removed duplicate "Converting..." and "Conversion complete" messages
- Eliminated redundant output file info in default mode
- Consolidated size/time reporting
- Renamed section to "Enhanced Diagnostics" to distinguish from converter output
Example outputs:
Default:
[SUCCESS] Converted: output.pb (45.2KB, 127ms)
With --diagnostics:
[DIAG] Input: recording.jfr (89.3KB)
Converting 1 JFR file(s) to OTLP format...
Adding: recording.jfr
Conversion complete!
Output: output.pb
Format: PROTO
Size: 45.2 KB
Time: 127 ms
[DIAG] === Enhanced Diagnostics ===
[DIAG] Input → Output: 89.3KB → 45.2KB
[DIAG] Compression: 50.6% of original
[DIAG] Space saved: 44.1KB (49.4% reduction)
Documentation updated in CLI.md with both output examples.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- IOLogger: datadog.trace.relocate.api → datadog.logging - getFile() → getPath() in OtlpProfileUploader and JfrToOtlpConverter - ScrubbedRecordingData: override doRelease() instead of final release() - libs.versions.toml: restore jafar-tools entry for profiling-scrubber Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
52c579e to
340ae63
Compare
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 62 metrics, 9 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~633db7ca1c, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.059 s) : 0, 1058894
Total [baseline] (11.086 s) : 0, 11085944
Agent [candidate] (1.065 s) : 0, 1065397
Total [candidate] (11.126 s) : 0, 11126378
section appsec
Agent [baseline] (1.263 s) : 0, 1263060
Total [baseline] (11.027 s) : 0, 11027464
Agent [candidate] (1.266 s) : 0, 1265766
Total [candidate] (11.063 s) : 0, 11063206
section iast
Agent [baseline] (1.242 s) : 0, 1241718
Total [baseline] (11.312 s) : 0, 11311800
Agent [candidate] (1.232 s) : 0, 1232323
Total [candidate] (11.305 s) : 0, 11304627
section profiling
Agent [baseline] (1.187 s) : 0, 1187251
Total [baseline] (11.013 s) : 0, 11013293
Agent [candidate] (1.2 s) : 0, 1199721
Total [candidate] (11.065 s) : 0, 11064570
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~633db7ca1c, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.226 ms) : 0, 1226
crashtracking [candidate] (1.223 ms) : 0, 1223
BytebuddyAgent [baseline] (632.05 ms) : 0, 632050
BytebuddyAgent [candidate] (636.37 ms) : 0, 636370
AgentMeter [baseline] (29.587 ms) : 0, 29587
AgentMeter [candidate] (29.774 ms) : 0, 29774
GlobalTracer [baseline] (248.904 ms) : 0, 248904
GlobalTracer [candidate] (250.151 ms) : 0, 250151
AppSec [baseline] (32.435 ms) : 0, 32435
AppSec [candidate] (32.713 ms) : 0, 32713
Debugger [baseline] (59.947 ms) : 0, 59947
Debugger [candidate] (60.229 ms) : 0, 60229
Remote Config [baseline] (603.226 µs) : 0, 603
Remote Config [candidate] (605.015 µs) : 0, 605
Telemetry [baseline] (8.065 ms) : 0, 8065
Telemetry [candidate] (8.111 ms) : 0, 8111
Flare Poller [baseline] (9.881 ms) : 0, 9881
Flare Poller [candidate] (10.073 ms) : 0, 10073
section appsec
crashtracking [baseline] (1.245 ms) : 0, 1245
crashtracking [candidate] (1.233 ms) : 0, 1233
BytebuddyAgent [baseline] (673.594 ms) : 0, 673594
BytebuddyAgent [candidate] (678.253 ms) : 0, 678253
AgentMeter [baseline] (12.179 ms) : 0, 12179
AgentMeter [candidate] (12.191 ms) : 0, 12191
GlobalTracer [baseline] (250.019 ms) : 0, 250019
GlobalTracer [candidate] (249.862 ms) : 0, 249862
IAST [baseline] (24.488 ms) : 0, 24488
IAST [candidate] (24.21 ms) : 0, 24210
AppSec [baseline] (186.897 ms) : 0, 186897
AppSec [candidate] (185.16 ms) : 0, 185160
Debugger [baseline] (66.134 ms) : 0, 66134
Debugger [candidate] (66.592 ms) : 0, 66592
Remote Config [baseline] (582.763 µs) : 0, 583
Remote Config [candidate] (569.9 µs) : 0, 570
Telemetry [baseline] (7.993 ms) : 0, 7993
Telemetry [candidate] (7.859 ms) : 0, 7859
Flare Poller [baseline] (3.53 ms) : 0, 3530
Flare Poller [candidate] (3.431 ms) : 0, 3431
section iast
crashtracking [baseline] (1.25 ms) : 0, 1250
crashtracking [candidate] (1.228 ms) : 0, 1228
BytebuddyAgent [baseline] (816.429 ms) : 0, 816429
BytebuddyAgent [candidate] (809.739 ms) : 0, 809739
AgentMeter [baseline] (11.546 ms) : 0, 11546
AgentMeter [candidate] (11.444 ms) : 0, 11444
GlobalTracer [baseline] (240.453 ms) : 0, 240453
GlobalTracer [candidate] (239.122 ms) : 0, 239122
IAST [baseline] (28.436 ms) : 0, 28436
IAST [candidate] (29.96 ms) : 0, 29960
AppSec [baseline] (30.431 ms) : 0, 30431
AppSec [candidate] (28.38 ms) : 0, 28380
Debugger [baseline] (65.137 ms) : 0, 65137
Debugger [candidate] (64.747 ms) : 0, 64747
Remote Config [baseline] (540.102 µs) : 0, 540
Remote Config [candidate] (533.41 µs) : 0, 533
Telemetry [baseline] (7.821 ms) : 0, 7821
Telemetry [candidate] (7.742 ms) : 0, 7742
Flare Poller [baseline] (3.429 ms) : 0, 3429
Flare Poller [candidate] (3.409 ms) : 0, 3409
section profiling
crashtracking [baseline] (1.192 ms) : 0, 1192
crashtracking [candidate] (1.205 ms) : 0, 1205
BytebuddyAgent [baseline] (692.601 ms) : 0, 692601
BytebuddyAgent [candidate] (700.857 ms) : 0, 700857
AgentMeter [baseline] (9.198 ms) : 0, 9198
AgentMeter [candidate] (9.274 ms) : 0, 9274
GlobalTracer [baseline] (207.082 ms) : 0, 207082
GlobalTracer [candidate] (209.028 ms) : 0, 209028
AppSec [baseline] (33.039 ms) : 0, 33039
AppSec [candidate] (33.258 ms) : 0, 33258
Debugger [baseline] (66.295 ms) : 0, 66295
Debugger [candidate] (65.886 ms) : 0, 65886
Remote Config [baseline] (590.47 µs) : 0, 590
Remote Config [candidate] (585.033 µs) : 0, 585
Telemetry [baseline] (7.822 ms) : 0, 7822
Telemetry [candidate] (8.78 ms) : 0, 8780
Flare Poller [baseline] (3.57 ms) : 0, 3570
Flare Poller [candidate] (3.663 ms) : 0, 3663
ProfilingAgent [baseline] (94.308 ms) : 0, 94308
ProfilingAgent [candidate] (95.522 ms) : 0, 95522
Profiling [baseline] (94.877 ms) : 0, 94877
Profiling [candidate] (96.089 ms) : 0, 96089
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~633db7ca1c, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.06 s) : 0, 1060291
Total [baseline] (8.809 s) : 0, 8808807
Agent [candidate] (1.065 s) : 0, 1064946
Total [candidate] (8.878 s) : 0, 8878252
section iast
Agent [baseline] (1.241 s) : 0, 1241231
Total [baseline] (9.599 s) : 0, 9599195
Agent [candidate] (1.226 s) : 0, 1226402
Total [candidate] (9.547 s) : 0, 9546976
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~633db7ca1c, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.231 ms) : 0, 1231
crashtracking [candidate] (1.222 ms) : 0, 1222
BytebuddyAgent [baseline] (635.056 ms) : 0, 635056
BytebuddyAgent [candidate] (637.437 ms) : 0, 637437
AgentMeter [baseline] (29.577 ms) : 0, 29577
AgentMeter [candidate] (29.83 ms) : 0, 29830
GlobalTracer [baseline] (248.764 ms) : 0, 248764
GlobalTracer [candidate] (250.687 ms) : 0, 250687
AppSec [baseline] (32.409 ms) : 0, 32409
AppSec [candidate] (32.744 ms) : 0, 32744
Debugger [baseline] (59.105 ms) : 0, 59105
Debugger [candidate] (59.917 ms) : 0, 59917
Remote Config [baseline] (595.772 µs) : 0, 596
Remote Config [candidate] (601.85 µs) : 0, 602
Telemetry [baseline] (8.82 ms) : 0, 8820
Telemetry [candidate] (8.112 ms) : 0, 8112
Flare Poller [baseline] (8.48 ms) : 0, 8480
Flare Poller [candidate] (8.23 ms) : 0, 8230
section iast
crashtracking [baseline] (1.241 ms) : 0, 1241
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (817.433 ms) : 0, 817433
BytebuddyAgent [candidate] (805.729 ms) : 0, 805729
AgentMeter [baseline] (11.563 ms) : 0, 11563
AgentMeter [candidate] (11.391 ms) : 0, 11391
GlobalTracer [baseline] (240.311 ms) : 0, 240311
GlobalTracer [candidate] (238.392 ms) : 0, 238392
AppSec [baseline] (28.437 ms) : 0, 28437
AppSec [candidate] (29.251 ms) : 0, 29251
Debugger [baseline] (62.536 ms) : 0, 62536
Debugger [candidate] (63.814 ms) : 0, 63814
Remote Config [baseline] (538.5 µs) : 0, 539
Remote Config [candidate] (523.891 µs) : 0, 524
Telemetry [baseline] (7.771 ms) : 0, 7771
Telemetry [candidate] (7.681 ms) : 0, 7681
Flare Poller [baseline] (3.395 ms) : 0, 3395
Flare Poller [candidate] (3.395 ms) : 0, 3395
IAST [baseline] (31.008 ms) : 0, 31008
IAST [candidate] (29.043 ms) : 0, 29043
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 2 performance regressions! Performance is the same for 16 metrics, 18 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~633db7ca1c, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section baseline
no_agent (17.917 ms) : 17740, 18093
. : milestone, 17917,
appsec (18.695 ms) : 18506, 18885
. : milestone, 18695,
code_origins (17.86 ms) : 17684, 18037
. : milestone, 17860,
iast (17.752 ms) : 17577, 17926
. : milestone, 17752,
profiling (18.166 ms) : 17986, 18345
. : milestone, 18166,
tracing (17.649 ms) : 17475, 17824
. : milestone, 17649,
section candidate
no_agent (19.025 ms) : 18838, 19212
. : milestone, 19025,
appsec (18.788 ms) : 18600, 18976
. : milestone, 18788,
code_origins (17.922 ms) : 17752, 18092
. : milestone, 17922,
iast (17.77 ms) : 17596, 17944
. : milestone, 17770,
profiling (255.107 µs) : 244, 266
. : milestone, 255,
tracing (17.79 ms) : 17620, 17960
. : milestone, 17790,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~633db7ca1c, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section baseline
no_agent (1.251 ms) : 1239, 1264
. : milestone, 1251,
iast (3.4 ms) : 3354, 3447
. : milestone, 3400,
iast_FULL (5.808 ms) : 5749, 5866
. : milestone, 5808,
iast_GLOBAL (3.703 ms) : 3641, 3764
. : milestone, 3703,
profiling (2.196 ms) : 2177, 2215
. : milestone, 2196,
tracing (1.897 ms) : 1881, 1912
. : milestone, 1897,
section candidate
no_agent (1.271 ms) : 1259, 1283
. : milestone, 1271,
iast (3.45 ms) : 3405, 3494
. : milestone, 3450,
iast_FULL (6.025 ms) : 5964, 6087
. : milestone, 6025,
iast_GLOBAL (3.735 ms) : 3679, 3792
. : milestone, 3735,
profiling (374.702 µs) : 367, 382
. : milestone, 375,
tracing (1.889 ms) : 1874, 1904
. : milestone, 1889,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~633db7ca1c, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section baseline
no_agent (15.522 s) : 15522000, 15522000
. : milestone, 15522000,
appsec (14.738 s) : 14738000, 14738000
. : milestone, 14738000,
iast (18.471 s) : 18471000, 18471000
. : milestone, 18471000,
iast_GLOBAL (18.002 s) : 18002000, 18002000
. : milestone, 18002000,
profiling (14.981 s) : 14981000, 14981000
. : milestone, 14981000,
tracing (14.744 s) : 14744000, 14744000
. : milestone, 14744000,
section candidate
no_agent (15.524 s) : 15524000, 15524000
. : milestone, 15524000,
appsec (14.869 s) : 14869000, 14869000
. : milestone, 14869000,
iast (18.552 s) : 18552000, 18552000
. : milestone, 18552000,
iast_GLOBAL (18.318 s) : 18318000, 18318000
. : milestone, 18318000,
profiling (15.089 s) : 15089000, 15089000
. : milestone, 15089000,
tracing (14.814 s) : 14814000, 14814000
. : milestone, 14814000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~633db7ca1c, baseline=1.62.0-SNAPSHOT~c72f06780f
dateFormat X
axisFormat %s
section baseline
no_agent (1.486 ms) : 1474, 1497
. : milestone, 1486,
appsec (3.829 ms) : 3608, 4050
. : milestone, 3829,
iast (2.263 ms) : 2194, 2333
. : milestone, 2263,
iast_GLOBAL (2.325 ms) : 2255, 2395
. : milestone, 2325,
profiling (2.119 ms) : 2064, 2175
. : milestone, 2119,
tracing (2.078 ms) : 2024, 2131
. : milestone, 2078,
section candidate
no_agent (1.496 ms) : 1484, 1507
. : milestone, 1496,
appsec (3.827 ms) : 3605, 4048
. : milestone, 3827,
iast (2.276 ms) : 2207, 2346
. : milestone, 2276,
iast_GLOBAL (2.319 ms) : 2249, 2389
. : milestone, 2319,
profiling (2.099 ms) : 2044, 2154
. : milestone, 2099,
tracing (2.078 ms) : 2025, 2132
. : milestone, 2078,
|
- Swap log messages in DatadogProfiler set/clear span context - Fix timestamp precision loss (millis→nanos) in JfrToOtlpConverter - Fix TOCTOU: move convert() inside try block before temp file deletion - Remove dead handleRecordingData/handleRecordingDataWithDump methods - Fix retain leak in OTLP listener lambda on upload exception - Fix retain/release TOCTOU race in RecordingData with synchronized - Eliminate URL derivation duplication in OtlpProfileUploader - Fix pipe exit code masking in convert-jfr.sh - Clarify DatadogProfilingScope.close() no-op comment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
convertStackTrace signature changed; replace reflection with file-based API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Share ParsingContext across parse calls to avoid ServiceLoader per open - Cache FunctionKey and LocationKey hashCode (was boxing+Object[] per lookup) - Reuse ProtobufEncoder across convert() calls via reset() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements a complete JFR to OTLP (OpenTelemetry Protocol) profiles converter with comprehensive validation, performance optimizations, and a high-performance CLI tool.
Core Implementation
OTLP Profiles Format Support
{stack_index, attribute_indices, link_index}JFR Event Type Support
datadog.ExecutionSample→ cpu/samplesdatadog.MethodSample→ wall/samplesdatadog.ObjectSample→ alloc-samples/bytes withobjectClassattributesjdk.JavaMonitorEnter,jdk.JavaMonitorWait→ lock-contention/nanosecondsPerformance Optimizations
Original Payload Support
original_payload+original_payload_formatfieldsSequenceInputStream)Usage
Validation & Compliance
CLI Tool
Features
Usage
References
Note: R&D work for OTLP profile support. May still be changing.
🤖 Generated with Claude Code