|
| 1 | +# Native-Image Build Fix Status |
| 2 | + |
| 3 | +## Problem Statement |
| 4 | +Adding the `profiling-scrubber` module triggered 44 "unintentionally initialized at build time" errors when building with GraalVM native-image and profiler enabled (`-J-javaagent` during compilation). |
| 5 | + |
| 6 | +## Root Cause Identified |
| 7 | + |
| 8 | +**The initialization cascade was caused by Exception Profiling instrumentation:** |
| 9 | + |
| 10 | +Using `--trace-class-initialization`, we discovered: |
| 11 | +``` |
| 12 | +datadog.trace.bootstrap.CallDepthThreadLocalMap caused initialization at build time: |
| 13 | + at datadog.trace.bootstrap.CallDepthThreadLocalMap.<clinit>(CallDepthThreadLocalMap.java:13) |
| 14 | + at datadog.trace.bootstrap.instrumentation.jfr.exceptions.ExceptionProfiling$Exclusion.isEffective(ExceptionProfiling.java:49) |
| 15 | + at java.lang.Exception.<init>(Exception.java:86) |
| 16 | + at java.lang.ReflectiveOperationException.<init>(ReflectiveOperationException.java:76) |
| 17 | + at java.lang.ClassNotFoundException.<init>(ClassNotFoundException.java:71) |
| 18 | +``` |
| 19 | + |
| 20 | +**Why this happens:** |
| 21 | +1. Agent attaches via `-J-javaagent` during native-image compilation |
| 22 | +2. OpenJdkController constructor runs and starts ExceptionProfiling |
| 23 | +3. GraalVM throws exceptions during class scanning |
| 24 | +4. Instrumented Exception constructor triggers ExceptionProfiling code |
| 25 | +5. This initializes CallDepthThreadLocalMap and 43 other config/bootstrap classes at build time |
| 26 | + |
| 27 | +## Solution Applied |
| 28 | + |
| 29 | +**Disable exception profiling during native-image build via configuration:** |
| 30 | + |
| 31 | +Modified: `dd-smoke-tests/spring-boot-3.0-native/application/build.gradle` |
| 32 | +```gradle |
| 33 | +if (withProfiler && property('profiler') == 'true') { |
| 34 | + buildArgs.add("-J-Ddd.profiling.enabled=true") |
| 35 | + // Disable exception profiling during native-image build to avoid class initialization cascade |
| 36 | + buildArgs.add("-J-Ddd.profiling.disabled.events=datadog.ExceptionSample") |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +## Results |
| 41 | + |
| 42 | +### ✅ SUCCESS: Initialization Errors Fixed |
| 43 | +- **Before:** 44 classes unintentionally initialized at build time |
| 44 | +- **After:** 0 initialization errors |
| 45 | + |
| 46 | +The configuration approach successfully prevents ExceptionProfiling from starting during native-image compilation, eliminating the entire initialization cascade. |
| 47 | + |
| 48 | +### ⚠️ NEW ISSUE: JVM Crash During Native-Image Build |
| 49 | + |
| 50 | +The build now fails with a JVM fatal error: |
| 51 | +``` |
| 52 | +SIGBUS (0xa) at pc=0x00000001067aa404 |
| 53 | +Problematic frame: V [libjvm.dylib+0x8be404] PSRootsClosure<false>::do_oop(narrowOop*)+0x48 |
| 54 | +``` |
| 55 | + |
| 56 | +**Error details:** |
| 57 | +- Crash occurs during garbage collection (Parallel Scavenge) |
| 58 | +- Happens while processing JavaThread frames |
| 59 | +- Stack trace shows agent's bytecode instrumentation is active: |
| 60 | + - `datadog.instrument.classmatch.ClassFile.parse` |
| 61 | + - `datadog.trace.agent.tooling.bytebuddy.outline.OutlineTypeParser.parse` |
| 62 | + - `datadog.trace.agent.tooling.bytebuddy.outline.TypeFactory.lookupType` |
| 63 | + |
| 64 | +**Error report:** `dd-smoke-tests/spring-boot-3.0-native/build/application/native/nativeCompile/hs_err_pid*.log` |
| 65 | + |
| 66 | +## Files Modified |
| 67 | + |
| 68 | +1. **dd-java-agent/agent-profiling/profiling-scrubber/build.gradle** |
| 69 | + - Removed unnecessary `internal-api` dependency (profiling-scrubber doesn't use it) |
| 70 | + |
| 71 | +2. **dd-java-agent/agent-profiling/src/main/java/com/datadog/profiling/agent/ProfilingAgent.java** |
| 72 | + - Removed static import of `PROFILING_TEMP_DIR_DEFAULT` (had System.getProperty in initializer) |
| 73 | + - Changed to runtime computation: `System.getProperty("java.io.tmpdir")` at line 162-163 |
| 74 | + |
| 75 | +3. **dd-java-agent/agent-profiling/profiling-controller/src/main/java/com/datadog/profiling/controller/ProfilerFlareReporter.java** |
| 76 | + - Line ~229: Replaced `PROFILING_JFR_REPOSITORY_BASE_DEFAULT` with runtime computation |
| 77 | + - Line ~507: Replaced `PROFILING_TEMP_DIR_DEFAULT` with runtime computation |
| 78 | + |
| 79 | +4. **dd-java-agent/agent-profiling/profiling-controller-openjdk/src/main/java/com/datadog/profiling/controller/openjdk/OpenJdkController.java** |
| 80 | + - Line ~275: Replaced `PROFILING_JFR_REPOSITORY_BASE_DEFAULT` with runtime computation |
| 81 | + - **Note:** This file is clean - no native-image detection code added |
| 82 | + |
| 83 | +5. **dd-smoke-tests/spring-boot-3.0-native/application/build.gradle** |
| 84 | + - Added `-J-Ddd.profiling.disabled.events=datadog.ExceptionSample` to disable exception profiling during build |
| 85 | + - Added trace flag (temporary, for debugging): `--trace-class-initialization=datadog.trace.bootstrap.CallDepthThreadLocalMap` |
| 86 | + |
| 87 | +## Next Steps |
| 88 | + |
| 89 | +The JVM crash during native-image build needs investigation: |
| 90 | + |
| 91 | +### Option 1: Investigate GC Crash |
| 92 | +- The crash occurs in Parallel GC during thread stack scanning |
| 93 | +- May be related to agent's bytecode instrumentation interfering with GC |
| 94 | +- Could try different GC algorithm or adjust heap settings |
| 95 | + |
| 96 | +### Option 2: Reduce Agent Footprint During Build |
| 97 | +- The agent performs extensive bytecode parsing during native-image compilation |
| 98 | +- Consider disabling more agent features during build (not just exception profiling) |
| 99 | +- Possible flags to try: |
| 100 | + - `-J-Ddd.instrumentation.enabled=false` (if such flag exists) |
| 101 | + - Reduce instrumentation scope during native-image compilation |
| 102 | + |
| 103 | +### Option 3: Check for Known Issues |
| 104 | +- Search for similar SIGBUS crashes with GraalVM + Java agents |
| 105 | +- Check if this is a known GraalVM 21.0.9 issue |
| 106 | +- Test with different GraalVM version |
| 107 | + |
| 108 | +### Option 4: Alternative Approach |
| 109 | +- Consider NOT attaching agent during native-image build |
| 110 | +- Configure agent to attach only at runtime in the compiled native-image |
| 111 | +- May require changes to how profiling is initialized |
| 112 | + |
| 113 | +## Testing Commands |
| 114 | + |
| 115 | +```bash |
| 116 | +# Rebuild agent |
| 117 | +./gradlew :dd-java-agent:shadowJar |
| 118 | + |
| 119 | +# Test native-image build with profiler |
| 120 | +./gradlew :dd-smoke-tests:spring-boot-3.0-native:springNativeBuild \ |
| 121 | + -PtestJvm=graalvm21 -Pprofiler=true --no-daemon |
| 122 | + |
| 123 | +# Check initialization errors (should be 0) |
| 124 | +grep -c "was unintentionally initialized" \ |
| 125 | + build/logs/*springNativeBuild.log |
| 126 | + |
| 127 | +# View JVM crash report |
| 128 | +ls -t dd-smoke-tests/spring-boot-3.0-native/build/application/native/nativeCompile/hs_err_pid*.log | head -1 |
| 129 | +``` |
| 130 | + |
| 131 | +## Key Learnings |
| 132 | + |
| 133 | +1. **Static imports with method calls trigger initialization:** Importing constants like `PROFILING_TEMP_DIR_DEFAULT = System.getProperty("java.io.tmpdir")` causes GraalVM to initialize classes at build time. |
| 134 | + |
| 135 | +2. **Exception profiling is a major trigger:** When the agent is active during native-image compilation, any exceptions thrown (e.g., ClassNotFoundException during class scanning) trigger instrumentation that initializes many config classes. |
| 136 | + |
| 137 | +3. **Configuration-based disable works:** Disabling JFR events via `-Ddd.profiling.disabled.events` successfully prevents initialization without needing runtime detection code. |
| 138 | + |
| 139 | +4. **Avoid detection during initialization:** Any attempt to detect "are we in native-image compilation" (Class.forName, getResource, etc.) can itself trigger the cascade we're trying to avoid. |
| 140 | + |
| 141 | +5. **Agent + GraalVM + GC = fragile:** The combination of active bytecode instrumentation, GraalVM native-image compilation, and aggressive GC can cause JVM crashes. |
| 142 | + |
| 143 | +## Branch Status |
| 144 | + |
| 145 | +- Branch: `jb/jfr_redacting` |
| 146 | +- All changes committed and ready to push |
| 147 | +- Initialization cascade: FIXED ✅ |
| 148 | +- Native-image build: CRASHES ⚠️ |
0 commit comments