The HotSpot JVM (Java Virtual Machine) is the primary execution engine for Java
applications. Developed by Sun Microsystems and now maintained by Oracle, HotSpot
is the default JVM included in the Oracle JDK and OpenJDK distributions. It earned
its name from its ability to identify and optimize "hot spots" in code—sections
that execute frequently—through Just-In-Time (JIT) compilation.
HotSpot revolutionized Java performance by introducing adaptive optimization
techniques that allow Java applications to achieve near-native execution speeds.
Unlike static compilers that optimize code before execution, HotSpot continuously
monitors application behavior at runtime and applies targeted optimizations to
the most performance-critical code paths.
Understanding HotSpot's internals is essential for Java developers who want to
write high-performance applications, tune JVM settings for production workloads,
or diagnose performance issues. This document provides a comprehensive overview
of HotSpot's architecture, execution model, garbage collection, memory management,
and performance tuning techniques.
HotSpot's origins trace back to the mid-1990s at Longview Technologies, which
was acquired by Sun Microsystems in 1997. The first HotSpot JVM was released with
Java 1.2, and it became the default JVM starting with Java 1.3. Over the years,
HotSpot has evolved significantly, incorporating modern garbage collectors,
tiered compilation, and integration with advanced technologies like GraalVM.
HotSpot's architecture is designed to balance startup performance with peak
throughput. The JVM consists of several key components that work together to
load, verify, compile, and execute Java bytecode efficiently.
┌─────────────────────────────────────────────────────────────────────────────┐
│ HotSpot JVM Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Class Loader Subsystem │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Bootstrap │ │ Extension │ │ Application │ │ │
│ │ │ Loader │ │ Loader │ │ Loader │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Runtime Data Areas │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ Method │ │ Heap │ │ Per-Thread Areas │ │ │
│ │ │ Area │ │ (Objects) │ │ ┌───────┐ ┌────────────┐ │ │ │
│ │ │ (Metaspace) │ │ │ │ │ Stack │ │ PC Register│ │ │ │
│ │ └─────────────┘ └─────────────┘ │ └───────┘ └────────────┘ │ │ │
│ │ └─────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Execution Engine │ │
│ │ ┌─────────────┐ ┌─────────────────────────────┐ ┌─────────────┐ │ │
│ │ │ Interpreter │ │ JIT Compiler │ │ Garbage │ │ │
│ │ │ │ │ ┌─────┐ ┌─────────┐ │ │ Collector │ │ │
│ │ │ │ │ │ C1 │ │ C2 │ │ │ │ │ │
│ │ │ │ │ │Client │ Server │ │ │ │ │ │
│ │ └─────────────┘ │ └─────┘ └─────────┘ │ └─────────────┘ │ │
│ │ └─────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The class loader subsystem is responsible for loading, linking, and initializing
Java classes. HotSpot uses a hierarchical class loading model with three main
class loaders that work together using the delegation principle.
| Loader Type | Responsibility |
|---|---|
| Bootstrap | Loads core Java classes from rt.jar (or modules in Java 9+) |
| Extension/Platform | Loads classes from extension directories |
| Application | Loads classes from the application classpath |
When loading a class, each loader first delegates to its parent. Only if the
parent cannot find the class does the child attempt to load it. This ensures
that core Java classes are always loaded by the bootstrap loader, preventing
malicious code from replacing critical system classes.
HotSpot manages several memory regions during program execution:
| Memory Area | Shared? | Purpose |
|---|---|---|
| Heap | Yes | Stores all object instances |
| Metaspace | Yes | Stores class metadata, replaces PermGen |
| Code Cache | Yes | Stores JIT-compiled native code |
| Thread Stacks | No | Per-thread call stacks and local variables |
| PC Registers | No | Per-thread program counter for bytecode execution |
The execution engine transforms bytecode into machine code through multiple stages:
Interpreter: Provides immediate execution of bytecode without compilation
overhead. Essential for fast startup and rarely-executed code paths.
C1 Compiler (Client): A fast compiler that produces moderately optimized
code quickly. Ideal for methods that need optimization sooner but may not
benefit from expensive analysis.
C2 Compiler (Server): A highly optimizing compiler that produces the fastest
code but takes longer to compile. Applied to "hot" methods that benefit from
aggressive optimizations like inlining and loop unrolling.
Garbage Collector: Automatically reclaims memory from objects that are no
longer reachable. HotSpot offers multiple GC algorithms optimized for different
workload characteristics.
HotSpot uses an adaptive execution model that balances startup performance with
peak throughput. When a Java application starts, methods are initially executed
by the interpreter. As methods are called repeatedly, they become candidates for
JIT compilation.
The interpreter executes Java bytecode instruction by instruction. While slower
than compiled code, interpretation provides several benefits:
- Fast startup: No compilation delay before execution begins
- Low memory footprint: No need to store compiled code
- Accurate profiling: Execution counters guide optimization decisions
- Debugging support: Easier to maintain source-level debugging
void main() {
// This method starts in the interpreter
// Each bytecode instruction is decoded and executed
int sum = 0;
for (int i = 0; i < 10; i++) {
sum += computeValue(i);
}
IO.println("Sum: " + sum);
}
int computeValue(int x) {
// If called frequently, this becomes a "hot" method
// and will be compiled by the JIT compiler
return x * x + 2 * x + 1;
}During interpretation, HotSpot maintains invocation counters for methods and
back-edge counters for loops. When these counters exceed configured thresholds,
the method becomes eligible for JIT compilation.
Modern HotSpot uses tiered compilation, which combines the fast compilation of
C1 with the high-quality code of C2. Methods progress through compilation tiers
based on their execution frequency and profiling data.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Tiered Compilation Levels │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Level 0: Interpreter │
│ │ │
│ │ Method invocation count exceeds threshold │
│ ▼ │
│ Level 1: C1 with Full Optimization (no profiling) │
│ │ - Used for trivial methods │
│ │ │
│ Level 2: C1 with Limited Profiling │
│ │ - Invocation and back-edge counters │
│ ▼ │
│ Level 3: C1 with Full Profiling │
│ │ - Type profiling, branch profiling │
│ │ - Used to collect data for C2 │
│ ▼ │
│ Level 4: C2 with Full Optimization │
│ - Maximum optimization based on profiling data │
│ - Speculative optimizations with deoptimization guards │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Level | Compiler | Profiling | Use Case |
|---|---|---|---|
| 0 | None | Basic | Initial execution, cold code |
| 1 | C1 | None | Trivial methods, getters/setters |
| 2 | C1 | Limited | Limited profiling for quick warmup |
| 3 | C1 | Full | Collecting data for C2 compilation |
| 4 | C2 | Uses L3 | Hot methods requiring maximum speed |
HotSpot's adaptive optimization continuously monitors application behavior and
adjusts compilation strategies accordingly. The JVM collects several types of
profiling information:
Invocation Counts: How often each method is called
Back-Edge Counts: How many loop iterations execute
Type Profiles: Actual types of objects at call sites
Branch Profiles: Which branches are taken and how often
void main() {
// Type profiling example
List<String> items = new ArrayList<>();
items.add("one");
items.add("two");
items.add("three");
// HotSpot profiles that 'items' is always ArrayList
// This enables virtual call devirtualization
int totalLength = 0;
for (var item : items) {
totalLength += item.length();
}
IO.println("Total length: " + totalLength);
}The JIT compiler uses profiling data to make speculative optimizations. If the
runtime behavior changes and an optimization becomes invalid, HotSpot can
deoptimize—reverting compiled code back to interpretation and recompiling
with updated profiling information.
The JIT compiler is the heart of HotSpot's performance. It translates Java
bytecode into highly optimized native machine code at runtime, achieving
performance comparable to statically compiled languages.
HotSpot identifies "hot" methods through execution counters. Methods that exceed
compilation thresholds are queued for background compilation. The thresholds are
configurable via JVM flags:
| Flag | Default | Description |
|---|---|---|
-XX:CompileThreshold |
10000 | Invocations before C2 compile |
-XX:Tier3InvocationThreshold |
200 | Invocations before C1 L3 |
-XX:Tier4InvocationThreshold |
5000 | Invocations before C2 from L3 |
HotSpot's JIT compilers apply numerous optimizations to improve performance:
Inlining: Replaces method calls with the method body, eliminating call
overhead and enabling further optimizations.
void main() {
long sum = 0;
// Without inlining: method call overhead for each iteration
// With inlining: getSquare code is inserted directly into loop
for (int i = 0; i < 1000000; i++) {
sum += getSquare(i);
}
IO.println("Sum of squares: " + sum);
}
// This small method is an ideal inlining candidate
int getSquare(int n) {
return n * n;
}Escape Analysis: Determines if objects are accessible outside their
allocating scope. Non-escaping objects can be stack-allocated or eliminated
entirely.
void main() {
long total = 0;
for (int i = 0; i < 1000000; i++) {
// The Point object doesn't escape this method
// HotSpot may eliminate the allocation entirely
var point = new Point(i, i * 2);
total += point.distance();
}
IO.println("Total distance: " + total);
}
record Point(int x, int y) {
double distance() {
return Math.sqrt(x * x + y * y);
}
}Loop Unrolling: Replicates loop bodies to reduce loop overhead and enable
instruction-level parallelism.
Dead Code Elimination: Removes code that has no effect on program output.
Constant Folding: Evaluates constant expressions at compile time.
Bounds Check Elimination: Removes redundant array bounds checks when the
compiler can prove safety.
Null Check Elimination: Removes null checks when the compiler can prove
non-nullness.
Monitor JIT compilation activity with these flags:
-XX:+PrintCompilation # Log compilation events
-XX:+UnlockDiagnosticVMOptions # Enable diagnostic options
-XX:+PrintInlining # Show inlining decisions
-XX:+LogCompilation # Generate detailed XML logs
void main() {
// Run with -XX:+PrintCompilation to see compilation output
// Methods will show compilation tier and timing
var startTime = System.nanoTime();
long result = 0;
for (int i = 0; i < 10_000_000; i++) {
result += fibonacci(20);
}
var endTime = System.nanoTime();
var durationMs = (endTime - startTime) / 1_000_000;
IO.println("Result: " + result);
IO.println("Duration: " + durationMs + " ms");
}
int fibonacci(int n) {
if (n <= 1) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}After repeated invocations, the fibonacci method will be compiled by C2,
resulting in significantly faster execution compared to interpretation.
Garbage collection (GC) is the process of automatically reclaiming memory
occupied by objects that are no longer reachable by the application. HotSpot
provides multiple garbage collectors, each optimized for different workload
characteristics.
HotSpot's garbage collectors are based on the generational hypothesis:
- Most objects die young (are short-lived)
- Few references from old objects to young objects exist
This observation enables efficient collection by focusing on the young
generation, where most garbage resides.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Heap Memory Structure │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────┐ ┌───────────────────────────────┐ │
│ │ Young Generation │ │ Old Generation │ │
│ │ ┌───────┐ ┌───────┐ ┌───────┐ │ │ │ │
│ │ │ Eden │ │ S0 │ │ S1 │ │ │ Tenured/Old Objects │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ New │ │Survivor│Survivor│ │ │ (Long-lived objects that │ │
│ │ │Objects│ │ From │ │ To │ │ │ survived multiple GCs) │ │
│ │ └───────┘ └───────┘ └───────┘ │ │ │ │
│ └───────────────────────────────────┘ └───────────────────────────────┘ │
│ │
│ Object lifecycle: │
│ 1. New objects allocated in Eden │
│ 2. Surviving objects move to Survivor spaces │
│ 3. After aging, objects promote to Old generation │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Collector | Young GC | Old GC | Pause Goal | Best For |
|---|---|---|---|---|
| Serial GC | Serial | Serial | None | Small apps, testing |
| Parallel GC | Parallel | Parallel | Throughput | Batch processing |
| G1 GC | Parallel | Concurrent | Predictable | General purpose |
| ZGC | Concurrent | Concurrent | Sub-millisecond | Low latency, large heaps |
| Shenandoah | Concurrent | Concurrent | Low latency | Low latency |
Serial GC (-XX:+UseSerialGC): Single-threaded collector suitable for
single-processor machines or applications with small heaps and minimal pause
time requirements.
Parallel GC (-XX:+UseParallelGC): Multi-threaded collector optimized for
throughput. Performs stop-the-world collections using multiple threads.
G1 GC (-XX:+UseG1GC): The default collector since Java 9. Divides the
heap into regions and collects garbage incrementally to meet pause time targets.
ZGC (-XX:+UseZGC): Scalable, low-latency collector with pause times under
10 milliseconds regardless of heap size. Ideal for large heaps and latency-
sensitive applications.
Shenandoah (-XX:+UseShenandoahGC): Similar to ZGC with concurrent
compaction. Available in OpenJDK but not Oracle JDK.
void main() {
// Demonstrate memory allocation patterns
var objects = new ArrayList<byte[]>();
IO.println("Allocating objects...");
for (int i = 0; i < 100; i++) {
// Allocate 1 MB chunks
objects.add(new byte[1024 * 1024]);
// Occasionally release some to trigger GC
if (i % 20 == 0 && i > 0) {
objects.subList(0, 10).clear();
}
}
IO.println("Allocated " + objects.size() + " MB worth of objects");
// Force GC for observation (don't do this in production)
System.gc();
Runtime runtime = Runtime.getRuntime();
long usedMemory = runtime.totalMemory() - runtime.freeMemory();
IO.println("Used memory: " + (usedMemory / 1024 / 1024) + " MB");
}| Requirement | Recommended GC | Key Flags |
|---|---|---|
| Maximum throughput | Parallel GC | -XX:+UseParallelGC |
| Predictable pause times < 200ms | G1 GC | -XX:MaxGCPauseMillis |
| Sub-millisecond pauses | ZGC | -XX:+UseZGC |
| Very large heap (TB+) | ZGC | -XX:+UseZGC |
| Single processor | Serial GC | -XX:+UseSerialGC |
Common flags for tuning garbage collection:
# G1 GC tuning
-XX:MaxGCPauseMillis=200 # Target maximum pause time
-XX:G1HeapRegionSize=16m # Region size (1-32 MB, power of 2)
-XX:InitiatingHeapOccupancyPercent=45 # When to start concurrent GC
# ZGC tuning
-XX:ZCollectionInterval=5 # Force GC every 5 seconds (for testing)
-XX:SoftMaxHeapSize=4g # Soft limit for heap size
# General GC logging
-Xlog:gc*:file=gc.log:time,level,tags # Modern unified logging
HotSpot manages memory across several distinct regions, each serving a specific
purpose in the execution of Java applications.
The heap is where all Java objects live. Its structure depends on the garbage
collector but typically includes generational divisions.
┌─────────────────────────────────────────────────────────────────────────────┐
│ JVM Memory Layout (Non-Heap + Heap) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Heap (-Xmx, -Xms) │ │
│ │ ┌─────────────────────────┐ ┌─────────────────────────────────┐ │ │
│ │ │ Young Generation │ │ Old Generation │ │ │
│ │ │ (-Xmn) │ │ │ │ │
│ │ │ ┌─────┬───────┬─────┐ │ │ Long-lived objects that │ │ │
│ │ │ │Eden │ S0 │ S1 │ │ │ survived multiple minor GCs │ │ │
│ │ │ └─────┴───────┴─────┘ │ │ │ │ │
│ │ └─────────────────────────┘ └─────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Metaspace (native memory) │ │
│ │ - Class metadata - Method bytecode │ │
│ │ - Constant pools - Annotations │ │
│ │ - Controlled by: -XX:MaxMetaspaceSize │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Code Cache │ │
│ │ - JIT compiled code - Native method stubs │ │
│ │ - Controlled by: -XX:ReservedCodeCacheSize │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Thread Stacks (per thread) │ │
│ │ - Local variables - Method call frames │ │
│ │ - Controlled by: -Xss (stack size per thread) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Native Memory (Direct Buffers) │ │
│ │ - NIO direct buffers - JNI allocations │ │
│ │ - Controlled by: -XX:MaxDirectMemorySize │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Flag | Purpose |
|---|---|
-Xms<size> |
Initial heap size |
-Xmx<size> |
Maximum heap size |
-Xmn<size> |
Young generation size |
-Xss<size> |
Thread stack size |
-XX:MaxMetaspaceSize |
Maximum metaspace size |
-XX:ReservedCodeCacheSize |
Maximum code cache size |
-XX:MaxDirectMemorySize |
Maximum direct buffer memory |
void main() {
Runtime runtime = Runtime.getRuntime();
// Heap memory
long maxMemory = runtime.maxMemory();
long totalMemory = runtime.totalMemory();
long freeMemory = runtime.freeMemory();
long usedMemory = totalMemory - freeMemory;
IO.println("=== Heap Memory ===");
IO.println("Max: " + formatBytes(maxMemory));
IO.println("Total: " + formatBytes(totalMemory));
IO.println("Used: " + formatBytes(usedMemory));
IO.println("Free: " + formatBytes(freeMemory));
// Processors
IO.println("\n=== System ===");
IO.println("Available processors: " + runtime.availableProcessors());
}
String formatBytes(long bytes) {
if (bytes < 1024) return bytes + " B";
if (bytes < 1024 * 1024) return (bytes / 1024) + " KB";
if (bytes < 1024 * 1024 * 1024) return (bytes / 1024 / 1024) + " MB";
return (bytes / 1024 / 1024 / 1024) + " GB";
}For detailed memory analysis, use the MemoryMXBean:
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryUsage;
void main() {
MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
MemoryUsage nonHeapUsage = memoryBean.getNonHeapMemoryUsage();
IO.println("=== Heap Memory ===");
printMemoryUsage(heapUsage);
IO.println("\n=== Non-Heap Memory ===");
printMemoryUsage(nonHeapUsage);
}
void printMemoryUsage(MemoryUsage usage) {
IO.println("Init: " + formatBytes(usage.getInit()));
IO.println("Used: " + formatBytes(usage.getUsed()));
IO.println("Committed: " + formatBytes(usage.getCommitted()));
IO.println("Max: " + formatBytes(usage.getMax()));
}
String formatBytes(long bytes) {
if (bytes < 0) return "undefined";
if (bytes < 1024) return bytes + " B";
if (bytes < 1024 * 1024) return (bytes / 1024) + " KB";
return (bytes / 1024 / 1024) + " MB";
}Performance tuning in HotSpot involves configuring JVM flags, monitoring
runtime behavior, and iteratively adjusting settings based on observations.
Effective tuning requires understanding the application's workload
characteristics and performance goals.
Heap Sizing:
-Xms4g # Initial heap size
-Xmx4g # Maximum heap size (set equal to -Xms for production)
-Xmn1g # Young generation size
GC Selection and Tuning:
-XX:+UseG1GC # Use G1 garbage collector
-XX:MaxGCPauseMillis=200 # Target maximum pause time
-XX:G1HeapRegionSize=16m # G1 region size
JIT Compilation:
-XX:+TieredCompilation # Enable tiered compilation (default)
-XX:TieredStopAtLevel=1 # Stop at C1 for faster startup
-XX:CompileThreshold=5000 # Lower threshold for faster warmup
Diagnostic Flags:
-XX:+HeapDumpOnOutOfMemoryError # Generate heap dump on OOM
-XX:HeapDumpPath=/path/dump.hprof # Heap dump location
-XX:+PrintFlagsFinal # Print all JVM flags
HotSpot includes several tools for performance analysis:
Java Flight Recorder (JFR): A low-overhead profiling tool built into
the JVM. Collects detailed runtime information with minimal performance impact.
# Enable JFR from command line
-XX:StartFlightRecording=duration=60s,filename=recording.jfr
# Enable JFR programmatically
jcmd <pid> JFR.start duration=60s filename=recording.jfr
JDK Mission Control: GUI application for analyzing JFR recordings.
Visualizes CPU usage, memory allocation, thread activity, and more.
VisualVM: Standalone profiler that can connect to running JVMs.
Provides CPU and memory profiling, thread analysis, and heap dumps.
Async Profiler: Third-party profiler with low overhead flame graph
generation. Excellent for production profiling.
import java.lang.management.ManagementFactory;
import java.lang.management.GarbageCollectorMXBean;
void main() {
IO.println("=== GC Statistics ===");
for (GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) {
IO.println("Collector: " + gc.getName());
IO.println(" Collection count: " + gc.getCollectionCount());
IO.println(" Collection time: " + gc.getCollectionTime() + " ms");
IO.println(" Memory pools: " + String.join(", ", gc.getMemoryPoolNames()));
IO.println();
}
}- Establish Baseline: Measure current performance with realistic workload
- Identify Bottleneck: Use profiling tools to find the constraint
- Make One Change: Modify a single setting at a time
- Measure Impact: Compare against baseline with same workload
- Iterate: Repeat until performance goals are met
| Symptom | Likely Cause | Tuning Action |
|---|---|---|
| Long GC pauses | Heap too small/large | Adjust -Xmx, switch GC |
| Frequent minor GCs | Young gen too small | Increase -Xmn |
| Full GCs | Old gen fragmentation | Use G1 or ZGC |
| Slow startup | JIT warmup | Lower CompileThreshold |
| High CPU in GC | Inefficient collection | Use Parallel or G1 GC |
HotSpot continues to evolve with new features that enhance performance,
scalability, and developer productivity.
Tiered compilation was introduced in Java 7 and became the default in Java 8.
It combines the benefits of fast C1 compilation with the optimizing power of
C2, providing both quick warmup and peak performance.
The compilation policy considers:
- Method invocation frequency
- Loop iteration counts
- Compilation queue length
- Available compiler threads
void main() {
IO.println("=== Compilation Tier Demo ===");
// Cold method - interpreted initially
int result = calculate(5);
IO.println("First call (interpreted): " + result);
// Warming up - triggers C1 compilation
for (int i = 0; i < 10000; i++) {
result = calculate(i % 100);
}
IO.println("After warmup (C1 compiled): " + result);
// Hot path - triggers C2 compilation
long total = 0;
for (int i = 0; i < 1000000; i++) {
total += calculate(i % 100);
}
IO.println("After hot loop (C2 compiled): total = " + total);
}
int calculate(int n) {
// Simple method that benefits from inlining and optimization
int sum = 0;
for (int i = 0; i <= n; i++) {
sum += i;
}
return sum;
}GraalVM is an advanced polyglot virtual machine that includes the Graal
compiler—a modern, highly optimizing JIT compiler written in Java. Graal can
be used as a drop-in replacement for C2 in HotSpot.
# Use Graal as the JIT compiler (requires GraalVM or JDK with JVMCI)
-XX:+UnlockExperimentalVMOptions
-XX:+EnableJVMCI
-XX:+UseJVMCICompiler
Graal provides several advantages:
- Written in Java for easier maintenance and extension
- Advanced optimizations for dynamic languages
- Native image compilation (ahead-of-time)
- Better support for partial escape analysis
Project Loom introduces virtual threads—lightweight threads managed by the
JVM rather than the operating system. Virtual threads enable scalable
concurrent applications with the simple thread-per-request model.
void main() throws InterruptedException {
IO.println("=== Virtual Threads Demo ===");
var startTime = System.currentTimeMillis();
// Create 100,000 virtual threads
var threads = new ArrayList<Thread>();
for (int i = 0; i < 100_000; i++) {
int taskId = i;
Thread vt = Thread.ofVirtual().start(() -> {
// Simulate blocking I/O
try {
Thread.sleep(100);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
});
threads.add(vt);
}
// Wait for all virtual threads to complete
for (Thread t : threads) {
t.join();
}
var duration = System.currentTimeMillis() - startTime;
IO.println("Completed 100,000 virtual threads in " + duration + " ms");
}Virtual threads are designed for high-throughput I/O-bound workloads. They
efficiently handle blocking operations by unmounting from platform threads,
allowing a small number of platform threads to execute millions of virtual
threads.
Project Valhalla introduces value objects—objects that are defined by their
data rather than their identity. Value objects can be flattened into arrays
and stored inline, eliminating object header overhead and improving cache
locality.
Value objects will eventually provide:
- Primitive-like performance for user-defined types
- Specialized generics (no more boxing for primitives)
- Better memory layout for data-intensive applications
Recent JDK versions include experimental support for compact object headers,
reducing the per-object memory overhead from 12-16 bytes to 8 bytes.
-XX:+UnlockExperimentalVMOptions
-XX:+UseCompactObjectHeaders
Understanding common mistakes helps developers avoid performance problems and
configuration issues.
One of the most common mistakes is excessive tuning based on assumptions rather
than measurements.
Problem: Setting too many JVM flags without understanding their interactions
can lead to worse performance or unexpected behavior.
Solution: Start with defaults, measure performance, and tune only based on
observed bottlenecks. Modern HotSpot versions have excellent ergonomics that
automatically adjust many settings.
void main() {
// Bad: Assuming more threads is always better
// -XX:ParallelGCThreads=64 (on an 8-core machine)
// Good: Let HotSpot choose based on available processors
int processors = Runtime.getRuntime().availableProcessors();
IO.println("Available processors: " + processors);
IO.println("HotSpot will configure GC threads appropriately");
// Only override defaults when measurements show improvement
}Problem: Focusing on JVM tuning when the real issue is in application code,
database queries, or external service calls.
Solution: Profile the application first. Often, optimizing algorithms or
reducing I/O has far greater impact than JVM tuning.
void main() {
// Example: Inefficient string concatenation in a loop
// No amount of JVM tuning will fix this O(n²) algorithm
String result = "";
for (int i = 0; i < 10000; i++) {
result += i + ","; // Creates new String each iteration
}
// Better: Use StringBuilder (O(n))
var sb = new StringBuilder();
for (int i = 0; i < 10000; i++) {
sb.append(i).append(",");
}
result = sb.toString();
IO.println("Length: " + result.length());
}Problem: Misinterpreting garbage collection logs leads to incorrect
tuning decisions.
Solution: Learn to read GC logs properly. Understand the difference between
minor GC, major GC, and full GC. Know what metrics indicate problems.
# Enable comprehensive GC logging
-Xlog:gc*:file=gc.log:time,level,tags:filecount=5,filesize=10m
Key GC log metrics to monitor:
| Metric | Healthy Range | Problem Indicator |
|---|---|---|
| Minor GC pause | < 50ms | > 200ms regularly |
| Full GC frequency | Rare | Multiple per minute |
| GC throughput | > 95% | < 90% |
| Heap after GC | < 70% of max | > 90% consistently |
Problem: Setting different values for initial and maximum heap causes
the JVM to resize the heap during operation, triggering garbage collections.
Solution: In production, set -Xms equal to -Xmx to avoid heap resizing.
# Production configuration - avoid heap resizing
-Xms4g -Xmx4g
# Development configuration - OK to let heap grow
-Xms256m -Xmx2g
Following these guidelines helps achieve optimal HotSpot performance while
maintaining system stability.
-
Establish Baseline Metrics: Before any tuning, capture normal operation
metrics including throughput, latency, GC frequency, and memory usage. -
Use Proper Tools: JFR for detailed profiling, GC logs for collection
analysis, and metrics systems for long-term trending. -
Monitor in Production: Performance characteristics differ between
development and production. Monitor continuously.
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
void main() {
ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
IO.println("=== Thread Monitoring ===");
IO.println("Thread count: " + threadBean.getThreadCount());
IO.println("Peak thread count: " + threadBean.getPeakThreadCount());
IO.println("Daemon thread count: " + threadBean.getDaemonThreadCount());
// Check for deadlocks
long[] deadlockedThreads = threadBean.findDeadlockedThreads();
if (deadlockedThreads == null) {
IO.println("No deadlocks detected");
} else {
IO.println("DEADLOCK DETECTED: " + deadlockedThreads.length + " threads");
}
}| Scenario | Recommendation |
|---|---|
| Standard web application | Accept G1 GC defaults |
| Sub-millisecond latency required | Switch to ZGC, tune carefully |
| Batch processing | Consider Parallel GC for throughput |
| Memory constrained environment | Tune heap sizes, consider Serial GC |
| Microservices with fast startup | Consider GraalVM native image |
-
Memory Configuration
-Xms4g -Xmx4g # Equal heap bounds -XX:MaxMetaspaceSize=256m # Bound metaspace -XX:+HeapDumpOnOutOfMemoryError # Capture OOM heap dump -
GC Configuration
-XX:+UseG1GC # Default collector -XX:MaxGCPauseMillis=200 # Pause time target -Xlog:gc*:file=gc.log:time # Enable GC logging -
Diagnostic Preparation
-XX:+UnlockDiagnosticVMOptions # Enable diagnostics -XX:+DebugNonSafepoints # Better profiling accuracy -XX:ErrorFile=/path/hs_err.log # Error log location
Performance tuning is only valid when tested under production-like conditions:
- Use production data volumes
- Simulate realistic request patterns
- Include warmup period before measurements
- Run tests for extended periods
- Test failure scenarios (high memory, high CPU)
void main() throws InterruptedException {
IO.println("=== Warmup and Measurement Demo ===");
// Warmup phase - trigger JIT compilation
IO.println("Warming up...");
for (int i = 0; i < 100_000; i++) {
doWork();
}
// Measurement phase
IO.println("Measuring...");
long startTime = System.nanoTime();
int iterations = 1_000_000;
for (int i = 0; i < iterations; i++) {
doWork();
}
long endTime = System.nanoTime();
double avgNanos = (endTime - startTime) / (double) iterations;
IO.println("Average time per operation: " + String.format("%.2f", avgNanos) + " ns");
IO.println("Operations per second: " + String.format("%.0f", 1_000_000_000.0 / avgNanos));
}
int doWork() {
// Simulated work
int result = 0;
for (int i = 0; i < 100; i++) {
result += i * i;
}
return result;
}The HotSpot JVM is a sophisticated runtime environment that has enabled Java
to achieve excellent performance across a wide range of applications. Its
adaptive optimization approach, combining interpretation with tiered JIT
compilation, provides both fast startup and peak throughput.
| Aspect | Summary |
|---|---|
| Architecture | Class loading, runtime data areas, execution engine |
| Execution Model | Interpreter + tiered JIT compilation (C1, C2) |
| JIT Compilation | Hot method detection, inlining, escape analysis |
| Garbage Collection | Generational collectors: G1 (default), ZGC, etc. |
| Memory Management | Heap, metaspace, code cache, thread stacks |
| Performance Tuning | Profile first, tune based on measurements |
| Advanced Features | Virtual threads, Graal integration, compact headers |
Understanding HotSpot internals enables developers to:
-
Write Efficient Code: Understanding JIT optimizations helps write code
that compiles well, avoiding patterns that prevent optimization. -
Configure for Production: Knowledge of GC algorithms and memory layout
enables appropriate configuration for specific workload requirements. -
Diagnose Performance Issues: Familiarity with profiling tools and
runtime behavior helps identify and resolve performance bottlenecks. -
Plan for Scalability: Understanding resource usage patterns helps design
systems that scale efficiently with load.
HotSpot continues to evolve with each JDK release, incorporating new features
and optimizations. Staying current with developments—such as virtual threads,
value objects, and improved garbage collectors—ensures that applications can
take advantage of the latest performance improvements.
The best approach to HotSpot performance is to start with defaults, measure
under realistic conditions, and tune only when measurements indicate specific
bottlenecks. Modern HotSpot ergonomics handle most situations well, and
over-tuning often causes more problems than it solves.