JVM tuning is the process of adjusting Java Virtual Machine parameters to
optimize application performance, memory usage, and response times. Profiling
involves analyzing runtime behavior to identify bottlenecks, memory leaks, and
inefficient code paths. Together, these practices form the foundation of Java
performance engineering.
Understanding JVM tuning and profiling is essential for building high-performance
Java applications. Modern applications face demanding requirements: low latency,
high throughput, efficient resource utilization, and predictable behavior under
load. Without proper tuning, applications may suffer from excessive garbage
collection pauses, memory exhaustion, or suboptimal thread utilization.
Tuning should always be guided by profiling data. Premature optimization based
on assumptions often leads to wasted effort or even degraded performance. The
correct approach is to measure first, identify actual bottlenecks, apply targeted
changes, and validate improvements through repeated measurement.
The JVM manages memory automatically, freeing developers from manual memory
allocation and deallocation. However, understanding how the JVM organizes and
manages memory is crucial for effective tuning.
The JVM divides memory into several distinct regions, each serving a specific
purpose:
| Region | Description |
|---|---|
| Heap | Primary storage for objects; managed by garbage collector |
| Stack | Per-thread memory for method calls and local variables |
| Metaspace | Class metadata, method definitions, constant pools |
| Code Cache | JIT-compiled native code |
| Native Memory | Direct buffers, JNI allocations, thread stacks |
┌─────────────────────────────────────────────────────────────────────────────┐
│ JVM Process Memory │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ HEAP │ │
│ │ ┌─────────────────────────┐ ┌─────────────────────────────────────┐ │ │
│ │ │ Young Generation │ │ Old Generation │ │ │
│ │ │ ┌───────┐ ┌──────────┐ │ │ │ │ │
│ │ │ │ Eden │ │ Survivor │ │ │ Long-lived objects survive │ │ │
│ │ │ │ │ │ S0 | S1 │ │ │ multiple GC cycles here │ │ │
│ │ │ │ New │ │ │ │ │ │ │ │
│ │ │ │objects│ │ Aging │ │ │ Collected by Major/Full GC │ │ │
│ │ │ │ here │ │ objects │ │ │ │ │ │
│ │ │ └───────┘ └──────────┘ │ │ │ │ │
│ │ └─────────────────────────┘ └─────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ NON-HEAP │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────────────┐ │ │
│ │ │ Metaspace │ │ Code Cache │ │ Direct Buffers │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ Class metadata │ │ JIT compiled │ │ Off-heap memory │ │ │
│ │ │ Method data │ │ native code │ │ for NIO operations │ │ │
│ │ │ Constant pool │ │ │ │ │ │ │
│ │ └─────────────────┘ └─────────────────┘ └───────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ THREAD STACKS │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Thread1 │ │ Thread2 │ │ Thread3 │ │ Thread4 │ │ ... │ │ │
│ │ │ Stack │ │ Stack │ │ Stack │ │ Stack │ │ │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The heap is the runtime data area from which memory for all class instances and
arrays is allocated. It is created when the JVM starts and may increase or
decrease in size during application runtime.
Young Generation: Where new objects are allocated. Most objects die young
and are collected quickly through Minor GC. It is further divided into Eden
space (initial allocation) and Survivor spaces (S0 and S1) for aging objects.
Old Generation: Objects that survive multiple Minor GC cycles are promoted
here. Collected less frequently through Major GC or Full GC operations.
Each thread has its own stack, created when the thread is created. The stack
stores frames containing local variables, partial results, and method invocation
data. Stack size is fixed per thread and configured via -Xss.
void main() {
// Each method call creates a new stack frame
int result = calculateFactorial(5);
println("Factorial: " + result);
}
int calculateFactorial(int n) {
// Local variables stored in stack frame
if (n <= 1) {
return 1;
}
// Recursive call creates new stack frame
return n * calculateFactorial(n - 1);
}Stack memory is automatically reclaimed when a method returns. Deep recursion
or excessively large local variables can cause StackOverflowError.
Metaspace replaced PermGen in Java 8. It stores class metadata, including:
- Class definitions and bytecode
- Method metadata and compiled method information
- Constant pool data
- Annotation processing information
Unlike PermGen, Metaspace uses native memory and can grow dynamically. It is
still subject to garbage collection when classes are unloaded.
Modern collectors like G1 organize the heap into regions for more efficient
collection:
┌─────────────────────────────────────────────────────────────────────────────┐
│ G1 Heap Region Layout │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐ │
│ │ E │ E │ S │ O │ O │ E │ H │ H │ O │ E │ S │ O │ │
│ ├─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┤ │
│ │ O │ E │ O │ O │ E │ E │ H │ O │ O │ E │ E │ O │ │
│ ├─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┤ │
│ │ O │ O │ E │ S │ O │ E │ O │ O │ E │ O │ O │ E │ │
│ └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘ │
│ │
│ E = Eden Region S = Survivor Region O = Old Region H = Humongous │
│ │
│ - Each region is typically 1-32 MB in size │
│ - Regions can change role dynamically │
│ - Humongous regions store objects larger than half a region │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
JVM flags control memory allocation, garbage collection behavior, and various
runtime optimizations. Flags are categorized into standard options (stable across
releases) and non-standard options (may change between JVM versions).
| Prefix | Description |
|---|---|
- |
Standard options (e.g., -version, -classpath) |
-X |
Non-standard options, relatively stable |
-XX: |
Advanced options, may change without notice |
-XX:+ |
Enable boolean option |
-XX:- |
Disable boolean option |
-XX:= |
Set numeric or string value |
The -Xmx flag sets the maximum heap size the JVM can allocate. When heap
usage approaches this limit, aggressive garbage collection occurs. If memory
cannot be freed, OutOfMemoryError is thrown.
# Set maximum heap to 4 gigabytes
java -Xmx4g MyApplication
# Set maximum heap to 2048 megabytes
java -Xmx2048m MyApplication
# Set maximum heap to specific kilobytes
java -Xmx2097152k MyApplicationBest practices for -Xmx:
- Set based on available physical memory (typically 50-75% of total RAM)
- Account for non-heap memory (Metaspace, native memory, thread stacks)
- Monitor actual usage before increasing; larger heaps mean longer GC pauses
- Container deployments should consider memory limits carefully
The -Xms flag sets the initial heap size. The JVM starts with this amount
and grows toward -Xmx as needed. Setting -Xms equal to -Xmx prevents
heap resizing during runtime.
# Start with 2GB, can grow to 4GB
java -Xms2g -Xmx4g MyApplication
# Fixed heap size of 4GB (recommended for production)
java -Xms4g -Xmx4g MyApplicationBest practices for -Xms:
- Set equal to
-Xmxfor production systems to avoid resize overhead - Lower values acceptable for development to conserve resources
- Monitor startup time; larger initial heap may slow JVM initialization
void main() {
var runtime = Runtime.getRuntime();
long maxMemory = runtime.maxMemory();
long totalMemory = runtime.totalMemory();
long freeMemory = runtime.freeMemory();
long usedMemory = totalMemory - freeMemory;
println("JVM Memory Information:");
println("─".repeat(50));
println("Max Memory (Xmx): " + formatBytes(maxMemory));
println("Total Memory: " + formatBytes(totalMemory));
println("Used Memory: " + formatBytes(usedMemory));
println("Free Memory: " + formatBytes(freeMemory));
println("Available Processors: " + runtime.availableProcessors());
}
String formatBytes(long bytes) {
if (bytes < 1024) return bytes + " B";
if (bytes < 1024 * 1024) return (bytes / 1024) + " KB";
if (bytes < 1024 * 1024 * 1024) return (bytes / (1024 * 1024)) + " MB";
return String.format("%.2f GB", bytes / (1024.0 * 1024 * 1024));
}Running this program shows current JVM memory configuration. Use it to verify
that your flags are applied correctly.
The JVM includes several garbage collectors, each optimized for different
workloads. Selecting the appropriate collector significantly impacts performance.
| Collector | Best For | Pause Times | Throughput |
|---|---|---|---|
| G1GC | General purpose, balanced | Medium (10-200ms) | High |
| ZGC | Low latency, large heaps | Very low (<1-10ms) | High |
| Shenandoah | Low latency, any heap size | Very low (<10ms) | High |
| Parallel GC | Batch processing, throughput | Higher | Highest |
| Serial GC | Small heaps, single CPU | Variable | Low |
G1 (Garbage-First) is the default collector since Java 9. It divides the heap
into regions and prioritizes collecting regions with the most garbage.
# Enable G1GC (default in Java 9+)
java -XX:+UseG1GC MyApplication
# Set target pause time (milliseconds)
java -XX:+UseG1GC -XX:MaxGCPauseMillis=200 MyApplication
# Set heap region size (1MB-32MB, must be power of 2)
java -XX:+UseG1GC -XX:G1HeapRegionSize=16m MyApplication
# Tune concurrent GC threads
java -XX:+UseG1GC -XX:ConcGCThreads=4 MyApplicationKey G1 tuning options:
| Flag | Description |
|---|---|
-XX:MaxGCPauseMillis=200 |
Target maximum GC pause time |
-XX:G1HeapRegionSize=16m |
Size of G1 heap regions |
-XX:G1NewSizePercent=5 |
Minimum young generation percentage |
-XX:G1MaxNewSizePercent=60 |
Maximum young generation percentage |
-XX:InitiatingHeapOccupancyPercent=45 |
Heap occupancy to trigger marking |
ZGC is designed for applications requiring extremely low latency. It performs
most work concurrently, keeping pause times consistently under 10ms regardless
of heap size.
# Enable ZGC
java -XX:+UseZGC MyApplication
# ZGC with generational mode (Java 21+)
java -XX:+UseZGC -XX:+ZGenerational MyApplication
# Set concurrent GC threads
java -XX:+UseZGC -XX:ConcGCThreads=4 MyApplicationZGC characteristics:
- Pause times typically under 1ms
- Scales to multi-terabyte heaps
- Slightly lower throughput than G1
- Best for latency-sensitive applications
Shenandoah is another low-pause collector, performing concurrent compaction
to minimize stop-the-world pauses.
# Enable Shenandoah
java -XX:+UseShenandoahGC MyApplication
# Set heuristics mode
java -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive MyApplication
# Compact mode for smaller heaps
java -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=compact MyApplicationParallel GC maximizes throughput by using multiple threads for collection.
It is ideal for batch processing where pause times are acceptable.
# Enable Parallel GC
java -XX:+UseParallelGC MyApplication
# Set number of GC threads
java -XX:+UseParallelGC -XX:ParallelGCThreads=8 MyApplication
# Enable adaptive sizing
java -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy MyApplication# Set maximum Metaspace size
java -XX:MaxMetaspaceSize=256m MyApplication
# Set initial Metaspace size
java -XX:MetaspaceSize=128m MyApplication
# Limit class metadata allocation
java -XX:CompressedClassSpaceSize=128m MyApplicationModern GC logging (Java 9+) uses the Unified Logging Framework:
# Basic GC logging
java -Xlog:gc MyApplication
# Detailed GC logging with timestamps
java -Xlog:gc*:file=gc.log:time,uptime,level,tags MyApplication
# Include heap information
java -Xlog:gc+heap=debug:file=gc.log MyApplication
# Log GC pauses only
java -Xlog:gc+pause=info MyApplicationLegacy GC logging (Java 8):
# Print GC details (Java 8)
java -XX:+PrintGCDetails -XX:+PrintGCDateStamps MyApplication
# Write to file
java -XX:+PrintGCDetails -Xloggc:gc.log MyApplication# Generate heap dump on OutOfMemoryError
java -XX:+HeapDumpOnOutOfMemoryError MyApplication
# Specify heap dump location
java -XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/var/dumps/heap.hprof MyApplication
# Run script on OutOfMemoryError
java -XX:+HeapDumpOnOutOfMemoryError \
-XX:OnOutOfMemoryError="./notify-admin.sh %p" MyApplication# Set thread stack size to 512KB
java -Xss512k MyApplication
# Larger stacks for deep recursion
java -Xss2m MyApplication| Flag | Description |
|---|---|
-Xmx<size> |
Maximum heap size |
-Xms<size> |
Initial heap size |
-Xss<size> |
Thread stack size |
-XX:+UseG1GC |
Enable G1 garbage collector |
-XX:+UseZGC |
Enable Z garbage collector |
-XX:+UseShenandoahGC |
Enable Shenandoah collector |
-XX:+UseParallelGC |
Enable Parallel collector |
-XX:MaxGCPauseMillis=<ms> |
Target maximum GC pause time |
-XX:MaxMetaspaceSize=<size> |
Maximum Metaspace size |
-XX:+HeapDumpOnOutOfMemoryError |
Dump heap on OOM |
-XX:HeapDumpPath=<path> |
Heap dump file location |
-Xlog:gc* |
Enable GC logging |
-XX:+PrintFlagsFinal |
Print all flag values |
-XX:+UnlockDiagnosticVMOptions |
Enable diagnostic options |
-XX:+UnlockExperimentalVMOptions |
Enable experimental options |
Profiling tools provide visibility into application behavior, enabling data-
driven optimization decisions. Java includes several powerful profiling tools
that integrate directly with the JVM.
Java Flight Recorder is a low-overhead profiling framework built into the JVM.
It continuously collects diagnostic data with minimal performance impact,
making it suitable for production environments.
# Start recording from command line
java -XX:StartFlightRecording=duration=60s,filename=recording.jfr \
MyApplication
# Start with continuous recording
java -XX:StartFlightRecording=disk=true,maxage=1h,maxsize=500m \
MyApplication
# Start recording without disk writes
java -XX:StartFlightRecording=memory MyApplication| Setting | Description |
|---|---|
duration |
Recording length (e.g., 60s, 5m, 1h) |
filename |
Output file path |
disk |
Enable disk repository (true/false) |
maxage |
Maximum age of data to keep |
maxsize |
Maximum disk space for recording |
settings |
Predefined profile (default, profile) |
import jdk.jfr.Configuration;
import jdk.jfr.Recording;
import java.nio.file.Path;
void main() throws Exception {
// Load a configuration
var config = Configuration.getConfiguration("profile");
// Create and start recording
var recording = new Recording(config);
recording.setName("MyRecording");
recording.setMaxAge(java.time.Duration.ofMinutes(10));
recording.start();
println("Recording started: " + recording.getName());
println("Recording ID: " + recording.getId());
// Run your application workload here
simulateWorkload();
// Stop and save recording
recording.stop();
recording.dump(Path.of("programmatic-recording.jfr"));
println("Recording saved to programmatic-recording.jfr");
recording.close();
}
void simulateWorkload() throws InterruptedException {
var list = new ArrayList<String>();
for (int i = 0; i < 100000; i++) {
list.add("Item " + i);
if (i % 10000 == 0) {
Thread.sleep(100);
println("Processed " + i + " items");
}
}
}This example creates a JFR recording programmatically, which is useful for
capturing specific application scenarios or integrating profiling into
automated testing.
import jdk.jfr.Category;
import jdk.jfr.Description;
import jdk.jfr.Event;
import jdk.jfr.Label;
import jdk.jfr.Name;
@Name("com.example.OrderProcessed")
@Label("Order Processed")
@Category("Business Events")
@Description("Fired when an order is successfully processed")
class OrderProcessedEvent extends Event {
@Label("Order ID")
String orderId;
@Label("Customer")
String customer;
@Label("Total Amount")
double amount;
@Label("Processing Time (ms)")
long processingTimeMs;
}
void main() {
// Process several orders
for (int i = 1; i <= 5; i++) {
processOrder("ORD-" + i, "Customer-" + i, i * 100.0);
}
println("Order processing complete");
}
void processOrder(String orderId, String customer, double amount) {
var event = new OrderProcessedEvent();
event.orderId = orderId;
event.customer = customer;
event.amount = amount;
long startTime = System.currentTimeMillis();
// Simulate order processing
try {
Thread.sleep((long) (Math.random() * 100 + 50));
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
event.processingTimeMs = System.currentTimeMillis() - startTime;
// Commit the event if enabled
event.commit();
println("Processed order: " + orderId +
" in " + event.processingTimeMs + "ms");
}Custom events enable tracking application-specific metrics alongside standard
JVM events. They are invaluable for correlating business logic with system
behavior.
Use jfr command-line tool to extract data from recordings:
# Print recording summary
jfr summary recording.jfr
# Print specific event types
jfr print --events jdk.GCPhasePause recording.jfr
# Print CPU sampling data
jfr print --events jdk.ExecutionSample recording.jfr
# Export to JSON format
jfr print --json recording.jfr > recording.json
# List all event types
jfr metadata recording.jfrVisualVM is a graphical tool for monitoring, troubleshooting, and profiling
Java applications. It combines several command-line JDK tools into an intuitive
interface.
| Feature | Description |
|---|---|
| Monitor | Real-time CPU, memory, classes, threads display |
| Threads | Thread state visualization and deadlock detection |
| Sampler | CPU and memory sampling with low overhead |
| Profiler | Detailed method-level CPU and memory profiling |
| Heap Dump | Capture and analyze heap snapshots |
| Thread Dump | Capture and view thread stack traces |
VisualVM is available as a standalone download or bundled with GraalVM. For
remote profiling, start the application with JMX enabled:
# Enable JMX for remote connection
java -Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.port=9090 \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
MyApplicationMemory leaks occur when objects are no longer needed but remain reachable,
preventing garbage collection. VisualVM helps identify leaks through:
- Monitor tab: Watch heap usage grow over time without stabilizing
- Sampler tab: Identify classes consuming the most memory
- Heap Dump: Analyze object references to find retention paths
import java.util.HashMap;
void main() throws InterruptedException {
// Simulated memory leak: map grows indefinitely
var leakyMap = new HashMap<String, byte[]>();
int counter = 0;
while (counter < 1000) {
// Each entry holds 1MB, never removed
leakyMap.put("key-" + counter, new byte[1024 * 1024]);
counter++;
if (counter % 100 == 0) {
println("Entries: " + counter +
", Memory: " + formatBytes(
Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory()));
}
Thread.sleep(100);
}
}
String formatBytes(long bytes) {
return String.format("%.2f MB", bytes / (1024.0 * 1024));
}When profiling this application with VisualVM:
- Connect to the running application
- Open the Monitor tab to observe heap growth
- Take a heap dump when memory is high
- Analyze retained size by class to find the byte[] arrays
- Trace references to discover the HashMap holding them
JDK Mission Control (JMC) is an advanced suite of tools for analyzing JFR
recordings. It provides deep insights into application behavior through
specialized views and automated analysis.
| View | Purpose |
|---|---|
| Outline | High-level recording summary |
| Method Profiling | Hot methods and call stacks |
| Memory | Allocation patterns and object statistics |
| Lock Instances | Contention analysis and lock profiling |
| Garbage Collections | GC events, durations, and pause analysis |
| Event Browser | Raw access to all recorded events |
JMC includes an automated analysis engine that examines recordings and
provides actionable recommendations:
- Memory pressure warnings
- Thread contention hotspots
- I/O bottleneck identification
- GC inefficiency detection
- Code execution anomalies
# Detailed recording for JMC analysis
java -XX:StartFlightRecording=settings=profile,\
duration=5m,\
filename=analysis.jfr \
MyApplication
# Continuous recording for production
java -XX:StartFlightRecording=settings=default,\
disk=true,\
maxage=24h,\
dumponexit=true,\
dumponexitpath=exit-recording.jfr \
MyApplication| Tool | Best For | Overhead | Production-Safe |
|---|---|---|---|
| JFR | Continuous monitoring | Very Low | Yes |
| VisualVM | Interactive debugging | Medium | Development |
| JMC | Deep analysis of JFR files | N/A | Analysis tool |
| async-profiler | CPU/allocation profiling | Low | Yes |
| jstack | Quick thread dump | Minimal | Yes |
| jmap | Heap statistics/dump | High | With caution |
Effective performance tuning follows a systematic, iterative process. Avoid
random flag changes; instead, use data to guide decisions.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Performance Tuning Workflow │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ 1. │ │ 2. │ │ 3. │ │ 4. │ │
│ │ MEASURE │────►│ ANALYZE │────►│ TUNE │────►│ VERIFY │────┐ │
│ │ │ │ │ │ │ │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ ▲ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ Iterate │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Before tuning, establish baseline metrics:
# Enable GC logging
java -Xlog:gc*:file=baseline-gc.log:time,level,tags \
-XX:+HeapDumpOnOutOfMemoryError \
MyApplication
# Start JFR recording
java -XX:StartFlightRecording=duration=10m,filename=baseline.jfr \
MyApplicationKey metrics to capture:
- Response times (p50, p95, p99 percentiles)
- Throughput (requests/second)
- GC pause times and frequency
- Heap usage patterns
- CPU utilization
- Thread counts and states
Use profiling tools to identify bottlenecks:
# Analyze GC patterns
jfr print --events jdk.GCPhasePause,jdk.GarbageCollection baseline.jfr
# Find hot methods
jfr print --events jdk.ExecutionSample baseline.jfr | \
sort | uniq -c | sort -rn | head -20
# Check memory allocation
jfr print --events jdk.ObjectAllocationInNewTLAB baseline.jfrCommon findings and solutions:
| Finding | Likely Cause | Solution |
|---|---|---|
| Frequent Full GC | Heap too small or leak | Increase Xmx, fix leak |
| High GC pause times | Large old generation | Tune GC, consider ZGC |
| Many young GC collections | High allocation rate | Reduce object creation |
| Thread contention | Lock bottlenecks | Reduce synchronization |
| High CPU in specific methods | Algorithmic inefficiency | Optimize code |
Apply targeted changes based on analysis:
# If heap is too small
java -Xms4g -Xmx4g MyApplication
# If GC pauses are too long
java -XX:+UseZGC MyApplication
# If throughput is priority
java -XX:+UseParallelGC -XX:ParallelGCThreads=8 MyApplication
# Combined tuning example
java -Xms8g -Xmx8g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=100 \
-XX:+HeapDumpOnOutOfMemoryError \
-Xlog:gc*:file=tuned-gc.log:time,level,tags \
MyApplicationCompare tuned metrics against baseline:
# Record with same duration as baseline
java -XX:StartFlightRecording=duration=10m,filename=tuned.jfr \
[tuning flags] \
MyApplicationVerification criteria:
- Did target metric improve?
- Did other metrics degrade?
- Is improvement consistent under load?
- Does change behave as expected?
This example demonstrates a complete tuning cycle:
Scenario: Application has slow response times during peak load.
Step 1: Measure
# Start with JFR recording and GC logging
java -XX:StartFlightRecording=settings=profile,duration=30m,\
filename=perf-issue.jfr \
-Xlog:gc*:file=perf-gc.log:time,level,tags \
-Xms2g -Xmx2g \
MyApplicationStep 2: Analyze in JMC
Open perf-issue.jfr in JDK Mission Control:
- Check Automated Analysis for warnings
- Review GC tab: many Full GCs taking >500ms
- Memory tab: heap consistently near 100%
- Conclusion: Heap is too small
Step 3: Tune
# Increase heap and set initial = max
java -Xms4g -Xmx4g \
-XX:StartFlightRecording=settings=profile,duration=30m,\
filename=after-tune.jfr \
-Xlog:gc*:file=after-tune-gc.log:time,level,tags \
MyApplicationStep 4: Verify
Compare recordings in JMC:
- Full GC count: 47 → 3
- Average GC pause: 423ms → 45ms
- p99 response time: 2.3s → 0.8s
The change achieved the goal. Document the configuration and monitor
production metrics.
Following these guidelines leads to more effective and maintainable
performance tuning.
Modern JVMs are well-tuned for general workloads. The HotSpot JVM uses
ergonomics to automatically configure many settings based on available
resources.
# Let JVM choose GC and heap size
java MyApplication
# View auto-selected options
java -XX:+PrintFlagsFinal -version 2>&1 | grep -E "UseG1GC|MaxHeapSize"Only override defaults when profiling reveals specific issues.
Tuning in isolation often produces misleading results. Performance
characteristics change significantly between test and production:
| Factor | Test Environment | Production |
|---|---|---|
| Data volume | Sample data | Full dataset |
| Concurrent users | 1-10 | 100-10000+ |
| Request patterns | Uniform | Spiky, varied |
| Duration | Minutes | Days/weeks |
| External services | Mocked | Real latency |
Use load testing tools (JMeter, Gatling, k6) to simulate production
conditions during tuning.
Establish continuous monitoring in production:
# Production GC logging configuration
java -Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=10,filesize=100m \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/var/dumps/ \
MyApplicationKey monitoring metrics:
- GC pause time distribution (especially p99)
- GC frequency and type (Minor vs Full)
- Heap occupancy over time
- Promotion rate (objects moving to old generation)
- Allocation rate
Never tune blindly. Each change should address a specific, measured issue:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Flag Change Decision Process │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Question │ Action │
│ ───────────────────────────────── │ ────────────────────────────────── │
│ Is there a performance problem? │ No → Don't tune │
│ Have you profiled? │ No → Profile first │
│ Did profiling identify root cause? │ No → Investigate more │
│ Will this flag address the cause? │ No → Find correct solution │
│ Have you tested the change? │ No → Test in staging │
│ Does it improve target metric? │ No → Revert and try alternative │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Maintain records of tuning decisions:
# production-jvm-config.sh
# JVM Configuration for Order Processing Service
# Last updated: 2024-01-15
# Owner: Performance Team
# Heap Configuration
# Reason: Production load analysis showed peak usage of 3.2GB
HEAP_OPTS="-Xms4g -Xmx4g"
# GC Configuration
# Reason: p99 latency requirement of <100ms
GC_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=80"
# Monitoring
# Reason: Required for production troubleshooting
MONITOR_OPTS="-Xlog:gc*:file=/var/log/gc.log:time,level,tags"
MONITOR_OPTS="$MONITOR_OPTS -XX:+HeapDumpOnOutOfMemoryError"
MONITOR_OPTS="$MONITOR_OPTS -XX:HeapDumpPath=/var/dumps"
java $HEAP_OPTS $GC_OPTS $MONITOR_OPTS -jar app.jarAvoid these frequent mistakes when tuning JVM performance.
Too Small: Causes frequent garbage collection, OutOfMemoryError, and
degraded performance. The JVM spends more time collecting than executing
application code.
Too Large: Results in longer GC pauses, slower startup, and wasted
resources. Very large heaps may exceed physical memory, causing swapping.
# Incorrect: Heap larger than physical RAM
java -Xmx32g MyApplication # On machine with 16GB RAM
# Incorrect: Heap too small for workload
java -Xmx256m MyApplication # With 2GB working set
# Correct: Sized based on analysis, accounting for non-heap
java -Xmx12g MyApplication # On 16GB machine (leaves room for OS, Metaspace)Common log interpretation errors:
| Misinterpretation | Reality |
|---|---|
| All GC is bad | Minor GC is normal and efficient |
| Longer pause = bigger heap | Pause depends on live objects, not heap |
| More GC threads always helps | Too many causes contention |
| G1 needs extensive tuning | Defaults work well for most workloads |
Example log entry:
[2024-01-15T10:30:00.123+0000] GC(42) Pause Young (Normal)
(G1 Evacuation Pause) 2048M->1024M(4096M) 45.123ms
Interpretation:
- Minor GC (#42), copying live objects from young generation
- Heap reduced from 2GB to 1GB (4GB max)
- Pause of 45ms - acceptable for G1
JVM tuning cannot fix algorithmic problems:
void main() {
// No amount of JVM tuning helps this inefficient code
var list = new ArrayList<String>();
for (int i = 0; i < 100000; i++) {
list.add("item-" + i);
}
// O(n) lookup instead of O(1) with Set
for (int i = 0; i < 100000; i++) {
boolean found = list.contains("item-" + i); // Slow!
}
}Profiling with JFR or VisualVM would show hot spots in ArrayList.contains.
The solution is changing data structure, not GC tuning.
The "cargo cult" approach of copying configurations without understanding:
# Copied from blog post without context
java -Xms32g -Xmx32g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=50 \
-XX:G1HeapRegionSize=32m \
-XX:G1NewSizePercent=30 \
-XX:G1MaxNewSizePercent=50 \
-XX:InitiatingHeapOccupancyPercent=35 \
-XX:G1MixedGCLiveThresholdPercent=85 \
-XX:ConcGCThreads=8 \
-XX:ParallelGCThreads=16 \
MyApplicationProblems with this approach:
- Heap size may be wrong for your hardware
- Pause time target may be unrealistic
- Thread counts may not match CPU count
- Some flags may conflict or be deprecated
Development and production environments differ significantly:
| Aspect | Development | Production |
|---|---|---|
| Heap size | 512MB | 4GB+ |
| CPU cores | 4 | 16+ |
| OS memory | Shared | Dedicated |
| JIT optimization | Minimal | Fully warmed |
| Class loading | Dynamic | Stable |
Always validate tuning changes in an environment that mirrors production.
These examples provide starting configurations for common deployment scenarios.
Adjust based on your specific requirements and profiling results.
Web applications typically prioritize low latency and consistent response times.
#!/bin/bash
# Web Application JVM Configuration
# Characteristics: Many concurrent requests, session state, moderate heap
# Heap: 4GB for medium traffic site
HEAP="-Xms4g -Xmx4g"
# G1GC with low pause target
GC="-XX:+UseG1GC"
GC="$GC -XX:MaxGCPauseMillis=100"
GC="$GC -XX:+ParallelRefProcEnabled"
# Large code cache for compiled JSPs/servlets
CODE_CACHE="-XX:ReservedCodeCacheSize=256m"
# Thread stack (default is usually fine)
STACK="-Xss512k"
# Metaspace for web frameworks with many classes
META="-XX:MaxMetaspaceSize=256m"
# Production monitoring
MONITOR="-Xlog:gc*:file=/var/log/app/gc.log:time,uptime:filecount=5,filesize=50m"
MONITOR="$MONITOR -XX:+HeapDumpOnOutOfMemoryError"
MONITOR="$MONITOR -XX:HeapDumpPath=/var/dumps/"
# JFR for production profiling
JFR="-XX:StartFlightRecording=disk=true,maxage=12h,maxsize=500m"
JFR="$JFR -XX:FlightRecorderOptions=repository=/var/jfr/"
java $HEAP $GC $CODE_CACHE $STACK $META $MONITOR $JFR \
-jar web-application.jarRationale:
- Fixed heap prevents resize pauses during traffic spikes
- G1GC provides predictable pause times
- Generous Metaspace for framework class loading
- Continuous JFR for diagnosing production issues
Microservices need fast startup, small footprint, and container awareness.
#!/bin/bash
# Microservice JVM Configuration
# Characteristics: Small heap, fast startup, container deployment
# Smaller heap for microservice (adjust based on service)
HEAP="-Xms512m -Xmx512m"
# ZGC for consistent low latency (or G1 for smaller heaps)
GC="-XX:+UseZGC"
# Enable container support (default in modern JVMs)
CONTAINER="-XX:+UseContainerSupport"
CONTAINER="$CONTAINER -XX:MaxRAMPercentage=75.0"
# Optimize for startup time
STARTUP="-XX:TieredStopAtLevel=1"
STARTUP="$STARTUP -XX:+UseSerialGC" # Use for very small services
# Alternative: CDS for faster startup
# Generate: java -Xshare:dump
CDS="-Xshare:on"
# Minimal monitoring in container (use sidecar for full logging)
MONITOR="-Xlog:gc:stderr:time"
MONITOR="$MONITOR -XX:+HeapDumpOnOutOfMemoryError"
MONITOR="$MONITOR -XX:HeapDumpPath=/tmp/"
# For production microservice with moderate load
java $HEAP $GC $CONTAINER $MONITOR \
-jar microservice.jar
# For startup-optimized (dev/test)
# java $HEAP $STARTUP $CDS $MONITOR -jar microservice.jarRationale:
- Container awareness respects cgroup limits
- MaxRAMPercentage for automatic sizing in containers
- ZGC provides low latency without tuning
- Startup options available for faster cold starts
Batch jobs prioritize throughput over latency, processing maximum data per
unit time.
#!/bin/bash
# Batch Processing JVM Configuration
# Characteristics: Large heap, throughput priority, long-running
# Large heap for batch data processing
HEAP="-Xms16g -Xmx16g"
# Parallel GC maximizes throughput
GC="-XX:+UseParallelGC"
GC="$GC -XX:ParallelGCThreads=12"
GC="$GC -XX:+UseNUMA"
# Alternative: G1 if pauses become too long
# GC="-XX:+UseG1GC -XX:MaxGCPauseMillis=500"
# Large young generation for allocation-heavy workloads
YOUNG="-XX:NewRatio=2"
# Aggressive inlining for hot paths
JIT="-XX:MaxInlineSize=100"
JIT="$JIT -XX:FreqInlineSize=200"
# Monitoring for batch analysis
MONITOR="-Xlog:gc*:file=/var/log/batch/gc.log:time,uptime"
MONITOR="$MONITOR -XX:+HeapDumpOnOutOfMemoryError"
# JFR for post-run analysis
JFR="-XX:StartFlightRecording=settings=profile,dumponexit=true"
JFR="$JFR -XX:FlightRecorderOptions=repository=/var/jfr/"
java $HEAP $GC $YOUNG $JIT $MONITOR $JFR \
-jar batch-processor.jarRationale:
- Parallel GC provides highest throughput
- Large heap accommodates batch data sets
- NUMA support for multi-socket systems
- Aggressive JIT settings for long-running hot paths
| Scenario | Heap | GC | Key Priorities |
|---|---|---|---|
| Web Application | 4-8GB | G1GC | Low latency, consistent |
| Microservice | 256M-1GB | ZGC/G1GC | Fast startup, small footprint |
| Batch Processing | 8-32GB | Parallel GC | Throughput, large dataset |
| Low-Latency Trading | 4-16GB | ZGC | Sub-ms pauses |
| Desktop/GUI | 256M-2GB | G1GC | Responsive, modest heap |
import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
void main() {
println("=== JVM Configuration Analysis ===");
println();
// Runtime info
var runtime = Runtime.getRuntime();
println("Available Processors: " + runtime.availableProcessors());
println();
// Memory configuration
println("Memory Configuration:");
println(" Max Memory (-Xmx): " + formatBytes(runtime.maxMemory()));
println(" Total Memory (current):" + formatBytes(runtime.totalMemory()));
println(" Free Memory: " + formatBytes(runtime.freeMemory()));
println(" Used Memory: " +
formatBytes(runtime.totalMemory() - runtime.freeMemory()));
println();
// Garbage collector info
println("Garbage Collectors:");
for (GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) {
println(" " + gc.getName());
println(" Collection Count: " + gc.getCollectionCount());
println(" Collection Time: " + gc.getCollectionTime() + "ms");
}
println();
// Memory pools
MemoryMXBean memory = ManagementFactory.getMemoryMXBean();
println("Heap Memory Usage:");
var heapUsage = memory.getHeapMemoryUsage();
println(" Init: " + formatBytes(heapUsage.getInit()));
println(" Used: " + formatBytes(heapUsage.getUsed()));
println(" Committed: " + formatBytes(heapUsage.getCommitted()));
println(" Max: " + formatBytes(heapUsage.getMax()));
println();
println("Non-Heap Memory Usage:");
var nonHeapUsage = memory.getNonHeapMemoryUsage();
println(" Init: " + formatBytes(nonHeapUsage.getInit()));
println(" Used: " + formatBytes(nonHeapUsage.getUsed()));
println(" Committed: " + formatBytes(nonHeapUsage.getCommitted()));
println();
// JVM arguments
println("JVM Arguments:");
var runtimeMXBean = ManagementFactory.getRuntimeMXBean();
for (String arg : runtimeMXBean.getInputArguments()) {
println(" " + arg);
}
}
String formatBytes(long bytes) {
if (bytes < 0) return "N/A";
if (bytes < 1024) return bytes + " B";
if (bytes < 1024 * 1024) return (bytes / 1024) + " KB";
if (bytes < 1024 * 1024 * 1024) return (bytes / (1024 * 1024)) + " MB";
return String.format("%.2f GB", bytes / (1024.0 * 1024 * 1024));
}This utility displays current JVM configuration. Run it with different flags
to verify settings are applied correctly.
JVM tuning and profiling are essential skills for building high-performance
Java applications. Effective performance engineering requires a disciplined,
data-driven approach rather than ad-hoc flag adjustments.
| Principle | Guidance |
|---|---|
| Measure First | Always profile before tuning |
| Start Simple | Use JVM defaults until proven insufficient |
| Make One Change at a Time | Isolate variables to understand impact |
| Validate Improvements | Confirm changes achieve desired outcomes |
| Test Realistically | Use production-like conditions |
| Document Decisions | Record rationale for future reference |
| Monitor Continuously | Track metrics to detect regressions |
The goal of tuning is not maximum performance at any cost. Instead, aim for
the optimal balance:
Performance: Meet response time and throughput requirements with
appropriate resource utilization. Avoid over-optimization that yields
diminishing returns.
Stability: Ensure consistent behavior under varying loads. A well-tuned
application handles traffic spikes gracefully without crashes or severe
degradation.
Maintainability: Keep configurations understandable and documented. Complex
tuning that nobody understands becomes a liability during incidents.
JVM performance engineering evolves with each Java release. Stay current by:
- Reading JVM release notes for new GC features and deprecations
- Following OpenJDK mailing lists and JEPs
- Experimenting with new collectors (ZGC, Shenandoah) as they mature
- Practicing with profiling tools on sample applications
- Reviewing production metrics to understand real-world behavior
With the foundation provided in this document and continued practice, you can
effectively diagnose performance issues, apply targeted improvements, and
build Java applications that meet demanding performance requirements.