JVM Tuning and Profiling

JVM tuning is the process of adjusting Java Virtual Machine parameters to
optimize application performance, memory usage, and response times. Profiling
involves analyzing runtime behavior to identify bottlenecks, memory leaks, and
inefficient code paths. Together, these practices form the foundation of Java
performance engineering.

Understanding JVM tuning and profiling is essential for building high-performance
Java applications. Modern applications face demanding requirements: low latency,
high throughput, efficient resource utilization, and predictable behavior under
load. Without proper tuning, applications may suffer from excessive garbage
collection pauses, memory exhaustion, or suboptimal thread utilization.

Tuning should always be guided by profiling data. Premature optimization based
on assumptions often leads to wasted effort or even degraded performance. The
correct approach is to measure first, identify actual bottlenecks, apply targeted
changes, and validate improvements through repeated measurement.

JVM Memory Management Basics

The JVM manages memory automatically, freeing developers from manual memory
allocation and deallocation. However, understanding how the JVM organizes and
manages memory is crucial for effective tuning.

Memory Regions Overview

The JVM divides memory into several distinct regions, each serving a specific
purpose:

Region	Description
Heap	Primary storage for objects; managed by garbage collector
Stack	Per-thread memory for method calls and local variables
Metaspace	Class metadata, method definitions, constant pools
Code Cache	JIT-compiled native code
Native Memory	Direct buffers, JNI allocations, thread stacks

JVM Memory Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                           JVM Process Memory                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │                              HEAP                                     │  │
│  │  ┌─────────────────────────┐  ┌─────────────────────────────────────┐ │  │
│  │  │      Young Generation   │  │          Old Generation             │ │  │
│  │  │  ┌───────┐ ┌──────────┐ │  │                                     │ │  │
│  │  │  │ Eden  │ │ Survivor │ │  │    Long-lived objects survive      │ │  │
│  │  │  │       │ │  S0 | S1 │ │  │    multiple GC cycles here         │ │  │
│  │  │  │ New   │ │          │ │  │                                     │ │  │
│  │  │  │objects│ │ Aging    │ │  │    Collected by Major/Full GC      │ │  │
│  │  │  │ here  │ │ objects  │ │  │                                     │ │  │
│  │  │  └───────┘ └──────────┘ │  │                                     │ │  │
│  │  └─────────────────────────┘  └─────────────────────────────────────┘ │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │                           NON-HEAP                                    │  │
│  │  ┌─────────────────┐  ┌─────────────────┐  ┌───────────────────────┐  │  │
│  │  │    Metaspace    │  │   Code Cache    │  │   Direct Buffers      │  │  │
│  │  │                 │  │                 │  │                       │  │  │
│  │  │  Class metadata │  │  JIT compiled   │  │  Off-heap memory      │  │  │
│  │  │  Method data    │  │  native code    │  │  for NIO operations   │  │  │
│  │  │  Constant pool  │  │                 │  │                       │  │  │
│  │  └─────────────────┘  └─────────────────┘  └───────────────────────┘  │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │                        THREAD STACKS                                  │  │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐     │  │
│  │  │ Thread1 │  │ Thread2 │  │ Thread3 │  │ Thread4 │  │  ...    │     │  │
│  │  │  Stack  │  │  Stack  │  │  Stack  │  │  Stack  │  │         │     │  │
│  │  └─────────┘  └─────────┘  └─────────┘  └─────────┘  └─────────┘     │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Heap Memory

The heap is the runtime data area from which memory for all class instances and
arrays is allocated. It is created when the JVM starts and may increase or
decrease in size during application runtime.

Young Generation: Where new objects are allocated. Most objects die young
and are collected quickly through Minor GC. It is further divided into Eden
space (initial allocation) and Survivor spaces (S0 and S1) for aging objects.

Old Generation: Objects that survive multiple Minor GC cycles are promoted
here. Collected less frequently through Major GC or Full GC operations.

Stack Memory

Each thread has its own stack, created when the thread is created. The stack
stores frames containing local variables, partial results, and method invocation
data. Stack size is fixed per thread and configured via -Xss.

void main() {

    // Each method call creates a new stack frame
    int result = calculateFactorial(5);
    println("Factorial: " + result);
}

int calculateFactorial(int n) {
    // Local variables stored in stack frame
    if (n <= 1) {
        return 1;
    }
    // Recursive call creates new stack frame
    return n * calculateFactorial(n - 1);
}

Stack memory is automatically reclaimed when a method returns. Deep recursion
or excessively large local variables can cause StackOverflowError.

Metaspace

Metaspace replaced PermGen in Java 8. It stores class metadata, including:

Class definitions and bytecode
Method metadata and compiled method information
Constant pool data
Annotation processing information

Unlike PermGen, Metaspace uses native memory and can grow dynamically. It is
still subject to garbage collection when classes are unloaded.

Garbage Collection Regions (G1 Collector)

Modern collectors like G1 organize the heap into regions for more efficient
collection:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         G1 Heap Region Layout                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐  │
│  │  E  │  E  │  S  │  O  │  O  │  E  │  H  │  H  │  O  │  E  │  S  │  O  │  │
│  ├─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┤  │
│  │  O  │  E  │  O  │  O  │  E  │  E  │  H  │  O  │  O  │  E  │  E  │  O  │  │
│  ├─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┤  │
│  │  O  │  O  │  E  │  S  │  O  │  E  │  O  │  O  │  E  │  O  │  O  │  E  │  │
│  └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘  │
│                                                                             │
│  E = Eden Region    S = Survivor Region    O = Old Region    H = Humongous  │
│                                                                             │
│  - Each region is typically 1-32 MB in size                                 │
│  - Regions can change role dynamically                                      │
│  - Humongous regions store objects larger than half a region                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Tuning with JVM Flags

JVM flags control memory allocation, garbage collection behavior, and various
runtime optimizations. Flags are categorized into standard options (stable across
releases) and non-standard options (may change between JVM versions).

Flag Categories

Prefix	Description
`-`	Standard options (e.g., `-version`, `-classpath`)
`-X`	Non-standard options, relatively stable
`-XX:`	Advanced options, may change without notice
`-XX:+`	Enable boolean option
`-XX:-`	Disable boolean option
`-XX:=`	Set numeric or string value

Heap Size Flags

Maximum Heap Size (-Xmx)

The -Xmx flag sets the maximum heap size the JVM can allocate. When heap
usage approaches this limit, aggressive garbage collection occurs. If memory
cannot be freed, OutOfMemoryError is thrown.

# Set maximum heap to 4 gigabytes
java -Xmx4g MyApplication

# Set maximum heap to 2048 megabytes
java -Xmx2048m MyApplication

# Set maximum heap to specific kilobytes
java -Xmx2097152k MyApplication

Best practices for -Xmx:

Set based on available physical memory (typically 50-75% of total RAM)
Account for non-heap memory (Metaspace, native memory, thread stacks)
Monitor actual usage before increasing; larger heaps mean longer GC pauses
Container deployments should consider memory limits carefully

Initial Heap Size (-Xms)

The -Xms flag sets the initial heap size. The JVM starts with this amount
and grows toward -Xmx as needed. Setting -Xms equal to -Xmx prevents
heap resizing during runtime.

# Start with 2GB, can grow to 4GB
java -Xms2g -Xmx4g MyApplication

# Fixed heap size of 4GB (recommended for production)
java -Xms4g -Xmx4g MyApplication

Best practices for -Xms:

Set equal to -Xmx for production systems to avoid resize overhead
Lower values acceptable for development to conserve resources
Monitor startup time; larger initial heap may slow JVM initialization

Viewing Current Memory Settings

void main() {

    var runtime = Runtime.getRuntime();

    long maxMemory = runtime.maxMemory();
    long totalMemory = runtime.totalMemory();
    long freeMemory = runtime.freeMemory();
    long usedMemory = totalMemory - freeMemory;

    println("JVM Memory Information:");
    println("─".repeat(50));
    println("Max Memory (Xmx):     " + formatBytes(maxMemory));
    println("Total Memory:         " + formatBytes(totalMemory));
    println("Used Memory:          " + formatBytes(usedMemory));
    println("Free Memory:          " + formatBytes(freeMemory));
    println("Available Processors: " + runtime.availableProcessors());
}

String formatBytes(long bytes) {
    if (bytes < 1024) return bytes + " B";
    if (bytes < 1024 * 1024) return (bytes / 1024) + " KB";
    if (bytes < 1024 * 1024 * 1024) return (bytes / (1024 * 1024)) + " MB";
    return String.format("%.2f GB", bytes / (1024.0 * 1024 * 1024));
}

Running this program shows current JVM memory configuration. Use it to verify
that your flags are applied correctly.

Garbage Collection Flags

The JVM includes several garbage collectors, each optimized for different
workloads. Selecting the appropriate collector significantly impacts performance.

Garbage Collector Comparison

Collector	Best For	Pause Times	Throughput
G1GC	General purpose, balanced	Medium (10-200ms)	High
ZGC	Low latency, large heaps	Very low (<1-10ms)	High
Shenandoah	Low latency, any heap size	Very low (<10ms)	High
Parallel GC	Batch processing, throughput	Higher	Highest
Serial GC	Small heaps, single CPU	Variable	Low

G1 Garbage Collector

G1 (Garbage-First) is the default collector since Java 9. It divides the heap
into regions and prioritizes collecting regions with the most garbage.

# Enable G1GC (default in Java 9+)
java -XX:+UseG1GC MyApplication

# Set target pause time (milliseconds)
java -XX:+UseG1GC -XX:MaxGCPauseMillis=200 MyApplication

# Set heap region size (1MB-32MB, must be power of 2)
java -XX:+UseG1GC -XX:G1HeapRegionSize=16m MyApplication

# Tune concurrent GC threads
java -XX:+UseG1GC -XX:ConcGCThreads=4 MyApplication

Key G1 tuning options:

Flag	Description
`-XX:MaxGCPauseMillis=200`	Target maximum GC pause time
`-XX:G1HeapRegionSize=16m`	Size of G1 heap regions
`-XX:G1NewSizePercent=5`	Minimum young generation percentage
`-XX:G1MaxNewSizePercent=60`	Maximum young generation percentage
`-XX:InitiatingHeapOccupancyPercent=45`	Heap occupancy to trigger marking

ZGC (Z Garbage Collector)

ZGC is designed for applications requiring extremely low latency. It performs
most work concurrently, keeping pause times consistently under 10ms regardless
of heap size.

# Enable ZGC
java -XX:+UseZGC MyApplication

# ZGC with generational mode (Java 21+)
java -XX:+UseZGC -XX:+ZGenerational MyApplication

# Set concurrent GC threads
java -XX:+UseZGC -XX:ConcGCThreads=4 MyApplication

ZGC characteristics:

Pause times typically under 1ms
Scales to multi-terabyte heaps
Slightly lower throughput than G1
Best for latency-sensitive applications

Shenandoah GC

Shenandoah is another low-pause collector, performing concurrent compaction
to minimize stop-the-world pauses.

# Enable Shenandoah
java -XX:+UseShenandoahGC MyApplication

# Set heuristics mode
java -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive MyApplication

# Compact mode for smaller heaps
java -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=compact MyApplication

Parallel GC

Parallel GC maximizes throughput by using multiple threads for collection.
It is ideal for batch processing where pause times are acceptable.

# Enable Parallel GC
java -XX:+UseParallelGC MyApplication

# Set number of GC threads
java -XX:+UseParallelGC -XX:ParallelGCThreads=8 MyApplication

# Enable adaptive sizing
java -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy MyApplication

Other Useful Flags

Metaspace Configuration

# Set maximum Metaspace size
java -XX:MaxMetaspaceSize=256m MyApplication

# Set initial Metaspace size
java -XX:MetaspaceSize=128m MyApplication

# Limit class metadata allocation
java -XX:CompressedClassSpaceSize=128m MyApplication

GC Logging Flags

Modern GC logging (Java 9+) uses the Unified Logging Framework:

# Basic GC logging
java -Xlog:gc MyApplication

# Detailed GC logging with timestamps
java -Xlog:gc*:file=gc.log:time,uptime,level,tags MyApplication

# Include heap information
java -Xlog:gc+heap=debug:file=gc.log MyApplication

# Log GC pauses only
java -Xlog:gc+pause=info MyApplication

Legacy GC logging (Java 8):

# Print GC details (Java 8)
java -XX:+PrintGCDetails -XX:+PrintGCDateStamps MyApplication

# Write to file
java -XX:+PrintGCDetails -Xloggc:gc.log MyApplication

Heap Dump Configuration

# Generate heap dump on OutOfMemoryError
java -XX:+HeapDumpOnOutOfMemoryError MyApplication

# Specify heap dump location
java -XX:+HeapDumpOnOutOfMemoryError \
     -XX:HeapDumpPath=/var/dumps/heap.hprof MyApplication

# Run script on OutOfMemoryError
java -XX:+HeapDumpOnOutOfMemoryError \
     -XX:OnOutOfMemoryError="./notify-admin.sh %p" MyApplication

Thread Stack Size

# Set thread stack size to 512KB
java -Xss512k MyApplication

# Larger stacks for deep recursion
java -Xss2m MyApplication

Complete Flag Reference Table

Flag	Description
`-Xmx<size>`	Maximum heap size
`-Xms<size>`	Initial heap size
`-Xss<size>`	Thread stack size
`-XX:+UseG1GC`	Enable G1 garbage collector
`-XX:+UseZGC`	Enable Z garbage collector
`-XX:+UseShenandoahGC`	Enable Shenandoah collector
`-XX:+UseParallelGC`	Enable Parallel collector
`-XX:MaxGCPauseMillis=<ms>`	Target maximum GC pause time
`-XX:MaxMetaspaceSize=<size>`	Maximum Metaspace size
`-XX:+HeapDumpOnOutOfMemoryError`	Dump heap on OOM
`-XX:HeapDumpPath=<path>`	Heap dump file location
`-Xlog:gc*`	Enable GC logging
`-XX:+PrintFlagsFinal`	Print all flag values
`-XX:+UnlockDiagnosticVMOptions`	Enable diagnostic options
`-XX:+UnlockExperimentalVMOptions`	Enable experimental options

Profiling Tools

Profiling tools provide visibility into application behavior, enabling data-
driven optimization decisions. Java includes several powerful profiling tools
that integrate directly with the JVM.

Java Flight Recorder (JFR)

Java Flight Recorder is a low-overhead profiling framework built into the JVM.
It continuously collects diagnostic data with minimal performance impact,
making it suitable for production environments.

Enabling JFR

# Start recording from command line
java -XX:StartFlightRecording=duration=60s,filename=recording.jfr \
     MyApplication

# Start with continuous recording
java -XX:StartFlightRecording=disk=true,maxage=1h,maxsize=500m \
     MyApplication

# Start recording without disk writes
java -XX:StartFlightRecording=memory MyApplication

Recording Settings

Setting	Description
`duration`	Recording length (e.g., `60s`, `5m`, `1h`)
`filename`	Output file path
`disk`	Enable disk repository (`true`/`false`)
`maxage`	Maximum age of data to keep
`maxsize`	Maximum disk space for recording
`settings`	Predefined profile (`default`, `profile`)

Starting Recording Programmatically

import jdk.jfr.Configuration;
import jdk.jfr.Recording;

import java.nio.file.Path;

void main() throws Exception {

    // Load a configuration
    var config = Configuration.getConfiguration("profile");

    // Create and start recording
    var recording = new Recording(config);
    recording.setName("MyRecording");
    recording.setMaxAge(java.time.Duration.ofMinutes(10));
    recording.start();

    println("Recording started: " + recording.getName());
    println("Recording ID: " + recording.getId());

    // Run your application workload here
    simulateWorkload();

    // Stop and save recording
    recording.stop();
    recording.dump(Path.of("programmatic-recording.jfr"));

    println("Recording saved to programmatic-recording.jfr");
    recording.close();
}

void simulateWorkload() throws InterruptedException {
    var list = new ArrayList<String>();
    for (int i = 0; i < 100000; i++) {
        list.add("Item " + i);
        if (i % 10000 == 0) {
            Thread.sleep(100);
            println("Processed " + i + " items");
        }
    }
}

This example creates a JFR recording programmatically, which is useful for
capturing specific application scenarios or integrating profiling into
automated testing.

Creating Custom JFR Events

import jdk.jfr.Category;
import jdk.jfr.Description;
import jdk.jfr.Event;
import jdk.jfr.Label;
import jdk.jfr.Name;

@Name("com.example.OrderProcessed")
@Label("Order Processed")
@Category("Business Events")
@Description("Fired when an order is successfully processed")
class OrderProcessedEvent extends Event {

    @Label("Order ID")
    String orderId;

    @Label("Customer")
    String customer;

    @Label("Total Amount")
    double amount;

    @Label("Processing Time (ms)")
    long processingTimeMs;
}

void main() {

    // Process several orders
    for (int i = 1; i <= 5; i++) {
        processOrder("ORD-" + i, "Customer-" + i, i * 100.0);
    }

    println("Order processing complete");
}

void processOrder(String orderId, String customer, double amount) {

    var event = new OrderProcessedEvent();
    event.orderId = orderId;
    event.customer = customer;
    event.amount = amount;

    long startTime = System.currentTimeMillis();

    // Simulate order processing
    try {
        Thread.sleep((long) (Math.random() * 100 + 50));
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
    }

    event.processingTimeMs = System.currentTimeMillis() - startTime;

    // Commit the event if enabled
    event.commit();

    println("Processed order: " + orderId + 
            " in " + event.processingTimeMs + "ms");
}

Custom events enable tracking application-specific metrics alongside standard
JVM events. They are invaluable for correlating business logic with system
behavior.

Analyzing JFR Recordings

Use jfr command-line tool to extract data from recordings:

# Print recording summary
jfr summary recording.jfr

# Print specific event types
jfr print --events jdk.GCPhasePause recording.jfr

# Print CPU sampling data
jfr print --events jdk.ExecutionSample recording.jfr

# Export to JSON format
jfr print --json recording.jfr > recording.json

# List all event types
jfr metadata recording.jfr

VisualVM

VisualVM is a graphical tool for monitoring, troubleshooting, and profiling
Java applications. It combines several command-line JDK tools into an intuitive
interface.

Key Features

Feature	Description
Monitor	Real-time CPU, memory, classes, threads display
Threads	Thread state visualization and deadlock detection
Sampler	CPU and memory sampling with low overhead
Profiler	Detailed method-level CPU and memory profiling
Heap Dump	Capture and analyze heap snapshots
Thread Dump	Capture and view thread stack traces

Installation and Setup

VisualVM is available as a standalone download or bundled with GraalVM. For
remote profiling, start the application with JMX enabled:

# Enable JMX for remote connection
java -Dcom.sun.management.jmxremote \
     -Dcom.sun.management.jmxremote.port=9090 \
     -Dcom.sun.management.jmxremote.authenticate=false \
     -Dcom.sun.management.jmxremote.ssl=false \
     MyApplication

Diagnosing Memory Leaks

Memory leaks occur when objects are no longer needed but remain reachable,
preventing garbage collection. VisualVM helps identify leaks through:

Monitor tab: Watch heap usage grow over time without stabilizing
Sampler tab: Identify classes consuming the most memory
Heap Dump: Analyze object references to find retention paths

import java.util.HashMap;

void main() throws InterruptedException {

    // Simulated memory leak: map grows indefinitely
    var leakyMap = new HashMap<String, byte[]>();

    int counter = 0;
    while (counter < 1000) {
        // Each entry holds 1MB, never removed
        leakyMap.put("key-" + counter, new byte[1024 * 1024]);
        counter++;

        if (counter % 100 == 0) {
            println("Entries: " + counter + 
                    ", Memory: " + formatBytes(
                        Runtime.getRuntime().totalMemory() - 
                        Runtime.getRuntime().freeMemory()));
        }

        Thread.sleep(100);
    }
}

String formatBytes(long bytes) {
    return String.format("%.2f MB", bytes / (1024.0 * 1024));
}

When profiling this application with VisualVM:

Connect to the running application
Open the Monitor tab to observe heap growth
Take a heap dump when memory is high
Analyze retained size by class to find the byte[] arrays
Trace references to discover the HashMap holding them

JDK Mission Control

JDK Mission Control (JMC) is an advanced suite of tools for analyzing JFR
recordings. It provides deep insights into application behavior through
specialized views and automated analysis.

Key Analysis Views

View	Purpose
Outline	High-level recording summary
Method Profiling	Hot methods and call stacks
Memory	Allocation patterns and object statistics
Lock Instances	Contention analysis and lock profiling
Garbage Collections	GC events, durations, and pause analysis
Event Browser	Raw access to all recorded events

Automated Analysis

JMC includes an automated analysis engine that examines recordings and
provides actionable recommendations:

Memory pressure warnings
Thread contention hotspots
I/O bottleneck identification
GC inefficiency detection
Code execution anomalies

Starting Recording for JMC

# Detailed recording for JMC analysis
java -XX:StartFlightRecording=settings=profile,\
     duration=5m,\
     filename=analysis.jfr \
     MyApplication

# Continuous recording for production
java -XX:StartFlightRecording=settings=default,\
     disk=true,\
     maxage=24h,\
     dumponexit=true,\
     dumponexitpath=exit-recording.jfr \
     MyApplication

Profiling Tools Comparison

Tool	Best For	Overhead	Production-Safe
JFR	Continuous monitoring	Very Low	Yes
VisualVM	Interactive debugging	Medium	Development
JMC	Deep analysis of JFR files	N/A	Analysis tool
async-profiler	CPU/allocation profiling	Low	Yes
jstack	Quick thread dump	Minimal	Yes
jmap	Heap statistics/dump	High	With caution

Performance Tuning Workflow

Effective performance tuning follows a systematic, iterative process. Avoid
random flag changes; instead, use data to guide decisions.

Step-by-Step Tuning Process

┌─────────────────────────────────────────────────────────────────────────────┐
│                      Performance Tuning Workflow                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────┐     ┌─────────┐     ┌─────────┐     ┌─────────┐             │
│   │  1.     │     │  2.     │     │  3.     │     │  4.     │             │
│   │ MEASURE │────►│ ANALYZE │────►│  TUNE   │────►│ VERIFY  │────┐        │
│   │         │     │         │     │         │     │         │    │        │
│   └─────────┘     └─────────┘     └─────────┘     └─────────┘    │        │
│        ▲                                                          │        │
│        │                                                          │        │
│        └──────────────────────────────────────────────────────────┘        │
│                              Iterate                                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Step 1: Measure (Identify Performance Issues)

Before tuning, establish baseline metrics:

# Enable GC logging
java -Xlog:gc*:file=baseline-gc.log:time,level,tags \
     -XX:+HeapDumpOnOutOfMemoryError \
     MyApplication

# Start JFR recording
java -XX:StartFlightRecording=duration=10m,filename=baseline.jfr \
     MyApplication

Key metrics to capture:

Response times (p50, p95, p99 percentiles)
Throughput (requests/second)
GC pause times and frequency
Heap usage patterns
CPU utilization
Thread counts and states

Step 2: Analyze (Collect and Interpret Profiling Data)

Use profiling tools to identify bottlenecks:

# Analyze GC patterns
jfr print --events jdk.GCPhasePause,jdk.GarbageCollection baseline.jfr

# Find hot methods
jfr print --events jdk.ExecutionSample baseline.jfr | \
    sort | uniq -c | sort -rn | head -20

# Check memory allocation
jfr print --events jdk.ObjectAllocationInNewTLAB baseline.jfr

Common findings and solutions:

Finding	Likely Cause	Solution
Frequent Full GC	Heap too small or leak	Increase Xmx, fix leak
High GC pause times	Large old generation	Tune GC, consider ZGC
Many young GC collections	High allocation rate	Reduce object creation
Thread contention	Lock bottlenecks	Reduce synchronization
High CPU in specific methods	Algorithmic inefficiency	Optimize code

Step 3: Tune (Adjust JVM Flags)

Apply targeted changes based on analysis:

# If heap is too small
java -Xms4g -Xmx4g MyApplication

# If GC pauses are too long
java -XX:+UseZGC MyApplication

# If throughput is priority
java -XX:+UseParallelGC -XX:ParallelGCThreads=8 MyApplication

# Combined tuning example
java -Xms8g -Xmx8g \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=100 \
     -XX:+HeapDumpOnOutOfMemoryError \
     -Xlog:gc*:file=tuned-gc.log:time,level,tags \
     MyApplication

Step 4: Verify (Validate Improvements)

Compare tuned metrics against baseline:

# Record with same duration as baseline
java -XX:StartFlightRecording=duration=10m,filename=tuned.jfr \
     [tuning flags] \
     MyApplication

Verification criteria:

Did target metric improve?
Did other metrics degrade?
Is improvement consistent under load?
Does change behave as expected?

Sample Workflow: VisualVM and JFR

This example demonstrates a complete tuning cycle:

Scenario: Application has slow response times during peak load.

Step 1: Measure

# Start with JFR recording and GC logging
java -XX:StartFlightRecording=settings=profile,duration=30m,\
     filename=perf-issue.jfr \
     -Xlog:gc*:file=perf-gc.log:time,level,tags \
     -Xms2g -Xmx2g \
     MyApplication

Step 2: Analyze in JMC
Open perf-issue.jfr in JDK Mission Control:

Check Automated Analysis for warnings
Review GC tab: many Full GCs taking >500ms
Memory tab: heap consistently near 100%
Conclusion: Heap is too small

Step 3: Tune

# Increase heap and set initial = max
java -Xms4g -Xmx4g \
     -XX:StartFlightRecording=settings=profile,duration=30m,\
     filename=after-tune.jfr \
     -Xlog:gc*:file=after-tune-gc.log:time,level,tags \
     MyApplication

Step 4: Verify
Compare recordings in JMC:

Full GC count: 47 → 3
Average GC pause: 423ms → 45ms
p99 response time: 2.3s → 0.8s

The change achieved the goal. Document the configuration and monitor
production metrics.

Best Practices

Following these guidelines leads to more effective and maintainable
performance tuning.

Start with Defaults

Modern JVMs are well-tuned for general workloads. The HotSpot JVM uses
ergonomics to automatically configure many settings based on available
resources.

# Let JVM choose GC and heap size
java MyApplication

# View auto-selected options
java -XX:+PrintFlagsFinal -version 2>&1 | grep -E "UseG1GC|MaxHeapSize"

Only override defaults when profiling reveals specific issues.

Test Under Realistic Workloads

Tuning in isolation often produces misleading results. Performance
characteristics change significantly between test and production:

Factor	Test Environment	Production
Data volume	Sample data	Full dataset
Concurrent users	1-10	100-10000+
Request patterns	Uniform	Spiky, varied
Duration	Minutes	Days/weeks
External services	Mocked	Real latency

Use load testing tools (JMeter, Gatling, k6) to simulate production
conditions during tuning.

Monitor GC Logs and Heap Usage

Establish continuous monitoring in production:

# Production GC logging configuration
java -Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=10,filesize=100m \
     -XX:+HeapDumpOnOutOfMemoryError \
     -XX:HeapDumpPath=/var/dumps/ \
     MyApplication

Key monitoring metrics:

GC pause time distribution (especially p99)
GC frequency and type (Minor vs Full)
Heap occupancy over time
Promotion rate (objects moving to old generation)
Allocation rate

Profile Before Changing Flags

Never tune blindly. Each change should address a specific, measured issue:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    Flag Change Decision Process                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Question                           │  Action                              │
│   ─────────────────────────────────  │  ──────────────────────────────────  │
│   Is there a performance problem?    │  No  → Don't tune                   │
│   Have you profiled?                 │  No  → Profile first                │
│   Did profiling identify root cause? │  No  → Investigate more             │
│   Will this flag address the cause?  │  No  → Find correct solution        │
│   Have you tested the change?        │  No  → Test in staging              │
│   Does it improve target metric?     │  No  → Revert and try alternative   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Document Configuration Decisions

Maintain records of tuning decisions:

# production-jvm-config.sh
# JVM Configuration for Order Processing Service
# Last updated: 2024-01-15
# Owner: Performance Team

# Heap Configuration
# Reason: Production load analysis showed peak usage of 3.2GB
HEAP_OPTS="-Xms4g -Xmx4g"

# GC Configuration  
# Reason: p99 latency requirement of <100ms
GC_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=80"

# Monitoring
# Reason: Required for production troubleshooting
MONITOR_OPTS="-Xlog:gc*:file=/var/log/gc.log:time,level,tags"
MONITOR_OPTS="$MONITOR_OPTS -XX:+HeapDumpOnOutOfMemoryError"
MONITOR_OPTS="$MONITOR_OPTS -XX:HeapDumpPath=/var/dumps"

java $HEAP_OPTS $GC_OPTS $MONITOR_OPTS -jar app.jar

Common Pitfalls

Avoid these frequent mistakes when tuning JVM performance.

Setting Heap Sizes Incorrectly

Too Small: Causes frequent garbage collection, OutOfMemoryError, and
degraded performance. The JVM spends more time collecting than executing
application code.

Too Large: Results in longer GC pauses, slower startup, and wasted
resources. Very large heaps may exceed physical memory, causing swapping.

# Incorrect: Heap larger than physical RAM
java -Xmx32g MyApplication  # On machine with 16GB RAM

# Incorrect: Heap too small for workload
java -Xmx256m MyApplication  # With 2GB working set

# Correct: Sized based on analysis, accounting for non-heap
java -Xmx12g MyApplication  # On 16GB machine (leaves room for OS, Metaspace)

Misinterpreting GC Logs

Common log interpretation errors:

Misinterpretation	Reality
All GC is bad	Minor GC is normal and efficient
Longer pause = bigger heap	Pause depends on live objects, not heap
More GC threads always helps	Too many causes contention
G1 needs extensive tuning	Defaults work well for most workloads

Example log entry:

[2024-01-15T10:30:00.123+0000] GC(42) Pause Young (Normal) 
    (G1 Evacuation Pause) 2048M->1024M(4096M) 45.123ms

Interpretation:

Minor GC (#42), copying live objects from young generation
Heap reduced from 2GB to 1GB (4GB max)
Pause of 45ms - acceptable for G1

Ignoring Application-Level Bottlenecks

JVM tuning cannot fix algorithmic problems:

void main() {

    // No amount of JVM tuning helps this inefficient code
    var list = new ArrayList<String>();
    for (int i = 0; i < 100000; i++) {
        list.add("item-" + i);
    }

    // O(n) lookup instead of O(1) with Set
    for (int i = 0; i < 100000; i++) {
        boolean found = list.contains("item-" + i);  // Slow!
    }
}

Profiling with JFR or VisualVM would show hot spots in ArrayList.contains.
The solution is changing data structure, not GC tuning.

Over-Reliance on Tuning Without Profiling

The "cargo cult" approach of copying configurations without understanding:

# Copied from blog post without context
java -Xms32g -Xmx32g \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=50 \
     -XX:G1HeapRegionSize=32m \
     -XX:G1NewSizePercent=30 \
     -XX:G1MaxNewSizePercent=50 \
     -XX:InitiatingHeapOccupancyPercent=35 \
     -XX:G1MixedGCLiveThresholdPercent=85 \
     -XX:ConcGCThreads=8 \
     -XX:ParallelGCThreads=16 \
     MyApplication

Problems with this approach:

Heap size may be wrong for your hardware
Pause time target may be unrealistic
Thread counts may not match CPU count
Some flags may conflict or be deprecated

Tuning in Wrong Environment

Development and production environments differ significantly:

Aspect	Development	Production
Heap size	512MB	4GB+
CPU cores	4	16+
OS memory	Shared	Dedicated
JIT optimization	Minimal	Fully warmed
Class loading	Dynamic	Stable

Always validate tuning changes in an environment that mirrors production.

Examples

These examples provide starting configurations for common deployment scenarios.
Adjust based on your specific requirements and profiling results.

Web Application Configuration

Web applications typically prioritize low latency and consistent response times.

#!/bin/bash
# Web Application JVM Configuration
# Characteristics: Many concurrent requests, session state, moderate heap

# Heap: 4GB for medium traffic site
HEAP="-Xms4g -Xmx4g"

# G1GC with low pause target
GC="-XX:+UseG1GC"
GC="$GC -XX:MaxGCPauseMillis=100"
GC="$GC -XX:+ParallelRefProcEnabled"

# Large code cache for compiled JSPs/servlets
CODE_CACHE="-XX:ReservedCodeCacheSize=256m"

# Thread stack (default is usually fine)
STACK="-Xss512k"

# Metaspace for web frameworks with many classes
META="-XX:MaxMetaspaceSize=256m"

# Production monitoring
MONITOR="-Xlog:gc*:file=/var/log/app/gc.log:time,uptime:filecount=5,filesize=50m"
MONITOR="$MONITOR -XX:+HeapDumpOnOutOfMemoryError"
MONITOR="$MONITOR -XX:HeapDumpPath=/var/dumps/"

# JFR for production profiling
JFR="-XX:StartFlightRecording=disk=true,maxage=12h,maxsize=500m"
JFR="$JFR -XX:FlightRecorderOptions=repository=/var/jfr/"

java $HEAP $GC $CODE_CACHE $STACK $META $MONITOR $JFR \
     -jar web-application.jar

Rationale:

Fixed heap prevents resize pauses during traffic spikes
G1GC provides predictable pause times
Generous Metaspace for framework class loading
Continuous JFR for diagnosing production issues

Microservice Configuration

Microservices need fast startup, small footprint, and container awareness.

#!/bin/bash
# Microservice JVM Configuration
# Characteristics: Small heap, fast startup, container deployment

# Smaller heap for microservice (adjust based on service)
HEAP="-Xms512m -Xmx512m"

# ZGC for consistent low latency (or G1 for smaller heaps)
GC="-XX:+UseZGC"

# Enable container support (default in modern JVMs)
CONTAINER="-XX:+UseContainerSupport"
CONTAINER="$CONTAINER -XX:MaxRAMPercentage=75.0"

# Optimize for startup time
STARTUP="-XX:TieredStopAtLevel=1"
STARTUP="$STARTUP -XX:+UseSerialGC"  # Use for very small services

# Alternative: CDS for faster startup
# Generate: java -Xshare:dump
CDS="-Xshare:on"

# Minimal monitoring in container (use sidecar for full logging)
MONITOR="-Xlog:gc:stderr:time"
MONITOR="$MONITOR -XX:+HeapDumpOnOutOfMemoryError"
MONITOR="$MONITOR -XX:HeapDumpPath=/tmp/"

# For production microservice with moderate load
java $HEAP $GC $CONTAINER $MONITOR \
     -jar microservice.jar

# For startup-optimized (dev/test)
# java $HEAP $STARTUP $CDS $MONITOR -jar microservice.jar

Rationale:

Container awareness respects cgroup limits
MaxRAMPercentage for automatic sizing in containers
ZGC provides low latency without tuning
Startup options available for faster cold starts

High-Throughput Batch Job Configuration

Batch jobs prioritize throughput over latency, processing maximum data per
unit time.

#!/bin/bash
# Batch Processing JVM Configuration
# Characteristics: Large heap, throughput priority, long-running

# Large heap for batch data processing
HEAP="-Xms16g -Xmx16g"

# Parallel GC maximizes throughput
GC="-XX:+UseParallelGC"
GC="$GC -XX:ParallelGCThreads=12"
GC="$GC -XX:+UseNUMA"

# Alternative: G1 if pauses become too long
# GC="-XX:+UseG1GC -XX:MaxGCPauseMillis=500"

# Large young generation for allocation-heavy workloads
YOUNG="-XX:NewRatio=2"

# Aggressive inlining for hot paths
JIT="-XX:MaxInlineSize=100"
JIT="$JIT -XX:FreqInlineSize=200"

# Monitoring for batch analysis
MONITOR="-Xlog:gc*:file=/var/log/batch/gc.log:time,uptime"
MONITOR="$MONITOR -XX:+HeapDumpOnOutOfMemoryError"

# JFR for post-run analysis
JFR="-XX:StartFlightRecording=settings=profile,dumponexit=true"
JFR="$JFR -XX:FlightRecorderOptions=repository=/var/jfr/"

java $HEAP $GC $YOUNG $JIT $MONITOR $JFR \
     -jar batch-processor.jar

Rationale:

Parallel GC provides highest throughput
Large heap accommodates batch data sets
NUMA support for multi-socket systems
Aggressive JIT settings for long-running hot paths

Configuration Summary Table

Scenario	Heap	GC	Key Priorities
Web Application	4-8GB	G1GC	Low latency, consistent
Microservice	256M-1GB	ZGC/G1GC	Fast startup, small footprint
Batch Processing	8-32GB	Parallel GC	Throughput, large dataset
Low-Latency Trading	4-16GB	ZGC	Sub-ms pauses
Desktop/GUI	256M-2GB	G1GC	Responsive, modest heap

Complete Example Application

import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;

void main() {

    println("=== JVM Configuration Analysis ===");
    println();

    // Runtime info
    var runtime = Runtime.getRuntime();
    println("Available Processors: " + runtime.availableProcessors());
    println();

    // Memory configuration
    println("Memory Configuration:");
    println("  Max Memory (-Xmx):     " + formatBytes(runtime.maxMemory()));
    println("  Total Memory (current):" + formatBytes(runtime.totalMemory()));
    println("  Free Memory:           " + formatBytes(runtime.freeMemory()));
    println("  Used Memory:           " + 
            formatBytes(runtime.totalMemory() - runtime.freeMemory()));
    println();

    // Garbage collector info
    println("Garbage Collectors:");
    for (GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) {
        println("  " + gc.getName());
        println("    Collection Count: " + gc.getCollectionCount());
        println("    Collection Time:  " + gc.getCollectionTime() + "ms");
    }
    println();

    // Memory pools
    MemoryMXBean memory = ManagementFactory.getMemoryMXBean();
    println("Heap Memory Usage:");
    var heapUsage = memory.getHeapMemoryUsage();
    println("  Init:      " + formatBytes(heapUsage.getInit()));
    println("  Used:      " + formatBytes(heapUsage.getUsed()));
    println("  Committed: " + formatBytes(heapUsage.getCommitted()));
    println("  Max:       " + formatBytes(heapUsage.getMax()));
    println();

    println("Non-Heap Memory Usage:");
    var nonHeapUsage = memory.getNonHeapMemoryUsage();
    println("  Init:      " + formatBytes(nonHeapUsage.getInit()));
    println("  Used:      " + formatBytes(nonHeapUsage.getUsed()));
    println("  Committed: " + formatBytes(nonHeapUsage.getCommitted()));
    println();

    // JVM arguments
    println("JVM Arguments:");
    var runtimeMXBean = ManagementFactory.getRuntimeMXBean();
    for (String arg : runtimeMXBean.getInputArguments()) {
        println("  " + arg);
    }
}

String formatBytes(long bytes) {
    if (bytes < 0) return "N/A";
    if (bytes < 1024) return bytes + " B";
    if (bytes < 1024 * 1024) return (bytes / 1024) + " KB";
    if (bytes < 1024 * 1024 * 1024) return (bytes / (1024 * 1024)) + " MB";
    return String.format("%.2f GB", bytes / (1024.0 * 1024 * 1024));
}

This utility displays current JVM configuration. Run it with different flags
to verify settings are applied correctly.

Conclusion

JVM tuning and profiling are essential skills for building high-performance
Java applications. Effective performance engineering requires a disciplined,
data-driven approach rather than ad-hoc flag adjustments.

Key Takeaways

Principle	Guidance
Measure First	Always profile before tuning
Start Simple	Use JVM defaults until proven insufficient
Make One Change at a Time	Isolate variables to understand impact
Validate Improvements	Confirm changes achieve desired outcomes
Test Realistically	Use production-like conditions
Document Decisions	Record rationale for future reference
Monitor Continuously	Track metrics to detect regressions

Balancing Performance, Stability, and Maintainability

The goal of tuning is not maximum performance at any cost. Instead, aim for
the optimal balance:

Performance: Meet response time and throughput requirements with
appropriate resource utilization. Avoid over-optimization that yields
diminishing returns.

Stability: Ensure consistent behavior under varying loads. A well-tuned
application handles traffic spikes gracefully without crashes or severe
degradation.

Maintainability: Keep configurations understandable and documented. Complex
tuning that nobody understands becomes a liability during incidents.

Continued Learning

JVM performance engineering evolves with each Java release. Stay current by:

Reading JVM release notes for new GC features and deprecations
Following OpenJDK mailing lists and JEPs
Experimenting with new collectors (ZGC, Shenandoah) as they mature
Practicing with profiling tools on sample applications
Reviewing production metrics to understand real-world behavior

With the foundation provided in this document and continued practice, you can
effectively diagnose performance issues, apply targeted improvements, and
build Java applications that meet demanding performance requirements.

FilesExpand file tree

jvm-tuning.md

Latest commit

History

jvm-tuning.md

File metadata and controls

JVM Tuning and Profiling

JVM Memory Management Basics

Memory Regions Overview

JVM Memory Architecture Diagram

Heap Memory

Stack Memory

Metaspace

Garbage Collection Regions (G1 Collector)

Tuning with JVM Flags

Flag Categories

Heap Size Flags

Maximum Heap Size (-Xmx)

Initial Heap Size (-Xms)

Viewing Current Memory Settings

Garbage Collection Flags

Garbage Collector Comparison

G1 Garbage Collector

ZGC (Z Garbage Collector)

Shenandoah GC

Parallel GC

Other Useful Flags

Metaspace Configuration

GC Logging Flags

Heap Dump Configuration

Thread Stack Size

Complete Flag Reference Table

Profiling Tools

Java Flight Recorder (JFR)

Enabling JFR

Recording Settings

Starting Recording Programmatically

Creating Custom JFR Events

Analyzing JFR Recordings

VisualVM

Key Features

Installation and Setup

Diagnosing Memory Leaks

JDK Mission Control

Key Analysis Views

Automated Analysis

Starting Recording for JMC

Profiling Tools Comparison

Performance Tuning Workflow

Step-by-Step Tuning Process

Step 1: Measure (Identify Performance Issues)

Step 2: Analyze (Collect and Interpret Profiling Data)

Step 3: Tune (Adjust JVM Flags)

Step 4: Verify (Validate Improvements)

Sample Workflow: VisualVM and JFR

Best Practices

Start with Defaults

Test Under Realistic Workloads

Monitor GC Logs and Heap Usage

Profile Before Changing Flags

Document Configuration Decisions

Common Pitfalls

Setting Heap Sizes Incorrectly

Misinterpreting GC Logs

Ignoring Application-Level Bottlenecks

Over-Reliance on Tuning Without Profiling

Tuning in Wrong Environment

Examples

Web Application Configuration

Microservice Configuration

High-Throughput Batch Job Configuration

Configuration Summary Table

Complete Example Application

Conclusion

Key Takeaways

Balancing Performance, Stability, and Maintainability

Continued Learning