JVM Execution & Memory Lifecycle

01 SYSTEM STARTUP & BOOTSTRAPPING ▾

Terminal Command

java -Xlog:all=trace MyApp — OS spawns the JVM process. The -Xlog flag enables verbose GC/JIT/class-load tracing useful for debugging.

OS → Native Bridge

OS invokes JNI_CreateJavaVM() — the native C++ function inside libjvm.so that initialises the entire JVM runtime from scratch.

Validation & Hardware Profiling

JVM validates classpath, args, and flags. It profiles system hardware (CPU cores, RAM) to auto-select the optimal GC.

Default GC Selection

G1GC is the default for most workloads in JDK 9+. ZGC/Shenandoah chosen for ultra-low latency.

Method Area Created

Creates Metaspace — a native off-heap region (outside the Java heap) mapped to OS virtual memory.

02 CLASS LOADING & INITIALIZATION PIPELINE ▾

Delegation model: JVM always tries the parent loader first before attempting to load itself. Prevents duplicate class definitions.

Bootstrap LoaderJDK core libs (java.lang.*)

→

Platform LoaderJDK extension modules

→

App LoaderYour classpath / jars

Phase 1 — Loading

JVM locates the binary .class file from classpath/module path → Parses bytecode → Derives an internal C++ structure called a Klass → Stores the Klass in Metaspace. Each Klass = the JVM's runtime blueprint of your Java class.

Phase 2 — Linking (3 sub-steps)

Verification

Bytecode verifier checks stack maps and opcodes to guarantee type safety and memory safety before any code runs. Prevents malicious or corrupt bytecode from crashing the JVM.

Preparation

JVM allocates memory for all static fields and assigns default values: int → 0, boolean → false, Object → null. Your code's values are NOT applied yet.

Resolution

Replaces symbolic references in the Constant Pool (e.g. "java/lang/String" as a string) with actual direct memory pointers. Makes method calls fast.

Phase 3 — Initialization

JVM executes the compiler-generated <clinit> method — runs all static { } blocks and assigns the exact user-defined values to static fields. Only runs once per class per JVM.

03 MEMORY TOPOGRAPHY — METASPACE ▾

Metaspace lives in native OS memory — completely outside the Java Heap. Capped by -XX:MaxMetaspaceSize. Grows automatically by default (unlike the old PermGen which had a fixed size and frequently caused OutOfMemoryError).

METASPACE NATIVE MEMORY — OFF HEAP

Compressed Class Space (Max 3 GB)

📦

Addressed via 32-bit shifted pointersSaves memory vs full 64-bit pointers. Enabled by -XX:+UseCompressedClassPointers

📐

Must be 100% ContiguousAll Klass definitions laid out in one uninterrupted block of memory. Required for pointer arithmetic to work.

🏗️

Stores Klass Definitions OnlyThe C++ runtime blueprint of each Java class: field layouts, vtable, etc.

Non-Class Metaspace

🔗

Standard 64-bit PointersNo compression, no contiguity requirement.

🧩

Can Be Fragmented / ScatteredMemory layout is flexible — chunks allocated wherever OS provides space.

📝

Stores Everything ElseBytecode (method body), Constant Pool, Annotations, Method metadata.

Key Distinction from PermGen

Metaspace was introduced in JDK 8. PermGen had a fixed size → java.lang.OutOfMemoryError: PermGen space. Metaspace auto-expands using native memory, vastly reducing this error class.

When Metaspace grows

Every time a new class is loaded (e.g., during framework startup, reflection, dynamic proxying), a new Klass + metadata is stored. Heavy use of runtime code generation (Spring, Hibernate, bytecode weavers) can spike Metaspace.

04 THREAD CREATION & STACK FRAME EXECUTION ▾

Thread Spawn

JVM provisions each thread its own private JVM Stack. Threads are completely isolated — no stack sharing. Size controlled by -Xss. Stack overflow → StackOverflowError.

Method Invoked → Stack Frame Pushed

Every method call creates one Stack Frame (Activation Record) pushed onto the thread's stack. Frame size is known exactly at compile time from .class metadata.

STACK FRAME (Activation Record)

Local Variable Array (LVA)

Zero-indexed array. Holds:
Index 0: this reference (instance methods)
Index 1..n: method parameters then local variables

Example: int add(int a, int b)
[this][a][b] at indices 0,1,2

Operand Stack (OS)

LIFO workspace for bytecode computation.

Example: a + b:
1. iload_1 → push a
2. iload_2 → push b
3. iadd → pop both, push result

All arithmetic flows through here.

Frame Data (FD)

Bridges the frame to global JVM state:

Constant Pool ref: pointer to the class's runtime constant pool
Return address: where to jump after the method returns
Exception table: maps bytecode ranges to catch handlers

No GC Needed for Stacks

When a method returns, its frame is instantly popped and destroyed. No garbage collection involved — stack allocation/deallocation is O(1) and far faster than heap operations.

05 ADAPTIVE JIT COMPILATION — TIERED COMPILATION ENGINE ▾

JVM tracks two counters per method: i (invocation count) and b (backedge/loop count). Hot code is progressively compiled to native machine code.

LEVEL 0 Interpreter Parses bytecode instruction-by-instruction. Slowest execution. Builds profiling data (type feedback, branch statistics). Every method starts here.

LEVEL 1 C1 — No Profile Simple fast native code, no profiling overhead. Used for trivial methods (getters/setters). Rarely a stepping stone — mostly a fast-track for tiny methods.

LEVEL 2 C1 — Light Profile Light profiling only. Activated when the C2 compiler queue is saturated and can't keep up. A pressure-relief valve.

LEVEL 3 C1 — Full Profile Standard path. Full profiling: type feedback, branch prediction data collected. Reached after ~200 invocations. Feeds rich profile data to C2.

LEVEL 4 C2 — Aggressive Optimize Uses profile data for speculative optimizations: method inlining, escape analysis, loop unrolling, scalar replacement, dead code elimination. Profiling STOPS. Reached after ~5000+ invocations. Output stored in Code Cache.

Escape Analysis

If C2 proves an object never "escapes" the method (no heap reference kept), it can allocate it on the stack instead. Zero GC pressure, instant cleanup on return.

Code Cache

Final native assembly stored in the Code Cache. Subsequent calls skip the interpreter entirely → near-bare-metal performance. Tune with -XX:ReservedCodeCacheSize.

Deoptimization

If C2's speculation proves wrong (e.g., assumed monomorphic call suddenly gets a new subtype), JVM deoptimizes back to interpreted mode and re-profiles. Rare but important to understand.

06 OBJECT ALLOCATION & G1GC HEAP REGIONS ▾

JEP 519 — COMPACT OBJECT HEADERS (JDK 25 Default) — Total: 64 bits (8 bytes)

Compressed Klass Ptr→ Metaspace

31-bit Hash CodeLazy, only if hashCode() called

4-bit GC Age0–15 survival count

Tag BitsLock state tracking

Reserved (4 bits)Future Valhalla Value Types

Compressed Klass PointerPoints to this object's Klass in Metaspace. Subsumed into the 64-bit Mark Word (previously was a separate 32-bit field).

31-bit Hash CodeLazily computed. Only written into header if hashCode() is ever called. Zero cost otherwise.

4-bit GC AgeIncremented every Young GC the object survives. At 15 → promoted to Old Gen.

Tag Bits (Lock State)Encodes whether object is unlocked, has a lightweight lock, or is contended with a full monitor.

Reserved 4 bitsExplicitly reserved for upcoming Project Valhalla Value Types (JEP 401+). Future-proofing the header layout.

G1GC divides the heap into equal-sized regions (1–32 MB each). Allocation path depends on object size:

✓ Size < 50% of Region → EDEN REGION

Uses TLAB (Thread-Local Allocation Buffer).

Each thread pre-owns a private chunk of Eden. Allocation = just bumping a pointer. Zero synchronization needed.

Extremely fast: comparable to stack allocation. TLAB exhausted → thread requests a new one.

⚠ Size ≥ 50% of Region → HUMONGOUS REGION

Bypasses Eden entirely. Locked straight into Old Generation.

Requires 100% physically contiguous regions end-to-end. Any leftover space in the last region is wasted (internal fragmentation).

Avoid large byte[] / int[] allocations in tight loops. Example: a 10 MB byte[] on a 16 MB region size goes Humongous immediately.

07 CROSS-REGION REFERENCE TRACKING — Write Barrier, Card Table, RSet ▾

Problem: During Young GC, we need to know which Young objects are referenced by Old objects — without scanning ALL of Old Gen (could be 100 GB!). G1GC solves this with a 3-layer tracking system.

① Write Barrier Old object assigned reference to Young object. JVM-injected native instruction intercepts this assignment before it completes.

→

② Card Table The 512-byte "card" covering the Old object's address is marked dirty. Card Table = compact bitset of the entire heap, 1 bit per 512 bytes.

→

③ Refinement Threads Background threads continuously scan for dirty cards. For each dirty card, they resolve the exact pointer inside it.

→

④ Remembered Set (RSet) Pointer registered in the Young region's RSet. At GC time, only the RSet is scanned — not all of Old Gen. Huge speedup.

RSet Structure

Per-region Hash Table: Key = External Region Address, Value = Dirty Card Index. Allows GC to scan only a tiny set of cards instead of the full old gen.

Write Barrier Cost

Every object field assignment in Java has a tiny hidden overhead — the write barrier check. This is why immutable objects (no writes after construction) can be slightly faster in tight loops.

RSet Memory Overhead

Large RSets can consume significant heap metadata memory. -XX:G1RSetUpdatingPauseTimePercent controls how much STW time is spent on RSet updates.

08 GARBAGE COLLECTION LIFECYCLE — Young GC → Mixed GC → Full GC ▾

YOUNG GC (Minor Collection) STOP-THE-WORLD

Trigger: All Eden Regions are full.

Scans RSets of young regions to find incoming Old→Young pointers
Evacuates (copies) live objects from Eden + From-Survivor → To-Survivor
GC Age incremented (+1) for every surviving object

PROMOTION RULES (Young → Old)

Rule	Condition	Action
Age Threshold	GC Age reaches 15 (default)	Object promoted to Old Region
Survivor Overflow	Survivor space is full (can't fit more)	Excess objects promoted early
Humongous Reclaim	Humongous primitive array with 0 incoming refs	Eagerly destroyed during Young GC immediately

IHOP THRESHOLD — Triggering Concurrent Marking

IHOP = Initiating Heap Occupancy Percent (adaptive, ~45% default).
When Old Gen crosses IHOP, G1GC calculates: can marking threads finish before Old Gen completely fills up?
If yes → start concurrent marking. This is an adaptive prediction, not a fixed threshold.

CONCURRENT MARKING CYCLE CONCURRENT (app keeps running)

1. Concurrent Marking — GC threads trace live objects while your app runs simultaneously. Uses SATB (Snapshot-At-The-Beginning) barriers:
  → Before overwriting a reference, the old value is logged to an SATB queue.
  → Preserves the logical "snapshot" of the heap from marking's start point.
  → Objects modified after snapshot starts = "float garbage" (collected next cycle).

2. Remark STW — Short pause. Drains remaining SATB buffers. Processes weak/soft/phantom references.

3. Cleanup STW — Scrubs/rebuilds RSets. Instantly reclaims regions that are proven 100% garbage (no live objects at all).

4. Mixed GC — Collects ALL Young regions + a carefully chosen subset of Old regions (those with the highest garbage ratio). G1 stops collecting Old regions the moment it would breach -XX:MaxGCPauseMillis. This is the core of G1's "Garbage First" naming — it always prioritizes the most garbage-dense regions.

FULL GC — Last Resort STOP-THE-WORLD · VERY SLOW

Triggered only if Mixed GC cannot reclaim memory fast enough, or a Humongous object needs contiguous regions that don't exist after fragmentation. Performs a serial single-threaded compaction of the entire heap. Can pause for seconds. Must be avoided in production.

Tuning MaxGCPauseMillis

-XX:MaxGCPauseMillis=200 is the default. G1 will reduce the Mixed GC collection set to stay within budget. Lower = more frequent, smaller GC pauses.

SATB Float Garbage

Objects that become unreachable AFTER the concurrent marking snapshot are "float garbage" — they won't be collected until the NEXT cycle. Normal behavior, not a memory leak.

G1 = "Garbage First"

Named because it prioritizes collecting regions with the most garbage first, maximizing reclaimed bytes per millisecond of pause time.

Humongous Allocation Tip

To avoid Humongous allocations, increase region size: -XX:G1HeapRegionSize=32m. This raises the Humongous threshold from 8 MB → 16 MB.