Lightning talk
![]() |
Carl Chesser
@che55er | che55er.io |
The trap
The symptom
The container crosses its memory limit, even when heap graphs look calm.
Teams raise requests and limits "just to be safe", then pay for unused capacity.
The real budget
Use the whole container, but keep explicit margin; padding limits blindly hides the real budget.
Kubernetes controls
resources:
requests:
memory: "768Mi" # scheduler reserves this much node capacity
cpu: "500m"
limits:
memory: "1Gi" # cgroup hard boundary for the process
cpu: "1"
Requests ask Kubernetes to place the pod where at least that much capacity is available. Limits are the hard cgroup boundary the process must stay under.
Java 17+
First check
java -XshowSettings:system -version
jcmd <pid> VM.info
java -Xlog:os+container=debug -version
Use the first two for normal checks; use logging when the JVM's view looks wrong.
Example
-XshowSettings:system surfaces cgroups
$ java -XshowSettings:system -version
Operating System Metrics:
Provider: cgroupv2
Effective CPU Count: 1
CPU Quota: 100000us
CPU Period: 100000us
Memory Limit: 1.00G
Memory & Swap Limit: 1.00G
Provider: cgroupv2 means the JVM is reading Linux's unified cgroup hierarchy, available since Linux 4.5 in 2016.Memory Limit: 1.00G is the ceiling for heap plus native memory.Effective CPU Count: 1 influences GC threads, JIT threads, and application parallelism.CPU Quota and CPU Period should line up with the deployment CPU limit.This is the quick "am I sizing from the pod or the node?" check.
Example
MaxRAMPercentage in the pod spec
# heap budget = limit - observed native memory - safety margin
1Gi limit - 220Mi native - 190Mi margin = 614Mi heap
614Mi / 1024Mi = 60%
env:
- name: JAVA_TOOL_OPTIONS
value: >
-XX:MaxRAMPercentage=60
Measure native memory under load first; use a lower percentage (default is 25%) for thread-heavy services, then increase % as you understand native memory needs.
Native threads
HotSpot Linux/x64 default is commonly 1024 KB per OS thread
500 OS threads * 1MiB stack = 500MiB reserved
env:
- name: JAVA_TOOL_OPTIONS
value: >
-Xss512k
The JVM default is intentionally roomy. Lowering -Xss can reclaim native headroom when a process has many OS-backed threads.
Built-in tools
-XX:NativeMemoryTracking=summary.summary is lighter than detail.jcmd against the running JVM.Java 17+ example
kubectl exec hello-world-8675d6cff9-b8v6x -- jcmd 1 VM.native_memory summary
Total: reserved=1677076KB, committed=43616KB
- Java Heap (reserved=262144KB, committed=16384KB)
- Class (reserved=1048664KB, committed=280KB)
- Thread (reserved=30653KB, committed=1141KB)
- Code (reserved=249735KB, committed=7735KB)
- GC (reserved=860KB, committed=64KB)
- Compiler (reserved=200KB, committed=200KB)
- Internal (reserved=1228KB, committed=1228KB)
- Other (reserved=16KB, committed=16KB)
- Symbol (reserved=1157KB, committed=1157KB)
- Native Memory Tracking (reserved=131KB, committed=131KB)
Heap appears here for context; the native budget is mostly everything outside the Java heap.
🪣 Read the buckets
- Thread (reserved=59260KB, committed=5588KB)
(threads #29)
(stack: reserved=59160KB, committed=5488KB)
(malloc=68KB tag=Thread #175)
29 JVM and application threads exist.58MB of address space.5.5MB is backed by real memory and can count toward pressure.-Xss change this budget.🪣 Read the buckets
Class: many loaded classes, frameworks, proxies, or hot redeploy behavior.Thread: large servlet pools, scheduler pools, Kafka consumers, or blocking I/O.Code: lots of hot methods compiled by the JIT over time.GC: bigger heaps, more regions, remembered sets, and collector worker data.Workflow
jcmd <pid> VM.native_memory summary
jcmd <pid> VM.native_memory baseline
# run representative load
jcmd <pid> VM.native_memory summary.diff
The goal is not perfect accounting. The goal is to stop guessing.
🚀Java Flight Recorder
env:
- name: JAVA_TOOL_OPTIONS
value: >
-XX:+FlightRecorder
-XX:StartFlightRecording=filename=/tmp/load-test.jfr,dumponexit=true
kubectl exec <pod> -- jcmd <pid> JFR.dump filename=/tmp/load-test.jfr
kubectl cp <pod>:/tmp/load-test.jfr ./load-test.jfr
Open the recording in JDK Mission Control to review heap usage, allocation pressure, and GC behavior across the test window.
Utilize NMT through diagnostic calls
Compare
jcmd plus NMT explains the gap
> jcmd 1 GC.heap_info
garbage-first heap total reserved 262144K, committed 18432K, used 2925K
> jcmd 1 VM.native_memory summary
Total: reserved=1726277KB, committed=80765KB
- Java Heap (reserved=262144KB, committed=18432KB)
- Class (reserved=1048665KB, committed=281KB)
- Thread (reserved=40867KB, committed=1219KB)
- Code (reserved=249739KB, committed=7739KB)
For Java heap, NMT's reserved and committed match GC.heap_info; heap_info adds used, the live heap occupancy inside that committed space.
Right-size loop
MaxRAMPercentage with explicit native headroom.![]() |
Carl Chesser
@che55er | che55er.io |