kubernetesseverity: blocker

OOMKilled

Kubernetes OOMKilled — container exceeded memory limit and was killed

OOMKilled — container killed for exceeding memory limit

✓ 92% fixable~20 mindifficulty: intermediate

Verified against Kubernetes docs — Container Resources, Linux kernel OOM killer documentation, JDK -XX:+UseContainerSupport docs · Updated June 2026

> quick_fix

OOMKilled means the Linux kernel's Out-Of-Memory killer terminated your container because it exceeded its memory limit (spec.containers[].resources.limits.memory). Either the limit is too low for the workload, or the application has a memory leak. Check the actual memory usage with kubectl top pod, then either increase the limit or fix the leak.

# Check which container was OOMKilled and its last state
kubectl describe pod <pod-name> | grep -A5 'Last State\|OOMKilled\|Limits'

# Check actual memory usage vs limit
kubectl top pod <pod-name> --containers

# Check the memory limit set on the container
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources.limits.memory}'

# Increase memory limit (quick fix)
kubectl set resources deployment <name> --limits=memory=512Mi

What causes this error

When a container's resident memory (RSS) exceeds the memory limit defined in resources.limits.memory, the Linux cgroup OOM killer terminates the container's main process. Kubernetes then records the exit code as 137 (128 + SIGKILL signal 9) and the reason as OOMKilled. The pod restarts according to its restartPolicy, but if the same workload hits the same limit, it enters CrashLoopBackOff. Common triggers: a memory leak in the application that grows over hours or days; a batch job processing a dataset larger than expected; JVM heap set higher than the container limit; Node.js default heap (1.7GB on 64-bit) exceeding a 512Mi limit; or a sidecar container (like Istio envoy) consuming memory the main container needs.

> advertisementAdSense placeholder

How to fix it

step 1

Confirm the OOMKilled event and check exit code

Verify that the termination reason is OOMKilled (not some other crash). Exit code 137 confirms the kernel's OOM killer sent SIGKILL.

# Detailed pod status showing last termination reason
kubectl describe pod <pod-name> | grep -A10 'Last State'

# Look for:
#   Last State:  Terminated
#     Reason:    OOMKilled
#     Exit Code: 137

# Check events for the OOM
kubectl get events --field-selector involvedObject.name=<pod-name> --sort-by='.lastTimestamp'

step 2

Measure actual peak memory usage

Before changing limits, measure what the container actually needs. Use kubectl top for current usage and metrics-server or Prometheus for historical peaks.

# Current memory usage per container in the pod
kubectl top pod <pod-name> --containers

# If you have Prometheus, query peak memory:
# container_memory_rss{pod="<pod-name>"}
# max_over_time(container_memory_rss{pod="<pod-name>"}[24h])

# Check node-level memory pressure
kubectl describe node <node-name> | grep -A5 'Conditions'

03
step 3
Set memory limits based on measured usage
Set the limit to 1.5-2x the observed peak usage to allow for normal variance. Set the request to the average observed usage. Never set limit equal to request for variable workloads — it leaves no headroom.
```
# deployment.yaml
spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "256Mi"   # average observed usage
      limits:
        memory: "512Mi"   # 2x average, covers peaks
```

step 4

Fix runtime-specific memory issues

Some runtimes need explicit memory configuration to stay within container limits. JVM, Node.js, and Python all have settings that must match the container limit.

# JVM — set heap to ~75% of container limit
# If limit is 512Mi, set -Xmx384m
java -Xmx384m -Xms256m -XX:+UseContainerSupport -jar app.jar

# Node.js — set max old space to ~75% of container limit
node --max-old-space-size=384 app.js

# Python — no built-in limit, but check for known leak patterns:
# - Growing lists/dicts that never get pruned
# - Unclosed file handles or database connections
# - Caching without eviction (functools.lru_cache with no maxsize)

How to verify the fix

kubectl describe pod shows no OOMKilled events in the last 24 hours.
kubectl top pod shows memory usage staying below 80% of the configured limit.
The pod is no longer entering CrashLoopBackOff due to memory issues.
JVM/Node.js heap settings are explicitly configured to stay within container limits.

Why OOMKilled happens at the runtime level

OOMKilled is enforced by the Linux kernel's cgroup memory controller, not by Kubernetes itself. When a container's cgroup memory usage (RSS + page cache, depending on kernel version) exceeds the cgroup's memory.limit_in_bytes (set by kubelet from the pod spec), the kernel invokes the OOM killer on the cgroup. The OOM killer sends SIGKILL (signal 9) to the process with the highest oom_score_adj within the cgroup — which is almost always the main container process. The exit code is 137 (128 + 9). Kubernetes detects the exit and records the reason as OOMKilled. The kubelet then restarts the container according to the pod's restartPolicy.

Common debug mistakes for OOMKilled

Setting JVM -Xmx equal to the container memory limit — the JVM uses significant memory outside the heap (metaspace, thread stacks, direct buffers), so -Xmx should be 60-75% of the container limit.
Increasing the memory limit without investigating why usage grew — if the application has a memory leak, a higher limit just delays the OOMKill by hours instead of minutes.
Checking only the main container when a sidecar (Istio envoy, log agent, monitoring agent) is the actual memory consumer — use kubectl top pod --containers to see per-container usage.
Setting memory request equal to limit for all workloads — this prevents burstable QoS and wastes cluster resources for workloads with variable memory patterns.
Assuming the application is at fault when the OOMKill is caused by the node running low on memory and the kubelet lowering the effective cgroup limit via memory.low thresholds.

When OOMKilled signals a deeper problem

Recurring OOMKilled events that keep coming back after limit increases point to a genuine memory leak. In garbage-collected languages (Java, Node.js, Python, Go), common leak patterns include: event listeners or callbacks registered but never removed, growing caches without eviction policies, closures that capture large objects and prevent GC, and database connection pools that hold result sets in memory. The diagnostic approach: capture a heap snapshot before and after a sustained workload period, then diff the two snapshots to find objects that grew. In Node.js use --inspect and Chrome DevTools; in Java use jmap -dump; in Go use pprof. If the application is healthy but memory usage grows linearly with request count, the issue is often per-request allocations that aren't being released to the OS — common in Go, where the runtime holds freed memory for reuse rather than returning it to the kernel.

Editor's take

OOMKilled is the Kubernetes error where people waste the most time increasing limits when they should be investigating usage. I've seen teams bump a container from 256Mi to 512Mi to 1Gi to 2Gi over the course of a month, each time 'fixing' the OOMKill for a few days before it comes back. The actual problem was a memory leak in a Node.js Express app that kept a growing array of request objects for logging that was never flushed.

The first thing to check is not the limit — it's the trajectory. Is memory usage stable at a high level (limit is genuinely too low) or is it growing over time (memory leak)? Run kubectl top pod every 5 minutes for an hour and plot the numbers. Stable = increase limit. Growing = find the leak.

The JVM trap catches teams regularly. A pod spec says limits.memory: 512Mi. The developer sets -Xmx512m thinking they're matching. But JVM total memory is heap + metaspace + thread stacks + direct buffers + code cache — easily 1.5x the heap size. The container OOMKills at 512Mi even though the heap never exceeded its limit. The fix: set -Xmx to 60-75% of the container limit, not 100%.

One more pattern: sidecar containers eating memory that you're blaming on the main container. Istio's envoy proxy typically uses 50-150Mi depending on traffic. If your pod has a 256Mi limit split across the main container and a sidecar, the main container only gets what's left. Always check per-container usage with --containers flag.

By Bikram Nath · Curator · Updated June 2026

Frequently asked questions

What's the difference between OOMKilled and Evicted?

OOMKilled means the container exceeded its own memory limit — the cgroup OOM killer sent SIGKILL to the container process. Evicted means the node ran out of memory and the kubelet evicted the entire pod to protect the node. OOMKilled is a container-level limit problem; Eviction is a node-level capacity problem. Fix OOMKilled by adjusting container limits; fix Eviction by adding more nodes or reducing overall cluster memory pressure.

Why does my Java app get OOMKilled even with -Xmx set correctly?

JVM memory is more than just heap. The total includes: heap (-Xmx), metaspace, thread stacks (1MB per thread by default), direct byte buffers, code cache, and GC overhead. A JVM with -Xmx256m might use 400-500MB total. Set -Xmx to about 60-75% of the container memory limit to leave room for non-heap memory. Use -XX:+UseContainerSupport (default since JDK 10) so the JVM auto-detects cgroup limits.

Should I remove memory limits entirely to avoid OOMKilled?

No. Without limits, a single leaking container can consume all node memory, causing the kubelet to evict other pods or even crash the node. Limits protect the cluster. Instead of removing limits, set them correctly based on measured usage with appropriate headroom (1.5-2x average).

> copy_quick_fix

# Check which container was OOMKilled and its last state
kubectl describe pod <pod-name> | grep -A5 'Last State\|OOMKilled\|Limits'

# Check actual memory usage vs limit
kubectl top pod <pod-name> --containers

# Check the memory limit set on the container
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources.limits.memory}'

# Increase memory limit (quick fix)
kubectl set resources deployment <name> --limits=memory=512Mi

related kubernetes errors