Kubernetes Pod status: CrashLoopBackOff
Pod keeps crashing and restarting
Verified against Kubernetes 1.32 docs (tasks/debug/debug-application), Stack Overflow #50472960 (top answer, 700 upvotes) · Updated April 2026
> quick_fix
Your pod's main process keeps exiting. Run `kubectl logs <pod> --previous` to see the last crashed container's output — the real error is there. Common causes: wrong command, missing env var, OOMKilled, failed liveness probe.
# See the last crashed container's logs (not the current one that's being restarted)
kubectl logs <pod-name> --previous
# Describe the pod to see restart reason and events
kubectl describe pod <pod-name>
# Check resource limits
kubectl get pod <pod-name> -o yaml | grep -A 5 resourcesWhat causes this error
Kubernetes marks a pod CrashLoopBackOff when its main container has restarted multiple times and the restart back-off has grown. The kubelet intentionally slows down restart attempts (10s → 20s → 40s ... max 5min) to avoid thrashing the node. The crash reason is always in the previous container's logs or the pod events.
How to fix it
- 01
step 1
Get the previous crash logs
The current container is too fresh to have logs. Use --previous to see what the last container printed before exiting.
kubectl logs <pod-name> --previous kubectl logs <pod-name> -c <container-name> --previous # if multi-container - 02
step 2
Describe the pod for events
`kubectl describe pod` shows the events log at the bottom — look for "Back-off restarting failed container", "OOMKilled", "Liveness probe failed", or "CreateContainerConfigError".
kubectl describe pod <pod-name> | tail -40 - 03
step 3
Check resource limits vs usage
If the pod was OOMKilled, the last exit code is 137. Raise memory limit, or profile your app to reduce memory usage.
- 04
step 4
Verify the command and args
If your Dockerfile has CMD ["./start.sh"] but the image doesn't actually contain start.sh, the container exits immediately with a non-zero code. Check with `kubectl exec -it <pod> -- /bin/sh` — if exec fails because the pod is crashing, swap CMD for sleep 3600 temporarily to debug.
- 05
step 5
Check liveness probes
An aggressive liveness probe (e.g., initialDelaySeconds too low, or wrong port) can kill a healthy container. Temporarily remove the probe to test.
Why CrashLoopBackOff happens at the runtime level
The kubelet's container manager monitors each pod's main container exit code and restart count. When the container exits with non-zero status (or zero with restartPolicy=Always), kubelet schedules a restart with exponential backoff: 10s, 20s, 40s, capped at 5 minutes. The pod status transitions to CrashLoopBackOff once the backoff window starts, indicating kubelet is intentionally delaying the next attempt. The actual crash reason is in the previous container's stdout/stderr (kubectl logs --previous) or the pod events from the kubelet (kubectl describe pod), which surface OOMKilled (exit 137), liveness probe failures, image pull failures, or the application's own non-zero exits.
Common debug mistakes for CrashLoopBackOff
- Running kubectl logs <pod> without --previous, the current container is freshly started and has no logs yet, so you see empty output and assume the app is silent on crash.
- Increasing the liveness probe initialDelaySeconds without first checking startup logs, if the app crashes during init, the probe is irrelevant and longer delay just postpones the same crash.
- Setting resources.limits.memory equal to limits.requests assuming guaranteed QoS prevents OOM, the JVM, Node.js, and Go runtimes commonly need 50-100% headroom over their working set for GC, so equal limits trigger OOMKilled at first GC.
- Adding restartPolicy: OnFailure to a Deployment template, Deployments only support Always, and the kubectl apply silently rejects the change while the user assumes it took effect.
- Editing the live pod with kubectl edit instead of the Deployment, the controller immediately replaces the edited pod and the change vanishes; updates must go to the spec.template.
When CrashLoopBackOff signals a deeper problem
CrashLoopBackOff that hits multiple unrelated services on the same node usually means a node-level resource issue, not application code. Common causes: kernel OOM killer triggering on memory pressure when cgroup limits aren't configured per pod, conntrack table full causing networking failures that look like app crashes, container runtime bugs (containerd 1.6.x on certain kernels) corrupting layers and causing exec failures. The architectural fix is to add node monitoring (node-exporter, cAdvisor) that surfaces kernel events, set resource requests on every workload so the scheduler avoids overcommit, and use Pod Disruption Budgets to prevent cascading failures. Without node-level observability, every CrashLoop looks like an app bug when half are infrastructure.
Editor's take
CrashLoopBackOff hits hardest during a canary rollout at 2am when your on-call rotation is down to one engineer covering three timezones. The scenario that surfaces constantly in small platform teams: a ConfigMap change gets merged without a deployment restart, the new pods come up referencing an env var that no longer exists, and within four minutes every replica in the affected namespace is oscillating between Error and CrashLoopBackOff. The exponential backoff — 10s, 20s, 40s, up to five minutes per retry — means your readiness probes stay dark long enough to trigger a full PagerDuty storm before anyone's even opened a terminal.
Diagnosing CrashLoopBackOff cleanly is a genuine seniority marker. A junior dev will run `kubectl describe pod` and stop at the Events section, confused why there's no stack trace. A mid-level engineer knows to reach for `kubectl logs <pod> --previous` immediately — the current container has no logs because it exited before writing any. The senior move is correlating restart count against the timeline: `kubectl get pod -o jsonpath='{.status.containerStatuses[0].restartCount}'` combined with `kubectl get events --sort-by=.lastTimestamp`. Understanding the kubelet restart policy distinctions between `OnFailure` and `Always` — and why init containers don't follow the same backoff curve — separates engineers who fix it once from those who prevent it systematically.
When you're deep in a CrashLoopBackOff incident, you'll commonly surface `OOMKilled` exit code 137 in `kubectl describe`, which points upstream toward missing resource limits in the container spec. Downstream, expect `Readiness probe failed` events cascading into `Service Unavailable` at the ingress layer, and if the pod holds a leader-election lease, `context deadline exceeded` errors appearing in sibling pods waiting to acquire the lock. On nodes under pressure, you may simultaneously see `Evicted` status on lower-priority pods — a node-level symptom that's easy to misread as an unrelated problem.
By Bikram Nath · Curator · Updated April 2026
Frequently asked questions
What is a Kubernetes CrashLoopBackOff?
A pod status indicating the main container keeps exiting and kubelet is applying exponential back-off between restart attempts.
What does exit code 137 mean?
Out of memory — the pod was killed by the kubelet OOM killer because it exceeded its memory limit. Raise `resources.limits.memory` or reduce app memory usage.
What does exit code 1 mean?
A generic application error — the app exited via `sys.exit(1)` or threw an unhandled exception. Check the application logs from --previous.