httpseverity: blocker

HTTP 503 Service Unavailable — server overloaded or under maintenance

Service Unavailable — server can't handle request right now

✓ 90% fixable~15 mindifficulty: intermediate

Verified against RFC 9110 §15.6.4, MDN Web Docs — 503 Service Unavailable, Nginx upstream docs · Updated June 2026

> quick_fix

HTTP 503 means the server is alive but temporarily unable to serve requests — usually because it's overloaded, restarting, or behind a load balancer that has no healthy backends. If you control the server, check process health, connection pool exhaustion, and upstream dependency availability. If you're a client, retry with exponential backoff and respect the Retry-After header if present.

# Check if the server process is running and listening
curl -sI https://example.com | head -5

# If you run Nginx, check upstream health
nginx -t && systemctl status nginx
tail -50 /var/log/nginx/error.log | grep upstream

# Check if backend processes are alive
ps aux | grep -E 'node|python|java|gunicorn|uvicorn'

# Check connection pool / open file descriptors
lsof -i -P | grep LISTEN
ss -tlnp

What causes this error

A 503 response means the server is reachable at the network level but cannot process the request right now. Common causes: the application process crashed or is restarting; a load balancer has no healthy backend targets; the server hit a connection pool limit (database connections, worker threads, file descriptors); a deployment is rolling and all old pods terminated before new ones passed health checks; a rate limiter or circuit breaker activated; or the server is in scheduled maintenance mode. Unlike a 500 (internal error), a 503 explicitly signals the condition is temporary.

> advertisementAdSense placeholder

How to fix it

step 1

Check if the application process is running

The most common cause is simply that the app isn't running. Check process status, memory usage, and recent crash logs.

# Node.js / PM2
pm2 status
pm2 logs --lines 50

# Systemd service
systemctl status myapp
journalctl -u myapp --since '5 minutes ago'

# Docker
docker ps -a | grep myapp
docker logs myapp --tail 50

step 2

Check load balancer / reverse proxy health

If you use Nginx, HAProxy, or a cloud load balancer, the 503 often comes from the proxy — not the app. Look for 'no live upstreams' or 'upstream prematurely closed connection' in proxy logs.

# Nginx upstream check
tail -100 /var/log/nginx/error.log | grep -i 'upstream\|502\|503'

# AWS ALB — check target group health
aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:...

# HAProxy stats
echo 'show stat' | socat unix:/var/run/haproxy.sock -

03
step 3
Check resource exhaustion (connections, memory, file descriptors)
503s spike when the server runs out of a finite resource. Database connection pools, worker threads, and OS file descriptor limits are the usual suspects.
```
# Check open connections to your DB
psql -c "SELECT count(*) FROM pg_stat_activity;"

# Check OS file descriptor limit vs usage
ulimit -n
ls /proc/$(pgrep -f myapp)/fd | wc -l

# Check memory
free -m
top -bn1 | head -20
```

step 4

Handle rolling deployments properly

During a deployment, old pods terminate before new ones are ready. Configure your orchestrator to keep at least one old pod alive until a new pod passes its health check.

# Kubernetes — ensure zero-downtime deploys
apiVersion: apps/v1
kind: Deployment
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  template:
    spec:
      containers:
      - name: app
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

How to verify the fix

curl -sI https://your-domain.com returns 200 (not 503).
Load balancer target group shows all targets as 'healthy'.
Application logs show no OOM kills or connection pool exhaustion.
A rolling deployment completes without any 503 responses to real users.

Why 503 happens at the runtime level

At the protocol level, 503 is the server's way of saying 'I'm alive but temporarily unable to process your request.' The server's TCP listener accepted the connection (otherwise you'd get a connection refused or timeout), and the HTTP layer generated a valid response — but the application layer decided it cannot fulfill the request right now. This differs from 500 (unexpected crash) and 502 (proxy got garbage from upstream). The 503 contract implies the condition is temporary, which is why the spec defines a Retry-After header specifically for this status code.

Common debug mistakes for 503

Assuming the app crashed when it's actually the load balancer returning 503 because health checks are failing — the app may be running fine but on the wrong port or path.
Restarting the server without checking why it went down — if the cause is memory exhaustion or connection pool leak, restarting buys you minutes before the same 503 returns.
Setting aggressive health check intervals (every 1 second) that themselves consume server resources and contribute to the overload causing the 503.
Ignoring the Retry-After header on the client side and hammering the server with rapid retries, making the overload worse.
Confusing a CDN-generated 503 (Cloudflare, CloudFront) with an origin 503 — the fix is completely different depending on which layer generated the response.

When 503 signals a deeper problem

When 503s appear intermittently under normal traffic, the server is operating near its capacity ceiling. The immediate fix (restart, scale up) buys time, but the architectural question is why the server can't handle this load. Common underlying issues: synchronous blocking calls that tie up worker threads while waiting for slow dependencies, missing connection pooling that opens a new database connection per request, no circuit breaker on external API calls that hang for 30 seconds before timing out, or autoscaling rules that react too slowly to traffic spikes. A single 503 is an incident; recurring 503s under predictable load are a capacity planning problem.

Editor's take

503 is the most misdiagnosed HTTP error I see in production incidents. Teams spend hours debugging the application when the actual problem is two layers up — the load balancer's health check is hitting a path that returns 404, or the new deployment changed the listening port from 3000 to 8080 but nobody updated the target group.

The first thing I check on any 503 is: who generated it? Read the response body. Nginx, Cloudflare, AWS ALB, and your application all generate different 503 pages. If it's the proxy, your app logs won't show anything because the request never reached the app. If it's your app returning 503 deliberately (maintenance mode, rate limiting), that's a different diagnosis entirely.

The second pattern I see: 503s during deployments. Kubernetes defaults to killing old pods before new ones are ready (maxUnavailable: 25%). If your app takes 30 seconds to start and you have 4 pods, a deploy briefly leaves you with 3 pods — and if traffic spikes during that window, you get cascading 503s. Set maxUnavailable to 0 and configure a proper readiness probe.

Knowing how to diagnose 503s quickly is what separates an SRE from someone who just restarts services. The error itself is simple — the diagnostic path is where expertise shows.

By Bikram Nath · Curator · Updated June 2026

Frequently asked questions

What is the difference between HTTP 502 and 503?

502 Bad Gateway means the proxy received an invalid response from the upstream server (the app responded with garbage or closed the connection mid-response). 503 Service Unavailable means the proxy couldn't reach any healthy upstream at all — the app is down, overloaded, or hasn't passed its health check yet. 502 = app responded wrong; 503 = app didn't respond at all.

Should I retry automatically on a 503?

Yes, 503 is explicitly designed to be retried. Use exponential backoff (e.g. 1s, 2s, 4s, 8s) and check the Retry-After response header if present — the server may tell you exactly when to try again. Cap retries at 3-5 attempts to avoid hammering a server that's already overloaded.

Can a CDN or WAF cause a 503?

Yes. Cloudflare returns 503 when its origin server is unreachable (you'll see a Cloudflare-branded error page). AWS CloudFront returns 503 when the origin times out or when a Lambda@Edge function fails. Check the response body — CDN 503 pages usually identify themselves.

> copy_quick_fix

# Check if the server process is running and listening
curl -sI https://example.com | head -5

# If you run Nginx, check upstream health
nginx -t && systemctl status nginx
tail -50 /var/log/nginx/error.log | grep upstream

# Check if backend processes are alive
ps aux | grep -E 'node|python|java|gunicorn|uvicorn'

# Check connection pool / open file descriptors
lsof -i -P | grep LISTEN
ss -tlnp

related http errors