httpseverity: can-fix
429

HTTP 429 Too Many Requests — What It Means and How to Fix It

429 Too Many Requests

95% fixable~15 mindifficulty: intermediate

Verified against RFC 6585 §4 (429 Too Many Requests), RFC 7231 §7.1.3 (Retry-After header), OpenAI Rate Limits documentation · Updated June 2026

> quick_fix

You've sent too many requests in too short a time. Check the `Retry-After` and `X-RateLimit-*` response headers to see when the window resets, then back off and retry. Never hammer an endpoint after a 429 — it worsens the situation and may trigger an IP ban.

# Read rate limit headers from the response
curl -si https://api.example.com/endpoint | grep -i 'x-ratelimit\|retry-after\|x-rate'
# x-ratelimit-limit: 100
# x-ratelimit-remaining: 0
# x-ratelimit-reset: 1748901600   (Unix timestamp)
# retry-after: 30                 (seconds to wait)

What causes this error

HTTP 429 is returned when a client exceeds the server's rate limit — the maximum number of requests allowed in a time window (e.g., 100 requests per minute per API key). Rate limits operate at different levels: per IP, per API key, per user, per endpoint, or globally across the service. Third-party APIs (OpenAI, Stripe, Twilio) have tiered limits by plan.

> advertisementAdSense placeholder

How to fix it

  1. 01

    step 1

    Read the Retry-After header and wait before retrying

    RFC 6585 defines `Retry-After` as either a number of seconds to wait or an HTTP-date timestamp. Never retry immediately after a 429 — it guarantees more 429s. Wait for the window to reset.

    async function fetchWithRateLimit(url, options) {
      const res = await fetch(url, options)
      if (res.status === 429) {
        const retryAfter = res.headers.get('retry-after')
        const waitMs = retryAfter ? parseInt(retryAfter) * 1000 : 60_000
        console.log(`Rate limited. Waiting ${waitMs}ms...`)
        await new Promise(r => setTimeout(r, waitMs))
        return fetchWithRateLimit(url, options)  // retry once
      }
      return res
    }
  2. 02

    step 2

    Implement exponential backoff with jitter

    Exponential backoff prevents thundering herd: all clients retrying at the same time and re-triggering the rate limit. Add random jitter to spread retries across the window.

    async function fetchWithBackoff(url, options, attempt = 0) {
      const res = await fetch(url, options)
      if (res.status === 429 && attempt < 5) {
        const base = Math.min(1000 * 2 ** attempt, 30_000)  // cap at 30s
        const jitter = Math.random() * 1000                 // up to 1s random
        await new Promise(r => setTimeout(r, base + jitter))
        return fetchWithBackoff(url, options, attempt + 1)
      }
      return res
    }
  3. 03

    step 3

    Track X-RateLimit-Remaining and slow down proactively

    Don't wait for a 429. Read `X-RateLimit-Remaining` on every response. When it approaches zero, add a delay before the next request. This avoids hitting the limit in the first place.

    async function apiCall(url) {
      const res = await fetch(url, { headers: { Authorization: `Bearer ${API_KEY}` } })
      const remaining = parseInt(res.headers.get('x-ratelimit-remaining') ?? '100')
      const reset = parseInt(res.headers.get('x-ratelimit-reset') ?? '0')
      if (remaining < 5) {
        const msUntilReset = (reset * 1000) - Date.now()
        if (msUntilReset > 0) await new Promise(r => setTimeout(r, msUntilReset))
      }
      return res
    }
  4. 04

    step 4

    Batch requests or use bulk endpoints

    If you're calling an API in a loop (e.g., enriching 10,000 records one by one), switch to bulk endpoints. Most APIs that rate-limit individual calls offer a batch endpoint that processes many items per request, dramatically reducing call count.

    // Instead of 10,000 individual calls:
    for (const id of ids) await fetch(`/api/users/${id}`)  // 10,000 requests
    
    // Use a bulk endpoint:
    const chunks = chunk(ids, 100)  // split into groups of 100
    for (const group of chunks) {
      await fetch('/api/users/batch', { method: 'POST', body: JSON.stringify({ ids: group }) })
      await new Promise(r => setTimeout(r, 1000))  // pace between batches
    }
  5. 05

    step 5

    Cache responses to reduce repeat calls

    Many 429 situations arise from fetching the same data repeatedly. Add a short-lived cache (even 60 seconds) for idempotent GET requests. For user-facing apps, deduplicate concurrent requests with a request deduplication layer.

    const cache = new Map()
    async function cachedFetch(url, ttlMs = 60_000) {
      const cached = cache.get(url)
      if (cached && Date.now() - cached.ts < ttlMs) return cached.data
      const data = await fetch(url).then(r => r.json())
      cache.set(url, { data, ts: Date.now() })
      return data
    }

How to verify the fix

  • `X-RateLimit-Remaining` stays above 0 during normal operation.
  • Retry logic correctly waits for `Retry-After` before retrying.
  • No burst of parallel requests — use a queue with concurrency limit.

Why 429 happens at the runtime level

HTTP 429 is defined in RFC 6585. Rate limiting is implemented at multiple layers: application-level (middleware counting requests per API key in Redis), reverse-proxy level (Nginx `limit_req` module using a leaky bucket algorithm), and CDN level (Cloudflare, AWS WAF). The algorithm used matters: token bucket allows bursting; leaky bucket enforces a steady rate; sliding window log is most accurate but memory-expensive. The `Retry-After` header value reflects when the current window resets or how many seconds until a new token is available.

Common debug mistakes for 429

  • Retrying immediately after a 429 with no delay — guarantees more 429s and can trigger extended bans.
  • Not reading `X-RateLimit-Remaining` proactively — waiting for a 429 instead of throttling before hitting the limit.
  • Firing requests in parallel without a concurrency limit — 10 simultaneous requests can exhaust a 100/minute limit in 6 seconds.
  • Using multiple API keys on the same account to bypass limits — violates ToS and is detectable by IP.
  • Ignoring `Retry-After` headers on webhook delivery retries — external services have retry budgets and will stop retrying after N failures.

When 429 signals a deeper problem

A sustained 429 rate in production typically means the application architecture is making too many API calls per user action. If each page load triggers 20 API calls, a small user base can exhaust enterprise-tier rate limits. The architectural fix is a backend-for-frontend (BFF) pattern that aggregates multiple API calls into one, combined with aggressive caching at the BFF layer. For background jobs, a queue-based worker system with a configurable `requestsPerSecond` ceiling prevents bursts entirely. Rate limit headers should be treated as first-class metrics in your observability platform.

Editor's take

The 429 becomes a production incident when a batch job that runs fine in staging — against a small dataset — runs against the full production dataset for the first time and fires 50,000 requests in two minutes. The AI enrichment pipeline that adds sentiment scores to user posts, the nightly sync job that validates 200,000 addresses against a postal API, the data migration that fetches updated pricing for every catalog item — they all work perfectly in development and explode on first production run.

The engineering discipline here is rate-limit-first thinking: before writing any code that calls an external API in a loop, look up the rate limit and calculate whether your batch volume fits within the window. 100 requests/minute for 50,000 items is 500 minutes = 8 hours. That calculation takes 30 seconds and determines whether you need to spread the job over a week or negotiate a burst allowance with the API provider before a single line of code is written.

The adjacent error seen in the same runbook: 503s from the destination API during a retry storm. When every client in a fleet retries at the same time after a 429, they collectively overwhelm the server's retry capacity, causing 503s that then trigger more retries. Exponential backoff with jitter isn't a nice-to-have — it's the mechanism that prevents this. In distributed systems, uncoordinated retries are a common cause of cascading failures that look like infrastructure issues but are actually client misbehavior.

By Bikram Nath · Curator · Updated June 2026

Frequently asked questions

What is the difference between rate limiting and throttling?

Rate limiting sets a hard cap on requests in a window — exceeding it returns a 429 immediately. Throttling slows requests down to a maximum throughput — the server accepts the request but queues or delays processing it, often returning 503 temporarily. Rate limiting is the client's problem to handle; throttling is transparent to the client.

Can rate limits apply differently to different endpoints?

Yes. Most APIs have separate rate limit buckets per endpoint type. OpenAI has per-model limits (tokens per minute) separate from requests per minute. Stripe has limits on charge creation separate from customer reads. Always read the API's rate limit documentation for each endpoint group, not just the global limit.

My app uses multiple API keys — can I spread requests across keys?

This violates most API terms of service and is called 'token farming' or 'key rotation abuse.' APIs track usage by account, not just key — they can detect and ban this pattern. The legitimate fix is requesting a higher rate limit tier from the provider or using an official enterprise plan.

disclosure:Errordex runs AdSense, has zero third-party affiliate or sponsored links, and occasionally links to the editor’s own paid digital products (clearly labelled). Every fix is cross-referenced against the official sources listed in the “sources” sidebar before it ships. If a fix here didn’t work for you, please email so we can update the page.