Infrastructure

Rate Limiting

Definition & meaning

Definition

Rate Limiting is a technique that controls the number of requests a user or IP address can make to an API or service within a specific time window. It protects servers from abuse, prevents denial-of-service attacks, ensures fair resource allocation among users, and manages infrastructure costs. Common implementations include fixed window (X requests per minute), sliding window (smoothed request counting), and token bucket (burst-friendly with sustained limits) algorithms. Rate limiting is typically implemented at the API gateway, load balancer, or application level. For public APIs, rate limits are documented and often tiered by pricing plan (free tier: 60 req/min, paid: 1000 req/min). Modern platforms like Vercel, Cloudflare, and Supabase provide built-in rate limiting, while custom implementations often use in-memory stores (Map, Redis) for fast request counting.

How It Works

Rate limiting controls how many requests a client can make to an API or service within a defined time window. Implementation typically uses algorithms like the token bucket (tokens replenish at a fixed rate; each request consumes one), sliding window (tracks requests within a rolling time period), or fixed window (resets the counter at regular intervals). When a client exceeds the limit, the server responds with HTTP 429 (Too Many Requests) and often includes a Retry-After header indicating when the client can try again. Rate limits are enforced per identifier — usually an API key, IP address, or authenticated user ID. Distributed systems implement rate limiting using shared state stores like Redis, which track request counts across multiple server instances. Rate limits are typically communicated through response headers: X-RateLimit-Limit (maximum requests), X-RateLimit-Remaining (requests left), and X-RateLimit-Reset (when the window resets). Different endpoints often have different limits based on their computational cost.

Why It Matters

Rate limiting is essential for API security, stability, and fair resource allocation. Without it, a single misbehaving client — whether malicious or buggy — can overwhelm your servers and degrade service for everyone. For developers building applications that consume APIs, understanding rate limits is critical for designing resilient integrations that handle 429 responses gracefully with exponential backoff. For builders operating APIs, rate limiting protects infrastructure costs, prevents abuse, and enables tiered pricing models where higher-paying customers get higher limits. Rate limiting also plays a defensive role against brute-force attacks, credential stuffing, and DDoS attempts at the application layer.

Real-World Examples

GitHub's API enforces 5,000 requests per hour for authenticated users and 60 per hour for unauthenticated requests. OpenAI's API rate limits vary by model and tier — hitting limits on GPT-4 is a common developer pain point. Stripe allows 100 read requests per second in live mode. At ThePlanetTools.ai, we document API rate limits in our tool reviews because they directly affect what you can build. Cloudflare's Rate Limiting rules let you define custom thresholds per URL pattern. Supabase implements rate limiting on its API endpoints to protect shared infrastructure on free-tier projects. Redis is the most common backend for distributed rate limiting, with libraries like rate-limiter-flexible for Node.js making implementation straightforward. Kong and nginx also offer built-in rate limiting modules.

Related Terms

Serverless

Infrastructure

Cloud model where providers manage servers — devs deploy functions on-demand.

Caching

Infrastructure

Storing data copies in fast-access locations to avoid repeating expensive operations.

Load Balancing

Infrastructure

Distributing traffic across multiple servers to prevent overload.

API

Development

Rules and protocols enabling software applications to communicate.

Back to Glossary