API Gateway

The Problem Without One

When every microservice handles its own authentication, rate limiting, and request routing independently, you replicate the same logic across dozens of services. A new auth token format requires changes in 20 codebases. A rate limiting bug affects every team differently. A client making 6 calls to 6 services to render one page absorbs 6 round-trip latencies. None of this is a scalability problem in the database sense. It is a coordination problem: cross-cutting concerns belong at the edge, not scattered through the interior.

How It Works

An API gateway sits between external clients and internal services. Every inbound request passes through it before touching a backend. At the gateway layer, the system performs:

Auth offloading: the gateway validates JWT tokens or API keys and forwards only authenticated requests downstream. Services receive a verified identity header and never handle raw credentials. A single gateway update propagates auth policy changes instantly across all services.

Request routing: the gateway maps URL paths or hostnames to backend service addresses. /api/payments/* routes to the payments service; /api/orders/* routes to the orders service. This is functionally similar to a reverse proxy, but a gateway adds programmable logic on top of simple forwarding.

Rate limiting at the edge: enforcing quotas at the gateway is more efficient than at each service. One in-memory or Redis-backed counter tracks requests per API key, per IP, or per tenant. Requests that exceed the quota are rejected with a 429 before they consume any backend resources.

Request aggregation: a gateway can fan out a single client call to multiple backend services, merge the responses, and return one payload. This is called the Backend for Frontend (BFF) pattern, and it is particularly useful for mobile clients that cannot afford multiple sequential calls.

Reverse Proxy vs Gateway

A reverse proxy forwards requests and optionally terminates TLS. An API gateway does all of that plus adds auth, rate limiting, transformation, and observability as programmable middleware. Nginx and HAProxy are reverse proxies. Kong, AWS API Gateway, and Apigee are API gateways. The distinction matters when scoping what belongs in each layer.

Gateway vs Service Mesh

A gateway operates at the north-south boundary: traffic between external clients and internal services. A service mesh operates east-west: traffic between internal services. They are complementary, not alternatives. A typical production stack uses both: the gateway handles external entry, the mesh handles inter-service reliability.

Observability at the Edge

Because every request passes through the gateway, it is the natural place to collect latency histograms, error rates, and request counts per route and per client. A gateway emitting metrics to Prometheus or Datadog gives you a unified view of API surface performance without instrumenting each service independently. Request IDs injected at the gateway propagate through downstream services, enabling distributed trace correlation without requiring each team to generate their own trace context.

Protocol Translation

Some gateways handle protocol translation: a client sends HTTP/1.1, the gateway translates to gRPC for internal services that use it. Or a mobile client speaks WebSocket, and the gateway converts to HTTP/2 streams internally. This keeps internal service protocols decoupled from external client constraints. Services can adopt gRPC for its binary efficiency and streaming capabilities without requiring clients to support it.

When Not to Use One

A gateway adds a network hop, typically 1-5ms depending on co-location. If all services are internal and latency budgets are tight, a gateway in the critical path is a liability. For very small deployments (two or three services), the operational overhead of managing a gateway exceeds the benefit. A single monolith has no need for one at all. The gateway also becomes a single point of failure: it must be deployed with high availability (multiple instances behind a load balancer) and sized for the full request volume of the platform.

Interview Tip

The question that separates strong answers: "Where do you enforce rate limiting, at the gateway or at each service?" The right answer is both, with different semantics. The gateway enforces global quotas per client (protecting the entire platform). Each service enforces its own limits for self-protection against internal callers. Candidates who say "just the gateway" miss that a compromised internal service or a misconfigured mesh can still flood a backend if it has no self-defense. Candidates who say "just each service" miss that per-service enforcement cannot aggregate across services to enforce a platform-level tenant quota. The L5+ addition: the gateway rate limiter must be backed by a shared store (Redis) when multiple gateway instances run in parallel, otherwise each instance maintains independent counters and the effective limit is multiplied by the number of instances.