Webhook Rate Limiting and Backpressure
Sustaining Resilient Delivery & Retry Strategies at scale means a dispatcher must throttle itself: rate limiting caps how fast it offers events to a consumer, while backpressure is the feedback that slows the producer when the consumer cannot keep up. Without both, a healthy spike in events becomes a self-inflicted outage — the dispatcher saturates a slow endpoint, retries pile on top of the original load, and a recoverable slowdown cascades into total failure. This page covers the two halves together in Python: a token bucket governs the steady-state send rate, concurrency caps and queue-depth signals convert downstream slowness into producer-side slowdown, and HTTP 429 Retry-After responses let the consumer assert its own limit.
Token Bucket Rate Limiting
A token bucket is the workhorse for outbound dispatch because it permits short bursts while bounding the long-run average. The bucket holds up to capacity tokens and refills at rate tokens per second; each send consumes one token, and an empty bucket forces the dispatcher to wait. Tuning is two-dimensional: rate matches the consumer’s documented sustained throughput, while capacity controls how large a burst you allow before the average reasserts itself. A capacity equal to one second of rate behaves almost like a fixed-window limiter; a larger capacity tolerates spiky producers without dropping below the consumer’s ceiling.
The alternatives trade smoothness for simplicity. A fixed-window counter (N requests per minute) is trivial but admits double-rate bursts at window boundaries. A leaky bucket enforces a perfectly smooth output rate with no burst allowance, which is ideal for strict per-endpoint contracts but wasteful when the consumer could absorb occasional spikes. For multi-instance dispatchers, the bucket state must be shared — a Redis-backed token bucket (often a single atomic Lua script) keeps the global rate correct no matter how many workers draw from it, the same coordination pattern used for nonce-based replay protection state.
Concurrency Caps and Queue-Depth Signals
Rate limiting bounds how often you start a send; a concurrency cap bounds how many are in flight at once. The two are distinct controls and you need both: a generous rate with no concurrency cap lets a sudden batch of slow responses open thousands of simultaneous connections and exhaust sockets and memory. An asyncio.Semaphore (or a fixed worker pool) caps in-flight deliveries; sizing it to the consumer’s connection limit is what actually protects a slow endpoint.
Queue depth is the primary backpressure signal. When dispatch work lands in a bounded queue, a full queue means consumers are draining slower than producers are filling — the signal to push back arrives automatically as the enqueue operation blocks or fails. An unbounded queue hides this until the process runs out of memory, converting a slow consumer into a crash. Watch the high-water mark: when depth crosses a threshold, slow the producer (shed, pause, or delay intake) rather than letting latency grow without bound. This shares the failure-isolation goal of Circuit Breaker Patterns, but where a breaker stops traffic on errors, backpressure modulates traffic on saturation.
Honoring 429 and Retry-After
A well-behaved consumer asserts its own limit by returning HTTP 429 with a Retry-After header. Treating 429 as a generic failure is a common and damaging bug: feeding it into Exponential Backoff Algorithms ignores the explicit instruction the consumer just gave you. Honor Retry-After precisely — wait exactly that long — and only fall back to exponential backoff with jitter when the header is absent. Critically, a 429 should also lower the bucket’s own rate for that endpoint, so the next burst does not immediately re-trigger the limit. Apply adaptive concurrency: on repeated 429s, shrink the in-flight cap; on a clean streak, grow it back toward the ceiling.
Failure Mode Analysis & Mitigation
| Failure Mode | Impact | Mitigation Strategy |
|---|---|---|
| Unbounded dispatch queue | Slow consumer drives the producer out of memory and crashes it | Use a bounded queue; treat a full queue as the backpressure signal |
| 429 fed into blind retry | Dispatcher ignores the consumer’s stated limit and amplifies the overload | Honor Retry-After exactly; lower the per-endpoint token rate on 429 |
| Rate cap without concurrency cap | A burst of slow responses opens thousands of connections, exhausting sockets | Add an asyncio.Semaphore sized to the consumer’s connection limit |
| Per-instance rate limits | N workers each enforce the limit, so the global rate is N× the target | Share token-bucket state in Redis with an atomic refill-and-take script |
| Retry storm on recovery | All paused deliveries fire at once when the consumer returns | Drain through the same token bucket and add jitter to release timing |
Runnable Implementation Example
This async dispatcher combines a token bucket, a concurrency cap, a bounded queue, and 429 handling.
import asyncio
import time
import httpx
class TokenBucket:
def __init__(self, rate: float, capacity: float):
self.rate = rate
self.capacity = capacity
self._tokens = capacity
self._updated = time.monotonic()
self._lock = asyncio.Lock()
async def acquire(self) -> None:
async with self._lock:
while True:
now = time.monotonic()
# Refill based on elapsed wall-clock time.
self._tokens = min(
self.capacity,
self._tokens + (now - self._updated) * self.rate,
)
self._updated = now
if self._tokens >= 1:
self._tokens -= 1
return
# Sleep just long enough for one token to accrue.
await asyncio.sleep((1 - self._tokens) / self.rate)
def throttle(self, factor: float = 0.5) -> None:
"""Shrink the steady-state rate after a 429."""
self.rate = max(1.0, self.rate * factor)
class Dispatcher:
def __init__(self, rate: float, max_inflight: int, queue_size: int):
self.bucket = TokenBucket(rate, capacity=rate)
self.sem = asyncio.Semaphore(max_inflight) # concurrency cap
self.queue: asyncio.Queue = asyncio.Queue(maxsize=queue_size) # bounded
self.client = httpx.AsyncClient(timeout=10.0)
async def submit(self, event: dict) -> None:
# A full queue is the backpressure signal: block the producer here.
await self.queue.put(event)
async def _send(self, event: dict) -> None:
await self.bucket.acquire() # rate limit
async with self.sem: # concurrency cap
resp = await self.client.post(event["url"], json=event["body"])
if resp.status_code == 429:
# Honor the consumer's explicit limit, then lower our rate.
delay = float(resp.headers.get("Retry-After", "5"))
self.bucket.throttle()
await asyncio.sleep(delay)
await self.queue.put(event) # requeue for another pass
async def run(self, workers: int = 4) -> None:
async def worker():
while True:
event = await self.queue.get()
try:
await self._send(event)
finally:
self.queue.task_done()
await asyncio.gather(*(worker() for _ in range(workers)))
Operational Workflows & CI/CD Integration
Export the levers as runtime configuration, not constants: per-endpoint rate, capacity, max_inflight, and queue_size should be tunable without a redeploy so operators can throttle a misbehaving integration in seconds. Emit queue depth, token-bucket fill level, in-flight count, and 429-rate as metrics, and alert when queue depth trends toward its bound — that is the leading indicator of an impending backlog, well before latency SLOs are breached. In load tests, drive the dispatcher against a deliberately slow endpoint and assert the queue stays bounded and memory stays flat; a test that only exercises the happy path will never catch an unbounded-queue regression.
Debugging Checklist
- Confirm the dispatch queue has a
maxsizeand that a full queue blocks or sheds the producer. - Verify 429 responses read
Retry-Afterand lower the per-endpoint rate, rather than entering blind retry. - Check that a concurrency cap exists independently of the rate limit.
- For multi-instance dispatchers, confirm token-bucket state is shared (Redis), not per-process.
- Ensure paused or requeued deliveries drain through the bucket with jitter, not all at once.
- Validate that queue depth and bucket fill level are exported as metrics with alerts.
Related
- Circuit Breaker Patterns — stop traffic on errors, the complement to throttling on saturation.
- Exponential Backoff Algorithms — the fallback when no
Retry-Afteris supplied. - Applying backpressure to webhook consumers — detecting slow consumers and pausing intake step by step.
- Resilient Delivery & Retry Strategies — the broader resilience context.