Applying Backpressure to Webhook Consumers

This guide implements consumer-side backpressure for a single Python webhook receiver under load, the concrete companion to Webhook Rate Limiting & Backpressure. The scenario: a FastAPI endpoint accepts events, hands them to background workers that do slow processing (a database write, a downstream call), and during a traffic spike the workers fall behind. Without backpressure the receiver either runs out of memory buffering or silently drops events. The fix is to make saturation explicit — a bounded queue, a 429 Retry-After response that tells the sender to slow down, and pause/resume control with hysteresis. Because the producer must honor that 429 to break the loop, pair this with sender-side protection such as per-endpoint circuit breaker state machines.

Consumer backpressure with high/low water marks Incoming deliveries fill a bounded queue; above the high-water mark the receiver returns 429, below the low-water mark it resumes accepting. Sender POST events Bounded queue high-water -> 429 low-water -> resume Workers slow drain 429 + Retry-After
The receiver returns 429 once the queue crosses the high-water mark and resumes accepting only after it drains below the low-water mark.

Prerequisites

Step 1: Bound the work queue

Replace any unbounded buffer with a fixed-size asyncio.Queue. The size is a deliberate memory budget: roughly the number of in-flight events you can hold without risking the process. A bounded queue turns “we are overloaded” from a silent memory leak into an observable, actionable condition.

import asyncio

QUEUE_MAX = 1000
HIGH_WATER = int(QUEUE_MAX * 0.8)   # start shedding
LOW_WATER = int(QUEUE_MAX * 0.4)    # resume accepting

queue: asyncio.Queue = asyncio.Queue(maxsize=QUEUE_MAX)

Step 2: Signal backpressure with 429 and Retry-After

When the queue is at or above the high-water mark, reject new deliveries with 429 Too Many Requests and a Retry-After header. This is the entire point of consumer backpressure: you are telling the sender to come back later instead of accepting work you cannot process.

from fastapi import FastAPI, Request, Response

app = FastAPI()

@app.post("/webhooks")
async def receive(request: Request):
    if queue.qsize() >= HIGH_WATER or not accepting.is_set():
        # Suggest a wait proportional to the backlog drain time.
        retry_after = max(1, queue.qsize() // max(1, DRAIN_PER_SEC))
        return Response(
            status_code=429,
            headers={"Retry-After": str(retry_after)},
        )
    event = await request.json()
    try:
        queue.put_nowait(event)        # never block the request on a full queue
    except asyncio.QueueFull:
        return Response(status_code=429, headers={"Retry-After": "5"})
    return Response(status_code=202)

Use put_nowait rather than await queue.put(...): blocking the HTTP handler on a full queue holds a connection open and converts backpressure into latency. Reject fast and let the sender retry.

Step 3: Pause and resume intake with hysteresis

A single threshold causes flapping — the receiver toggles 202/429 on every event near the boundary. Use two thresholds: stop accepting at the high-water mark, and only resume once the queue drains below the low-water mark. An asyncio.Event makes the accepting/paused state explicit and cheap to check.

accepting = asyncio.Event()
accepting.set()                         # start in the accepting state
DRAIN_PER_SEC = 50                      # measured worker throughput

async def watermark_monitor():
    while True:
        depth = queue.qsize()
        if depth >= HIGH_WATER and accepting.is_set():
            accepting.clear()           # pause intake
        elif depth <= LOW_WATER and not accepting.is_set():
            accepting.set()             # resume intake
        await asyncio.sleep(0.25)

async def worker():
    while True:
        event = await queue.get()
        try:
            await process(event)        # the slow work
        finally:
            queue.task_done()

@app.on_event("startup")
async def startup():
    asyncio.create_task(watermark_monitor())
    for _ in range(8):                   # concurrency cap = worker count
        asyncio.create_task(worker())

The gap between HIGH_WATER and LOW_WATER is the hysteresis band; widen it if you observe rapid accept/reject oscillation.

Step 4: Detect slow consumers from drain rate

Backpressure is reactive; detection lets you act before the queue saturates. Track per-event dwell time (enqueue-to-dequeue) and processing latency. A rising dwell time with steady intake means workers are falling behind — the leading indicator of an impending 429 storm.

import time

dwell_samples: list[float] = []

async def worker_instrumented():
    while True:
        event = await queue.get()
        dwell = time.monotonic() - event["_enqueued_at"]
        dwell_samples.append(dwell)
        try:
            await process(event)
        finally:
            queue.task_done()

def health_snapshot() -> dict:
    recent = dwell_samples[-100:] or [0.0]
    return {
        "queue_depth": queue.qsize(),
        "accepting": accepting.is_set(),
        "p95_dwell_seconds": sorted(recent)[int(len(recent) * 0.95)],
    }

Stamp _enqueued_at = time.monotonic() when you enqueue in Step 2. Export health_snapshot() to your metrics pipeline and alert on p95_dwell_seconds trending up.

Verification / Testing

Drive the receiver faster than it drains and assert it sheds load instead of growing without bound.

import asyncio, httpx

async def flood():
    async with httpx.AsyncClient(base_url="http://localhost:8000") as c:
        codes = await asyncio.gather(*[
            c.post("/webhooks", json={"i": i}) for i in range(5000)
        ], return_exceptions=True)
    statuses = [r.status_code for r in codes if hasattr(r, "status_code")]
    assert 429 in statuses, "receiver never applied backpressure"
    assert 202 in statuses, "receiver rejected everything"
    print("202:", statuses.count(202), "429:", statuses.count(429))

asyncio.run(flood())

A passing test shows a mix of 202 and 429, and the receiver’s memory stays flat throughout the flood — confirming the bounded queue, not the heap, absorbs the spike.

Failure modes and gotchas