Designing webhook fan-out architectures
A single domain event — an order placed, a payment settled — often needs to reach many subscribers at once. This guide builds a fan-out architecture where one inbound event is delivered to every interested endpoint through per-subscriber queues and isolated workers, so one slow or failing subscriber cannot stall delivery to the others. It builds on sync vs async webhooks and complements when to use synchronous callbacks vs async webhooks, which establishes why fan-out delivery must be asynchronous in the first place.
The defining hazard of fan-out is head-of-line blocking: if all subscribers share one queue and one endpoint hangs for 30 seconds per request, every other subscriber waits behind it. Isolating each subscriber onto its own queue with its own workers removes that coupling — a dead endpoint backs up only its own lane.
Prerequisites
- Python 3.10+ with
redisandrq(Redis Queue) installed, plus a reachable Redis instance. - A subscriptions table or store mapping event types to subscriber endpoints and secrets.
- An understanding of why this work is asynchronous, covered in when to use synchronous callbacks vs async webhooks.
- A retry policy in mind; this design defers the backoff details to resilient delivery & retry strategies.
Step 1: Persist the event once and return immediately
The ingest endpoint does the minimum: validate, store the event durably, and respond 202 Accepted. It must never block on delivery, because delivery to N subscribers can take arbitrarily long.
import json
import uuid
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import redis
app = FastAPI()
r = redis.Redis(decode_responses=True)
@app.post("/events")
async def ingest(request: Request):
payload = await request.json()
event_id = str(uuid.uuid4())
event = {"id": event_id, "type": payload["type"], "body": payload}
# Store the canonical event once; deliveries reference it by id.
r.set(f"event:{event_id}", json.dumps(event))
fan_out(event) # enqueue jobs; does not perform HTTP delivery
return JSONResponse({"event_id": event_id}, status_code=202)
Step 2: Fan out to per-subscriber queues
Look up every subscriber for the event type and enqueue one delivery job per subscriber onto that subscriber’s own queue. The queue name is keyed by subscriber id, which is what gives each one an isolated lane.
from rq import Queue
def subscribers_for(event_type: str) -> list[dict]:
# In production this reads your subscriptions store.
raw = r.smembers(f"subs:{event_type}")
return [json.loads(s) for s in raw]
def fan_out(event: dict) -> None:
for sub in subscribers_for(event["type"]):
# One queue per subscriber => no shared head-of-line.
q = Queue(f"deliver:{sub['id']}", connection=r)
q.enqueue(
"delivery.deliver", # worker function path
event_id=event["id"],
subscriber=sub,
job_timeout=30, # cap a single attempt
retry=None, # retries handled explicitly below
)
Step 3: Deliver from isolated workers
The delivery function loads the stored event, signs it, and POSTs to the subscriber. Each subscriber’s queue is drained by its own worker process, so a hung endpoint consumes only its lane’s worker, never another subscriber’s.
# delivery.py
import hashlib
import hmac
import json
import httpx
import redis
r = redis.Redis(decode_responses=True)
def sign(secret: str, body: bytes) -> str:
return hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
def deliver(event_id: str, subscriber: dict) -> str:
event = json.loads(r.get(f"event:{event_id}"))
body = json.dumps(event["body"]).encode()
headers = {
"Content-Type": "application/json",
"X-Event-Id": event_id, # stable id => consumer can dedupe
"X-Signature": sign(subscriber["secret"], body),
}
state_key = f"delivery:{event_id}:{subscriber['id']}"
r.hincrby(state_key, "attempts", 1)
try:
resp = httpx.post(subscriber["url"], content=body, headers=headers, timeout=10)
resp.raise_for_status()
except httpx.HTTPError as exc:
r.hset(state_key, "status", "failed")
raise # re-raise so the queue's retry/DLQ policy can act
r.hset(state_key, "status", "delivered")
return "delivered"
Run one worker pool per subscriber lane. With RQ you point a worker at the specific queues it should drain:
# A worker dedicated to subscriber sub_42's lane.
rq worker deliver:sub_42 --url redis://localhost:6379
Step 4: Track per-subscriber delivery state
Because each subscriber is independent, delivery state is per (event_id, subscriber_id): attempts, last status, and whether it has been dead-lettered. This lets you retry or replay a single subscriber without touching the others, and lets you answer “who got event X?” precisely. Route a subscriber to a dead-letter queue only after its own attempts are exhausted; the other subscribers’ deliveries are unaffected.
Verification
A unit test should confirm that one ingested event produces exactly one job per matching subscriber, on distinct queues.
from rq import Queue
import fakeredis, json
def test_fan_out_enqueues_one_job_per_subscriber():
fake = fakeredis.FakeStrictRedis(decode_responses=True)
fake.sadd("subs:order.created",
json.dumps({"id": "a", "url": "http://a", "secret": "s"}),
json.dumps({"id": "b", "url": "http://b", "secret": "s"}))
# ... wire fan_out to `fake`, then:
event = {"id": "e1", "type": "order.created", "body": {}}
fan_out(event)
assert Queue("deliver:a", connection=fake).count == 1
assert Queue("deliver:b", connection=fake).count == 1
To prove isolation, point one subscriber at a sink that sleeps and confirm the other still receives promptly:
# Slow endpoint for sub A; healthy endpoint for sub B.
curl -fsS -X POST localhost:8000/events \
-H 'content-type: application/json' \
--data '{"type":"order.created","order_id":"ord_1"}'
# sub B's worker should mark "delivered" while sub A is still retrying.
Failure modes and gotchas
- Shared queue defeats the whole design. If you enqueue all deliveries onto one queue and merely tag them with a subscriber id, a single slow endpoint blocks the head of the line for everyone. The queue name must be per subscriber.
- Unbounded queue-per-subscriber sprawl. Tens of thousands of subscribers means tens of thousands of queues and workers. Above a few hundred, shard subscribers across a fixed pool of N queues by
hash(subscriber_id) % Nso isolation is approximate but worker count stays bounded. - Re-fetching a mutated event on retry. Workers load the event by id; if some other process mutates
event:<id>between attempts, retries deliver different bytes and break signatures and consumer dedupe. Treat the stored event as immutable once written. - Enqueue after a partial crash. If ingest stores the event but crashes before all jobs are enqueued, some subscribers never receive it. Make fan-out idempotent and re-runnable from the stored event so a sweeper can re-enqueue missing
(event_id, subscriber_id)pairs.