What happens to fan-out when a subscriber is deleted mid-flight?

Jobs already queued for that subscriber will still fire unless the worker re-reads the subscription before delivering. Look the subscriber up by id at delivery time and treat a missing or disabled record as a terminal, non-retryable outcome, otherwise a deleted endpoint keeps receiving events until its lane drains.

Designing webhook fan-out architectures

Q: Should the job carry the payload or just the event id?

Carry the id. A 12-byte reference keeps a 500,000-job backlog in the tens of megabytes instead of gigabytes, and it guarantees that a retry three hours later signs exactly the bytes that were stored at ingest. The one cost is a read per attempt, which is cheap next to an HTTP round trip.

Q: How do we stop a huge fan-out from starving small subscribers?

Bound the enqueue burst and reserve capacity. Fanning one event to 20,000 subscribers inside a request handler blocks ingest for seconds, so chunk the enqueue into batches and hand the tail to a background job. Reserving a minimum worker slice per shard keeps a low-volume subscriber from waiting behind a mass broadcast.

Q: Where should the retry policy live: the queue or the delivery code?

In the delivery layer, expressed as data on the delivery record. Queue-level retry defaults are invisible during an incident and differ between brokers, whereas an explicit attempt count and next-attempt timestamp on the record can be inspected, paused per subscriber, and replayed by an operator without redeploying anything.

A single domain event — an order placed, a payment settled — often needs to reach many subscribers at once. This guide builds a fan-out architecture where one inbound event is delivered to every interested endpoint through per-subscriber queues and isolated workers, so one slow or failing subscriber cannot stall delivery to the others. It builds on sync vs async webhooks and complements when to use synchronous callbacks vs async webhooks, which establishes why fan-out delivery must be asynchronous in the first place.

The defining hazard of fan-out is head-of-line blocking: if all subscribers share one queue and one endpoint hangs for 30 seconds per request, every other subscriber waits behind it. Isolating each subscriber onto its own queue with its own workers removes that coupling — a dead endpoint backs up only its own lane.

One stored event fans out to a dedicated queue and worker per subscriber, so a stalled endpoint only backs up its own lane.

Prerequisites

Python 3.10+ with redis and rq (Redis Queue) installed, plus a reachable Redis instance.
A subscriptions table or store mapping event types to subscriber endpoints and secrets.
An understanding of why this work is asynchronous, covered in when to use synchronous callbacks vs async webhooks.
A prior decision that push delivery is the right transport for these subscribers; webhooks vs polling vs WebSockets covers when a pull model costs less to run.
A retry policy in mind; this design defers the backoff details to resilient delivery & retry strategies.

Step 1: Persist the event once and return immediately

The ingest endpoint does the minimum: validate, store the event durably, and respond 202 Accepted. It must never block on delivery, because delivery to N subscribers can take arbitrarily long.

import json
import uuid
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import redis

app = FastAPI()
r = redis.Redis(decode_responses=True)

@app.post("/events")
async def ingest(request: Request):
    payload = await request.json()
    event_id = str(uuid.uuid4())
    event = {"id": event_id, "type": payload["type"], "body": payload}
    # Store the canonical event once; deliveries reference it by id.
    r.set(f"event:{event_id}", json.dumps(event))
    fan_out(event)  # enqueue jobs; does not perform HTTP delivery
    return JSONResponse({"event_id": event_id}, status_code=202)

Step 2: Fan out to per-subscriber queues

Look up every subscriber for the event type and enqueue one delivery job per subscriber onto that subscriber’s own queue. The queue name is keyed by subscriber id, which is what gives each one an isolated lane.

from rq import Queue

def subscribers_for(event_type: str) -> list[dict]:
    # In production this reads your subscriptions store.
    raw = r.smembers(f"subs:{event_type}")
    return [json.loads(s) for s in raw]

def fan_out(event: dict) -> None:
    for sub in subscribers_for(event["type"]):
        # One queue per subscriber => no shared head-of-line.
        q = Queue(f"deliver:{sub['id']}", connection=r)
        q.enqueue(
            "delivery.deliver",          # worker function path
            event_id=event["id"],
            subscriber=sub,
            job_timeout=30,              # cap a single attempt
            retry=None,                  # retries handled explicitly below
        )

The queue name is the whole isolation mechanism, so it is worth being explicit about what each part of it does and what the job actually carries. The job holds an event id rather than the event body: the payload stays in one place, and a queued job that is retried tomorrow still signs exactly the bytes that were stored today.

The subscriber id in the queue name is what buys isolation; everything else in the job is deliberately small so a backlog costs queue memory rather than payload copies.

Step 3: Deliver from isolated workers

The delivery function loads the stored event, signs it, and POSTs to the subscriber. Each subscriber’s queue is drained by its own worker process, so a hung endpoint consumes only its lane’s worker, never another subscriber’s.

# delivery.py
import hashlib
import hmac
import json
import httpx
import redis

r = redis.Redis(decode_responses=True)

def sign(secret: str, body: bytes) -> str:
    return hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()

def deliver(event_id: str, subscriber: dict) -> str:
    event = json.loads(r.get(f"event:{event_id}"))
    body = json.dumps(event["body"]).encode()
    headers = {
        "Content-Type": "application/json",
        "X-Event-Id": event_id,                 # stable id => consumer can dedupe
        "X-Signature": sign(subscriber["secret"], body),
    }
    state_key = f"delivery:{event_id}:{subscriber['id']}"
    r.hincrby(state_key, "attempts", 1)
    try:
        resp = httpx.post(subscriber["url"], content=body, headers=headers, timeout=10)
        resp.raise_for_status()
    except httpx.HTTPError as exc:
        r.hset(state_key, "status", "failed")
        raise  # re-raise so the queue's retry/DLQ policy can act
    r.hset(state_key, "status", "delivered")
    return "delivered"

Run one worker pool per subscriber lane. With RQ you point a worker at the specific queues it should drain:

# A worker dedicated to subscriber sub_42's lane.
rq worker deliver:sub_42 --url redis://localhost:6379

Two settings decide whether that worker behaves under stress. The first is the relationship between job_timeout and the HTTP timeout: the job deadline must be comfortably larger, because it has to cover DNS resolution, the TLS handshake, the request itself, and the bookkeeping around it. A sensible default is roughly three times the HTTP timeout — timeout=10 with job_timeout=30. Invert them and the worker is killed mid-request, which leaves the delivery record stuck in sending forever with no exception ever reaching the except branch. The observable symptom is a lane whose attempt counter climbs while its status field never changes, and it is easy to misread as a consumer problem when it is entirely self-inflicted.

The second is connection reuse. Constructing httpx.post(...) per delivery opens a fresh TCP connection and repeats the TLS handshake every time, which adds 80–150 ms to every attempt and accumulates sockets in TIME_WAIT — at a few hundred deliveries per second per host you will exhaust the ephemeral port range and start seeing connection failures that look like the subscriber refusing traffic. Build one httpx.Client per worker process at startup, give it a keep-alive pool sized to the lane’s concurrency, and reuse it for every job. On a warm connection the same delivery often completes in a third of the time, which directly reduces how long a slow subscriber occupies its lane.

The payoff shows up on a time axis. With job_timeout=30 on a subscriber whose endpoint hangs, that lane burns its full timeout and then retries, while the other lanes are already finished — the same event, three completely different completion times.

The stalled subscriber consumes only its own worker for the full timeout, so the healthy subscribers see delivery latency measured in milliseconds rather than in timeouts.

Step 4: Track per-subscriber delivery state

Because each subscriber is independent, delivery state is per (event_id, subscriber_id): attempts, last status, and whether it has been dead-lettered. This lets you retry or replay a single subscriber without touching the others, and lets you answer “who got event X?” precisely. Route a subscriber to a dead-letter queue only after its own attempts are exhausted; the other subscribers’ deliveries are unaffected.

Because that record is per subscriber rather than per event, each subscriber runs its own small state machine over the same event. Subscriber A can be sitting in failed awaiting a retry while subscriber B is already delivered, and dead-lettering A leaves B’s record untouched.

Each subscriber advances this machine independently, so dead-lettering one endpoint never changes the recorded outcome for any other subscriber of the same event.

Sizing lanes: dedicated queues versus hashed shards

A queue per subscriber is the cleanest form of isolation and the least scalable. Redis itself is untroubled by tens of thousands of list keys, but each lane needs a worker to drain it, and an idle RQ worker still costs 30–80 MB of resident memory plus a Redis connection. At 5,000 subscribers that is 5,000 processes and roughly 250 GB of memory doing almost nothing — the design collapses long before Redis does. The break-even point in practice sits in the low hundreds of subscribers: below it, dedicated lanes are simple and worth it; above it, hash subscribers into a fixed number of shard lanes and accept isolation that is approximate rather than absolute.

Sharding trades exact isolation for a bounded worker count: a hang now delays the handful of subscribers sharing that bucket instead of the entire fleet.

Sharding is a one-line change to the lane function, but the hash choice matters more than it looks. Python’s built-in hash() is salted per process unless PYTHONHASHSEED is pinned, so the same subscriber lands in different buckets on different workers and after every restart — which silently destroys both per-lane ordering and any monitoring keyed on the lane. Use an explicit, stable digest:

import hashlib

SHARD_COUNT = 64

def lane_for(subscriber_id: str) -> str:
    """Stable across processes, restarts and interpreter versions."""
    digest = hashlib.blake2b(subscriber_id.encode(), digest_size=8).digest()
    bucket = int.from_bytes(digest, "big") % SHARD_COUNT
    return f"deliver:shard{bucket:02d}"

Pick SHARD_COUNT from your worker budget rather than your subscriber count, and set it once: a change re-maps every subscriber, so jobs already queued under the old scheme keep their old lane while new jobs go elsewhere, and any ordering guarantee is void for the duration. If you expect to grow, start with a count comfortably above what you need (64 or 128 lanes is cheap) rather than resharding later. The remaining exposure is a noisy bucket: if a large subscriber and a chronically slow one hash together, the slow one delays the large one. Give known-slow endpoints an explicit override to their own dedicated lane and let the hash handle everyone else — the override list stays short precisely because slow endpoints are rare.

What to watch per lane

Fan-out monitoring fails in a specific way: fleet-wide aggregates look perfect while an individual lane is entirely stuck, because a stalled lane produces no failures — it produces nothing at all. Every signal below is therefore computed per lane and alerted per lane.

Signal	Computed as	Investigate when	What it usually means
Lane backlog age	Now minus the enqueue time of the oldest job in the lane	Above 5 minutes	The endpoint is hanging rather than failing; nothing has errored yet
Lane depth trend	Jobs added minus jobs completed over 5 minutes	Positive for 15 minutes	Delivery is slower than production for that subscriber; the lane will never catch up on its own
Attempts per delivery	Total attempts ÷ delivered records, per subscriber	Above 1.5	The endpoint is flapping, usually rate limiting or an overloaded database behind it
Records stuck in `sending`	Count of delivery records in `sending` older than the job timeout	Any non-zero count	Workers are being killed mid-attempt, or `job_timeout` is shorter than the HTTP timeout
Fan-out ratio	Jobs enqueued ÷ events ingested	Drops below the subscriber count	Fan-out crashed partway through and some subscribers were skipped

The last row is the one worth building first. It is the only signal that catches a partially completed fan-out, and a partial fan-out is invisible from every other angle: the event is stored, the deliveries that were enqueued all succeed, and the dashboard is green while a subscriber never learns the event happened.

Verification

A unit test should confirm that one ingested event produces exactly one job per matching subscriber, on distinct queues.

from rq import Queue
import fakeredis, json

def test_fan_out_enqueues_one_job_per_subscriber():
    fake = fakeredis.FakeStrictRedis(decode_responses=True)
    fake.sadd("subs:order.created",
              json.dumps({"id": "a", "url": "http://a", "secret": "s"}),
              json.dumps({"id": "b", "url": "http://b", "secret": "s"}))
    # ... wire fan_out to `fake`, then:
    event = {"id": "e1", "type": "order.created", "body": {}}
    fan_out(event)
    assert Queue("deliver:a", connection=fake).count == 1
    assert Queue("deliver:b", connection=fake).count == 1

To prove isolation, point one subscriber at a sink that sleeps and confirm the other still receives promptly:

# Slow endpoint for sub A; healthy endpoint for sub B.
curl -fsS -X POST localhost:8000/events \
  -H 'content-type: application/json' \
  --data '{"type":"order.created","order_id":"ord_1"}'
# sub B's worker should mark "delivered" while sub A is still retrying.

Failure modes and gotchas

Shared queue defeats the whole design. If you enqueue all deliveries onto one queue and merely tag them with a subscriber id, a single slow endpoint blocks the head of the line for everyone. The queue name must be per subscriber.
Unbounded queue-per-subscriber sprawl. Tens of thousands of subscribers means tens of thousands of queues and workers. Above a few hundred, shard subscribers across a fixed pool of N queues by hash(subscriber_id) % N so isolation is approximate but worker count stays bounded.
Re-fetching a mutated event on retry. Workers load the event by id; if some other process mutates event:<id> between attempts, retries deliver different bytes and break signatures and consumer dedupe. Treat the stored event as immutable once written.
Enqueue after a partial crash. If ingest stores the event but crashes before all jobs are enqueued, some subscribers never receive it. Make fan-out idempotent and re-runnable from the stored event so a sweeper can re-enqueue missing (event_id, subscriber_id) pairs.
Fanning out inline for a broadcast event. Enqueueing is fast per job but not free: at roughly 0.4 ms of round trip per enqueue, a broadcast to 20,000 subscribers spends eight seconds inside the request handler, long past the client’s timeout. The client then retries an event that was already stored, and you fan the same event out twice. Enqueue a bounded first batch inline — a few hundred jobs — and hand the remainder to a single background fan-out job that resumes from the last subscriber id it processed.
Delivering to a subscriber that no longer exists. Jobs queued before a subscriber was deleted or disabled will still fire, because the subscriber record was copied into the job at enqueue time. An endpoint that was revoked for a security reason keeps receiving events until its lane drains, which is the worst possible time for that to be true. Re-read the subscription by id at delivery time and treat a missing or disabled record as a terminal, non-retryable outcome rather than a failure to retry.
Re-serialising the payload before signing. If the delivery worker rebuilds the JSON with json.dumps while the consumer verifies against the bytes it received, any difference in key order, whitespace, or unicode escaping produces a signature mismatch that looks exactly like a wrong secret. Store the canonical body as bytes at ingest, sign those bytes, and never let an intermediate layer pretty-print or re-encode them.
Synchronised retries when an endpoint recovers. A subscriber that was down for ten minutes has a lane full of jobs whose backoff timers all expire within the same second, so recovery is greeted with a burst that knocks it over again — and the second failure has a longer backoff, so the burst returns bigger. Add proportional jitter to every retry delay and cap the lane’s concurrency during recovery so the first minute after a breaker closes is a trickle rather than a flood.

Frequently Asked Questions

How many subscriber queues is too many?

The limit is worker processes rather than queues. Redis holds tens of thousands of list keys without noticing, but a dedicated worker per lane costs 30 to 80 MB of resident memory and a connection.

A few hundred lanes is where a machine starts to feel it; past that, map subscribers onto a bounded set of shard lanes and settle for isolation that is approximate rather than absolute.

Should the job carry the payload or just the event id?

Carry the id. A short reference keeps a 500,000-job backlog in the tens of megabytes instead of gigabytes, and it guarantees that a retry three hours later signs exactly the bytes that were stored at ingest.

The one cost is a read per attempt, which is cheap next to an HTTP round trip.

Does one worker per subscriber guarantee ordered delivery to that subscriber?

Only if the lane runs a single worker with a concurrency of one and a failed attempt blocks the lane. Two workers on the same queue, or one worker with more than one job in flight, will reorder events whenever the first attempt is retried.

Ordering therefore costs throughput, so enable it per subscriber rather than globally.

How do we stop a huge fan-out from starving small subscribers?

Bound the enqueue burst and reserve capacity. Chunking the enqueue keeps ingest responsive during a broadcast, and reserving a minimum worker slice per shard keeps a low-volume subscriber from waiting behind a mass broadcast that happens to hash into the same lane.

Where should the retry policy live: the queue or the delivery code?

On the delivery record, as data. Queue-level retry defaults are invisible during an incident and differ between brokers, whereas an explicit attempt count and next-attempt timestamp can be inspected, paused for one subscriber, and replayed by an operator without redeploying anything.

Can two events for the same subscriber be delivered concurrently?

Yes, and by default they are — running two workers on one lane doubles throughput for a busy subscriber. Do it deliberately: concurrency within a lane is what makes a large subscriber keep up, and it is also what removes any ordering guarantee for that subscriber.

Record the choice on the subscription so the delivery worker, not the operator, enforces it.