Choosing a Webhook Delivery Guarantee Level

Every webhook system makes a delivery guarantee, whether it is chosen deliberately or not. The guarantee is the contract you offer consumers about how often each event arrives: never more than once, at least once, or exactly once in effect. Picking the right tier is the most consequential design decision in Delivery Guarantee Levels, because it dictates your retry policy, your storage footprint, and how much work consumers must do. This guide gives you a concrete procedure and a comparison table to choose, then shows how to encode the choice in code. For the deeper theory behind why exactly-once is effectively a deduplication problem, read at-least-once vs exactly-once delivery trade-offs.

The short version: at-least-once is the right default for almost all webhooks, paired with consumer-side idempotency. At-most-once and effectively-exactly-once are specializations you reach for only when a specific requirement forces them.

Delivery guarantee decision tree Branches on whether event loss is acceptable and whether consumers can deduplicate, leading to at-most-once, at-least-once, or effectively-exactly-once. Is event loss acceptable? At-most-once no retries Can consumer dedupe? At-least-once retry + idempotency Effectively exactly-once (dedup store) yes no yes no
The decision tree: loss tolerance picks at-most-once; otherwise dedup capability decides between at-least-once and an effectively-exactly-once dedup layer.

The three guarantee levels compared

There is no true exactly-once over an unreliable network — “exactly-once” in practice means at-least-once delivery plus deduplication that makes reprocessing a no-op. The three operational tiers are:

Dimension At-most-once At-least-once Effectively-exactly-once
Duplicates Never Possible (must be tolerated) Suppressed by a dedup store
Lost events Possible on any failure Never (retried until acked) Never
Retries None Bounded retries + backoff Bounded retries + backoff
Where dedup lives N/A Consumer (its responsibility) Producer dedup store + consumer idempotency
Storage cost Lowest Low (delivery log only) Highest (durable dedup keys with long TTL)
Latency overhead Lowest Low Added lookup/write per event
Typical use Metrics, presence pings, ephemeral signals Most business events Payments, billing, inventory

The decisive trade-off is duplicates versus loss. You cannot have neither without unbounded cost; you choose which one your consumers can tolerate cheaply. Most can absorb duplicates with a small idempotency check far more easily than they can recover from a silently dropped event.

Step 1: Classify each event’s loss tolerance

Run the decision per event type, not per system. A single dispatcher often carries metric.sampled (loss fine) alongside invoice.paid (loss catastrophic). For each type, ask: if this event vanishes and no one notices for an hour, what breaks?

from enum import Enum

class Guarantee(Enum):
    AT_MOST_ONCE = "at_most_once"
    AT_LEAST_ONCE = "at_least_once"
    EFFECTIVELY_EXACTLY_ONCE = "eeo"

EVENT_POLICY = {
    "metric.sampled":   Guarantee.AT_MOST_ONCE,      # cheap, replaceable
    "user.updated":     Guarantee.AT_LEAST_ONCE,     # dedupe on consumer
    "invoice.paid":     Guarantee.EFFECTIVELY_EXACTLY_ONCE,  # money
}

If loss is acceptable, choose at-most-once: fire once, no retries, no delivery log. Everything else continues to Step 2.

Step 2: Decide who owns deduplication

For events that must not be lost, the next question is whether the consumer can deduplicate. A consumer that writes through a unique idempotency key (e.g. an INSERT ... ON CONFLICT DO NOTHING keyed on the event ID) makes at-least-once safe with almost no extra machinery on the producer side. This is the sweet spot and your default:

def deliver_at_least_once(dispatcher, event, max_attempts=6):
    # Retry until acknowledged; the consumer dedupes on event["id"].
    for attempt in range(max_attempts):
        ok = dispatcher.post(event, headers={"X-Idempotency-Key": event["id"]})
        if ok:
            return "delivered"
        dispatcher.backoff(attempt)   # exponential backoff + jitter
    dispatcher.dead_letter(event)
    return "exhausted"

If you cannot rely on the consumer to dedupe — for example a third-party endpoint with side effects you don’t control — escalate to Step 3 and provide deduplication on the producer side.

Step 3: Cost the strongest tier before committing

Effectively-exactly-once is not free. It requires a durable dedup store whose key TTL must outlive your maximum retry window and any DLQ retention, so a replayed event from a dead-letter queue is still recognized as a duplicate weeks later. Budget for the storage and the per-event lookup latency:

import redis

class ExactlyOnceGate:
    def __init__(self, client: redis.Redis, ttl_days: int = 30):
        self.r = client
        self.ttl = ttl_days * 24 * 3600

    def first_delivery(self, event_id: str) -> bool:
        # SET NX is the dedup primitive; returns True only the first time.
        return bool(self.r.set(f"eeo:{event_id}", "1", nx=True, ex=self.ttl))

def deliver_eeo(dispatcher, gate: ExactlyOnceGate, event):
    if not gate.first_delivery(event["id"]):
        return "suppressed_duplicate"
    return deliver_at_least_once(dispatcher, event)

Only adopt this tier where a duplicate genuinely causes harm that the consumer cannot undo cheaply — double charges, double shipments, double-counted balances.

Step 4: Encode the choice in the dispatcher

Wire the per-event policy into one place so the guarantee is explicit and testable rather than emergent from scattered retry settings:

def dispatch(event, dispatcher, gate):
    level = EVENT_POLICY.get(event["type"], Guarantee.AT_LEAST_ONCE)
    if level is Guarantee.AT_MOST_ONCE:
        dispatcher.post(event)              # fire and forget, no retry
        return "sent_best_effort"
    if level is Guarantee.AT_LEAST_ONCE:
        return deliver_at_least_once(dispatcher, event)
    return deliver_eeo(dispatcher, gate, event)

Verification

Assert that each tier behaves as contracted under a forced failure.

def test_at_least_once_retries_then_delivers():
    calls = {"n": 0}
    class D:
        def post(self, e, headers=None):
            calls["n"] += 1
            return calls["n"] >= 3        # fail twice, then succeed
        def backoff(self, a): pass
        def dead_letter(self, e): pass
    assert deliver_at_least_once(D(), {"id": "e1"}) == "delivered"
    assert calls["n"] == 3

def test_eeo_suppresses_duplicate():
    gate = ExactlyOnceGate(fakeredis.FakeStrictRedis())
    assert gate.first_delivery("evt-9") is True
    assert gate.first_delivery("evt-9") is False   # second time: duplicate

Operationally, confirm the contract with a black-box probe: send the same event twice and inspect the consumer.

# Send a duplicate; an EEO consumer must show exactly one applied side effect.
curl -s -X POST "$ENDPOINT" -H 'X-Idempotency-Key: evt-9' -d @event.json
curl -s -X POST "$ENDPOINT" -H 'X-Idempotency-Key: evt-9' -d @event.json

Failure modes and gotchas