At-least-once vs exactly-once webhook delivery trade-offs

Every webhook platform must pick a delivery semantic, and the choice cascades into retry design, consumer complexity, and operational cost. This comparison builds on message ordering guarantees and sits alongside implementing strict ordering for financial webhooks, which shows ordering on top of these semantics. For the full taxonomy of guarantees a platform can offer, see the cross-cutting treatment in delivery guarantee levels.

The short version: true exactly-once delivery over an unreliable network is impossible, because the sender can never be certain whether a lost acknowledgement means the receiver got the message or not. What real systems ship is at-least-once delivery plus idempotent consumers, which together produce an effect that is observed exactly once. Understanding why is the difference between chasing an unattainable guarantee and building one that works.

Lost ack forces a duplicate delivery The sender delivers an event, the ack is lost, and the sender retries, so the consumer must dedupe. Sender Consumer deliver event #1 ack (lost) timeout retry event #1 dedupe by event id
A lost acknowledgement is indistinguishable from a lost message, so the sender retries; the consumer dedupes to make the effect observed once.

What at-least-once delivery guarantees

At-least-once means the sender keeps retrying until it receives an acknowledgement, so the consumer is guaranteed to see every event — but possibly more than once. It is the default for essentially every commercial webhook provider because it is the only semantic that survives network partitions without dropping events. The sender’s contract is simple: store the event, deliver, and on any timeout or non-2xx response, retry with backoff.

The burden moves to the consumer, which must tolerate duplicates. That is the whole reason idempotency matters in webhooks; with an idempotent consumer, at-least-once delivery yields effectively-once processing.

Why exactly-once delivery is unattainable

Exactly-once delivery would require the sender to know, with certainty, that the consumer received and durably stored each event — exactly once, never zero, never twice. The problem is the two generals: after the consumer processes an event it sends an ack, but if that ack is lost the sender cannot tell the difference between “consumer never got it” and “consumer got it, ack vanished.” Its only safe move is to retry, which produces a duplicate. No additional acknowledgement round solves this; the last message in any finite exchange can always be the one that’s lost.

So exactly-once is achievable only as exactly-once processing — the effect happens once even though the message may arrive several times. That is delivered by at-least-once transport plus a consumer that discards duplicates.

At-most-once for contrast

At-most-once delivery fires each event once and never retries: zero duplicates, but events are silently lost on any failure. It suits pure telemetry or best-effort notifications where a missed event costs nothing. It is the wrong choice for anything stateful, which is why it rarely appears in webhook platforms beyond fire-and-forget pings.

Comparison

Dimension At-most-once At-least-once Exactly-once (effective)
Duplicates Never Possible Suppressed by consumer
Lost events Possible Never (with retries) Never
Sender complexity Trivial (fire and forget) Retry + backoff + DLQ Same as at-least-once
Consumer complexity None Must tolerate duplicates Must be idempotent + store keys
Network-partition behavior Drops events Survives, retries later Survives, retries later
Realistic to implement Yes Yes Only as effectively-once
Typical fit Telemetry, pings The default for webhooks Money movement, account state

Choosing for a workload

Default to at-least-once delivery with an idempotent consumer for almost every webhook integration. It is the only combination that neither drops events nor double-applies them, and it degrades gracefully under partitions. Add a durable idempotency key when the side effect is irreversible, and back retries with exponential backoff and a dead-letter queue so a permanently failing consumer does not stall the pipeline.

Pick at-most-once only for genuinely disposable signals. Never reach for it on anything that mutates state, because the lost-event mode is invisible until reconciliation surfaces the gap. When ordering also matters, layer the techniques from implementing strict ordering for financial webhooks on top of the at-least-once base.

Implementing effectively-once on the consumer

import hashlib
import json
import psycopg2

# A UNIQUE constraint on event_id turns at-least-once into effectively-once:
#   CREATE TABLE processed_events (
#     event_id TEXT PRIMARY KEY,
#     processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
#   );

def stable_event_id(headers: dict, payload: dict) -> str:
    # Prefer the provider's event id; it is stable across retries.
    if eid := headers.get("X-Event-Id"):
        return eid
    canonical = json.dumps(payload, sort_keys=True, separators=(",", ":"))
    return hashlib.sha256(canonical.encode()).hexdigest()

def handle(conn, headers: dict, payload: dict) -> str:
    eid = stable_event_id(headers, payload)
    with conn:  # one transaction: claim + side effect commit together
        with conn.cursor() as cur:
            cur.execute(
                "INSERT INTO processed_events (event_id) VALUES (%s) "
                "ON CONFLICT (event_id) DO NOTHING RETURNING event_id",
                (eid,),
            )
            if cur.fetchone() is None:
                # The row already existed: this is a retry of a processed event.
                return "duplicate-ignored"
            apply_side_effect(cur, payload)  # runs in the same transaction
    return "processed"

def apply_side_effect(cur, payload: dict) -> None:
    ...  # the actual work, committed atomically with the claim

Claiming the event_id and applying the side effect in one transaction is what makes it correct: either both commit or neither does, so a crash mid-flight is safely retried rather than leaving a claimed-but-unprocessed event.

Verification

def test_duplicate_is_ignored(conn):
    headers = {"X-Event-Id": "evt_123"}
    payload = {"amount": 100}
    assert handle(conn, headers, payload) == "processed"
    # An identical retry must not re-run the side effect.
    assert handle(conn, headers, payload) == "duplicate-ignored"

You can also force the duplicate path with curl by replaying the same delivery twice; the second call should be acknowledged with a 200 but leave state unchanged:

BODY='{"amount":100}'
for i in 1 2; do
  curl -fsS -X POST localhost:8000/webhooks \
    -H 'X-Event-Id: evt_123' -H 'content-type: application/json' --data "$BODY"
done
# Expect one state change in the database, two 200 responses.

Failure modes and gotchas