Idempotency keys vs deduplication windows for webhook consumers

Webhook providers deliver at-least-once, so a consumer will see the same event twice sooner or later. Two mechanisms suppress the duplicate effect: a durable idempotency key that records “this exact operation already ran,” and a time-bounded deduplication window that drops events seen recently within a sliding time horizon. This comparison builds on idempotency in webhooks and pairs with how to design idempotent webhook consumers, which walks through the key-extraction and atomic-store steps in depth. Here the goal is narrower: choose the right mechanism for a given workload.

The distinction matters because the two approaches fail differently. Keys give exact, permanent suppression at the cost of unbounded storage; windows give cheap, bounded storage at the cost of letting late duplicates through. Picking the wrong one shows up as either a runaway dedup table or a double-charged customer.

A key store suppresses duplicates forever but grows without bound; a window only remembers events inside its TTL and forgets the rest.

Prerequisites

To follow the code in this comparison you need:

Python 3.11 or newer, with redis 5.x and fakeredis 2.x for the tests.
A Redis instance with AOF or RDB persistence enabled — an in-memory-only node loses durable keys on restart.
The provider’s documented maximum retry horizon, in hours, for every event type you subscribe to.
A durable store for the permanent key layer: Redis with persistence, or a Postgres table with a UNIQUE index.

How an idempotency key works

An idempotency key is a stable identifier — usually supplied by the provider as X-Idempotency-Key, or derived as a hash of immutable payload fields — that you record the first time you process an event. Storage is typically a UNIQUE constraint in your database or a Redis SET key NX. The first insert wins and runs the side effect; every later insert with the same key collides and returns the cached result. Suppression is exact and permanent: even a duplicate that arrives a year later is caught, because the key is still on record.

The cost is storage that grows with event volume and an explicit decision about when, if ever, to prune it. Pruning a key reopens the window for a duplicate of that exact event.

How a deduplication window works

A deduplication window stores recently seen event identifiers with a TTL and rejects any identifier already present. In Redis this is a SET event_id "1" NX EX <seconds>; the entry self-expires, so the store stays bounded regardless of total volume. The window is sized to the provider’s retry horizon — if a provider retries for up to 72 hours, a 72-hour window catches every retry-driven duplicate.

The cost is that any duplicate arriving after the TTL expires slips through and re-runs the side effect. Windows trade exactness for bounded, self-cleaning storage.

The store stays the same size no matter how many events pass through it, which is exactly why a duplicate that outlives the TTL is invisible to the window.

Comparison

Dimension	Idempotency key	Deduplication window
Suppression guarantee	Exact and permanent	Only within the TTL horizon
Storage growth	Unbounded unless pruned	Bounded; entries self-expire
Late-duplicate handling	Always caught	Slips through after expiry
Implementation	`UNIQUE` constraint or `SET NX` + retention policy	`SET NX EX <ttl>`
Best fit	Money movement, account mutations, anything irreversible	High-volume idempotent-ish events, notifications, cache busts
Identifier source	Provider key or hash of immutable fields	Same, but only needs to be unique within the window
Operational risk	Table bloat, prune-too-early double-runs	TTL shorter than retry horizon lets duplicates through

When each one fits

Reach for a durable idempotency key when a duplicate side effect is expensive or irreversible: charging a card, transferring funds, provisioning a resource, or any operation where “ran twice” cannot be tolerated even months later. Pair the key with the atomic check-and-set described in how to design idempotent webhook consumers so concurrent retries cannot both win.

Reach for a deduplication window when events are high-volume and the cost of an occasional late duplicate is low — sending a notification, invalidating a cache, recomputing a derived value. The bounded, self-expiring store keeps Redis memory flat without a retention job.

Many production systems use both: a window absorbs the common case of rapid provider retries cheaply, while a durable key behind it guarantees exactness for the operations that truly cannot run twice.

Volume influences the cost of the store, but only the reversibility of the side effect decides whether a bounded window is safe on its own.

A combined implementation

The two-layer guard rolls out in four steps:

Derive a stable event identifier — prefer the provider-supplied key, and fall back to a hash of fields that never change across retries.
Claim the bounded dedup window — write the identifier with SET NX EX, using a TTL at least as long as the provider’s retry horizon.
Claim the durable idempotency key — write the same identifier with no TTL, so irreversible work is still blocked after the window expires.
Run the side effect once — execute the irreversible work only after both claims succeed, and return the cached acknowledgement on every later delivery.

import hashlib
import json
import redis

r = redis.Redis(decode_responses=True)

def idempotency_id(payload: dict, header_key: str | None) -> str:
    # Prefer the provider-supplied key; fall back to a hash of immutable fields.
    if header_key:
        return header_key
    canonical = json.dumps(payload, sort_keys=True, separators=(",", ":"))
    return hashlib.sha256(canonical.encode()).hexdigest()

def process_with_window_and_key(payload: dict, header_key: str | None) -> str:
    eid = idempotency_id(payload, header_key)

    # Layer 1: cheap, bounded dedup window catches rapid provider retries.
    # TTL must be >= the provider's documented retry horizon.
    if not r.set(f"dedup:{eid}", "1", nx=True, ex=72 * 3600):
        return "duplicate-suppressed-by-window"

    # Layer 2: durable key guarantees exactness for irreversible effects.
    # A UNIQUE insert that fails means this operation already ran.
    if not r.set(f"idem:{eid}", "done", nx=True):  # no TTL: permanent
        return "duplicate-suppressed-by-key"

    run_side_effect(payload)  # charge, transfer, provision, etc.
    return "processed"

def run_side_effect(payload: dict) -> None:
    ...  # the irreversible work

The window short-circuits the storm of identical retries that arrive within seconds; the keyless-TTL idem: entry is the permanent backstop for the events that must never double-run.

The window absorbs the retry storm so the durable store is only touched by traffic that survived the first filter, keeping the permanent key layer small and fast.

Verification

A unit test should prove the window suppresses a second call and that the durable key survives window expiry.

import fakeredis

def test_window_suppresses_immediate_duplicate():
    fake = fakeredis.FakeStrictRedis(decode_responses=True)
    # First call processes; identical second call is suppressed.
    assert fake.set("dedup:abc", "1", nx=True, ex=10) is True
    assert fake.set("dedup:abc", "1", nx=True, ex=10) is None

def test_durable_key_outlives_window():
    fake = fakeredis.FakeStrictRedis(decode_responses=True)
    fake.set("idem:abc", "done", nx=True)          # permanent
    fake.set("dedup:abc", "1", nx=True, ex=1)       # window
    fake.delete("dedup:abc")                         # simulate TTL expiry
    # Window is gone, but the durable key still blocks a re-run.
    assert fake.set("idem:abc", "done", nx=True) is None

Failure modes and gotchas

TTL shorter than the retry horizon. A window sized at 1 hour against a provider that retries for 24 will let the 2-hour-late retry re-run the side effect. Always read the provider’s retry policy and size the window to its maximum, plus a margin.
Pruning durable keys too aggressively. Deleting old idempotency keys to reclaim space reopens the door for an exact duplicate of those events. If you must prune, archive instead and only delete keys older than any plausible duplicate.
Hashing mutable fields into the identifier. If the derived id includes a timestamp the provider rewrites on retry, every retry gets a fresh id and nothing is suppressed. Hash only fields that are stable across retries.
Race between window and key layers. With two stores, a crash between the window SET and the key SET can leave the window claimed but the effect un-run; a later retry is then suppressed by the window without ever executing. Make the durable key the source of truth and treat the window purely as an optimization, re-checking the key on any window miss.

Frequently Asked Questions

What happens if the side effect raises after both claims have been written?

Both claims are already in place, so the provider's retry is suppressed and the work never runs — a silent loss that no counter in the sample would catch. Write the durable entry as a pending marker, run the effect, then mark it done, and remove the marker on an exception so a retry can win the claim again. The one case for keeping the claim after a failure is a side effect known to have partially applied, which needs reconciliation rather than a blind second attempt.

Can both layers live on the same Redis instance?

Only if the eviction policy cannot reach the permanent entries. A node configured with allkeys-lru will evict a durable claim under memory pressure and report nothing, so the next duplicate simply re-runs the side effect months later. Use volatile-lru so that only TTL-bearing window entries are eligible for eviction, or put the durable layer somewhere eviction is not a concept, such as a Postgres table with a unique index.

How large does the durable layer get over time?

Budget roughly 100 bytes per entry in Redis once key and object overhead are counted, so a 64-character hex identifier at 10 million events a year lands near a gigabyte a year and never shrinks. That is trivial for money movement and indefensible for click tracking, which is the concrete reason the choice tracks reversibility rather than volume. If a year of claims will not sit comfortably in memory beside your other workload, the durable layer belongs in Postgres.

Could a Bloom filter bound the durable layer instead?

Not for anything irreversible. A false positive reports a first delivery as already seen, so the event is dropped and the side effect never runs at all — a silent, unrecoverable failure that is strictly worse than the duplicate you were trying to prevent. A Bloom filter is defensible only as a pre-filter where a positive result triggers an authoritative lookup rather than a decision.

How do you migrate an existing window-only consumer onto the two-layer guard?

The durable layer cannot be reconstructed for events already processed, because a window keeps no record past its TTL. Start writing the permanent claim on every delivery from the deploy forward, and accept that anything older than the current window is unprotected against a very late duplicate. When the provider exposes an event log API, backfill identifiers from it for the period that actually matters and let the two sources converge.

Should the durable key be claimed before the window rather than after?

Claiming it first eliminates the state where a window entry exists but nothing durable does, which is the ordering that can swallow an event outright. The price is that every delivery, retry storm included, now hits the durable store first and the cheap filter stops filtering anything. Window-first remains fine provided a successful window claim is always followed by the key check and the window is treated as advisory rather than authoritative.