Zero-downtime webhook secret rotation
Rotating a shared HMAC secret is the one routine security task most likely to cause an outage, because the naive approach — swap the secret on both ends at once — guarantees a window where the sender and receiver disagree and every delivery fails its signature check. This guide covers the dual-secret overlap pattern that eliminates that window: for a defined period the verifier accepts either the old or the new secret, so the sender can switch over at its own pace without a single rejected webhook. It extends the Key Rotation Strategies reference and complements the broader procedure in How to implement secure key rotation for webhooks. The scenario here is narrow and concrete: you verify inbound webhooks with an HMAC shared secret, and you need to replace that secret on a schedule with provably zero dropped deliveries.
The whole technique rests on one idea — never have exactly one valid secret during a change. You go from one secret, to two valid secrets (overlap), to one secret again. As long as the overlap window is open on the verifier before the sender starts using the new secret, and stays open until the sender has fully stopped using the old one, no in-flight request can land on a secret the verifier does not hold.
Prerequisites
- Runtime: Node.js 20+ with TypeScript and
node:crypto. - A secret store (KMS, Vault, or env injection) that can hold an ordered list of secrets, not just one — e.g.
WEBHOOK_SECRETSas a comma-separated or JSON list. - Control over the rollout timing of both the verifier and the signer. If a third party signs and you only verify, you can still hold two secrets; you just coordinate the switch with them.
- A working single-secret HMAC verifier, such as the one in HMAC webhook validation in Node.js.
Step 1: Add the new secret alongside the old
Generate a strong new secret and load it into the verifier in addition to the existing one. The verifier reads a list of secrets, ordered newest-first, rather than a single value.
import crypto from 'node:crypto';
// Generate a 32-byte secret; store it, do not log it.
export const newSecret = () => crypto.randomBytes(32).toString('hex');
// Verifier reads an ordered list. During normal operation this has one entry;
// during rotation it has two. e.g. WEBHOOK_SECRETS="<new>,<old>"
export function loadSecrets(): string[] {
return (process.env.WEBHOOK_SECRETS ?? '')
.split(',')
.map((s) => s.trim())
.filter(Boolean);
}
Engineering Note: Deploy this list-aware verifier before you ever add a second secret. The first rotation should not also be the first time the multi-secret code path runs in production.
Step 2: Verify against both secrets
Accept the request if its signature validates against any secret in the list, using a constant-time comparison for each candidate. Order the list newest-first so the common case (post-switch traffic) matches on the first try.
import crypto from 'node:crypto';
function sign(rawBody: Buffer, secret: string): Buffer {
return crypto.createHmac('sha256', secret).update(rawBody).digest();
}
export function verifyAgainstAny(rawBody: Buffer, signatureHex: string, secrets: string[]): boolean {
const provided = Buffer.from(signatureHex, 'hex');
let valid = false;
for (const secret of secrets) {
const expected = sign(rawBody, secret);
// Evaluate every secret without early-return so timing does not reveal
// which secret (if any) matched.
const match =
expected.length === provided.length && crypto.timingSafeEqual(expected, provided);
valid = valid || match;
}
return valid;
}
Engineering Note: Do not return true on the first match. Iterating all secrets keeps total verification time independent of which secret matched, preserving the constant-time property across the whole list. With only two secrets the cost is negligible.
Step 3: Switch the sender to the new secret
Once every verifier instance is running with both secrets loaded (confirm via deploy rollout, not assumption), update the signing side to sign exclusively with the new secret.
// Signer: after the verifier fleet accepts both, sign only with the new secret.
function signOutbound(rawBody: Buffer): string {
const secret = process.env.WEBHOOK_ACTIVE_SECRET!; // now the new value
return crypto.createHmac('sha256', secret).update(rawBody).digest('hex');
}
Engineering Note: The ordering between Step 1 and Step 3 is the entire safety guarantee. The verifier must accept the new secret before the sender emits anything signed with it. If you reverse them, you reintroduce the exact failure window this pattern exists to remove.
Step 4: Retire the old secret
Leave the old secret in the verifier’s list until you are certain no traffic still uses it — at minimum, longer than your delivery timeout plus retry budget so that even a request enqueued before the switch and retried afterward cannot land on the old secret after it is gone. Then remove it from WEBHOOK_SECRETS and destroy it in the store.
// After the overlap window: WEBHOOK_SECRETS shrinks back to a single value.
// Confirm zero verifications matched the old secret before removing it (see Verification).
Engineering Note: Tag each verification with which secret matched and emit it as a metric. Retire the old secret only after that metric has read zero for a full overlap window — let observed traffic, not a wall-clock guess, gate the retirement.
Verification and testing
- Both-secret acceptance: sign one payload with the old secret and one with the new, and assert
verifyAgainstAnyreturnstruefor both while both are loaded. - Per-secret metric: instrument the verifier to count
match_secret_index(0 = new, 1 = old). During overlap you should watch index 1 fall to zero before retiring it. - Negative test: sign with a secret not in the list and assert rejection — proving the overlap accepts only the explicitly loaded secrets, not anything.
- Retirement test: after removing the old secret, replay a payload signed with it and assert a
401.
# Watch which secret is matching in production logs before retiring the old one.
grep 'webhook_verified' /var/log/app/webhook.log | jq '.match_secret_index' | sort | uniq -c
A minimal unit test:
import { verifyAgainstAny } from './verify';
import crypto from 'node:crypto';
const OLD = 'old-secret', NEW = 'new-secret';
const body = Buffer.from('{"event":"test"}');
const sig = (s: string) => crypto.createHmac('sha256', s).update(body).digest('hex');
test('accepts both during overlap', () => {
expect(verifyAgainstAny(body, sig(OLD), [NEW, OLD])).toBe(true);
expect(verifyAgainstAny(body, sig(NEW), [NEW, OLD])).toBe(true);
});
test('rejects retired secret', () => {
expect(verifyAgainstAny(body, sig(OLD), [NEW])).toBe(false);
});
Failure modes and gotchas
- Switching the sender before the verifier fleet is fully rolled out. During a rolling deploy, some verifier instances still hold only the old secret. If the sender switches while any such instance is live, requests routed there fail. Gate Step 3 on a completed verifier rollout, not on the first instance picking up the new secret.
- Early-return verification leaking timing. Returning on the first matching secret makes verification time depend on which secret matched, a side channel. Always evaluate every secret in the list before deciding.
- Retiring the old secret too soon. A webhook enqueued and signed before the switch may be retried minutes later. If the overlap window is shorter than your maximum retry horizon, those retries fail after retirement. Size the overlap to exceed the full retry budget and confirm zero old-secret traffic before removing it.
- Logging secrets during rotation. Rotation touches secret-handling code paths, which is exactly when an accidental
console.log(secrets)slips in. Redact secret values in every log line and assert it in tests.