Simulating Webhook Traffic Spikes
A traffic spike is the scenario that breaks most webhook receivers: a provider that has been quiet suddenly replays a backlog or fans out a large event batch, opening many concurrent connections within a second or two. This page extends load testing webhook endpoints with a focused recipe for reproducing that thundering herd using k6, and it pairs naturally with consumer-driven contract tests for webhooks so the payloads you blast are also schema-valid. The goal is not a vanity throughput number; it is to observe exactly how your endpoint degrades the instant arrival rate outruns processing rate.
Prerequisites
- k6 installed locally (
brew install k6, or the Docker imagegrafana/k6). - A staging webhook endpoint that verifies signatures and enqueues work — never spike production.
- The shared signing secret exported as an environment variable, plus knowledge of the provider’s exact HMAC scheme.
- A metrics view of the receiver’s queue depth and worker lag, not just HTTP status, so you can see backlog form.
- A defined latency budget (for example, p99 under 1 s) and an acceptable error rate (for example, under 1%).
Step-by-Step Implementation
1. Establish the baseline rate
First measure the steady rate your endpoint sustains comfortably. Run a short constant-arrival-rate test and find the highest rate that keeps p99 within budget and the queue flat. Call that BASE. Your spike will jump to a multiple of it (start with 10×).
2. Write the spike scenario
k6’s ramping-arrival-rate executor lets you express an instantaneous jump by using a zero-duration stage. The preAllocatedVUs must be large enough to issue the spike rate even while responses are slow — under-allocating VUs silently caps the spike.
import http from "k6/http";
import crypto from "k6/crypto";
import { check } from "k6";
const SECRET = __ENV.WEBHOOK_SECRET;
const HOST = __ENV.HOST; // e.g. https://staging.example.com
const BASE = Number(__ENV.BASE || 20);
export const options = {
scenarios: {
spike: {
executor: "ramping-arrival-rate",
startRate: BASE,
timeUnit: "1s",
preAllocatedVUs: 600, // headroom so the spike is not VU-limited
maxVUs: 1500,
stages: [
{ target: BASE, duration: "30s" }, // baseline
{ target: BASE * 10, duration: "1s" }, // near-instant jump
{ target: BASE * 10, duration: "30s" }, // hold the herd
{ target: BASE, duration: "1s" }, // drop back
{ target: BASE, duration: "30s" }, // observe recovery
],
},
},
thresholds: {
http_req_duration: ["p(99)<1000"], // fail run if p99 > 1s
http_req_failed: ["rate<0.01"], // fail run if >1% non-2xx
},
};
3. Sign each generated payload
Reproduce the provider’s signing scheme inside the default function so every request exercises the real verification path. Randomize the event ID so consumer-side deduplication does not absorb the load.
function signedHeaders(body) {
const ts = Math.floor(Date.now() / 1000).toString();
const mac = crypto.hmac("sha256", SECRET, `${ts}.${body}`, "hex");
return {
"Content-Type": "application/json",
"X-Webhook-Signature": `t=${ts},v1=${mac}`,
};
}
export default function () {
const body = JSON.stringify({
id: `evt_${Date.now()}_${__VU}_${__ITER}`,
type: "order.created.v1",
data: { amount: 4200, currency: "USD" },
});
const res = http.post(`${HOST}/webhooks/orders`, body, {
headers: signedHeaders(body),
});
check(res, { "is 2xx": (r) => r.status >= 200 && r.status < 300 });
}
4. Run the spike
WEBHOOK_SECRET=$SECRET HOST=https://staging.example.com BASE=20 \
k6 run spike.js
Verification and Testing
A spike test is only meaningful if you confirm two things at the spike timestamp. First, the k6 summary should report http_req_duration p99 and http_req_failed rate — if either threshold tripped, k6 exits non-zero, which makes the test usable as a CI gate. Second, correlate that moment against your receiver’s queue depth: a healthy endpoint shows the queue rising during the hold and draining smoothly afterward. Assert recovery explicitly by checking that, in the final baseline stage, the queue returns to near zero. A quick log assertion confirms no events were dropped:
# Count delivered vs. accepted; they must match.
grep -c '"accepted webhook"' receiver.log # should equal total k6 requests
grep -c '"dropped"\|"queue full"' receiver.log # must be 0
Failure Modes and Gotchas
- VU starvation caps the spike. If
preAllocatedVUsis too low, k6 cannot issue the requested rate once responses slow, and you will under-test. Watch thedropped_iterationsmetric — any nonzero value means the generator, not the endpoint, was the limit. RaisepreAllocatedVUs/maxVUs. - Connection limits on the generator host. A single machine can exhaust ephemeral ports or file descriptors during a 10× burst, producing
EADDRNOTAVAILthat masquerades as endpoint errors. Raiseulimit -n, enable keep-alive, or run k6 distributed. - Fast 2xx hiding backlog. An endpoint that returns 200 immediately and enqueues can pass HTTP thresholds while its worker pool falls hours behind. Always assert on queue drain in the recovery stage, and consider routing overflow to a dead-letter queue so spikes degrade gracefully instead of dropping events.
- Identical payloads. A static body lets caches and idempotency stores short-circuit processing, understating per-event cost. Always vary the event ID and meaningful fields per request.
Related
- Consumer-driven contract tests for webhooks — keep spike payloads schema-valid.
- Debugging failed webhook deliveries — diagnosing the failures a spike exposes.
- Load testing webhook endpoints — the parent guide on capacity testing.