Load Testing Webhook Endpoints
Load testing a webhook receiver is the practice of subjecting your ingestion path to synthetic delivery traffic so that capacity limits are discovered in a controlled environment rather than during a provider’s production fan-out, and it belongs to the broader discipline of Webhook Testing & Local Development. This guide assumes you already operate a receiver that verifies signatures and enqueues work, and that you want to know its true throughput ceiling, its tail latency under stress, and the exact request rate at which it begins shedding or corrupting events. Readers should be comfortable with HTTP semantics, percentile latency, and a scripting language for the load generator.
Modeling Realistic Delivery Traffic
The single most common load-testing mistake is generating a smooth, constant request rate that no real provider ever produces. Production webhook traffic is bursty: a provider batches events, opens a pool of concurrent connections, and delivers a spike that decays over seconds. A test that ignores this shape will report a throughput number your endpoint cannot actually sustain when a real burst arrives. Build at least three traffic profiles and run each separately.
- Steady-state soak: a constant arrival rate held for 30–60 minutes to surface memory leaks, connection-pool exhaustion, and slow queue drain that only appear over time.
- Ramp to breaking point: a linearly increasing arrival rate that climbs until error rate crosses a threshold (for example, 1% non-2xx). The rate at the crossover is your endpoint’s saturation point.
- Burst / thundering herd: a near-instantaneous jump from baseline to many multiples of it, holding briefly, then dropping. This models a provider replaying a backlog and is covered in depth in simulating webhook traffic spikes.
Drive load with open-model arrival rates, not a fixed number of looping virtual users. A closed model (each user waits for a response before sending the next request) artificially throttles itself when the endpoint slows down, hiding the queueing collapse you are trying to find. k6’s constant-arrival-rate and ramping-arrival-rate executors and Locust’s constant_throughput wait time both express load as requests per second independent of response time, which is the behavior a fire-and-forget webhook sender exhibits.
Choosing Between k6 and Locust
k6 scripts are written in JavaScript, compile to a single Go binary’s runtime, and produce very high request rates from one machine with low CPU overhead — ideal when you need tens of thousands of requests per second and want first-class arrival-rate executors and percentile thresholds as pass/fail gates. Locust scripts are Python, which makes it trivial to reuse your existing signing code, generate complex payloads, and model stateful sequences; it scales horizontally across worker processes when a single node runs out of headroom. For signed-payload webhook tests where you must reproduce the provider’s exact HMAC scheme, Locust’s Python ergonomics often win; for raw ceiling-finding, k6 is leaner. Whichever you pick, the load generator must send a valid signature so the request exercises the real verification path — testing against an endpoint with auth disabled measures a system you will never run.
Measuring p95, p99, and the Breaking Point
Averages lie. A receiver can show a 40 ms mean while one request in fifty takes two seconds because a connection waited behind a saturated pool. Providers retry on timeout, so tail latency directly drives duplicate deliveries and retry storms; the receiver must therefore acknowledge each delivery via synchronous callbacks versus async webhooks decisions that keep the response fast. Track p95 and p99 response time, the full error-rate breakdown by status code, and — critically — the queue depth and worker lag behind the endpoint, not just the HTTP response. A receiver that returns 200 in 10 ms while its queue grows unbounded has not passed; it has merely deferred its failure. Define explicit thresholds in the test so the run fails automatically when p99 exceeds your budget or error rate breaches 1%.
Failure Mode Analysis
| Failure mode | Impact | Mitigation |
|---|---|---|
| Closed-model VUs mask collapse | Reported throughput is unachievable under real bursts | Use open-model arrival-rate executors (k6 ramping-arrival-rate, Locust constant_throughput) |
| Fast 2xx, unbounded queue growth | Endpoint “passes” while events back up and age out | Assert on queue depth and worker lag, not only HTTP latency |
| Load generator is the bottleneck | Plateau is the test rig’s limit, not the endpoint’s | Distribute load across nodes; monitor generator CPU and socket exhaustion |
| Signature verification skipped in test | Measured path differs from production hot path | Sign every synthetic request with the real HMAC scheme |
| Single fixed payload | Caches and dedup hide real per-event cost | Randomize event IDs and payload bodies per request |
Runnable Implementation Example
The following Python Locust file signs each synthetic delivery with the provider’s HMAC scheme and drives an open-model throughput so the arrival rate stays constant even as the endpoint slows.
import hashlib
import hmac
import json
import os
import time
import uuid
from locust import HttpUser, task, constant_throughput
SECRET = os.environ["WEBHOOK_SECRET"].encode()
TARGET_RPS = float(os.environ.get("TARGET_RPS", "50"))
def sign(body: bytes, timestamp: str) -> str:
"""Reproduce the provider's signing scheme: HMAC over timestamp + body."""
message = f"{timestamp}.".encode() + body
digest = hmac.new(SECRET, message, hashlib.sha256).hexdigest()
return f"t={timestamp},v1={digest}"
class WebhookSender(HttpUser):
# Open-model load: each user targets a fixed rate regardless of latency.
wait_time = constant_throughput(TARGET_RPS)
@task
def deliver(self):
# Unique id per request defeats consumer-side caching/dedup.
payload = {
"id": str(uuid.uuid4()),
"type": "order.created.v1",
"data": {"amount": 4200, "currency": "USD"},
}
body = json.dumps(payload).encode()
ts = str(int(time.time()))
headers = {
"Content-Type": "application/json",
"X-Webhook-Signature": sign(body, ts),
}
with self.client.post(
"/webhooks/orders",
data=body,
headers=headers,
name="POST /webhooks/orders",
catch_response=True,
) as resp:
# Treat slow 2xx as a failure to surface tail latency in the report.
if resp.elapsed.total_seconds() > 1.0:
resp.failure("over 1s budget")
elif resp.status_code >= 300:
resp.failure(f"status {resp.status_code}")
Run it with locust -f load.py --headless -u 200 -r 20 --run-time 10m --host https://staging.example.com, then watch the percentile table and, separately, your queue-depth dashboard.
Debugging Checklist
- Confirm the load generator itself is not CPU- or socket-bound before trusting any plateau.
- Verify every synthetic request carries a valid signature and unique event ID.
- Correlate the HTTP p99 spike with queue depth and worker lag at the same timestamp.
- Re-run the breaking-point ramp three times; the saturation rate should be stable within ~10%.
- Check for connection-pool exhaustion (look for
EADDRNOTAVAILor keep-alive churn) on both ends. - Ensure the staging database and downstream services are sized like production, not scaled down.
Related
- Inspecting and replaying webhook deliveries — capture and re-send real traffic for repeatable load.
- Webhook contract testing — verify payload shape before you load the endpoint.
- Simulating webhook traffic spikes — modeling thundering-herd bursts in detail.
- Webhook Testing & Local Development — the broader testing discipline.