Load Testing Webhook Endpoints

Load testing a webhook receiver is the practice of subjecting your ingestion path to synthetic delivery traffic so that capacity limits are discovered in a controlled environment rather than during a provider’s production fan-out, and it belongs to the broader discipline of Webhook Testing & Local Development. This guide assumes you already operate a receiver that verifies signatures and enqueues work, and that you want to know its true throughput ceiling, its tail latency under stress, and the exact request rate at which it begins shedding or corrupting events. Readers should be comfortable with HTTP semantics, percentile latency, and a scripting language for the load generator.

Load testing data flow for a webhook endpoint A k6 or Locust load generator drives concurrent signed POSTs into the webhook endpoint and queue, while a metrics pipeline records p95, p99 latency and the error rate. Load generator k6 / Locust N virtual users Webhook endpoint verify + 2xx Queue / worker async processing Metrics p95 / p99 latency error rate
The load generator drives concurrent signed requests into the endpoint and its async queue while a metrics pipeline records tail latency and error rate.

Modeling Realistic Delivery Traffic

The single most common load-testing mistake is generating a smooth, constant request rate that no real provider ever produces. Production webhook traffic is bursty: a provider batches events, opens a pool of concurrent connections, and delivers a spike that decays over seconds. A test that ignores this shape will report a throughput number your endpoint cannot actually sustain when a real burst arrives. Build at least three traffic profiles and run each separately.

Drive load with open-model arrival rates, not a fixed number of looping virtual users. A closed model (each user waits for a response before sending the next request) artificially throttles itself when the endpoint slows down, hiding the queueing collapse you are trying to find. k6’s constant-arrival-rate and ramping-arrival-rate executors and Locust’s constant_throughput wait time both express load as requests per second independent of response time, which is the behavior a fire-and-forget webhook sender exhibits.

Choosing Between k6 and Locust

k6 scripts are written in JavaScript, compile to a single Go binary’s runtime, and produce very high request rates from one machine with low CPU overhead — ideal when you need tens of thousands of requests per second and want first-class arrival-rate executors and percentile thresholds as pass/fail gates. Locust scripts are Python, which makes it trivial to reuse your existing signing code, generate complex payloads, and model stateful sequences; it scales horizontally across worker processes when a single node runs out of headroom. For signed-payload webhook tests where you must reproduce the provider’s exact HMAC scheme, Locust’s Python ergonomics often win; for raw ceiling-finding, k6 is leaner. Whichever you pick, the load generator must send a valid signature so the request exercises the real verification path — testing against an endpoint with auth disabled measures a system you will never run.

Measuring p95, p99, and the Breaking Point

Averages lie. A receiver can show a 40 ms mean while one request in fifty takes two seconds because a connection waited behind a saturated pool. Providers retry on timeout, so tail latency directly drives duplicate deliveries and retry storms; the receiver must therefore acknowledge each delivery via synchronous callbacks versus async webhooks decisions that keep the response fast. Track p95 and p99 response time, the full error-rate breakdown by status code, and — critically — the queue depth and worker lag behind the endpoint, not just the HTTP response. A receiver that returns 200 in 10 ms while its queue grows unbounded has not passed; it has merely deferred its failure. Define explicit thresholds in the test so the run fails automatically when p99 exceeds your budget or error rate breaches 1%.

Failure Mode Analysis

Failure mode Impact Mitigation
Closed-model VUs mask collapse Reported throughput is unachievable under real bursts Use open-model arrival-rate executors (k6 ramping-arrival-rate, Locust constant_throughput)
Fast 2xx, unbounded queue growth Endpoint “passes” while events back up and age out Assert on queue depth and worker lag, not only HTTP latency
Load generator is the bottleneck Plateau is the test rig’s limit, not the endpoint’s Distribute load across nodes; monitor generator CPU and socket exhaustion
Signature verification skipped in test Measured path differs from production hot path Sign every synthetic request with the real HMAC scheme
Single fixed payload Caches and dedup hide real per-event cost Randomize event IDs and payload bodies per request

Runnable Implementation Example

The following Python Locust file signs each synthetic delivery with the provider’s HMAC scheme and drives an open-model throughput so the arrival rate stays constant even as the endpoint slows.

import hashlib
import hmac
import json
import os
import time
import uuid
from locust import HttpUser, task, constant_throughput

SECRET = os.environ["WEBHOOK_SECRET"].encode()
TARGET_RPS = float(os.environ.get("TARGET_RPS", "50"))


def sign(body: bytes, timestamp: str) -> str:
    """Reproduce the provider's signing scheme: HMAC over timestamp + body."""
    message = f"{timestamp}.".encode() + body
    digest = hmac.new(SECRET, message, hashlib.sha256).hexdigest()
    return f"t={timestamp},v1={digest}"


class WebhookSender(HttpUser):
    # Open-model load: each user targets a fixed rate regardless of latency.
    wait_time = constant_throughput(TARGET_RPS)

    @task
    def deliver(self):
        # Unique id per request defeats consumer-side caching/dedup.
        payload = {
            "id": str(uuid.uuid4()),
            "type": "order.created.v1",
            "data": {"amount": 4200, "currency": "USD"},
        }
        body = json.dumps(payload).encode()
        ts = str(int(time.time()))
        headers = {
            "Content-Type": "application/json",
            "X-Webhook-Signature": sign(body, ts),
        }
        with self.client.post(
            "/webhooks/orders",
            data=body,
            headers=headers,
            name="POST /webhooks/orders",
            catch_response=True,
        ) as resp:
            # Treat slow 2xx as a failure to surface tail latency in the report.
            if resp.elapsed.total_seconds() > 1.0:
                resp.failure("over 1s budget")
            elif resp.status_code >= 300:
                resp.failure(f"status {resp.status_code}")

Run it with locust -f load.py --headless -u 200 -r 20 --run-time 10m --host https://staging.example.com, then watch the percentile table and, separately, your queue-depth dashboard.

Debugging Checklist