Instrumenting Webhooks with OpenTelemetry for End-to-End Tracing

When a webhook delivery is slow or fails intermittently, the only reliable way to find the culprit is to follow a single event across the producer and consumer as one trace — and that is exactly what this guide builds, extending the broader patterns in Webhook Observability & Monitoring. We will wrap dispatch and delivery in OpenTelemetry spans, propagate the W3C Trace Context traceparent header from producer to consumer, and attach the span attributes that make the resulting trace actionable. Once spans exist, they become the substrate for the targets in defining SLOs for webhook delivery and the signals routed by alerting on webhook delivery failures.

Trace context propagation sequence The producer span injects a traceparent header which the consumer span extracts to continue the same trace. Producer dispatch span Consumer ingest span POST + traceparent: 00-trace_id-span_id-01 inject on send 2xx response closes producer span Both spans share one trace_id
The producer injects trace context on dispatch; the consumer extracts it so both spans belong to one trace.

Prerequisites

Step 1: Configure the Tracer and Exporter

Initialize a tracer provider with an OTLP exporter and, critically, set the global propagator to W3C Trace Context so traceparent is the wire format on both ends.

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.propagate import set_global_textmap
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator

resource = Resource.create({"service.name": "webhook-dispatcher"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="http://collector:4317")))
trace.set_tracer_provider(provider)

# Ensure traceparent (W3C) is the propagation format on both producer and consumer.
set_global_textmap(TraceContextTextMapPropagator())

tracer = trace.get_tracer("webhook.dispatch")

Step 2: Open a Dispatch Span and Inject Context

Wrap each delivery attempt in a span. Use inject to write the active context into the outgoing headers; never hand-format traceparent yourself.

import requests
from opentelemetry.propagate import inject

def deliver(event, endpoint, attempt):
    with tracer.start_as_current_span("webhook.deliver") as span:
        span.set_attribute("webhook.endpoint_id", endpoint["id"])
        span.set_attribute("webhook.event_type", event["type"])
        span.set_attribute("webhook.attempt", attempt)
        span.set_attribute("webhook.payload_bytes", len(event["body"]))

        headers = {"Content-Type": "application/json"}
        inject(headers)  # writes traceparent into headers from the active span

        resp = requests.post(endpoint["url"], data=event["body"], headers=headers, timeout=10)
        span.set_attribute("http.response.status_code", resp.status_code)
        if resp.status_code >= 300:
            span.set_status(trace.Status(trace.StatusCode.ERROR, f"status {resp.status_code}"))
        return resp.status_code

Step 3: Set Span Attributes That Make Traces Actionable

The attributes above — endpoint_id, event_type, attempt, payload_bytes, and http.response.status_code — are what let you filter traces to “attempt > 1 deliveries to endpoint X that returned 5xx.” Add a span event for each retry decision so the backoff schedule is visible inline. Avoid putting the full payload or any secret on the span; record a payload hash instead.

Step 4: Extract Context and Start a Consumer Span

On the consumer, extract the context from request headers before starting your handler span. This is the join that makes the consumer span a child of the producer span.

from fastapi import FastAPI, Request
from opentelemetry import trace
from opentelemetry.propagate import extract

app = FastAPI()
tracer = trace.get_tracer("webhook.consume")

@app.post("/webhooks")
async def receive(request: Request):
    ctx = extract(dict(request.headers))  # reads traceparent into a context
    with tracer.start_as_current_span("webhook.handle", context=ctx) as span:
        body = await request.body()
        span.set_attribute("webhook.payload_bytes", len(body))
        # verify_signature(...) then process; span auto-closes on exit
        return {"status": "ok"}

Step 5: Record Errors and Close Spans

Set the span status to error on any non-2xx outcome or exception and call record_exception so the stack trace rides on the span. Because the spans use context managers they close automatically, but never swallow exceptions before recording them — an unrecorded error is an invisible failure.

Verification and Testing

Run both services against a local collector and fire one event, then assert the trace joined correctly. A focused integration test extracts the context the producer would send and confirms the trace ID matches:

from opentelemetry.propagate import inject, extract
from opentelemetry import trace

def test_traceparent_round_trips():
    tracer = trace.get_tracer("test")
    with tracer.start_as_current_span("producer") as producer:
        headers = {}
        inject(headers)
        assert "traceparent" in headers
        producer_trace_id = producer.get_span_context().trace_id

    ctx = extract(headers)
    # The extracted span context carries the producer's trace id.
    span_ctx = trace.get_current_span(ctx).get_span_context()
    assert span_ctx.trace_id == producer_trace_id

You can also verify on the wire with curl and inspect that your handler logs the same trace ID it received:

curl -X POST http://localhost:8000/webhooks \
  -H 'Content-Type: application/json' \
  -H 'traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01' \
  -d '{"type":"payment.succeeded"}'

Failure Modes and Gotchas