Skip to main content

Delivery Recovery

This guide is for teams responsible for keeping a live integration healthy.

It explains how to reason about delivery state, dead letters, and replay without turning recovery into guesswork.

What this page is for

Use this page when you need to answer questions like:

Is this an authentication problem or a delivery problem?
Are deliveries stuck, retrying, or dead-lettered?
Should I replay one item or recover a larger batch?
Is delivery draining normally?

Start with diagnosis, not replay

The first step is always to inspect the app’s operational state.

That inspection should tell you whether the primary problem is:

invalid app auth
missing delegated authority
delivery backlog
dead-letter accumulation

When to replay a single delivery

Replay one delivery when:

the underlying bug is fixed
the failure was isolated
you know exactly which delivery should be retried

This is the safest recovery path because it limits duplicate downstream work.

When to replay dead letters in batch

Replay a batch only when the failure was systemic, for example:

the receiver was down
signature validation was broken
a parser bug affected many deliveries

Batch replay is a recovery tool, not a first diagnostic step.

When to inspect delivery health

Delivery health matters when you see:

growing retry counts
a rising dead-letter count
queued deliveries aging instead of draining
inconsistent downstream state after valid writes

Those are delivery signals, not application-state signals.

Practical recovery order

Inspect the operational snapshot.
Decide whether the issue is auth, grants, or delivery.
Fix the underlying cause.
Replay one delivery if the failure was isolated.
Replay dead letters in batch only when the failure was systemic.

What this page is for
Start with diagnosis, not replay
When to replay a single delivery
When to replay dead letters in batch
When to inspect delivery health
Practical recovery order
Related guides