A classic integration bug: the Finance module issues an invoice and sends a webhook to the Logistics module ("invoice paid, you can start shipping"), but webhook delivery fails — network blip, Logistics deploy in progress, whatever. Finance retries. Two packages ship. The customer gets two invoices. The complaint lands on the CTO's desk. In this article: the way out — outbox table + idempotency key + retry pattern, as we use it on Netorigo between Finance and Logistics.
Why not direct HTTP
The naive solution: Finance sends an HTTP POST directly to Logistics. On error, retry. But two problems:
- Transactional consistency is missing. Finance sends the webhook inside its DB transaction — if the transaction rolls back (because of some other error), the webhook has already gone out and Logistics kicks off processing for a non-existent invoice.
- Retries are not rolling. If Logistics is unreachable for 30 seconds, all of Finance's billing actions stall for that duration.
Next step: outbox table.
The outbox table
The Finance module records every event (e.g. InvoicePaidEvent) NOT via direct HTTP but with an INSERT into a finance_outbox table — IN THE SAME TRANSACTION as the one updating invoices.status = 'paid'. One SQL transaction, two INSERT/UPDATE: if anything rolls back, both roll back.
The table:
CREATE TABLE finance_outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
event_type TEXT NOT NULL, -- 'invoice.paid', 'invoice.refunded', etc.
payload JSONB NOT NULL,
idempotency_key TEXT NOT NULL UNIQUE,
delivery_status TEXT NOT NULL DEFAULT 'pending', -- pending | delivered | failed
delivery_attempts INT NOT NULL DEFAULT 0,
last_attempt_at TIMESTAMPTZ,
next_attempt_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
delivered_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX ON finance_outbox (delivery_status, next_attempt_at);
The idempotency_key is the stable identifier of the event — e.g. invoice.paid:<invoice_id>:<payment_id>. If Finance tries to send two paid events for the same (invoice_id, payment_id) pair, the UNIQUE constraint prevents it.
The delivery worker
A separate NestJS scheduler @Cron('*/5 * * * * *') (every 5 seconds) reads the outbox:
SELECT * FROM finance_outbox
WHERE delivery_status = 'pending'
AND next_attempt_at <= NOW()
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED;
FOR UPDATE SKIP LOCKED ensures two worker instances don't try to send the same event. For each row the worker resolves the target webhook URL (https://logistics.netorigo.svc/webhooks/finance), POSTs the payload with X-Idempotency-Key header.
On 2xx: delivery_status = 'delivered', delivered_at = NOW(). On 5xx or timeout: delivery_attempts += 1, next_attempt_at = NOW() + exp_backoff(attempts) (5s, 30s, 5min, 30min, 4h). After 8 attempts: delivery_status = 'failed', alert on a Slack channel.
The consumer side: using the idempotency key
The Logistics module's webhook controller does this:
@Post('/webhooks/finance')
async handleFinanceWebhook(@Headers('x-idempotency-key') key: string, @Body() body: any) {
const existing = await this.processedEventsRepo.findOne({ idempotency_key: key });
if (existing) {
// Already processed, respond 200
return { status: 'ok', already_processed: true };
}
await this.processedEventsRepo.insert({
idempotency_key: key,
received_at: new Date(),
payload: body,
});
// Now process it
await this.shipmentService.handleInvoicePaid(body);
return { status: 'ok' };
}
logistics_processed_events.idempotency_key is UNIQUE — if the Finance worker sends the event twice (because the first 200 response was lost on timeout), the second attempt finds the row via processedEventsRepo.findOne and Logistics doesn't start shipping a second time.
This is the exactly-once at-the-consumer pattern: Finance sends at-least-once, Logistics processes exactly-once.
Cleanup
finance_outbox rows with delivered = true are archived (finance_outbox_archive) after 30 days, deleted after 1 year. logistics_processed_events rows are deleted after 90 days — by then the Finance worker won't be sending retries any more.
The full round trip
Finance writes invoice.paid → finance_outbox INSERT (same TX) → cron worker picks up the row → HTTP POST to Logistics with X-Idempotency-Key → Logistics processed_events check + INSERT (same TX) → shipment-state changes to waiting → 200 OK → Finance marks outbox row delivered.
If anything stalls — Logistics is offline, network is down, Finance restarts, Logistics restarts — the flow resumes on the next cron iteration with an exactly-once guarantee.
Closing
The webhook + outbox + idempotency-key pattern is the most reliable integration foundation for modular ERPs. A 30-row table + a 100-line scheduler + a 20-line consumer guard solve a problem that produces weekly support tickets in classic integrations. 18 months of production: zero duplicate invoices, zero lost shipments — the pattern works.