Engineering

Three lessons from a Stripe webhook idempotency bug we shipped

January 9, 2026 · 7 min read · By ErrorLens Team

We had an idempotency check that wasn’t actually idempotent. The result: duplicate plan upgrades, two customers refunded, an hour of incident time, and a few engineering principles we’re re-internalising.

In December a Stripe webhook handler we’d shipped weeks earlier produced two duplicate plan upgrades for the same customers. We caught it in monitoring within an hour, refunded the duplicates, and rolled out a fix the same evening. The bug was small. The lessons are general enough to be worth sharing.

What we shipped

The handler looked, in shorthand, like this:

POST /api/billing/webhook
  verify Stripe signature
  if event.id already in stripe_events table: return 200 (duplicate)
  process the event (e.g. update user.plan)
  insert event.id into stripe_events
  return 200

The intent was idempotency: Stripe will redeliver an event up to a few hundred times if you don’t respond 2xx, so duplicate handling has to be a non-event. The check at the top — “is this event already in our table?” — was supposed to catch redeliveries.

Why it didn’t work

The problem: Stripe sometimes redelivers within milliseconds of the original. If two retries arrive nearly simultaneously and our handler processes them in parallel (which Vercel’s serverless functions absolutely will), both pass the “already in table” check before either has finished inserting. Both then run the upgrade. Both insert the event id (with ON CONFLICT DO NOTHING, so the second insert is a no-op). The user’s plan got upgraded twice. The user was billed twice for the proration.

The check was idempotent in the sense of “eventually the row exists”. It was not idempotent in the sense of “the side effect runs at most once”. Those are different properties.

The fix

Move the INSERT ... ON CONFLICT DO NOTHING to the top of the handler, and gate the side effect on whether the insert returned a row:

const inserted = await query(
  `INSERT INTO stripe_events (event_id, event_type)
   VALUES ($1, $2)
   ON CONFLICT (event_id) DO NOTHING
   RETURNING event_id`,
  [event.id, event.type]
);
if (inserted.rows.length === 0) {
  // duplicate, ack and bail
  return NextResponse.json({ received: true, duplicate: true });
}
// We are guaranteed to be the only handler that ever runs the side effect
// for this event id, because the database accepted exactly one INSERT.
processEvent(event);

The crucial change is using the database’s atomic INSERT ... ON CONFLICT as a mutex. Whichever request wins the insert gets to run the side effect; the loser gets rows.length === 0 and bails. There’s no “check, then act” window any more.

Lesson 1: idempotency means “exactly once side effect”, not “eventually consistent state”

The original code looked idempotent. Read it linearly and it does the right thing. The bug was only visible if you imagined two copies running in parallel and asked “what happens if both pass the check before either inserts?”.

That kind of reasoning — “what if this code runs concurrently with itself?” — needs to be a check on every webhook, every cron, every fire-and-forget endpoint. We added it to our PR template.

Lesson 2: the database is the only correct mutex

It’s tempting to reach for an in-memory lock or a Redis SETNX. Both work in single-region setups; neither survives a multi-region rollout. The database, with INSERT ... ON CONFLICT against a primary or unique key, gives you a mutex that is correct under partition, restart, and any number of replicas. Free.

Use it as the first line of defence on any operation that must run at most once.

Lesson 3: replay the webhook once you’ve fixed the handler

Stripe’s dashboard has a “Resend” button on every event. After deploying the fix we resent the original event for the affected customers and verified the new code path correctly identified it as a duplicate and bailed. Then we ran the same exercise against three randomly chosen recent events to confirm we hadn’t broken the happy path.

This is mundane but it’s the only way to know your fix actually fixed the thing. Logs will tell you the new code ran; a real replay tells you the new code did the right thing.

What we measure now

We added two alerts:

Both metrics together would have caught the original bug in minutes. We’ll know if we ship a similar shape of bug again, even if we don’t spot it in code review.

The root cause of every “ran twice” bug we’ve ever seen has been the same: a check-then-act sequence with a window in between. The fix is always the same: collapse the check and the act into a single atomic operation against shared state. The lesson keeps surfacing because the bug pattern is easy to write and hard to spot.

More articles