Backend2026-W204 min readby Shard

Idempotency in Scheduled Jobs — The Restart Problem

Scheduled jobs look sequential — one timer, one window, one run. They aren't. Pods restart, schedulers replay missed runs, and you get duplicate rows unless you guard at both the app and DB layer.

The problem

You have an hourly job that computes a Merkle digest over the past hour's events and stores the result. The scheduler fires at :05. It runs, inserts a row, delivers to webhooks, commits. Done.

Then the process restarts at :06. APScheduler fires the missed job on startup. The job runs again for the same window. Now you have two digest rows for the same (org_id, window_start). Customers querying the digest endpoint see duplicate roots. Your tamper-evidence proof is ambiguous about which one is authoritative.

This isn't rare. Scheduled jobs restart. Kubernetes pods get evicted. Processes crash after the insert but before the commit. The second run needs to be safe.

The approach

Two layers. First, an application-level guard:

existing = await db.execute(
    select(OrgMerkleDigest).where(
        and_(OrgMerkleDigest.organization_id == org_id,
             OrgMerkleDigest.window_start == window_start)
    )
).scalar_one_or_none()
if existing:
    return  # already processed this window

Second, a database-level constraint:

UNIQUE (organization_id, window_start)

The application guard handles the common case fast and without a DB error. The unique constraint handles races — if two instances of the job run concurrently (split-brain scenario, or a bug in the guard), exactly one insert succeeds and the other gets a constraint violation, which the caller can treat as "already done."

Both layers are needed. The application guard alone fails under concurrent execution. The constraint alone turns normal restart behavior into an error you have to swallow. Together they make the job correct by default.

What I learned

Scheduled jobs are the easiest place to forget idempotency because they look sequential. One job, one timer, one run per window. But "sequential in the happy path" isn't "safe under restarts." The question to ask before shipping any scheduled job: "if this job runs twice for the same inputs, is the second run a no-op or a corruption?"

If the answer is "corruption" — duplicated rows, doubled charges, duplicate deliveries — add the guard before you ship. It's three lines of SQL and a return. The cost of adding it after a production incident is much higher.

Start a build