Blog
Concurrency2026-W184 min read

Offloading Slow Background Work from an Async Web Server

A 30-second LLM call inside an async route ties up the event loop and starves every other request. `ThreadPoolExecutor` + `run_in_executor` is the right tool — and the executor must be sized to the downstream resource's real concurrency ceiling, not to throughput.

A chef's hands plating a dish on a stainless-steel pass at a busy restaurant kitchen, sharp focus on the plate, background blurred with motion, no faces visible, warm overhead light, editorial.

The problem

A FastAPI endpoint receives a request, saves some data, and needs to kick off a slow background operation (in this case, an LLM call that takes 30–120 seconds). The response must return immediately — the slow work can happen whenever. The naive solution, asyncio.create_task(slow_coroutine()), only works if the slow work is an async function. If it calls a subprocess or blocks on I/O, it ties up the event loop thread.

The approach

ThreadPoolExecutor + loop.run_in_executor is the right tool: submit a regular synchronous function to a thread pool, get back a future, don't await it.

_executor = ThreadPoolExecutor(max_workers=2, thread_name_prefix="bg-worker") async def my_endpoint(request): save_to_db(data) loop = asyncio.get_running_loop() loop.run_in_executor(_executor, slow_sync_function, arg1, arg2) return JSONResponse({"ok": True})

Two details that matter:

Use asyncio.get_running_loop(), not asyncio.get_event_loop(). Inside an async function, there is always a running loop — get_running_loop() returns it directly and is the correct call. get_event_loop() is deprecated for this use in Python 3.10+ and raises a DeprecationWarning; in some 3.12+ configurations it can raise instead of warn.

Size the executor for concurrency, not throughput. max_workers=2 for an LLM background call means at most 2 simultaneous LLM subprocesses. A large thread pool here would just queue up LLM calls — more than 2–3 concurrent invocations is usually counterproductive. Match the pool size to the real concurrency ceiling of the downstream resource.

The future returned by run_in_executor is discarded here intentionally — this is a fire-and-forget pattern. Since the sync function catches its own exceptions and writes results directly to a shared store (Redis), the discard is safe. If you needed error propagation, you'd await the future or attach a callback.

What I learned

The run_in_executor fire-and-forget pattern is clean and easy to reason about, but it requires the sync function to be completely self-contained: catch your own exceptions, write results to a shared store, log failures internally. If you let exceptions escape uncaught from a discarded future, they go to the asyncio exception handler (or are silently dropped, depending on Python version). The discipline of "if you fire and forget, handle your own errors" makes this pattern composable without surprises.