SC
Distributed Systems

Serverless Batch Processing at Scale

More Lambdas does not mean more throughput. At scale the bottleneck is almost always somewhere else.

Serverless is a great fit for batch workloads that are bursty and embarrassingly parallel — processing invoices, documents, statements. You pay for work done, not for idle servers.

But the mental model that "more concurrent functions = more throughput" breaks quickly.

The trap

The naive design fans out one function per item and assumes it scales linearly.

It doesn't. Past a certain point, adding concurrency makes things worse.

The bottleneck is rarely the compute

When throughput stalls, the limit is almost always downstream:

  • Database connections. Thousands of concurrent functions exhaust the connection pool instantly. You need pooling or a proxy, not more functions.
  • Third-party APIs. ERPs, payment gateways, and CRMs have rate limits. Fan out past them and you get throttled or banned.
  • Queue backpressure. SQS, visibility timeouts, and redrive policies decide your real throughput more than function count does.
  • Rate limits everywhere — including your own LLM provider.

Lesson. Invoking more Lambdas does not equal more throughput. The bottleneck is usually outside Lambda. Tune for the slowest downstream dependency, not the function.

What actually moves the needle

The biggest win I've had came from rethinking the processing model, not the function:

  • Batch deliberately. Process N items per invocation to amortize cold starts, connection setup, and per-call overhead — instead of one item per function.
  • Cap concurrency on purpose. Reserved/maximum concurrency protects shared downstreams. A lower, steady concurrency often beats an uncapped spike.
  • Separate ingestion from processing so a burst of arrivals doesn't translate into a burst of downstream load.
  • Make each unit idempotent so retries and partial failures are safe.

Redesigning the batching model this way — right-sizing batch size and concurrency rather than maximizing fan-out — cut resource consumption by roughly 80% on a pipeline handling millions of invoices monthly, while improving throughput and reliability.

Rule of thumb

Find the slowest thing your functions touch. Size everything else around it. Throughput is a property of the whole pipeline, not the function.