Serverless Batch Processing at Scale
More Lambdas does not mean more throughput. At scale the bottleneck is almost always somewhere else.
Serverless is a great fit for batch workloads that are bursty and embarrassingly parallel — processing invoices, documents, statements. You pay for work done, not for idle servers.
But the mental model that "more concurrent functions = more throughput" breaks quickly.
The trap
The naive design fans out one function per item and assumes it scales linearly.
It doesn't. Past a certain point, adding concurrency makes things worse.
The bottleneck is rarely the compute
When throughput stalls, the limit is almost always downstream:
- Database connections. Thousands of concurrent functions exhaust the connection pool instantly. You need pooling or a proxy, not more functions.
- Third-party APIs. ERPs, payment gateways, and CRMs have rate limits. Fan out past them and you get throttled or banned.
- Queue backpressure. SQS, visibility timeouts, and redrive policies decide your real throughput more than function count does.
- Rate limits everywhere — including your own LLM provider.
Lesson. Invoking more Lambdas does not equal more throughput. The bottleneck is usually outside Lambda. Tune for the slowest downstream dependency, not the function.
What actually moves the needle
The biggest win I've had came from rethinking the processing model, not the function:
- Batch deliberately. Process N items per invocation to amortize cold starts, connection setup, and per-call overhead — instead of one item per function.
- Cap concurrency on purpose. Reserved/maximum concurrency protects shared downstreams. A lower, steady concurrency often beats an uncapped spike.
- Separate ingestion from processing so a burst of arrivals doesn't translate into a burst of downstream load.
- Make each unit idempotent so retries and partial failures are safe.
Redesigning the batching model this way — right-sizing batch size and concurrency rather than maximizing fan-out — cut resource consumption by roughly 80% on a pipeline handling millions of invoices monthly, while improving throughput and reliability.
Rule of thumb
Find the slowest thing your functions touch. Size everything else around it. Throughput is a property of the whole pipeline, not the function.
Idempotency Patterns
Making operations safe to retry — the single most important property in a payments or integration system. A retried payment must never become a double charge.
Multi-Tenant SaaS Architecture
Serving many customers from one system while making each feel like they have their own. The question is always how strongly tenants are isolated — and at what operational cost.