When you build on AWS, one of the first decisions you make is where my code runs. Three broad patterns have emerged over the last decade:
| Architecture | Core Idea | AWS Compute Primitive |
|---|---|---|
| Server-Based | You provision and manage servers. Code runs on always-on instances. | EC2, ECS/EKS (Fargate excluded) |
| Serverless | You write code, AWS runs it. No servers to think about. | Lambda, Fargate, API Gateway, DynamoDB |
| Hybrid | You use both serverless for event-driven spikes, servers for sustained load. | Mix of everything |
None is universally “best.” The right choice depends on your workload’s traffic pattern, latency requirements, operational maturity, and cost profile.
A useful mental model is to think in terms of two user levels:
- Baseline users — the consistent, everyday load your system always handles. Serverful infrastructure is well-suited here because you can size servers for this level and pay a flat, predictable rate.
- Peak users — the burst above baseline that happens during launches, sales, or viral moments. Serverless shines here because it scales instantly to absorb the spike and scales back to zero when it’s over.
The gap between your baseline and peak load is often the deciding factor in which architecture — or combination — makes the most sense.
Serverless
Serverless computing is a cloud-computing execution model where the cloud provider dynamically manages the allocation and provisioning of servers. In this model, developers can focus on writing code without worrying about the underlying infrastructure.
| Pros | Cons |
|---|---|
| No server management — infrastructure is handled by the provider | Cold starts can introduce latency on first invocation |
| Scales automatically from zero to peak demand | Execution time limits (e.g. 15 min max on Lambda) |
| Pay-per-use pricing — no cost when idle | Cost more than serverful on long-running or stateful workloads |
| Built-in high availability and fault tolerance | Debugging and local testing can be complex |
| Encourages small, focused functions | Cost can spike unexpectedly under high sustained traffic |
When to use serverless
| Use Serverless When… | Example |
|---|---|
| Traffic is spiky or unpredictable | A marketing campaign site that gets bursts of visitors |
| You need event-driven processing | Resize images when uploaded to S3 |
| You want to minimize operational overhead | A small startup with no dedicated ops team |
| Workloads are short-lived | Sending a welcome email after user signup |
| You’re building APIs with variable load | A REST API that sees low traffic at night and peaks during day |
When NOT to use serverless
| Avoid Serverless When… | Example |
|---|---|
| Workloads run continuously | A real-time trading engine that must always be responsive |
| Latency is critical and cold starts hurt | A low-latency gaming backend or financial tick processor |
| Execution runs longer than provider limits | A video transcoding job that takes 30+ minutes |
Serverful
Serverful computing refers to traditional server-based architectures where developers provision and manage servers to run their applications. This model provides more control over the environment but requires more operational effort.
| Pros | Cons |
|---|---|
| Full control over the runtime, OS, and hardware configuration | Requires provisioning, patching, and maintaining servers |
| Predictable performance — no cold starts or execution time limits | Must over-provision capacity to handle traffic spikes |
| Cost-effective for sustained, high-throughput workloads | You pay for idle capacity even when traffic is low |
| Easier to run stateful applications and persistent connections | Scaling requires manual configuration or complex auto-scaling |
| Simpler local development and debugging experience | Higher operational overhead — needs DevOps expertise |
When to use serverful
| Use Serverful When… | Example |
|---|---|
| Workloads run continuously at high throughput | A high-traffic e-commerce platform with millions of daily users |
| Low and consistent latency is required | A multiplayer game server maintaining persistent player connections |
| Jobs run longer than serverless execution limits | Batch video transcoding or large ML model training jobs |
When NOT to use serverful
| Avoid Serverful When… | Example |
|---|---|
| Traffic is unpredictable or mostly idle | A seasonal campaign site that’s quiet 90% of the year |
| Workloads are short, event-triggered tasks | Sending notifications or processing webhook payloads |
Hybrid
A hybrid architecture combines serverful and serverless components within the same system. The idea is to run your baseline workload on always-on servers — where predictable cost and low latency matter — while offloading peak or event-driven workloads to serverless functions that scale automatically and cost nothing when idle.
For example, a streaming platform might run its core video delivery service on EC2 instances (serverful) for consistent low-latency playback, while using Lambda (serverless) to handle thumbnail generation, notification dispatch, and usage analytics — tasks that are triggered by events and don’t need to run continuously.
This approach lets you optimise for cost and performance at each layer rather than forcing every workload into a single model.