Scaling Serverless APIs: Lessons from 10M Monthly Requests
Practical techniques and patterns for building serverless APIs that scale gracefully to handle millions of requests.
Scaling Serverless APIs: Lessons from 10M Monthly Requests
When I first built a serverless API using AWS Lambda and API Gateway, I was amazed by how quickly I could deploy a working endpoint. But as my user base grew from hundreds to thousands to millions, I encountered scaling challenges that required rethinking my approach. Here’s what I learned while scaling a serverless API to handle over 10 million monthly requests.
Initial Architecture
My application started with a simple architecture:
- API Gateway as the entry point
- Lambda functions for business logic
- DynamoDB for data storage
- S3 for file storage
This worked beautifully until we hit about 1 million monthly requests.
Challenge 1: Cold Starts
Problem: As traffic increased, users occasionally experienced delays of 1-2 seconds when their requests hit a “cold” Lambda function.
Solution: I implemented a combination of strategies:
-
Provisioned concurrency: For critical endpoints, I configured provisioned concurrency to keep functions warm at all times.
-
Optimized runtime: Switching from Node.js with a heavy framework to lightweight functions significantly reduced cold start times.
-
Streamlined dependencies: I audited dependencies ruthlessly, removing anything non-essential and using tools like esbuild to minimize bundle sizes.
Code example:
// Before: Heavy dependenciesimport * as lodash from "lodash"; // Imports the entire libraryimport { parse } from "date-fns";import * as AWS from "aws-sdk"; // Imports the entire SDK
// After: Optimized importsimport pick from "lodash/pick"; // Imports only what's neededimport { parseISO } from "date-fns/fp"; // Smaller functional versionimport { DynamoDB } from "@aws-sdk/client-dynamodb"; // Only import specific serviceChallenge 2: Database Scaling
Problem: As data volume grew, DynamoDB began throttling requests during peak traffic.
Solution: I refined our data access patterns with these techniques:
-
On-demand capacity: Switched from provisioned to on-demand capacity to handle unpredictable traffic spikes.
-
Caching strategy: Implemented a multi-level caching approach:
- In-memory cache within Lambda functions for hot data
- DAX (DynamoDB Accelerator) for frequently accessed items
- ElastiCache for shared caching needs
-
Data partitioning: Redesigned partition keys to distribute data more evenly and avoid hot partitions.
Before:
// Problematic schema with potential hot partitionconst params = { TableName: "Users", Key: { userId: "123", // Many operations on the same user create a hot key },};After:
// Improved schema with composite keys for better distributionconst params = { TableName: "UserActions", Key: { userId: "123", actionId: `${timestamp}#${uuid()}`, // Ensures even distribution },};Challenge 3: Cost Optimization
Problem: As scale increased, costs grew faster than expected, particularly with API Gateway and Lambda.
Solution: I implemented these optimizations:
-
Batch processing: Instead of processing events one by one, I implemented batching where appropriate.
-
Lambda power tuning: Used the AWS Lambda Power Tuning tool to find the optimal memory/CPU configuration for each function, sometimes finding that higher memory settings actually reduced costs by completing faster.
-
API Gateway caching: Enabled caching at the API Gateway level for frequently accessed, relatively static data.
-
GraphQL consolidation: Replaced multiple REST endpoints with a single GraphQL endpoint, reducing the total number of Lambda invocations.
Challenge 4: Monitoring and Observability
Problem: As complexity increased, it became difficult to identify performance bottlenecks and errors.
Solution: I built a comprehensive observability system:
-
Structured logging: Standardized logging format across all functions with correlation IDs to track requests.
-
Custom metrics: Created custom CloudWatch metrics for business-level monitoring.
-
Tracing: Implemented AWS X-Ray tracing to identify latency issues across services.
-
Alerting: Set up automated alerting based on error rates and latency thresholds.
Example structured logging:
const logger = (correlationId: string) => ({ info: (message: string, data?: Record<string, unknown>) => console.log( JSON.stringify({ level: "INFO", timestamp: new Date().toISOString(), correlationId, message, ...data, }) ), error: (message: string, error?: Error, data?: Record<string, unknown>) => console.error( JSON.stringify({ level: "ERROR", timestamp: new Date().toISOString(), correlationId, message, errorName: error?.name, errorMessage: error?.message, stackTrace: error?.stack, ...data, }) ),});
// Usage in Lambda handlerexport const handler = async (event: APIGatewayProxyEvent) => { const correlationId = event.headers["x-correlation-id"] || uuidv4(); const log = logger(correlationId);
// ...};Conclusion
Scaling serverless applications requires a shift in mindset from traditional server-based architectures. By understanding the nuances of cold starts, database access patterns, and cost structures, you can build highly scalable systems that handle millions of requests without breaking the bank.
What challenges have you faced when scaling serverless apps? Let me know in the comments!
Related Posts
Continue exploring similar topics
Welcome to the Blog
This is your first blog post written in MDX.
Building My First SaaS: Lessons Learned
A retrospective on launching my first software-as-a-service product and the key takeaways from the experience.
Mastering React Hooks: Beyond the Basics
Take your React skills to the next level by mastering hooks with these advanced patterns and techniques.