An alarm tells you something is broken. Logs tell you what happened. But when a user reports “I clicked the button and nothing happened,” you need to trace that single request from the moment it hit API Gateway, through your Lambda function, into DynamoDB, and back. That means connecting log entries across services into a single story.
If you want AWS’s version of the monitoring stack nearby while you read, the CloudWatch overview is the official reference.
This is the difference between monitoring and observability. Monitoring tells you “errors went up.” Observability lets you ask “why did this specific request fail?” and follow the breadcrumbs to the answer. I find this distinction matters more than it sounds like it should.
flowchart LR
Browser["Browser request"] --> APIGateway["API Gateway"]
APIGateway --> Lambda["Lambda logs requestId"]
Lambda --> DynamoDB["DynamoDB operation"]
DynamoDB --> Lambda
Lambda --> Logs["Structured logs in CloudWatch"]
APIGateway --> Logs
Logs --> Insights["Logs Insights query by correlation ID"]The Problem: Disconnected Logs
Right now, your application produces logs in multiple places:
- API Gateway logs the incoming HTTP request (method, path, status code, latency)
- Lambda logs whatever you write with
console.log—and with structured logging from CloudWatch Log Groups and Structured Logging, each entry includes fields likelevel,message, andduration - DynamoDB publishes metrics (latency, consumed capacity) but doesn’t produce per-request logs
The challenge: these logs live in separate log groups with no connection between them. API Gateway processed a request, Lambda ran a function, DynamoDB stored an item—but there’s nothing linking those three events together. If the Lambda function failed, you can’t tell which API Gateway request triggered it or which DynamoDB operation it was attempting.
Correlation IDs: The Connecting Thread
A correlation ID is a unique identifier that follows a request through every service it touches. API Gateway generates one for every incoming request—the requestId in the event object that Lambda receives. By including this ID in every log entry your Lambda function writes, you create a thread that ties everything together.
You saw this in the structured logging example from the previous lesson. Here’s a more complete version that logs at every stage of the request lifecycle:
import type { APIGatewayProxyHandlerV2 } from 'aws-lambda';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient, GetCommand, PutCommand } from '@aws-sdk/lib-dynamodb';
const client = DynamoDBDocumentClient.from(new DynamoDBClient({}));
const TABLE_NAME = process.env.TABLE_NAME ?? 'my-frontend-app-data';
interface LogEntry {
level: string;
message: string;
requestId: string;
[key: string]: unknown;
}
function log(entry: LogEntry): void {
console.log(JSON.stringify(entry));
}
export const handler: APIGatewayProxyHandlerV2 = async (event) => {
const requestId = event.requestContext?.requestId ?? crypto.randomUUID();
Note The requestId from API Gateway becomes the correlation ID for this entire request. const method = event.requestContext?.http?.method ?? 'UNKNOWN';
const path = event.rawPath ?? '/';
log({
level: 'INFO',
message: 'Request started',
requestId,
method,
path,
queryParams: event.queryStringParameters ?? {},
});
try {
const itemId = event.queryStringParameters?.id;
if (method === 'GET' && itemId) {
const dynamoStart = Date.now();
const result = await client.send(
new GetCommand({
TableName: TABLE_NAME,
Key: { id: itemId },
}),
);
const dynamoDuration = Date.now() - dynamoStart;
log({
level: 'INFO',
message: 'DynamoDB read completed',
requestId,
operation: 'GetItem',
tableName: TABLE_NAME,
itemId,
found: !!result.Item,
dynamoDuration,
});
return {
statusCode: result.Item ? 200 : 404,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(result.Item ?? { error: 'Not found' }),
};
}
if (method === 'POST') {
const body = JSON.parse(event.body ?? '{}');
const newItem = { id: crypto.randomUUID(), ...body, createdAt: new Date().toISOString() };
const dynamoStart = Date.now();
await client.send(
new PutCommand({
TableName: TABLE_NAME,
Item: newItem,
}),
);
const dynamoDuration = Date.now() - dynamoStart;
log({
level: 'INFO',
message: 'DynamoDB write completed',
requestId,
operation: 'PutItem',
tableName: TABLE_NAME,
itemId: newItem.id,
dynamoDuration,
});
return {
statusCode: 201,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(newItem),
};
}
return {
statusCode: 405,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ error: 'Method not allowed' }),
};
} catch (error) {
log({
level: 'ERROR',
message: 'Request failed',
requestId,
error: error instanceof Error ? error.message : String(error),
stack: error instanceof Error ? error.stack : undefined,
});
return {
statusCode: 500,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ error: 'Internal server error' }),
};
}
};Every log entry includes the same requestId. A single request produces multiple log entries—“Request started,” “DynamoDB read completed,” and potentially “Request failed”—all linked by the correlation ID.
Querying by Correlation ID
When you need to trace a specific request, you run a Logs Insights query that filters by requestId:
fields @timestamp, level, message, requestId, operation, dynamoDuration, error
| filter requestId = "abc-123-def-456"
| sort @timestamp ascThis returns every log entry for that request in chronological order. You can see:
- When the request started
- What method and path were used
- Whether the DynamoDB operation succeeded
- How long DynamoDB took
- If and why the request failed
Running the Query from the CLI
aws logs start-query \
--log-group-name /aws/lambda/my-frontend-app-api \
--start-time $(date -v-1H +%s) \
--end-time $(date +%s) \
--query-string 'fields @timestamp, level, message, requestId, operation, dynamoDuration, error | filter requestId = "abc-123-def-456" | sort @timestamp asc' \
--region us-east-1 \
--output jsonThen fetch the results:
aws logs get-query-results \
--query-id "your-query-id-here" \
--region us-east-1 \
--output jsonCommon Tracing Queries
Beyond looking up a single request, Insights queries help you spot patterns.
Find the Slowest DynamoDB Operations
fields @timestamp, requestId, operation, dynamoDuration, itemId
| filter dynamoDuration > 0
| sort dynamoDuration desc
| limit 20This shows you which requests had the slowest database interactions. If you see GetItem operations taking 200+ ms consistently, you might have a hot partition key—revisit DynamoDB Tables and Keys for key design.
Find All Failed Requests in the Last Hour
fields @timestamp, requestId, message, error
| filter level = "ERROR"
| sort @timestamp desc
| limit 50Correlate Errors with Specific Operations
fields @timestamp, requestId, operation, error
| filter level = "ERROR" and operation = "PutItem"
| sort @timestamp descThis shows you only the write failures—maybe your Lambda function has insufficient permissions to write to DynamoDB (check the execution role you set up in Lambda Execution Roles and Permissions).
Calculate DynamoDB Latency Percentiles
filter message = "DynamoDB read completed"
| stats avg(dynamoDuration) as avgLatency,
pct(dynamoDuration, 95) as p95Latency,
pct(dynamoDuration, 99) as p99Latency
by bin(5m)This gives you a time-series view of your database read latency, broken into 5-minute buckets. If p99 is climbing over time, something’s changing—maybe growing table size, maybe a shift in access patterns.
Piecing Together the Full Request Path
Here’s what a traced request looks like when you put it all together. A user calls your API, and you reconstruct the full path from logs:
Step 1: API Gateway receives the request. You can see this in the API Gateway access logs (if enabled) or infer it from the Lambda log’s first entry. The requestId from API Gateway becomes your correlation ID.
Step 2: Lambda starts executing. Your first structured log entry fires: “Request started” with the requestId, method, path, and query parameters.
Step 3: Lambda calls DynamoDB. Your log entry captures the operation type, table name, and how long the call took.
Step 4: Lambda returns a response. If the request succeeded, the final log entry says “Request completed” with the status code and total duration. If it failed, the error entry captures the error message and stack trace.
A Logs Insights query that reconstructs this timeline:
fields @timestamp, level, message, operation, dynamoDuration, error
| filter requestId = "abc-123-def-456"
| sort @timestamp ascThe result might look like:
| @timestamp | level | message | operation | dynamoDuration | error |
|---|---|---|---|---|---|
| 14:30:00.100 | INFO | Request started | |||
| 14:30:00.145 | INFO | DynamoDB read completed | GetItem | 42 | |
| 14:30:00.150 | INFO | Request completed |
Three log entries, 50 milliseconds total, 42 of which were spent in DynamoDB. If this request had failed, the “Request completed” entry would be replaced by a “Request failed” entry with an error message and stack trace.
Beyond Manual Tracing
What you’ve built here—correlation IDs plus structured logging plus Insights queries—is the foundation of request tracing. AWS offers more sophisticated tracing through X-Ray, which automatically instruments SDK calls and produces visual service maps. X-Ray is beyond the scope of this course, but the structured logging patterns you’ve learned here work with or without it. Correlation IDs and structured logs are the minimum viable observability for any production application.
If your frontend returns a request ID to the user (in a response header or error message), support incidents become dramatically easier. The user says “I got error abc-123,” you run the Insights query, and you have the full trace in seconds.
You now have the complete monitoring stack: structured logs you can query, metrics dashboards you can glance at, alarms that tell you when things break, and correlation IDs that let you trace individual requests across services. In the exercise that follows, you’ll put the alarm skills to practice—creating error and duration alarms for your Lambda function, wiring them to SNS, and triggering them intentionally to verify the pipeline works end to end.