How Real-Time is Amazon Cloudwatch Logs Live Tail?
AWS released Amazon CloudWatch Logs Live Tail on June 6th, 2023. The name is on point even if it is a mouthful (though terse by AWS standards) - Live Tail is a service that helps developers follow (tail) cloudwatch logs in real-time. It also layers on additional goodies like dynamic filters and highlights.
The real-time part of the service intrigued me - cloudwatch logs, anecdotally, tend to have a latency between 5s to 20s for log ingestion. Could Live Tail do better? And how does it compare to the existing CLI workflow?
aws logs tail {LogGroupName} --follow --format json
NOTE:
aws logs tail
is not equivalent to Live Tail and has been available long before
Experiment time!
Setup
To run the test, I set up a serverless stack consisting of API Gateway + Lambda.
The experiment:
Use both Live Tail and the AWS CLI to tail the created cloudwatch logs
Send an HTTP request to API Gateway
Observe when logs showed up in live tail vs the CLI
To measure when logs showed up, I split my window into three panes.
Pane 1 represented the Live Tail console
Pane 2 represented the results of running the CLI
Pane 3 was opened to https://currentmillis.com which showed the current time in milliseconds
I recorded my screen to track when logs showed up in panes 1 and 2. I did this testing across two scenarios - the first from a lambda cold start (this first created a new cloudwatch log stream), the second from a warm start (this appended logs to the existing stream)
You can see a sample run below
I then compared this data to the output of the GetLogEvents API call. This returns events from a particular log group in the following format:
{
"events": [
{
"timestamp": 1687921318305,
"message": "INIT_START Runtime Version: nodejs:16.v14\tRuntime Version ARN: arn:aws:lambda:us-east-1::runtime:***\n",
"ingestionTime": 1687921325842,
},
...
}
timestamp
: the time when the event was emitted (from lambda)ingestionTime
: the time when the event was processed (in cloudwatch logs)
TIP: By subtracting the
ingestionTime
from thetimestamp
, you get the cloudwatch log delivery latency.
Observations
The initial call from cold start yielded the following timestamps (in milliseconds since epoch). Timestamps are space-delimited for readability.
call: 16879213 18 305,
inge: 16879213 25 842
tail: 16879213 27 193
foll: 16879213 27 193
The way to read the timestamps:
call: when API call was made (
timestamp
field)inge: when log was ingested by cloudwatch (
ingestionTime
)tail: when Live Tail started displaying the logs
foll: when the AWS CLI first started displaying the logs
Based on the initial data, you can see that Live Tail is downstream of cloudwatch logs. Ingestion of logs has a 7s latency. Tailing of the logs has a 2s additional latency. In this case, both tailing and following logs occurred within the same second.
Here is the result of another run, this time from a warm start.
call2: 16879213 38 725
inge2: 16879213 47 744
tail2: 16879213 50 096
foll2: 16879213 55 079
This time, there's a 9s ingestion latency followed by a 3s latency for Live Tail. The CLI has another additional 5s latency behind tailing.
I re-ran this experiment a few times and got similar results. Live Tail will stream logs ~2s from when it is first ingested in cloudwatch. Using the CLI, this can vary between 2s to 9s. Note that this is on top of the latency to ingest metrics into cloudwatch logs (usually between 5s-20s).
Implementation
Behind the scenes, I assumed that Live Tail was using websockets. This is hinted at by the pricing model (seconds of active usage) which charged for the number of active connections.
This can be verified by digging into the Chrome network inspector tab. At the start of a Live Tail session, the console makes the following request.
# NOTE: extraneous fields stripped for clarity
curl 'https://us-east-1.console.aws.amazon.com/cloudwatch/CloudWatch/data/auth.SdkSign/20230628033119884-***' \
-H 'origin: https://us-east-1.console.aws.amazon.com' \
...
--data-raw $'{
"Service":"logs","Endpoint":"logs.us-east-1.amazonaws.com","HttpMethod":"POST","Region":"us-east-1","Path":"/",
"Headers":{"X-Amz-User-Agent":"aws-sdk-js/2.1265.0 promise","X-Amz-Requested-Operation":"startLiveTail","Content-Type":"application/x-amz-json-1.1","X-Amz-Target":"Logs_20140328.StartLiveTail",...}'
Note the X-Amz-Target
set to Logs_20140328.StartLiveTail. This looked very much like an API call to start a WebSocket connection.
When inspective for websocket connections, you can see said connection
curl 'wss://us-east-1.prod.livetail.cloudwatchlogs.aws.dev/' \
-H 'Origin: https://us-east-1.console.aws.amazon.com' \
-H 'Sec-WebSocket-Key: **' \
-H 'Upgrade: websocket' \
-H 'Cache-Control: no-cache' \
-H 'Connection: Upgrade' \
-H 'Sec-WebSocket-Version: 13' \
-H 'Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits' \
Presumably, there is a real-time service in cloudwatch logs that will push new events through the Live Tail websocket as soon as it is ingested.
In this regard, Live Tail is unlike most AWS services. AWS has a reputation for being an API-first platform (Route53 launched with only API support back in 2010). It feels strange to see a new feature be launched that is console first - at the time of this writing, there is no way (that I know of) to integrate Live Tail via API.
Parting Thoughts
Live Tail is a service that works as advertised. It tails your cloudwatch logs in real-time with the caveat that it is subject to the same ingestion latency that is inherent in being downstream of cloudwatch logs.
While the CLI-based solution has existed for ages, Live Tail brings this to the AWS console with additional features (eg. the ability to tail multiple log groups, filtering, and syntax highlighting). The difference between Live Tail and the tail
cli command is that one is push-based and the other is pull-based. Live Tail can deliver logs from cloudwatch with consistent single-second latencies whereas, via the CLI, it is variable.
If you're already using the CLI, there's not much incentive to switch (unless you need the ability to tail multiple log groups at once). If you rely on the console for log diving, then Live Tail will be a welcome addition to your tool set.
Do note that Live Tail is not free. It has a heart palpitation-inducing per-second pricing model ($0.01 per minute) that is offset, in part, by the 1,800 free minutes per month free tier (that's 1h per day).
All in all, the addition of Live Tail fills the last gap in the cloudwatch suite of observability tooling. Just remember to not leave it running for too long!