Bit by Bit

Share this post

How Real-Time is Amazon Cloudwatch Logs Live Tail?

bit.kevinslin.com

Discover more from Bit by Bit

All things observability in the cloud
Continue reading
Sign in

How Real-Time is Amazon Cloudwatch Logs Live Tail?

Kevin Lin
Jun 29, 2023
2
Share this post

How Real-Time is Amazon Cloudwatch Logs Live Tail?

bit.kevinslin.com
Share
bullet time with light speed travel background , digital art

AWS released Amazon CloudWatch Logs Live Tail on June 6th, 2023. The name is on point even if it is a mouthful (though terse by AWS standards) - Live Tail is a service that helps developers follow (tail) cloudwatch logs in real-time. It also layers on additional goodies like dynamic filters and highlights.

The real-time part of the service intrigued me - cloudwatch logs, anecdotally, tend to have a latency between 5s to 20s for log ingestion. Could Live Tail do better? And how does it compare to the existing CLI workflow?

aws logs tail {LogGroupName} --follow --format json

NOTE: aws logs tail is not equivalent to Live Tail and has been available long before

Experiment time!

Setup

To run the test, I set up a serverless stack consisting of API Gateway + Lambda.

API Gateway to Lambda

The experiment:

  1. Use both Live Tail and the AWS CLI to tail the created cloudwatch logs

  2. Send an HTTP request to API Gateway

  3. Observe when logs showed up in live tail vs the CLI

To measure when logs showed up, I split my window into three panes.

  • Pane 1 represented the Live Tail console

  • Pane 2 represented the results of running the CLI

  • Pane 3 was opened to https://currentmillis.com which showed the current time in milliseconds

Live Tail and CLI Setup

I recorded my screen to track when logs showed up in panes 1 and 2. I did this testing across two scenarios - the first from a lambda cold start (this first created a new cloudwatch log stream), the second from a warm start (this appended logs to the existing stream)

You can see a sample run below

Loading video

I then compared this data to the output of the GetLogEvents API call. This returns events from a particular log group in the following format:

{
  "events": [
    {
      "timestamp": 1687921318305,
      "message": "INIT_START Runtime Version: nodejs:16.v14\tRuntime Version ARN: arn:aws:lambda:us-east-1::runtime:***\n",
      "ingestionTime": 1687921325842,
    },
    ...
}
  • timestamp: the time when the event was emitted (from lambda)

  • ingestionTime: the time when the event was processed (in cloudwatch logs)

TIP: By subtracting the ingestionTime from the timestamp, you get the cloudwatch log delivery latency.

Observations

The initial call from cold start yielded the following timestamps (in milliseconds since epoch). Timestamps are space-delimited for readability.

call: 16879213 18 305,
inge: 16879213 25 842
tail: 16879213 27 193
foll: 16879213 27 193

The way to read the timestamps:

  • call: when API call was made (timestamp field)

  • inge: when log was ingested by cloudwatch (ingestionTime)

  • tail: when Live Tail started displaying the logs

  • foll: when the AWS CLI first started displaying the logs

Based on the initial data, you can see that Live Tail is downstream of cloudwatch logs. Ingestion of logs has a 7s latency. Tailing of the logs has a 2s additional latency. In this case, both tailing and following logs occurred within the same second.

Here is the result of another run, this time from a warm start.

call2: 16879213 38 725 
inge2: 16879213 47 744 
tail2: 16879213 50 096
foll2: 16879213 55 079

This time, there's a 9s ingestion latency followed by a 3s latency for Live Tail. The CLI has another additional 5s latency behind tailing.

I re-ran this experiment a few times and got similar results. Live Tail will stream logs ~2s from when it is first ingested in cloudwatch. Using the CLI, this can vary between 2s to 9s. Note that this is on top of the latency to ingest metrics into cloudwatch logs (usually between 5s-20s).

Implementation

Behind the scenes, I assumed that Live Tail was using websockets. This is hinted at by the pricing model (seconds of active usage) which charged for the number of active connections.

This can be verified by digging into the Chrome network inspector tab. At the start of a Live Tail session, the console makes the following request.

# NOTE: extraneous fields stripped for clarity
curl 'https://us-east-1.console.aws.amazon.com/cloudwatch/CloudWatch/data/auth.SdkSign/20230628033119884-***' \
  -H 'origin: https://us-east-1.console.aws.amazon.com' \
  ...
  --data-raw $'{
    "Service":"logs","Endpoint":"logs.us-east-1.amazonaws.com","HttpMethod":"POST","Region":"us-east-1","Path":"/",
    "Headers":{"X-Amz-User-Agent":"aws-sdk-js/2.1265.0 promise","X-Amz-Requested-Operation":"startLiveTail","Content-Type":"application/x-amz-json-1.1","X-Amz-Target":"Logs_20140328.StartLiveTail",...}'

Note the X-Amz-Target set to Logs_20140328.StartLiveTail. This looked very much like an API call to start a WebSocket connection.

When inspective for websocket connections, you can see said connection

curl 'wss://us-east-1.prod.livetail.cloudwatchlogs.aws.dev/' \
  -H 'Origin: https://us-east-1.console.aws.amazon.com' \
  -H 'Sec-WebSocket-Key: **' \
  -H 'Upgrade: websocket' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: Upgrade' \
  -H 'Sec-WebSocket-Version: 13' \
  -H 'Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits' \

Presumably, there is a real-time service in cloudwatch logs that will push new events through the Live Tail websocket as soon as it is ingested.

In this regard, Live Tail is unlike most AWS services. AWS has a reputation for being an API-first platform (Route53 launched with only API support back in 2010). It feels strange to see a new feature be launched that is console first - at the time of this writing, there is no way (that I know of) to integrate Live Tail via API.

Parting Thoughts

Live Tail is a service that works as advertised. It tails your cloudwatch logs in real-time with the caveat that it is subject to the same ingestion latency that is inherent in being downstream of cloudwatch logs.

While the CLI-based solution has existed for ages, Live Tail brings this to the AWS console with additional features (eg. the ability to tail multiple log groups, filtering, and syntax highlighting). The difference between Live Tail and the tail cli command is that one is push-based and the other is pull-based. Live Tail can deliver logs from cloudwatch with consistent single-second latencies whereas, via the CLI, it is variable.

If you're already using the CLI, there's not much incentive to switch (unless you need the ability to tail multiple log groups at once). If you rely on the console for log diving, then Live Tail will be a welcome addition to your tool set.

Do note that Live Tail is not free. It has a heart palpitation-inducing per-second pricing model ($0.01 per minute) that is offset, in part, by the 1,800 free minutes per month free tier (that's 1h per day).

All in all, the addition of Live Tail fills the last gap in the cloudwatch suite of observability tooling. Just remember to not leave it running for too long!

Thanks for reading Bit by bit! Subscribe for free to receive new posts and support my work.

2
Share this post

How Real-Time is Amazon Cloudwatch Logs Live Tail?

bit.kevinslin.com
Share
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Kevin Lin
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing