You're Paying too much for (Cloudwatch) Logs
Reducing Cloudwatch Log Costs by 80% with Firehose, S3 and Athena
If you're using AWS, there's a good chance (>95%) you're using Cloudwatch Logs (CL). It is the defacto logging solution from AWS - CL lets you collect, store, and query log files from anywhere with little to no upfront work.
If you're using CL, there's also a good chance you're paying for more than you need.
Why Cloudwatch Logs
First, lets talk about why you might be using CL.
CL comes ready out of the box. You might not even be aware that you're using it as it integrates directly with 30+ AWS services (API Gateway, Lambda, ECS, etc).
CL can also be extended into on-prem environments using the unified cloudwatch agent - you can use the agent to collect both metrics and logs from the host machine.
CL covers the basic use cases for logging - querying (via Cloudwatch Log Insight) and tailing (via live tail). It also lets you use filters to create metrics from logs and add subscribers that can forward your logs to other destinations.
CL's most compelling (and expensive) feature is that it can do all of this in "near real time" - logs usually show up ~5s-20s after publication (CL publishes times for when logs are emitted vs ingested so you can actually do the calculation yourself).
Why not Cloudwatch Logs
CL might not be for you if you don't need "near real time" logs and the cost of operation is too high.
This is because the real time nature of CL comes at a premium - the ingestion fee per gigabyte ($0.50/GB) is one of the highest in the industry (for comparison, new relic is $0.30/GB, datadog is $0.10/GB, and S3 is free).
CL charges you based on ingestion and retention. Ingestion is $0.50/GB. Retention is $0.03/GB. Ideally, you'd want the costs to be flipped. When retention is expensive, you can reduce costs by shortening you retention policy and archiving older logs to S3. But when ingestion is expensive, your options are limited and generally involve not sending logs at all.
Comparing Cloudwatch Logs with Other Solutions
Lets compare CL to some other logging solutions in terms of cost. Specifically, we'll compare:
cost of using CL
cost of using Datadog (industry standard)
cost of a custom solution using other AWS services
We'll make the following assumptions to baseline our comparison:
Net new logs ingested per day: 500GB/day
Average size of a single log event: 256 Bytes
Retention period for logs: 90 days
Amount of data scanned per day: 3,500GB (7 days worth of logs)
When calculating log costs, we'll consider the following factors:
ingestion fee: how much does it cost to import the logs
retention fee: how much does it cost to store the logs
querying fee: how much does it cost to search the logs
1. Cloudwatch Logs
Ingestion: $0.50/gb -> for 500GB/day, this is $250/day.
Retention: $0.03/gb -> for 500GB/day, this is $15/day
Querying (via Cloudwatch Log Insight): $0.005/GB scanned -> for 3500GB, this is $17.50/day
Total: $282.50/day
Note that the retention cost is less than 10% of ingestion cost. CL is expensive primarily because of the high ingest cost.
2. Datadog
Ingestion: $0.10/GB -> $50/day
Retention: $1.70/1 million events -> assuming 256B per event, 500GB translates to 1,953,125,000 events or $3,320.31day
Querying: $0
Total: $3,370.313/day
Note that Datadog's pricing is flipped when compared to CL - ingestion is cheap but retention is very expensive (~2200% more in this case). Your exact costs will vary depending on the size of your event but in most cases, you'll pay more for retention with Datadog than with any other vendor.
3. Custom Solution using other AWS Services
There's many ways to build a logging platform on top of AWS. For today's comparison, we will use Firehose (managed streaming service), S3 (managed object storage), and Athena (managed query/analytics engine on top of S3). These are all mature serverless offerings provided by AWS that require no ongoing operational effort to run and are also cost effective at scale.
Calculating the cost is a bit more involved and differs per service.
Ingestion:
Firehose: $0.029/GB -> $14.50/day
S3: Ingest is free but you do pay $0.005 per 1k PUT request. Assuming you set the buffer at 1MB for Firehose, this works out to 500GB/0.001GB * $0.005/1000 or $2.50/day
Total: $14.50/day
Retention: $0.023/GB for S3 standard storage or $11.50/day
Querying:
Athena: $0.005/GB scanned or $17.50/day
S3: Athena issues GET requests to S3 which costs $0.0004 per 1k GET request. Assuming average object size of 1MB, this works out to 3500GB/0.001GB * $0.0004/1000 or $1.40/day
Total: $47.40/day
Our home grown solution made from combining different AWS services stands out in a few respects compared to the other solutions. Most notably, it is 5x cheaper than CL and 98% cheaper than Datadog for the same amount of logs. More importantly, there is no premium on either ingestion or retention which means that costs stay manageable even as you grow.
One caveat is that our home grown solution has a variable cost when it comes to querying data. How much this varies depends on how your data is stored as Athena charges per GB scanned. This is impacted by your file format (eg. Parquet, Iceberg, etc), indexing, and partition strategies. It is also depends on what kind of queries you run and how often you run them.
The other caveat is that the latency of streaming logs from Firehose to S3 can be on the order of minutes instead of seconds so it might not work as well if you need near real time logs.
Finally, this doesn't have the other nice features that CL has - mainly, metric filters and subscriptions.
Note that depending on your requirements, any and all of these caveats can be addressed. You can eliminate the query cost by using UltraWarm storage with Opensearch. You can get near real time logs, metric filters, and subscriptions by using Kinesis Data Streams instead of Firehose. Each of these solutions have different cost/complexity tradeoffs but will, at scale, still be cheaper than running CL or Datadog.
Costs of Different Logging Solutions
Here's a table that shows the various cost calculations in one place.
Final Thoughts
When you're starting out, everything is cheap (even datadog). You can and should use whatever works best for your team and business.
This article is not advocating that you not use CL - it's a fantastic service (with questionable UX) that takes a lot of toil out of logging. But if you find that log costs are getting you down and especially if you're not taking advantage of the near real time nature of CL, then know that there are better options.
Everything in AWS is about tradeoffs - if you take time to optimize your usage, you will not find a most cost competitive solution at scale (without running your own data centers). The challenge is that it takes a long time to even know what all the options are, never mind actually put them into practice (it doesn't help that AWS releases changes on a daily basis).
In the case of Cloudwatch Logs, understand the tradeoff you're implicitly making and if cost is a concern, make different tradeoffs.
Couldn't agree more - we were spending $100k in cloudwatch logging costs at some point - crazy that our logging costs exceeded compute costs - developers were to blame as well as they were publishing millions of processing records to logs as well - however the near real time nature & indexing of all fields in cloudwatch seem unnecessary for most usecases & the exorbitant price - so we just pushed the logs to S3 and sent to enterprise splunk which we already had and some teams were sending it to Elasticsearch - both options cost us less than using cloudwatch