AWS S3 与 Cloudflare R2 比较，（给AWS S3和CF R2选择困难症的你）

老张

给AWS S3和CF R2选择困难症的你，原文地址：https://kerkour.com/aws-s3-vs-cloudflare-r2-price-performance-user-experience

In September 2021, Cloudflare took the tech world by storm by announcing R2: a storage service accessible through the ubiquitous S3 API but that didn’t charge any egress (“bandwidth”) fees. You only pay for storage and read/write/delete requests.

It was a major kick in the incumbents’ balls who all charge extortionary fees for traffic leaving their networks. You don’t pay for traffic to and from a S3 bucket within an AWS region, but as soon as an object stored in an S3 bucket is accessed outside of its AWS region you pay $0.09 per GB, which adds up fast, really fast.

One year later, in September 2022, Cloudflare announced the general availability of R2: it should be reliable enough to be used in production.

It’s now time to verify these claims and see how R2 compares to S3, AWS’ first service that is now powering most of the companies and websites in the world.

This article is an excerpt of my new book Cloudflare for Speed and Security where you will learn everything you need to know to build and deploy fast, secure and scalabale full-stack applications, all while reducing your cloud bill.
There is a special discount during the early-access, so don’t wait before it’s too late 🙂

Price
As mentioned in the introduction, R2’s major “innovation” is the free and unlimited ingress (inbound traffic, from internet to the bucket) and egress (outbound traffic, from the bucket to internet) which make it cheaper than S3 for most scenarios. The base price of R2 is also cheaper than S3.

S3 R2
Storage $0.023 / GB $0.015 / GB
GET, SELECT and other requests $0.40 / million (first 20,000 free) $0.36 / million (first 10,000,000 free)
PUT, COPY, POST, LIST requests $5 / million (first 2,000 free) $4.5 / million (first 1,000,000 free)
Egress $0.09 / GB free
As always, AWS’ pricing is actually more complex than that and requires a 10 men-months study to understand all the subtleties and how to plan accordingly, but you get the general idea: R2 is cheaper than S3, unless you are using S3 with Intelligent-Tiering and a lot of your data is not accessed often.

Examples:

You are storing and distributing public assets (such as AI model weights or video game assets):

S3 R2
Storage: 50TB $1150 $750
Read Requests: 50,000,000 $20 $14.4
Write Requests: 50,000 $0.25 free
Traffic: 500 TB $29,491.20 free
Total $30,661.45 $764.4
You are using S3/R2 as a data warehouse and querying it from EC2 instances in the same region:

S3 R2
Storage: 10TB $230 $150
Read Requests: 10,000,000 $4 free
Write Requests: 10,000,000 $50 $40.5
Traffic (in-region): 2 TB free free
Total $284 $190.5
You are using S3/R2 as a data warehouse and querying it from compute instances from another cloud provider:

S3 R2
Storage: 10TB $230 $150
Read Requests: 10,000,000 $4 free
Write Requests: 10,000,000 $50 $40.5
Traffic: 2 TB $180 free
Total $464 $190.5
You are storing video-surveillance footages with S3 Intelligent-Tiering where only a small portion of the data is ever retrieved:

S3 R2
Storage - R2 Standard: 500TB - $7,500
Storage - S3 Frequent Access: 1TB $23 -
Storage - S3 Infrequent Access: 4TB $50 -
Storage - S3 Archive Instant Access: 495TB $1980 -
Traffic: 5 TB $450 free
Total $2,503 $7,500
So, the only realistic scenario where S3 beats R2 is when you have a lot of data that is not accessed often, and you are using the S3 Intelligent-Tiering storage class.

Performance
Being the oldest AWS service, S3 is extremely reliable, but being based on Hard Disk (and not SSDs), not that fast. As you will see below, you should not expect latencies under _30ms, even in the same region. Note that AWS recently launched the new S3 Express One Zone storage class which is less reliable but provides single-digit milliseconds latency.

On the other hand, as mentioned multiple times on their blog, R2 is built on top of Cloudflare Workers and Durable Objects and it seems to inherit the limitations of these products. As explained here, Durable Objects are single threaded and thus limited by nature in the throughput they can offer

During the Open beta, in May 2022, an R2 bucket was limited to 1,000 GET operations per second and 100 PUT operations per second. Now that R2 is generally available they unfortunately don’t publish the limits, and you will have to find it the hard way in production.

One interesting thing to note is that when announcing the Open Beta of Cache Reserve which uses R2 under the hood, in November 2022, they mentioned that each zone’s assets are sharded across multiple R2 buckets to distribute the load which may indicated that a single R2 bucket was not able to handle the load for user-facing traffic. Things may have improve since thought.

A very interesting finding that I made is that R2’s performance vary greatly depending on how you access it.

If you access your bucket via the public URL (e.g. xxx.r2.dev or a custom domain) you get better performances than when using the S3 API xxx.cloudflarestorage.com with a pre-signed URL. I suspect that R2 authentication / authorization layer is built on Cloudflare Workers / Durable Objects which explains why its performing poorly and its performance is very unstable.

average p50 p99
S3 40ms 35ms 52ms
r2.dev 78ms 58ms 96ms
cloudflarestorage.com 89ms 71ms 123ms
These measurements were made from an EC2 instance with the buckets in the same region and GETing a 50KB object. The r2.dev and cloudflarestorage.com URLs point to the same bucket. The given latency includes DNS resolution and TCP/TLS handshakes.

r2.dev
r2.dev

cloudflarestorage.com with pre-signed URL.
cloudflarestorage.com

The problem with cross-datacenter traffic
While Cloudflare and the servers where you are hosting your compute may have a direct peering connection (a direct optical fiber between racks / datacenters) sufficiently provisioned to handle even the craziest peeks, all the routers and servers on the path from your compute to the R2 bucket in Cloudflare’s datacenters may not be able to handle such surges in load.

As we can see in the images above, the latency between an AWS EC2 instance and a Cloudflare R2 bucket is not very stable and increases during the working hours.

It can be explained by the following graph taken from this article, which shows “the CPU utilization of a typical Cloudflare metal”.

Cloudflare server load

During the working hours of the day, servers and routers are way more busy than during the night, so the more servers and routers you have between your compute and your storage, the higher the latency peaks will be.

It’s important to keep that in mind if you plan to access your R2 buckets from compute instances hsoted at a different cloud provider.

User experience
Generally, R2’s user experience is way better and simpler than S3. As always with AWS, you need 5 certifications and 3 months to securely deploy a bucket.

On example is API keys management. Simplified API keys management is especially important in a startup environment where you want to move fast and may be lacking deep security expertise.

With Cloudflare, R2 buckets are private by default which is great, and you only need 3 clicks and you have your API key scoped to a specific bucket and with the corrects permissions. With AWS you need to create IAM policies, credentials, bucket policies and more, with everything scattered all around the place.

But, Cloudflare R2 has also some major drawbacks.

First, R2 is not 100% compatible with the S3 API. I got some errors when using the official AWS SDKs and had to tweak my PutObject requests to work with R2.

Second, what triggered me the most, you can’t chose the location of your R2 bucket! Cloudflare automagically decides where you bucket will be located depending on from where the bucket creation request originates. You can give a “location hint” (like “Western Europe”) that may or may not be followed… There also are Jurisdictional Restrictions to meet your legal requirements and ensure that objects in a bucket are stored within a specific jurisdiction such as EU or FedRAMP, but you just can’t chose a specific location for your bucket.

This is really, really, really annoying. For example you know that all your compute instances are in Paris, and you know that Cloudflare has a big datacenter in Paris, so you want your bucket to be in Paris, but you can’t be 100% sure that it will be located here. If you are unlucky when creating your bucket, it will be placed in Warsaw or some other place far away and you will have huge latency penalty for every request.

For me, this is the biggest downside of R2, and a recurrent problem with Cloudflare. I really hope that they will rectify the course. Too much “magic” is the complete opposite of good engineering.

When your boss will come to ask you why the website is slow, “I don’t know, it’s magic” won’t be accepted as an answer.

Some Closing Thoughts
I initially started this conclusion along the lines of “S3 strengths are… R2 strengths are … So now the decision is up to you”, but damn the blandness!

Honestly, I see only a few reasons to use S3 today: if _40ms of latency really matters, if you already have a mature AWS-only architecture and inter-region traffic fees are not making a dent in your profits, or if it’s really hard to bring another vendor into your infrastructure for organizational reasons. That’s all.

Maybe 90%+ of projects and organizations will be better served by Cloudflare R2.

It’s easy and free to get data in AWS, but they bleed you as soon as you try to go out, so f%#$K the cloud racketeers, and thank you Cloudflare for enabling the next generation of startups to build products without having to pay the Amazon tax that then trickles down to the entire economy.

Finally, I don’t want to end this article without talking about the upcoming R2 pipeline API announced in April 2024. Today the main way to interact with R2 is via the S3 API, but they are cooking something that looks really, really interesting: a managed pipeline that offers HTTP, websocket and Kafka endpoint that batches events/messages as they are ingested.

wrangler pipelines create clickstream-ingest-prod –batch-size=“1MB” –batch-timeout-secs=120 –batch-on-json-key=“.domain” –destination-bucket=“prod-cs-data”

✅ Successfully created new pipeline “clickstream-ingest-prod”
📥 Created endpoints:
➡ HTTPS: https://d458dbe698b8eef41837f941d73bc5b3.pipelines.cloudflarestorage.com/clickstream-ingest-prod
➡ WebSocket: wss://d458dbe698b8eef41837f941d73bc5b3.pipelines.cloudflarestorage.com:8443/clickstream-ingest-prod
➡ Kafka: d458dbe698b8eef41837f941d73bc5b3.pipelines.cloudflarestorage.com:9092 (topic: clickstream-ingest-prod)
From the announcement, we can think of this as a managed AWS kinesis + S3: you write events to these endpoints as they arrive, and Cloudflare Pipelines will take care of writing batch of events to R2.

Event better, Pipelines are directly integrated with Cloudflare Workers to make serverless and infinitely scalable ETL pipelines.

I’m not a fan of Cloudflare workers to build general purpose applications where “good old” monoliths deployed in containers or microVMs are far superior, but for stateless endpoints and ingestion pipelines where you can have a really variable load, it seems that it will be a no-brainer.

Now, you may want to learn more about how Cloudflare can make your web applications faster all while reducing your Cloud bill, and I have good news! In my book Cloudflare for Speed and Security you will learn how to get the most of the Cloudflare platform all while avoiding the traps. It’s like a speed-run to deploy production-ready full-stack application.

There is a special discount during the early access, so don’t wait before it’s too late 🙂

liulisanwan

哥，其实可以deepl翻译一下。英文不好的人太头大了