Which Cloud Storage To Use?
Apr 11, 2023
Part 2: Amazon Web Services
Vasko Viktorov is part of ROITI’s Cloud Software Engineering Team and in a sequel of three articles, he is taking a peek into each of the three most popular cloud providers – Azure, AWS and Google. In the previous article, he looked into Azure’s common storage options. If you missed it and want to have a full picture of available storage options on the market, check it while it’s hot. If you have already seen it or aren’t interested in Azure – proceed because we are going to peek into the next provider in the list – Amazon and its product Amazon Web Services (AWS).
Amazon Web Services
Amazon Web Services was founded in 2004 as part of Amazon.com, but it’s been functioning independently as its own company since November 2012 and only four years were needed for it to become the market leader. From 2016 till the present time, AWS has been the market leader, with a 34% market share as of Q3 2022.
As one can guess, to be a market leader for that long would mean setting the market standards. As such, AWS offers an extensive line of storage options that should meet most if not all requirements. Each of those storage options can be accessed easily via the AWS Management Console website or Command Line Interface (CLI).
As we can see there’s a lot of flexibility and plenty of options so let’s check them out.
Elastic Compute Cloud (EC2) was one of the first services provided by AWS in early 2006. It is a virtual machine maintained by AWS, that provides virtual computing environments (instances), that can be set with preconfigured templates (ready-to-use software for the instances) and preset various hardware (depending on the instance type).
Although EC2 is not a storage service itself, it has a storage feature that allows the instance to be set as:
- Temporary storage volume – persisting data as long as the instance is not stopped, hibernated, or terminated. If the storage is mounted in the same instance as an application by which it will be used, it can offer extremely high I/O speed, due to the fact, that they share the same space. This storage is perfect for application cache or buffer.
- Persistent storage volume – persisting data on Elastic Block Storage, which we will check out next.
Elastic Block Storage (EBS) volumes behave like raw, unformatted block devices (virtual hard drives) that can be mounted on EC2 instances. The size limitations of EBS depend on the number of volumes (SSD/HDD) mounted on the EC2 instance and its type – different instance types have different volume limitations. But in most use cases you shouldn’t be concerned with hitting the limits. EBS delivers high I/O performance, availability, and durability. It is designed as a fast and reliable block storage volume for a single instance, although there’s a multi-attach exception to this, that applies only in very specific scenarios.
It is best used in cases of demanding workload, where long data persistence is required, and there is a high frequency of random read/write operations. Also, keep in mind that you pay for the amount of GB you provision for EC2 instances per month and input/output operations beyond the set baseline.
Elastic File System (EFS) is a managed Network File System (NFS), that provides serverless, fully elastic file storage where you can share files without provisioning or managing storage capacity and performance. It is designed for use across different instances and availability zones. EFS storage is practically unlimited, the only notable limitation is the single file size of 52.7 TB, which is quite big.
It is best used in demanding workloads – same as EBS, but in cases where the storage needs to be shared between more than one instance and used across different availability zones.
Simple Storage Service (S3) also known as bucket, is object cloud storage that provides high accessibility, scalability, reliability, and flexibility. The storage size is practically unlimited, the only limitation is the file size, which is capped at 5TB.
There are seven different types of S3 buckets, each optimized for a different scenario. They are ordered by expected frequency usage from most frequent to most infrequent:
S3 Standard frequent (Standard) is the default storage class. It’s best used in case of frequently accessed data. It has higher maintenance and retrieval cost compared to the others.
S3 Standard Infrequent (Standard-IA) is designed for long-lived and infrequently accessed data, accessed across multiple zones, which makes it resilient to zone connection loss. Best used for backups and infrequently accessed data that requires a millisecond response. It has a higher maintenance cost, but lower retrieval one.
S3 One Zone Infrequent (One Zone-IA) – has the same use case as S3 Standard-IA but is used only for a single zone. This makes it vulnerable to zone loss, but it is the cheaper option.
S3 Glacier Instant Retrieval is used for rarely accessed objects that require a milliseconds response. It has lower maintenance costs, but higher retrieval one.
S3 Glacier Flexible Retrieval is used for rarely accessed long-term data that does not require immediate access. It has a minimum storage duration of 90 days. The free retrieval takes 5 to 12 hours, but expedited retrieval can be ordered, which takes 1-5 min.
S3 Glacier Deep Archive is for long-term archive and digital preservation with retrieval in hours at the lowest cost of storage in the cloud. It has a minimum storage duration of 180 days. The retrieval time by default takes 12 hours.
S3 Intelligent-Tiering is your option If none of the above hit the spot or you can’t make up your mind. S3 Intelligent Tiering will analyze the data usage frequency in S3 and optimize costs for unknown or changing access patterns (of course for a small additional fee).
Amazon File Cache is a fully managed high-speed cache for file data processing, build on SSD, which makes it suited for low-latency, IOPS-intensive workloads that typically have small, random file operations. It can be used in on-premises file systems, AWS object store or AWS file systems if they support the NFSv3 protocol.
Amazon File Server service (FSx) is a managed native file server service, where fully managed means that hardware and software setup, maintenance and backup are fully taken care of by AWS. Another plus for FSx is that it has easy integration with the other AWS services. Overall if you are looking for a fast and easy solution, where there are minimum to no code changes required for migration from on-premise to the cloud, this could be your solution.
There are currently four different types of FSx services:
FSx for Lustre is based on one of the popular open-source file systems – Lustre. It supports many requirements of leadership class High-Performance Computing (HPC) simulation environments. FSx for Lustre is best suited for compute-intensive workloads like machine learning and video rendering.
FSx for Windows File Server uses a native Microsoft Windows file system, built on a Windows server. It is an optimized and perfect place for Windows-based applications.
FSx for OpenZFS – is based on one of the popular Linux-based file servers. It is an optimized and perfect place for Linux-based applications.
FSx for NetApp ONTAP is built on NetApp’s popular ONTAP file system. It is accessible from Linux, Windows, and macOS compute instances running in AWS or on-premises. It is optimized for applications based on NetApp ONTAP on-premises.
It is worth mentioning that AWS also provides a cool service called AWS Storage Gateway that makes accessing and moving data from on-premises to one of its cloud storages an easy task. So, if AWS is your choice, you should get familiar with this service.
This concludes the common storage options that are provided by AWS. If none of them made you excited and you’re looking for something different, you can also check the database options that AWS offers for:
– Relational databases: SQL, Aurora(MySQL/PostgreSQL), Oracle, MariaDB, Redshift(PostgreSQL)
– Non-relational databases: Cassandra, MongoDB, Amazon Neptune, DynamoDB, Timestream
In the next part, we will check out the common storage options provided by Google.