After Moving the Data to the Cloud, the Next Step is Backup and Storage
Amazon S3 Storage Classes
Amazon S3 (Amazon Simple Storage Service) is an object storage service for the AWS cloud platform. It provides a 99.999999999% durability. If you want to further improve the durability of your data or if you want to place your files in a region closer to your customers to reduce latency, you can setup cross-region replication (CRR) settings, enabling buckets between different AWS regions to automatically replicate objects synchronously. Cross-region replicated buckets can be owned by the same AWS account or different accounts.
You can choose the corresponding storage category on Amazon S3 according to the frequency of data access; or you can use the lifecycle configuration to migrate files originally stored in S3 standard to S3 intelligent_tiering, S3 standard_IA, S3 onezone_IA, and S3 Glacier:
The S3 standard provides a low-latency, high-throughput object-based storage service, so it can be used in a wide range of scenarios, such as static websites, mobile and gaming programs, building data lakes for big data analysis, etc. You can set lifecycle rules in the S3 standard, so that files are automatically transferred in the storage category according to the access frequency.
S3 intelligent-tiering achieves the tiering goal by optimizing the access of two different storage access tiers; one tier that is optimized for frequent access and another lower-cost tier for infrequent. S3 Intelligent-Tiering will monitor every object stored in this tier and move files that have not been accessed for 30 consecutive days to the infrequent access tier. If an object in the infrequent access tier is accessed, it will automatically be moved back to the frequent access tier. When using the S3 Intelligent-Tiering storage class, there are no additional cost when objects are moved between access tiers. Objects with undefined or unpredictable access frequencies are well suited to storing in this storage class.’
S3 Standard – IA (Infrequent Access)
If your data is accessed less frequently but requires rapid access when needed, S3 Standard-IA may be your choice for storing data in the S3 level. The combination of low cost and high performance makes S3 Standard-IA very suitable for long-term storage, backups and be used as a data storage area for disaster recovery.
S3 One Zone – IA (Infrequent Access)
Compared with other S3 storage classes that store data in at least 3 different availability zones, S3 One Zone – IA stores files only in a single availability zone and it costs 20% less than S3 Standard – IA . For those whose data does not require high availability and resiliency, S3 One Zone – IA may be a good choice.
S3 Glacier provides long-term data storage services at an extremely low-cost, and it can be applied to the long-term storage inquiry that the media industry may need to store movies or news clips. It provides three retrieval options; the faster retrieval can be within 5 minutes, standard retrieval can be used for less time-critical data, with a retrieval time about 3-5 hours, and large-scale retrieval has the lowest cost option, retrieving a large amount of data within 5-12 hours.
S3 Glacier Deep Archive
S3 Glacier Deep Archive is the storage class with the lowest storage cost in the S3 series, which is especially suitable for situations that require long-term storage of data in order to meet compliance requirements, such as hospital medical records, public sector file retention, financial service account information, and etc. S3 Glacier Deep Archive is a cost-effective disaster recovery solution that can be used as an alternative to tape backup.
AWS Storage Gateway provides hybrid cloud storage service that can connect on-premise data to the AWS cloud platform. It can provide low-latency Network File System (NFS) access to Amazon S3 objects from on-premises applications, and at the same time provide simultaneous access from any application that supports the Amazon S3 API. AWS Storage Gateway’s file gateway configuration can be used to implement hybrid IT architectures, such as hierarchical file system storage, data archiving, on-demand bursting workloads, and backups to the AWS platform.
Amazon EBS(Elastic Block Storage) and EBS Snapshot
Amazon Elastic Block Store (EBS) provides Amazon EC2 block storage services that can run databases, file systems, and etc. after connecting to EC2 Instances. Each Amazon EBS volume is automatically copied within the same availability zone. Amazon EBS has different volume types, allowing you to attain the optimization of cost and performance according to the workload. The different volume types include IOPS SSDs for latency-sensitive transactional workloads, general purpose SSDs for wide variety of transactional workloads, throughput optimized HDD for frequent access and intensive workloads, and Cold HDD for less frequent access and lower cost.
Amazon EBS can perform point-in-time snapshot services of your volumes and store the snapshot data to Amazon S3. The snapshot service is stored incrementally, that is, snapshots are only stored and billed for the changed blocks. When you delete a snapshot, only the parts you don’t need to restore are removed, and the time required to restore the changed data to the working volume is the same. When you restore data from S3 to Amazon EBS, you don’t have to wait for all the data to be transferred to the volume, for it can be used as long as it is connected to the instances. You can also perform cross-region replication of EBS snapshots for further disaster recovery and data center transfer.
Amazon EFS(Elastic File System)
Amazon EFS provides an elastic and automatically scalable cloud storage service. It is designed to provide massively parallel shared access to Amazon EC2 Instances and is a decentralized architecture that can provide data protection against access errors or interruptions. At the same time, EFS also provides infrequent access storage class (EFS IA). After setting the object life cycle, files that have not been accessed for more than 30 days will be moved to Amazon EFS IA, which can save you up to 85% of the cost.
If you want to process recovery from accidental changes or deletions to the file system, the infrastructure and configuration information of the EFS-to-EFS backup solution can automatically copy files from one Amazon EFS (source file system) to another Amazon EFS (backup file system).
The EFS-to-EFS backup solution automatically deploys the necessary AWS services, including Amazon CloudWatch and AWS Lambda, to create automatic incremental backups of the Amazon EFS file system according to the schedule you define. You can use this solution to create daily, weekly, or monthly, backups of your file system and keep those backups as according to your demands to meet your business requirements.
AWS Backup provides a centralized backup service where you can centrally configure backup policies and monitor backup activities, including Amazon EFS, Amazon EBS Snapshot, Amazon DynamoDB, Amazon RDS Snapshot, and AWS Storage Gateway Snapshot. You can back up your on-premises data using AWS Backup after integration via Storage Gateway.
Amazon FSx for Lustre / Windows File Server
Amazon FSx for Lustre provides a fully managed, high-performance Lustre file system that allows file-based applications to access data with hundreds of gigabytes per second, millions of IOPS, and sub-millisecond latency. Amazon FSx can be used with Amazon S3 and on-premises storage, allowing you to transparently access your S3 objects as files on Amazon FSx to run a few hours to several months of analysis, and then you can write the results back to S3. FSx for Lustre also allows you to access the FSx file system through Amazon Direct Connect or VPN, so you can deploy data processing workloads on-premises to AWS.
Amazon FSx for Windows File Server makes it easy for you to launch and use Windows shared file storage for Windows-based applications. Built on Windows Server, Amazon FSx provides a fully managed Windows File Server with the native compatibility and features that Windows-based applications rely on. Amazon FSx uses SSD storage to provide high levels of throughput and IOPS, and consistent sub-millisecond latency.