- S3 Standards (default):
- The objects are stored in at least 3 AZs
- Provides eleven nines of availability
- The replication is using MD5 file checks together with CRCs to detect object issues
- When objects are stored in S3 using the API, a HTTP 200 OK response is provided
- Billing:
- GB/month of data stored in S3
- A dollar for GB charge transfer out (in is free)
- Price per 1000 requests
- No specific retrieval fee, no minimum duration, no minimum size
- S3 standard makes data accessible immediately, can be used for static website hosting
- Should be used for data frequently accessed
- S3 Standard-IA:
- Shares most of the characteristics of S3 standard: objects are replicated in 3 AZs, durability is the same, availability is the same, first byte latency is the same, objects can be made publicly available
- Billing:
- It is more cost effective for storing data
- Data transfer fee is the same as S3 standard
- Retrieval fee: for every GB of data there is a retrieval fee, overall cost may increase with frequent data access
- Minimum duration charge: we will be billed for a minimum of 30 days, minimum capacity of the objects being 128KB (smaller objects will be billed as being 128 KB)
- Should be used for long lived data where data access is infrequent
- S3 One Zone-IA:
- Similar to S3 standard, but cheaper. Also cheaper than S3 standard IA
- Data stored using this class is only stored in one region
- Billing:
- Similar to S3 standard IA: similar minimum duration fee of 30 days, similar billing for smaller objects and also similar retrieval fee per GB
- Same level of durability (if the AZ does not fail)
- Data is replicated inside one AZ
- Since data is not replicated between AZs, this storage class is not HA. It should be used for non-critical data or for data that can be reproduced easily
- S3 Glacier Instant Retrieval:
- It like S3 Standard-IA, but with cheaper storage, more expensive retrieval, longer minimums
- Recommended for data that is infrequently accessed (once per quarter), but it still needs to be retrieved instantly
- Minimum storage duration charge is 90 days
- S3 Glacier Flexible Retrieval (formerly knowns as S3 Glacier):
- Same data replication as S3 standard and S3 standard IA
- Same durability characteristics
- Storage cost is about 1/6 of S3 standard
- S3 objects stored in Glacier should be considered cold objects (should not be accessed frequently)
- Objects in Glacier class are just pointers to real objects and they can not be made public
- In order to retrieve them, we have to perform a retrieval process:
- A job that needs to be done to get access to objects
- Retrievals processes are billed
- When objects are retrieved for Glacier, they are temporarily stored in standard IA and they are removed after a while. We can retrieve them permanently as well
- Retrieval job types:
- Expedited: objects are retrieved in 1-5 minutes, retrieval process being the most expensive
- Standard: data is accessible at 3-5 hours
- Bulk: data is accessible at 5-12 hours at lower cost
- Glacier has a 40KB minimum billable size and a 90 days minimum duration for storage
- Glacier should be used for data archival (yearly access), where data can be retrieved in minutes to hours
- S3 Glacier Deep Archive:
- Deep Archive represents data in a frozen state
- Has a 40KB minimum billable data size and a 180 days minimum duration for data storage
- Objects can not be made publicly available, data access is similar to standard Glacier class
- Restore jobs are longer:
- Standard: up to 12 hours
- Bulk: up to 48 hours
- Should be used for archival which is very rarely accessed
- S3 Intelligent-Tiering:
- It is a storage class containing 5 different tiering a storage
- Objects that are access frequently are stored in the Frequent Access tier, less frequently accessed objects are stored in the Infrequent Access tier. Objects accessed very infrequently will be stored in either Archive or Deep Archive tier
- We don't have to worry for moving objects over tier, this is done by the storage class automatically
- Intelligent tier can be configured, archiving data is optional and can be enabled/disabled
- There is no retrieval cost for moving data between frequent and infrequent tiers, we will be billed based on the automation cost per 1000 objects
- S3 Intelligent-Tiering is recommended for unknown or uncertain data access usage
- Storage classes comparison:
S3 Standard | S3 Intelligent-Tiering | S3 Standard-IA | S3 One Zone-IA | S3 Glacier Instant | S3 Glacier Flexible | S3 Glacier Deep Archive | |
---|---|---|---|---|---|---|---|
Designed for durability | 99.999999999% (11 9's) | 99.999999999% (11 9's) | 99.999999999% (11 9's) | 99.999999999% (11 9's) | 99.999999999% (11 9's) | 99.999999999% (11 9's) | 99.999999999% (11 9's) |
Designed for availability | 99.99% | 99.9% | 99.9% | 99.5% | 99.9% | 99.99% | 99.99% |
Availability SLA | 99.9% | 99% | 99% | 99% | 99% | 99.9% | 99.9% |
Availability Zones | ≥3 | ≥3 | ≥3 | 1 | ≥3 | ≥3 | ≥3 |
Minimum capacity charge per object | N/A | N/A | 128KB | 128KB | 128KB | 40KB | 40KB |
Minimum storage duration charge | N/A | 30 days | 30 days | 30 days | 90 days | 90 days | 180 days |
Retrieval fee | N/A | N/A | per GB retrieved | per GB retrieved | per GB retrieved | per GB retrieved | per GB retrieved |
First byte latency | milliseconds | milliseconds | milliseconds | milliseconds | milliseconds | select minutes or hours | select hours |
Storage type | Object | Object | Object | Object | Object | Object | Object |
Lifecycle transitions | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
- We can create lifecycle rules on S3 buckets which can move objects between tiers or expire objects automatically
- A lifecycle configuration is a set of rules applied to a bucket or a group of objects in a bucket
- Rules consist of actions:
- Transition actions: move objects from one tier to another after a certain time
- Expiration actions: delete objects or versions of objects
- Objects can not be moved based on how much they are accessed, this can be done by the intelligent tiering. We can move objects based on time passed
- By moving objects from one tier to another we can save costs, expiring objects also will help saving costs
- Transitions between tiers:
- Considerations:
- Files from One Zone-IA can transition to Glacier Flexible, or Deep Archive, NOT into Glacier Instant retrieval
- Smaller objects cost more in Standard-IA, One Zone-IA, etc.
- An object needs to remain for at least 30 days in standard tier before being able to be moved to infrequent tiers (objects can be uploaded manually in infrequent tiers)
- A single rule can not move objects instantly from standard IA to infrequent tiers and then to Glacier tiers. Objects have to stay for at least 30 days in infrequent tiers before being able to be moved by one rule only. In order ot overcome this, we can define 2 different rules
- 2 types of replication are supported by S3:
- Cross-Region Replication (CRR)
- Same-Region Replication (SRR)
- Both types of replication support same account replication and cross-account replication
- If we configure cross-account replication, we have to define a policy on the destination account to allow replication from the source account
- We can create replicate all objects from a bucket or we can create rules for a subset of objects. We can filter objects to replicate based on prefix or tags or both
- We can specify which storage class to use for an object in the destination bucket (default: use the same class)
- We can also define the ownership of the objects in the destination bucket. By default it will be the same as the owner in the source bucket
- Replication Time Control (RTC): if enabled ensures a 15 minutes replication of objects
- Replication consideration:
- By default replication is not retroactive: only newer objects are replicated after the replication is enabled
- Versioning needs to be enabled for both source and destination buckets
- Batch replication can be used to replicate existing objects. It needs to be specifically configured. If it is not, replication wont be retroactive
- Replication by default one-way only, source => destination. There is an option to use bi-directional replication, but this has to be configured
- Replication is capable of handling objects encrypted with SSE-S3 and SSE-KMS (with extra configuration). SSE-C (customer managed keys) is also supported, historically it was incompatible
- Replication requires for the owner of source bucket needs permissions on the objects which will be replicated
- System events will not be replicated, only user events
- Any objects in the Glacier and Glacier Deep Archive will not be replicated
- By default, deletion are not replicated. We can enable replication for deletion events
- Replication use cases:
- SRR:
- Log aggregation
- PROD and Test sync
- Resilience with strict sovereignty
- CRR
- Global resilience improvements
- Latency reduction
- SRR:
- Buckets aren't encrypted, objects inside buckets are encrypted
- Encryption at rest types:
- Client-Side encryption: data is encrypted before it leaves the client
- Server-Side encryption: data is encrypted at the server side, it is sent on plain-text format from the client
- Both encryption types use encryption in-transit for communication
- Server-side encryption is mandatory, we cannot store data in S3 without being encrypted
- There are 3 types of server-side encryption supported:
- SSE-C: server-side encryption with customer-provided keys
- Customer is responsible for managing the keys, S3 managed encryption/decryption
- When an object is put into S3, we need to provide the key utilized
- The object will be encrypted by the key, a hash is generated and stored for the key
- The key will be discarded after the encryption is done
- In case of object retrieval, we need to provide the key again
- SSE-S3 (default): server-side encryption with Amazon S3-managed keys
- AWS handles both the encryption/decryption and the key management
- When using this method, S3 creates a master key for the encryption process (handled entirely by S3)
- When an object is uploaded an unique key is used for encryption. After the encryption, this unique key is encrypted as well with the master key and the unencrypted key is discarded. Both the key and the object are stored together
- For most situations, this is the default type of encryption. It uses AES-256 algorithm, they key management is entirely handled bt S3
- SSE-KMS: Server-side encryption with customer-managed keys stored in AWS Key Management Service (KMS)
- Similar to SSE-S3, but for this method the KMS handles stored keys
- When an object is uploaded for the first time, S3 will communicate with KMS and creates a customer master key (CMK). This is default master key used in the future
- When new objects are uploaded AWS uses the CMK to generate individual keys for encryption (data encryption keys). The data encryption key will be stored along with the object in encrypted format
- We don't have to use the default CMK provided by AWS, we can use our own CMK. We can control the permission on it and how it is regulated
- SSE-KMS provides role separation:
- We can specify who can access the CMK from KMS
- Administrators can administers buckets but they may not have access to KMS keys
- SSE-C: server-side encryption with customer-provided keys
- Default Bucket Encryption:
- When an object is uploaded, we can specify which server-side encryption to be used by adding a header to the request:
x-amz-server-side-encryption
- Values for the header:
- To use SSE-S3:
AES256
- To use SSE-KMS:
aws:kms
- To use SSE-S3:
- All Amazon S3 buckets have encryption configured by default, and all new objects that are uploaded to an S3 bucket are automatically encrypted at rest
- Server-side encryption with Amazon S3 managed keys (SSE-S3) is the default encryption configuration for every bucket in Amazon S3, this can be overridden in a PUT request with
x-amz-server-side-encryption
header
- When an object is uploaded, we can specify which server-side encryption to be used by adding a header to the request:
- Each object in a bucket is using a uniq data-encryption key (DEK)
- AWS uses the bucket's KMS key to generate this data-encryption key
- Calls to KMS have a cost and levels where throttling occurs: 5500/10_000/50_000 PUT/sec depending on region
- Bucket keys:
- A time limited bucket key is used to generate DEKs within S3
- KMS generates a bucket key and gives it to S3 to use to generate DEKs for each upload, offloading the load from KMS to S3
- Reduces the number of KMS API calls => reduces the costs/increases scalability
- Using bucket keys is not retroactive, it will only affect objects after bucket keys are enabled
- Thing to keep in mind after enabling bucket keys:
- CloudTrail KMS event logs will show the bucket ARN instead of the object ARN
- Fewer CloudTrail events of KMS will be in the logs (since work is offloaded to S3)
- Bucket keys work with SRR and CRR; the object encryption settings are maintained
- If we replicate plaintext to a bucket using bucket keys, the object is encrypted at the destination side; this can result in ETAG changes on the object
- Is a way to give other people access to our buckets using our credentials
- An IAM admin can generate a presigned URL for a specific object using his credentials. This URL will have an expiry date
- The presigned URL can be given to unauthenticated uses in order to access the object
- The user will interact with S3 using the presigned URL as if it was the person who generated the presigned URL
- Presigned URLs can be used for downloads and for uploads
- Presigned URLs can be used for giving direct access private files to an application user offloading load from the application. This approach will require a service account for the application which will generate the presigned URLs
- Presigned URL considerations:
- We can create a presigned ULR for objects we don't have access to
- When using the URL, the permissions match the identity which generated it. The permissions are evaluated at the moment of accessing the object (it might happen the the identity had its permissions revoked, meaning we wont have access to the object either)
- We should not generate presigned URLs generated on temporary credentials (assuming an IAM role). When the temporary credentials are expired, the presigned URL will stop working as well. Recommended to use long-term identities such as an IAM user
- Are ways to retrieve parts of objects instead of entire objects
- S3 can store huge objects (up to 5 TB)
- Retrieving a huge objects will take time and consume transfer capacity
- S3/Glacier provides services to access partial objects using SQL-like statements to select parts of objects
- Both S3 Select and Glacier selects supports the following formats: CSV, JSON, Parquet, BZIP2 compression for CSV and JSON
- Improves the manageability of objects when buckets are used for many different teams or they contain objects for a large amount of functions
- Access Points simplify the process of managing access to S3 buckets/objects
- Rather than 1 bucket (1 bucket policy) access we can create many access points with different policies
- Each access point can be limited from where it can be accessed, and each can have different network access controls
- Each access point has its own endpoint address
- We can create access point using the console or the CLI using
aws s3control create-access-point --name < name > --account-id < account-id > --bucket < bucket-name >
- Any permission defined on the access point needs to be defined on the bucket policy as well. We can do delegation, by defining wide access permissions in the bucket policy and granular permissions on the access point policy
- The Amazon S3 Block Public Access feature provides settings for access points, buckets, and accounts to help manage public access to Amazon S3 resources
- The settings we can configure with the Block Public Access Feature are:
- IgnorePublicAcls: this prevents any new ACLs to be created or existing ACLs being modified which enable public access to the object. With this alone existing ACLs will not be affected
- BlockPublicAcls: Any ACLs actions that exist with public access will be ignored, this does not prevent them being created but prevents their effects
- BlockPublicPolicy: This prevents a bucket policy containing public actions from being created or modified on an S3 bucket, the bucket itself will still allow the existing policy
- RestrictPublicBuckets: this will prevent non AWS services or authorized users (such as an IAM user or role) from being able to publicly access objects in the bucket
- S3 Select and Glacier Select: save in network a CPU cost by retrieving ony the necessary data
- S3 Lifecycle Rules: transition objects between tiers
- Compress objects to save space
- S3 Requester Pays:
- In general, bucket owners pay for all Amazon S3 storage and data transfer costs associated with their bucket
- With Requester Pays buckets, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket
- The bucket owner always pays the cost of storing data
- Helpful when we want to share large datasets with other accounts
- Requires a bucket policy
- If an IAM role is assumed, the owner account of that role pays for the request!
- Object Lock can be enabled on newly created S3 buckets. For existing ones in order to enable Object Lock we have to contact AWS support
- Versioning will be also enabled when Object Lock is enabled
- Object Lock can not be disabled, versioning can not be suspended when Object Lock is active on the bucket
- Object Lock is a Write-Once-Read-Many (WORM) architecture: when an object is written, can not be modified. Individual versions of objects are locked
- There are 2 ways S3 managed object retention:
- Retention Period
- Legal Hold
- Object versions can have both retention period and legal hold enabled; can have only one of those enabled or none of them
- Object Lock retentions can be individually defined on object versions, a bucket can have default Object Lock settings
- When a retention period is enabled on an object, we specify the days and years for the period
- The retention period will end after the period
- There are 2 types of retention period modes:
- Compliance mode:
- Object can not be adjusted, deleted or overwritten. The retention period can not be reduced, the retention mode can not be adjusted even by the account root user
- Should be used for compliance reasons
- Governance mode:
- Objects can not be adjusted, deleted or overwritten, but special permissions can be added to some identities to allow for the lock setting to be adjusted
- This identities should have the
s3:BypassGovernanceRetention
permission - The governance mode can be overwritten when passing
x-amz-bypass-governance-retention:true
header (header is default for console ui)
- Compliance mode:
- We don't set a retention period for this type of retention, Legal Hold can be on or off for specific versions of an object
- We can't delete or overwrite an object with Legal Hold
- An extra permission is required when we want to add or remove the Legal Hold on an object:
s3:PutObjectLegalHold
- Legal Hold can be used for preventing accidental removals
- Used to transfer files into S3. Enables fast, easy, and secure transfers of files over long distances between our client and an S3 bucket
- Takes advantage of the globally distributed edge locations in Amazon CloudFront
- We might want to use Transfer Acceleration on a bucket for various reasons:
- We upload to a centralized bucket from all over the world
- We transfer gigabytes to terabytes of data on a regular basis across continents
- We can't use all of our available bandwidth over the internet when uploading to Amazon S3
- To use Transfer Accelerate, it must be enabled on the bucket. After we enable Transfer Acceleration on a bucket, it might take up to 20 minutes before the data transfer speed to the bucket increases
- With Amazon S3 Object Lambda, we can add our own code to Amazon S3
GET
,LIST
, andHEAD
requests to modify and process data as it is returned to an application - S3 Object Lambda uses AWS Lambda functions to automatically process the output of standard S3
GET
,LIST
, orHEAD
requests - After we configure a Lambda function, we attach it to an S3 Object Lambda service endpoint, known as an Object Lambda Access Point
- The Object Lambda Access Point uses a standard S3 access point
- When we send a request to your Object Lambda Access Point, Amazon S3 automatically calls your Lambda function
- We can host a static site on S3
- To host a static site in a bucket we must enable static website hosting, configure an index document, and set permissions
- We should also make a bucket content public:
- We should turn of Block Public Access settings
- We should attach a bucket policy which allows public read on the objects
- Amazon S3 website endpoints do not support HTTPS! If we want to use HTTPS, we can use Amazon CloudFront to serve a static website hosted on Amazon S3