Exam guide
Multiple choice - 1 correct response, 3 incorrect responses
Multiple response - 2 correct responses from 5 options
Design Resilient Architectures - 30% Design a multi-tier architecture solution Design highly available and/or fault-tolerant architectures Design decoupling mechanisms using AWS services Choose appropriate resilient storage Design High-Performing Architectures - 28% Identify elastic and scalable compute solutions for a workload Select high-performing and scalable storage solutions for a workload Select high-performing networking solutions for a workload Choose high-performing database solutions for a workload Design Secure Applications and Architectures - 24% Design secure access to AWS resources Design secure application tiers Select appropriate data security options Design Cost-Optimized Architectures - 18% Identify cost-effective storage solutions Identify cost-effective compute and database services Design cost-optimized network architectures Result is score from 100 - 1,000 and the minimum to pass is 720 . In total, there are 65 questions with 130 minutes to complete
Fundamentals
AWS consists of Regions and Availability Zones where:
Regions - Physical locations in the world, e.g., Frankfurt
Availability Zones (AZ) - Data Centers, i.e., buildings that fill in with servers.
1 region consists of 2 or more Availability Zones which are located far away from each other enough to be counted as different AZ
Edge Location - is another concept of locations of endpoints for AWS which is used for caching content.
There are uncountable services offered by AWS in which it increases from time to time. However, to pass this exam, knowing only the following services are sufficient.
Main services of AWS
How to process the information
EC2, Lambda, Elastic Beanstalk
How to save information
S3, EBS, EFS, FSx, Storage Gateway
How to store and retrieve information
RDS, DynamoDB, Redshift
How Compute, Storage, and Databases communicate with each other
VPC, Direct Connect, Route 53, API Gateway, AWS Global Accelerator
5 Pillars of well-architected framework
IAM
IAM = Identity Access Management → manage users and their level of access to the AWS Console
Create users and grant permissions to those users Control access to AWS resources Root account
Email address which is used to sign up for AWS → full admin access
Therefore, this account must be secured by:
Enable multi-factor authentication on the root account Create an admin group for your admins, and assign the appropriate permissions to this group Create user accounts for your admins Add your users to the admin group Permission Control using IAM
Permission is governed by Policy Document in JSON format to assign to Groups, Users, or Roles which is independent from regions
Example:
{
"Version": "2021-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
Usually, Policy Documents are not assigned specifically to Users as it will be hard to manage. Instead, create a group of users (even though this group consists of this specific user) and assign a Policy Document to this group is best-practice in this case.
Building Blocks
Users - a physical person Groups - functions → admin, dev, etc. Roles - internal usage within AWS Principals of least privilege - only assign a user the minimum amount of privileges they need to do their job
Tips
Access key ID and secret access keys are used for programmatic authentication Username and passwords are used for console login authentication [Access key ID and secret access keys] and [Username and passwords] are not the same Access key ID and secret access keys can be viewed only once. If lose, they have to be regenerated S3
Simple Storage Service (S3) - Object storage which is scalable and simple to use
Object storage → can store anything but cannot run OS or DB
Basics
S3 can store unlimited storage, but each object must be max. 5 TB
All objects are stored in folder-like objects called Bucket in which its name must be globally unique.
Bucket name format:
https://{bucket-name}.s3.{region}.amazonaws.com/{key-name}
Example:
https://acloudguru.s3.us-east-1.amazonaws.com/Raphie.jpg
When upload to S3 bucket, a HTTP 200 response will be return upon success.
S3 Object
composed of:
Version ID - store multiple version of the object Metadata - data about data (content-type , last-modified ) Access Control List (ACL) vs. Bucket Policy - ACL governs accessibility of each individual object, while the Bucket policy controls the access of all objects in the bucket. The ACL cannot overwrite Bucket Policy.
Versioning
Advantages:
All versions are stored in S3 even if the object is deleted The object is already backed-up Once versioning is enabled, it cannot be disabled Can be integrated to lifecycle rules Note that public access of the versioning does apply to only the latest version. Older versions requires individual setting for public access
To delete objects which are versioning, one needs to delete the object first to get its delete marker then delete the delete marker in order to completely remove that file including its versions from the bucket.
Storage classes
S3 Standard
High availability and Durability Data is stored redundantly across multiple devices in multiple facilities (>=3 AZs) Designed for Frequent Access Perfect for frequently accessed data Suitable for Most Workloads For websites, content distribution, mobile and gaming apps, big data analytics S3 Standard-Infrequent Access (Standard-IA)
Used for data that is accessed less frequently but requires rapid access when needed Low per-GB price but cost per-GB retrieval fee Great for long-term storage, backups, and as a data store for disaster recovery files S3 One Zone-Infrequent Access
Like Standard-IA but cost 20% less Data is stored redundantly within a single AZ S3 Glacier
retrieval time from 1 minute to 12 hours S3 Glacier Deep Archive
Default retrieval time is 12 hours S3 Intelligent Tiering
For data with unknow access pattern Automatically move the data to the most cost-effective tier based on how frequently it is accessed. With lifecycle management, one can automatically move objects between different storage tiers to save the cost. This can also apply to versioning, which means to archive old-version files to cheaper storage class. In fact, lifecycle management can apply to current version and previous version files.
S3 Object Lock
An object with Object Lock is stored in write once, read many (WORM) model to prevent object of being deleted or modified for a fixed amount of time (retention period) or indefinitely to add an additional layer of protection.
Objects and their versions cannot be overwritten or deleted, as well as cannot alter its lock setting except that user has special permissions.
Objects and their versions cannot be overwritten or deleted, as well as cannot alter its lock setting regardless of user’s permission level. Even the root user cannot touch objects in this mode until the retention period expires
Retention period vs Legal Holds
Both of these terms prevent object from being overwritten or deleted but:
Retention period - set as duration. can be changed in Governance Mode for users with permissions. Legal holds - remain affected until removed. can be placed and removed by any user who has s3:PutObjectLegalHold permission S3 Glacier Vault Lock
S3 Glacier Vault Lock allows you to deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy You can specify controls, like WORM, in a vault lock policy and lock the policy from the future edits. Once locked, the policy can no longer be changed. Encryption
Encryption in Transit - sending objects to/from bucket Encryption at Rest: Server-Side Encryption (SSE) - encryption at S3 SSE-S3: Keys are managed by S3, users do not need to worry anything SSE-KMS: Keys are managed by AWS Key Management Service SSE-C: Keys are managed by customer Encryption at Rest: Client-Side Encryption user encrypts files before uploading to S3 Server-side encryption can be enforced by:
Select encryption setting on the S3 bucket in the Console bucket policy can also apply in a way that the S3 will reject any PUT request to upload files without parameter x-amz-server-side-encryption in the request header Optimizing S3 Performance
By accessing S3 object, one would access by:
bucket_name/folder1/subfolder1/file.txt
The components which are not bucket name bucket_name and file name file.txt are called prefixes. The more different prefixes, the faster performance during request can be done.
Upload → multipart upload increases uploading speed for files over 100 MB. This should be used to any file over 5 GB. Download → S3 byte-range fetch increases downloading speed. Can partially download to get only header of the file. S3 Replication
Replicate objects from bucket in one region to bucket in another region Objects in bucket are not replicated automatically. Should upload a new version in order to start to replicate Delete markers are not replicated by default.
EC2
Elastic Compute Cloud (EC2) - Secure, resizable compute cloud → Virtual Machine hosted in AWS
Pricing options
On-Demand - Pay by hour or second, depending on your need Flexible - Low cost and flexibility without upfront payment Short-term - Applications with short-term, spiky, or unpredictable workloads Testing the water - Applications being tested on EC2 for the first time Reserved - make a contract of 1 or 3 years to get up to 72% discount on the hourly charge Predictable usage - applications with steady state or predictable usage Specific capacity requirements - applications that require reserved capacity Pay up Front - save more when paying upfront Standard Reserved Instances - up to 72% off the on-demand price Convertible Reserved Instances - up to 54% off the on-demand price with an option to change to different instance type with equal or greater value Scheduled Reserved Instances - launch instances within predefined time window. Spot - purchase unused capacity at a discount price up to 90%. However, this price fluctuates with demand/supply Flexible - applications that have flexible start and end times Urgent capacity - users with an urgent need for large amounts of additional computing capacity Cost sensitive - applications that are only feasible at very low compute prices Dedicated - physical EC2 server for you. The most expensive one. If there is any question regarding licensing, go straight toward Dedicated option Compliance - Regulatory requirements that may not support multi-tenant virtualization On-Demand - can be purchased hourly on-demand Licensing - great for licensing that does not support multi-tenancy or cloud deployments Reserved - can be purchased as a reservation for up to 70% off the on-demand price Roles
identity that you can create in IAM that has specific permissions
similar to a user → AWS identity with permission policies → tell what can they do / cannot do. But can specify to a group of users.
roles can be assigned to user, AWS architecture, system-level accounts, or even cross-account access.
Security Groups
Usually computer communicates with each other using ports like:
Once an EC2 instance is created, a virtual firewall is generated to block everything. To be able to connect to the EC2, you need to open up the correct port using Security Groups. Or let everything in by 0.0.0.0/0 . Anyway, in production, do not forget to open only port 80 and 443 so that others will not be able to gain control of EC2 via SSH or RDP.
Bootstrap script
A script that runs once starting an instance. Usually it is a shell script with lines about installations, updates, and settings
Metadata
metadata = data about data → IP address, hostname, security groups, etc.
retrieve metadata from EC2
curl http://169.254.169.254/latest/meta-data/{variable_name}
retrieve user data from EC2
curl http://169.254.169.254/latest/user-data/{variable_name}
Virtual Networking in EC2
ENI (Elastic Network Interface) - basic day-to-day networking create a management network use network and security appliances in your VPC private home network subnet EN (Enhanced Networking) - single root I/O virtualization → high performance (10 Gbps - 100 Gbps) higher bandwidth, higher packet per second (PPS), lower inter-instance latencies Composed of Elastic Network Adapter (ENA) and Intel 82599 Virtual Function (VF) interface - in any scenario question, always choose ENA over VF interface EFA (Elastic Fabric Adapter) - accelerate high performance computing and ML applications lower and more consistent latency and higher throughput than TCP transport when asking about high performance computing, what network interface should be used, go straight to EFA it uses OS-bypass to enable HPC to speed up with lower latency in ML application. When ask about OS-bypass, go straight to EFA Placement groups
Cluster Placement Groups - group of instances within a single AZ → for applications that need low network latency, high network throughput, or both Spread Placement Groups - group of instances that each hardware should be separated from each other → for applications that need to split hardware, e.g., for security, or appliance reasons Partition Placement Groups - group of instances that has its own network and power source → for isolation to reduce the impact of hardware failure Spot Instances
EC2 allows user to use the unused capacity in the cloud with up to 90% discount compared to on-demand price.
When to use?
The applications using Spot instances must be stateless, fault-tolerant, flexible applications. For example:
containerized workloads (CI/CD) high-performance computing Spot instances are not good for: