AWS Certified Solutions Architect - A Cloud Guru

AWS Certified Solutions Architect - Associate (SAA-C02) - A Cloud Guru

Random notes from practice exams

Explore

AWS Certified Solutions Architect - Associate (SAA-C02) - A Cloud Guru

Peerapon Wechsuwanmanee

Exam guide

Response Types

Multiple choice - 1 correct response, 3 incorrect responses

Multiple response - 2 correct responses from 5 options

Exam domains

Design Resilient Architectures - 30%

Design a multi-tier architecture solution

Design highly available and/or fault-tolerant architectures

Design decoupling mechanisms using AWS services

Choose appropriate resilient storage

Design High-Performing Architectures - 28%

Identify elastic and scalable compute solutions for a workload

Select high-performing and scalable storage solutions for a workload

Select high-performing networking solutions for a workload

Choose high-performing database solutions for a workload

Design Secure Applications and Architectures - 24%

Design secure access to AWS resources

Design secure application tiers

Select appropriate data security options

Design Cost-Optimized Architectures - 18%

Identify cost-effective storage solutions

Identify cost-effective compute and database services

Design cost-optimized network architectures

Tip

Result is score from 100 - 1,000 and the minimum to pass is 720 . In total, there are 65 questions with 130 minutes to complete

Fundamentals

AWS consists of Regions and Availability Zones where:

Regions - Physical locations in the world, e.g., Frankfurt

Availability Zones (AZ) - Data Centers, i.e., buildings that fill in with servers.

1 region consists of 2 or more Availability Zones which are located far away from each other enough to be counted as different AZ

Edge Location - is another concept of locations of endpoints for AWS which is used for caching content.

There are uncountable services offered by AWS in which it increases from time to time. However, to pass this exam, knowing only the following services are sufficient.

⁠

Main services of AWS

Compute

How to process the information

EC2, Lambda, Elastic Beanstalk

Storage

How to save information

S3, EBS, EFS, FSx, Storage Gateway

Databases

How to store and retrieve information

RDS, DynamoDB, Redshift

Networking

How Compute, Storage, and Databases communicate with each other

VPC, Direct Connect, Route 53, API Gateway, AWS Global Accelerator

5 Pillars of well-architected framework

Operational Excellence

Performance Efficiency

Security

Cost Optimization

Reliability

IAM

IAM = Identity Access Management → manage users and their level of access to the AWS Console

Create users and grant permissions to those users

Create groups and roles

Control access to AWS resources

Root account

Email address which is used to sign up for AWS → full admin access

Therefore, this account must be secured by:

Enable multi-factor authentication on the root account

Create an admin group for your admins, and assign the appropriate permissions to this group

Create user accounts for your admins

Add your users to the admin group

Permission Control using IAM

Permission is governed by Policy Document in JSON format to assign to Groups, Users, or Roles which is independent from regions

Example:

Usually, Policy Documents are not assigned specifically to Users as it will be hard to manage. Instead, create a group of users (even though this group consists of this specific user) and assign a Policy Document to this group is best-practice in this case.

Building Blocks

Users - a physical person

Groups - functions → admin, dev, etc.

Roles - internal usage within AWS

Principals of least privilege - only assign a user the minimum amount of privileges they need to do their job

Tips

Access key ID and secret access keys are used for programmatic authentication

Username and passwords are used for console login authentication

[Access key ID and secret access keys] and [Username and passwords] are not the same

Access key ID and secret access keys can be viewed only once. If lose, they have to be regenerated

S3

Simple Storage Service (S3) - Object storage which is scalable and simple to use

Object storage → can store anything but cannot run OS or DB

Basics

S3 can store unlimited storage, but each object must be max. 5 TB

All objects are stored in folder-like objects called Bucket in which its name must be globally unique.

Bucket name format:

https://{bucket-name}.s3.{region}.amazonaws.com/{key-name}

Example:

https://acloudguru.s3.us-east-1.amazonaws.com/Raphie.jpg

When upload to S3 bucket, a HTTP 200 response will be return upon success.

S3 Object

composed of:

Key - object name

Value - the data itself

Version ID - store multiple version of the object

Metadata - data about data (content-type , last-modified )

Access Control List (ACL) vs. Bucket Policy - ACL governs accessibility of each individual object, while the Bucket policy controls the access of all objects in the bucket. The ACL cannot overwrite Bucket Policy.

Versioning

Advantages:

All versions are stored in S3 even if the object is deleted

The object is already backed-up

Once versioning is enabled, it cannot be disabled

Can be integrated to lifecycle rules

Support MFA

Note that public access of the versioning does apply to only the latest version. Older versions requires individual setting for public access

To delete objects which are versioning, one needs to delete the object first to get its delete marker then delete the delete marker in order to completely remove that file including its versions from the bucket.

Storage classes

S3 Standard

High availability and Durability

Data is stored redundantly across multiple devices in multiple facilities (>=3 AZs)

Designed for Frequent Access

Perfect for frequently accessed data

Suitable for Most Workloads

Default storage class

For websites, content distribution, mobile and gaming apps, big data analytics

S3 Standard-Infrequent Access (Standard-IA)

Used for data that is accessed less frequently but requires rapid access when needed

Low per-GB price but cost per-GB retrieval fee

Great for long-term storage, backups, and as a data store for disaster recovery files

S3 One Zone-Infrequent Access

Like Standard-IA but cost 20% less

Data is stored redundantly within a single AZ

S3 Glacier

long-term data archiving

retrieval time from 1 minute to 12 hours

S3 Glacier Deep Archive

For rarely accessed data

Default retrieval time is 12 hours

S3 Intelligent Tiering

For data with unknow access pattern

Automatically move the data to the most cost-effective tier based on how frequently it is accessed.

⁠

With lifecycle management, one can automatically move objects between different storage tiers to save the cost. This can also apply to versioning, which means to archive old-version files to cheaper storage class. In fact, lifecycle management can apply to current version and previous version files.

S3 Object Lock

An object with Object Lock is stored in write once, read many (WORM) model to prevent object of being deleted or modified for a fixed amount of time (retention period) or indefinitely to add an additional layer of protection.

Governance Mode

Objects and their versions cannot be overwritten or deleted, as well as cannot alter its lock setting except that user has special permissions.

Compliance Model

Objects and their versions cannot be overwritten or deleted, as well as cannot alter its lock setting regardless of user’s permission level. Even the root user cannot touch objects in this mode until the retention period expires

Retention period vs Legal Holds

Both of these terms prevent object from being overwritten or deleted but:

Retention period - set as duration. can be changed in Governance Mode for users with permissions.

Legal holds - remain affected until removed. can be placed and removed by any user who has s3:PutObjectLegalHold permission

S3 Glacier Vault Lock

S3 Glacier Vault Lock allows you to deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy

You can specify controls, like WORM, in a vault lock policy and lock the policy from the future edits. Once locked, the policy can no longer be changed.

Encryption

Encryption in Transit - sending objects to/from bucket

SSL/TLS

HTTPS

Encryption at Rest: Server-Side Encryption (SSE) - encryption at S3

SSE-S3: Keys are managed by S3, users do not need to worry anything

SSE-KMS: Keys are managed by AWS Key Management Service

SSE-C: Keys are managed by customer

Encryption at Rest: Client-Side Encryption

user encrypts files before uploading to S3

Server-side encryption can be enforced by:

Select encryption setting on the S3 bucket in the Console

Use bucket policy

bucket policy can also apply in a way that the S3 will reject any PUT request to upload files without parameter x-amz-server-side-encryption in the request header

Optimizing S3 Performance

By accessing S3 object, one would access by:

The components which are not bucket name bucket_name and file name file.txt are called prefixes. The more different prefixes, the faster performance during request can be done.

Upload → multipart upload increases uploading speed for files over 100 MB. This should be used to any file over 5 GB.

Download → S3 byte-range fetch increases downloading speed. Can partially download to get only header of the file.

S3 Replication

Replicate objects from bucket in one region to bucket in another region

Objects in bucket are not replicated automatically. Should upload a new version in order to start to replicate

Delete markers are not replicated by default.

EC2

Elastic Compute Cloud (EC2) - Secure, resizable compute cloud → Virtual Machine hosted in AWS

Pricing options

On-Demand - Pay by hour or second, depending on your need

Flexible - Low cost and flexibility without upfront payment

Short-term - Applications with short-term, spiky, or unpredictable workloads

Testing the water - Applications being tested on EC2 for the first time

Reserved - make a contract of 1 or 3 years to get up to 72% discount on the hourly charge

Predictable usage - applications with steady state or predictable usage

Specific capacity requirements - applications that require reserved capacity

Pay up Front - save more when paying upfront

Standard Reserved Instances - up to 72% off the on-demand price

Convertible Reserved Instances - up to 54% off the on-demand price with an option to change to different instance type with equal or greater value

Scheduled Reserved Instances - launch instances within predefined time window.

Spot - purchase unused capacity at a discount price up to 90%. However, this price fluctuates with demand/supply

Flexible - applications that have flexible start and end times

Urgent capacity - users with an urgent need for large amounts of additional computing capacity

Cost sensitive - applications that are only feasible at very low compute prices

Dedicated - physical EC2 server for you. The most expensive one. If there is any question regarding licensing, go straight toward Dedicated option

Compliance - Regulatory requirements that may not support multi-tenant virtualization

On-Demand - can be purchased hourly on-demand

Licensing - great for licensing that does not support multi-tenancy or cloud deployments

Reserved - can be purchased as a reservation for up to 70% off the on-demand price

Roles

identity that you can create in IAM that has specific permissions

similar to a user → AWS identity with permission policies → tell what can they do / cannot do. But can specify to a group of users.

roles can be assigned to user, AWS architecture, system-level accounts, or even cross-account access.

Security Groups

Usually computer communicates with each other using ports like:

SSH - port 22

RDP - port 3389

HTTP - port 80

HTTPS - port 443

Once an EC2 instance is created, a virtual firewall is generated to block everything. To be able to connect to the EC2, you need to open up the correct port using Security Groups. Or let everything in by 0.0.0.0/0 . Anyway, in production, do not forget to open only port 80 and 443 so that others will not be able to gain control of EC2 via SSH or RDP.

Bootstrap script

A script that runs once starting an instance. Usually it is a shell script with lines about installations, updates, and settings

Metadata

metadata = data about data → IP address, hostname, security groups, etc.

retrieve metadata from EC2

retrieve user data from EC2

Virtual Networking in EC2

ENI (Elastic Network Interface) - basic day-to-day networking

create a management network

use network and security appliances in your VPC

private home network subnet

EN (Enhanced Networking) - single root I/O virtualization → high performance (10 Gbps - 100 Gbps)

higher bandwidth, higher packet per second (PPS), lower inter-instance latencies

Composed of Elastic Network Adapter (ENA) and Intel 82599 Virtual Function (VF) interface - in any scenario question, always choose ENA over VF interface

EFA (Elastic Fabric Adapter) - accelerate high performance computing and ML applications

lower and more consistent latency and higher throughput than TCP transport

when asking about high performance computing, what network interface should be used, go straight to EFA

it uses OS-bypass to enable HPC to speed up with lower latency in ML application. When ask about OS-bypass, go straight to EFA

Placement groups

Cluster Placement Groups - group of instances within a single AZ → for applications that need low network latency, high network throughput, or both

Spread Placement Groups - group of instances that each hardware should be separated from each other → for applications that need to split hardware, e.g., for security, or appliance reasons

Partition Placement Groups - group of instances that has its own network and power source → for isolation to reduce the impact of hardware failure

Spot Instances

EC2 allows user to use the unused capacity in the cloud with up to 90% discount compared to on-demand price.

When to use?

The applications using Spot instances must be stateless, fault-tolerant, flexible applications. For example:

big data

containerized workloads (CI/CD)

high-performance computing

test and dev workloads

Spot instances are not good for:

persistent workloads

critical jobs

databases

Prices

The Spot Instances will be always available as long as the spot price is lower than your predefined maximum price. The price varies across regions and AZs. If the price exceeds the maximum price, you have 2 min to stop (and resume when the spot price is below your max price), or terminate the instance.

You can use Spot Block to stop your instances from being terminated if the price goes higher than the max price but the block can be set up to 6 hours currently.

Spot requests

To start using Spot instances, one need to define:

max price - maximum price

number of instances - amount of instances to run

launch spec - spec of machines, AMI, etc.

request type {one-time | persistent} - just request for one time or keep requesting from when until when.

For persistent request, it is critical to keep in mind that one must stop the persistent request first before terminating instances. Otherwise, the request will keep starting new instances after terminating ones infinitely.

Spot Fleet

A combination of Spot instances and On-demand instances which attempts to maintain target capacity with price restraints. There are multiple strategies as follows:

capacityOptimized The Spot instances come from the pool with optimal capacity for the number of instances launching

diversified The Spot instances are distributed across all pools

lowestPrice (default) The Spot instances come from the pool with the lowest price

instancePoolsToUseCount The Spot instances are distributed across the number of Spot instance pools you specify. Can only be used in combination with lowestPrice

Elastic Block Storage (EBS) and Elastic File System (EFS)

EBS

storage volumes you can attach to EC2 instances - store AMI, apps, database, etc. → virtual harddrive

EBS must be in the same AZ as the EC2. It can be resized or type changing on the fly without stopping or restarting the instance

EBS volume types

General Purpose SSD (gp2)

balance of price per performance

good for boot volumes or development and test applications that are not latency sensitive

General Purpose SSD (gp3)

new version of gp2 ... 4 times faster

no need to select between gp2 and gp3, they are just boot device

Provisioned IOPS SSD (io1)

high throughput SSD (more than 16,000 IOPS)

I/O-intensive apps, large databases, latency sensitive workloads, or apps that need high level of durability

most expensive

Provisioned IOPS SSD (io2)

latest generation

higher durability than io1

same price as io1

Throughput Optimized HDD (st1)

low-cost HDD

frequently accessed, throughput intensive workloads

for bid data, data warehouses, ETL, log processing

cost-effective

cannot be a boot volume

Cold HDD (sc1)

lowest cost option

for apps that do not need performance

cannot be a boot volume

⁠

IOPS vs Throughput

⁠

Volumes

virtual hard drive - you need at least 1 volume per EC2 instance → root device volume

snapshot → a point-in-time copy of an EBS volume, stored in S3 with only changes from the previous snapshots (no redundancy).

Encryption

volumes and snapshots can be encrypted using AWS Key Management Service (KMS) or customer master keys (CMK).

In case if the volume is unencrypted, one can encrypt it with the following steps:

create a snapshot of unencrypted volume

create a copy of the snapshot and select the encrypt option

create an AMI from encrypted snapshot

use that AMI to launch new encrypted instances

Behaviors

Start an EC2 instance

OS Boots up

Bootstrap scripts run

App starts

Stop instance

Data remains in the EBS

Root device is kept

Terminate instance

Stop everything

Root device is terminated by default

Hibernate instance

Save states from RAM to EBS (try root device first, if not work, then data volume)

Stop everything

Start from hibernation

Root volume is restored from the previous state

RAM is reloaded

Boot up in a much faster fashion (RAM is already reloaded)

Resume previously running processes

Reattach data volumes with the same instance ID

Cannot hibernate for longer than 60 days

EFS

network file system that can be mounted to multiple EC2 instances (can also work with EC2 in multiple AZs)

use NFAv4 protocol

compat with Linux-based AMI (not Windows)

encrypted using KMS

file system scales automatically

pay per use (most expensive)

Tiers

EFS has multiple tiers and life cycle management, to move data from one tier to another after x amount of days

Standard - for frequently accessed files

Infrequently Accessed (IA) - for files not frequently accessed

FSx

fully managed native Windows file system

⁠

AMI

Amazon Machine Image (AMI) provides the information required to launch an instance. There are two categories

Amazon EBS - instances being launched by a snapshot by Amazon

Instance Store - instances being launched by a template stored in S3 (ephemeral storage). Cannot be stopped. Otherwise, all data will loss.

!Databases

Relational Databases Service (RDS)

dataset that stores in tables which has rows and columns

databases that has RDS available

SQL Server

PostgreSQL

Oracle

MariaDB

MySQL

Amazon Aurora

RDS is generally used for Online Transaction Processing (OLTP) NOT Online Analytical Processing (OLAP)

RDS can be deployed in multi-AZ to ensure high availability in case of failure in the primary AZ. This concept creates a replica of a RDS from one AZ to another AZ as a back up. However, one cannot directly access the secondary AZ directly as long as the primary one is still healthy.

Read replica is another concept of having a copy of RDS but the replica is read-only. This takes off the load from the primary RDS. Thus, ensuring high scalability. This concept applies on both multi AZs and cross-region.

⁠

Aurora

is a MySQL and PostgreSQL-compatible relational database engine that combines the speed and availbility of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases.

5x performance than MySQL

3x performance than PostgreSQL

lower price point

similar availability

DynamoDB

Nonrelation DB from Amazon

!VPC

VPC = virtual data center in the cloud → define our own network which includes IP adress range, subnets, route tables, and network gateways

CIDR

By setting a VPC, a set of IP addresses should be assigned to resources inside this VPC. CIDR plays a role to define how many IP addressed here.

Possible questions regarding this topic is - given an amount of resources, how should the range be defined. The IP addresses are given as xxx.xxx.xxx.xxx/n . In this case, n should be identified. A quick and dirty calculation is to find the smallest integer which satisfies the condition:

2^(32-n) >= amount of resources

For example, if 20 IPs are required, this leads to n=27. Thus, the IP range is xxx.xxx.xxx.xxx/27 .

Find more details in CIDR.xyz

Subnet

usually includes 3 tiers

web - public-facing network

application - private subnet. speak only with web tier and database tier

database - private subnet. speak only with application tier

Keep in mind that 1 subnet is always in 1 AZ

Network Address Translation (NAT) Gateway

enable instances in a private subnet (that means if there is no private subnet, NAT is not required) to connect to the internet or other AWS services while preventing the internet from initiating a connection with those instances. It acts in background as an additional EC2 instance in our public subnet to route out connections from our private subnets. NAT gateways yields following properties:

Redundant inside the AZ (AWS prepares more than 1 EC2 to support this task for you in 1 AZ)

Start at 5 Gbps and scale to 45 Gbps

No need to patch → AWS does this job

Not associate with security groups

Auto assigned a public IP address

Note: If you have resources in multiple AZs and they share a NAT gateway, in the event the NAT gateway’s AZ is down, resources in the other AZs lose internet access. → To create an AZ-independent architecture, create a NAT gateway in each AZ and config your routing to ensure resources use the NAT gateway in the same AZ.

Security Groups

Virtual firewalls for our resources in the subnet. By default, everything is blocked. To communicate with other channels, correct ports should be open. Port number to remember:

SSH → 22

RDP → 3389

HTTP → 80

HTTPS → 443

To let everything in: set 0.0.0.0/0

Network ACL

First line of defense (optional)

Control the inbound and outbound traffic. The default one allows all in- and outbound. The newly created custom ones will deny all traffic unless adding rules.

Route 53

Routing from Domain Name Server (DNS) to an IP address.

Routing Policies

Simple Routing Policy - have one record with multiple IP addresses. If multiple values in a record, all values will be returned to the user in a random order.

Weighted Routing Policy - split traffic based on weights

Failover Routing Policy - route traffic to active site, and use Health Monitor to detect failure. If fails, route traffic to passive site.

Geolocation Routing Policy - route traffic based on user’s geographic location (where DNS queries originate)

Geoproximity Routing Policy (Traffic Flow) - route traffic by combining user’s geographic location, resource availability, and latency to calculate bias before sending the user to our resources.

Latency Routing Policy - route traffic based on lowest network latency for the user. Need a latency resource record set for the EC2 or ELB in each region that host the website

Multivalue Answer Routing Policy - similar to simple routing but checking health of each resource first before sending traffic to healthy resources.

ELB

Automatically distributes incoming application traffic across multiple targets. Can be done across multiple AZs.

ELB Types

Application Load Balancer - best suited for HTTP and HTTPS

Network Load Balancer - capable of handling extreme performance load balancing

Classic Load Balancer - no longer supported by AWS. If need to know user’s IP, need X-Forwarded-For feature

Monitoring

CloudWatch is a monitoring and observability platform to identify potential issues

CloudWatch features:

System Metrics - metrics that get out of the box

Application Metrics - more information inside EC2 instance, getting by install CloudWatch

Alarms - alerts when something goes wrong, or stop instance

Standard metrics is delivered every 5 min while detailed monitoring delivers data every 1 min

CloudWatch Logs

a tool to monitor, store, and access log files from variety of different sources. It can use SQL-like tools to query logs to find potential issues or data that is relevant. Important terms are:

Log Event - record of what happened. It contains a timestamp and the data

Log Stream - a collection of Log Events from the same source

Log Group - a collection of Log Streams

If logs do not need to process afterward, forward them straight to S3. Otherwise, go to CloudWatch Logs.

If real-time logging service is required, choose Kinesis instead of CloudWatch Logs

High Availability and Scaling

Vertical scaling - increase the performance of the instance

Hotizontal scaling - add more amount of instances at the current performance

3W of scaling

What - What do we scale?

what sort of resource are we going to scale?

how do we define the template?

Where - Where do we scale?

where does the model go? Should we scale out database or webserver?

When - When do we scale?

how do we know that we need more resources?

CloudWatch alarms provides more data

To scale out, one needs a Template, which is more or less like Configuration but:

Autoscaling Group

A collection of EC2 instances that scales according to a predetermined strategy with the following steps:

1) Define template

2) Networking and purchasing - don’t forget multiple AZs for high availability

3) ELB config - Autoscaling group follows a predefined load balancer health checks

4) Set scaling policy - select min, max, desired capacity

5) Notification - set SNS to let us know when the event happens

Scaling Types:

Reactive scaling - scale according to measured load

Scheduled scaling - scale according to time

Predictive scaling - scale according to AWS ML algo. Reevaluate every 24 hours to forecast the next 48 hours

Relational Database Scaling

Vertical scaling - resizing the database from one size to another size

Scaling storage - up size the storage, but once up, cannot go down

Read replicas - create read-only copies of the data to spread out the workload

Aurora serverless - AWS takes care of scaling, works really well on unpredictable workloads

Nonrelational Database Scaling (DynamoDB)

Can set the Capacity mode in the DynamoDB instance

Provisioned - for predictable workload

On-demand - pay-as-you-go

Decoupling Workflows

Tightly coupled

⁠

Loosely coupled

⁠

If one service fails in tight coupled architecture, the whole service fails. Therefore, one should always consider loosely coupled architecture over the tightly coupled one.

Simple Queue Service (SQS)

Poll-bases messaging → an asynchronous messaging service that the writer put a message into a queue, when the reader is ready, they will come to fetch the message eventually.

SQS Settings

Delivery Delay - the delay for the reader to be able to read the message after the message arrived the queue. default is 0; can be set up to 15 min

Message Size - message in any format up to 256 KB; can be set to have max. size to be smaller, but not bigger

Encryption - messages are encrypted in transit by default, at-rest encryption can be set optionally

Message Retention - how long does the message be alive in queue. default is 4 days; can be set between 1 min and 14 days

Long vs Short

Short (default) - connect to see if there is a message, then disconnect. Then come a gain to see message, and disconnect. This setting burns CPU drastically.

Long (recommended) - connect to see if there is a message, then wait a bit before disconnect.

Queue Depth - amount of messages in the queue; a trigger a autoscaling

Visibility Timeout - time duration of which message being kepted (and hidden) in the queue after fetched. If the reader fails while processing the fetched message, the message will re-appear in the queue and allow other readers to fetch.

Dead-Letter Queue (DLQ)

in case if there is a bad message in the queue, one reader fetches and gets error. So the message will re-appear after the Visibility Timeout. This death loop occurs until the Message Retention is reached. → solve by creating another SQS just to store Dead-Letter (DLQ)

It’s also important to setup a CloudWatch alarm to monitor queue depth

SQS Message Ordering

Simple Notification Service (SNS)

push-based messaging → a proactive messaging service that sends notification to subsctibed endpoints

SNS Settings

Subscribers - who will receive SNS messages

Message Size - message can be up to 256 KB

DLQ Support - Messages that fail to be delivered can be stored in SQS DLQ

FIFO or Standard - FIFO only supports SQS as a subscriber

Encryption - messages are encrypted in transit by default, at-rest encryption can be set optionally

Access Policy - can add policy like in S3

API Gateway

a fully managed service that allow you to publish, create, maintain, monitor, and secure your API by acting as a “front door” of your applications

Big Data

3V of Big Data

Volume - ranges from TB to PB

Variety - data from various sources and formats

Velocity - data needs to be collected, stored, processed, and analyzed within a short perioud of time

Redshift

a relational database which become a data warehouse service that can handle petabyte-scale data (but can be only in one AZ)

Redshift is not standard, dont use just to replace RDS

Elastic MapReduce (EMR)

a managed fleet of EC2 instances running open-source tools like Spark, Hive, HBase, Flink, Hudi, and Presto

Kinesis

a service to deal with real-time data. There are 2 versions of Kinesis:

When looking for a message broker

SQS - messaging broker that is simple to use and doesn’t require much config. But cannot deliver real-time data

Kinesis - a bit more complicated to config but is eligible for real-time application

Athena

a serverless SQL solution that allow user to query data directly from S3 without loading into a database

Glue

a serverless data integration service to perform ETL workloads without managing underlying servers

QuickSight

a BI data visualization service to create and share dashboards

Elasticsearch

Amazon solution to handle an open-source searching solution called Elasticsearch, works very well with logging (apart from CloudWatch logs). Always involve the Elasticsearch, Logstash, and Kibana (ELK) stack

Serverless Architecture

Serverless focus on only application code and let the provider manage the compute architecture for us.

Lambda

The simplest version of serverless computing that the user only:

select runtime - an environment of the code to run

set permission - attach a role if the function needs an AWS API call

set networking - setting VPC, subnet, security groups are also possible in Lambda but not required

set resource - define the amount of CPU, RAM (128 MB - 10 GB), timeout (up to 15 min) the function will need

set trigger - let the function knows when to initialize

Container

a unit of software that packages up code and its dependencies so that it is flexible enough to run from one environment to another.

Terminologies

Dockerfile - text document that contains all commands or instructions that will be used to build an image

Image - immutable file that contains the code, libs, dependencies, and config files needed to run an app

Registry - docker image and distribution storage

Container - running copy of the image that has been created

Container Management System (ECS vs EKS)

AWS offers different services which are capable of managing multiple containers

Elastic Container Service (ECS)

The simplest choice. It manages everything on AWS. Can scale upto thousands of containers while keeping ease of use. Can also integrate to ELB.

Elastic Kubernetes Service (EKS)

Kubernetes is an open-source platform to manage containers. Can work on-prem and on cloud. Since not everything is on AWS, some configs might be required. If cross-cloud or hybrid settings are needed, this will be the only choice.

Fargate

compute engine that runs ECS or EKS. This choice will appear when selecting which compute engine will containers from ECS or EKS be hosted.

EventBridge (CloudWatch Events)

a serverless event bus that allows passing events from a source to an endpoint, or a glue for serverless applications.

Security

Distributed Denial of Service (DDoS) Attack

an attack to make our server fails to response to our real users.

Layer 4 attack - get thru TCP channel such as SYN floods or NTP amplification attacks

Layer 7 attack - flood of GET/POST requests

CloudTrail

a tool to log what’s happing in our AWS via API calls to S3 bucket but not RDP or SSH traffic

what to log:

metadata around API calls

identity of API caller

time of the API call

source IP address of the API caller

request parameters

response elements returned by the service

AWS Shield → for Layer 4

a free high level protection against DDoS or Layer 3 and 4 attacks on ELB, Amazon CloudFront, and Route 53. By paying 3000$ more to activate AWS Shield Advanced, you will get a dedicated 24/7 DDoS Response Team to support and AWS bill protection once getting attacked due to high fees from ELB, CloudFront, and Route 53.

AWS Web Application Firewall (WAF) → for Layer 7

a monitoring tools for HTTP and HTTPS requests before sending to CloudFront or ELB. It can configure the conditions as follows:

All all requests except the ones specified

Block all requests except the ones specified

Count the requests that match the properties specified

In general, WAF blocks Layer 7 attacks, e.g., DDoS, SQL injections, cross-site scripting.

GuardDuty

a threat monitoring and detection service with AI. That means, it will take 7-14 days to learn how does “normal” behavior looks like and take updated external database to consider malicious domains. Afterwards, if GuardDuty detects a threat from

CloudTrail logs

VPC Flow Logs

DNS logs

, it will report via GuardDuty console and in CloudWatch Events which can further trigger Lambda function to address the threat.

Macie

a service that use AI to analyze your S3 and alert that you are storing Personal Identifiable Information (PII), Personal Health Information (PHI), or Financial Data.

Inspector

an automated security assessment service that helps to improve security and compliance of applications deployed on AWS. It scans two main area - EC2 instances and VPCs.

Type of assessments:

Network assessment - analyze network config to check if ports are reachable from outside the VPC → no need agent

Host assessment - analyze vulnerable software, host hardening, and security best practices → need agent

Key Management System (KMS)

an Amazon centralized key management service. Need Customer Master Key (CMK) to create and manage the master key. This approach relies on the fact that the Hardware Security Module (HSM) is hosted on AWS. The HSM plays a role like Trezor. With only KMS, it is like sharing Trezor with other AWS users. There is also CloudHSM service which allow us to have a standalone HSM. Again, CloudHSM does not allow key rotation.

Secrets Manager vs Parameter Store

Certificate Manager

a service to create, manage, and deploy public and private SSL certificates for use with other AWS services → ELB, CloudFront distributions, API Gateway. Therefore, there is no need to renew and update the certificates. Also, it’s free.

Automation

CloudFormation

A template in JSON or YAML form to deploy AWS service. It composes of 3 sections:

parameters - user-dependent questions to be filled when the template is run

mappings - values that fill themselves based on different conditions like deploying on different regions

resources - all resources to deploy

Note: Hard-coded value and resource IDs, e.g., AMI can make the template fails

Elastic Beanstalk

a Platform-as-a-Service (PaaS). Just bring your code for webapp and Beanstalk will manage EC2 architecture for you, not serverless unfortunately.

Systems Manager

a set of tools to view, control, and automate AWS architecture and on-prem resources (need agent to install on the instance). Usually it is not called System Manager but its features, e.g., Automation Documents, Session Manager, or Parameter Store.

Caching

A solution to save some of the external content to somewhere near the resources to improve the overall performance

CloudFront

cache via edge locations. can select only by region, cannot select specific countries.

⁠

ElastiCache vs DynamoDB Accelerator (DAX)

ElastiCache - a combination of open-source solutions - Memcached, and Redis. It is designed for RDS solutions.

DAX - designed for caching DynamoDB

Global Accelerator

IP caching between users and ELB

!Governance

Organizations

a centralized tool to manage multiple AWS accounts. An important feature is Service Control Policies (SCP) to globally limit user’s permissions including the root account.

Deny statement in SCP means the users are not allowed to do the specified actions.

Allow statement in SCP means the users are not allowed to do everything except the specified actions.

Every time there is a question asks about role, just answer role.

Migration

Move data to AWS

Normal internet → insecure, slow

Direct Connect → secure, fast, but costly if used only for short period of time

Physical → send HDD to AWS

Snow Family

Physical HDD that AWS prepare to you for migration. Some packages might includes computational capability.

⁠

Storage Gateway

Hybrid cloud storage service to merge on-prem resources with the cloud. can be used for one-time or long-term.

File Gateway

⁠

Volume Gateway

⁠

Tape Gateway

⁠

DataSync

One-time migration tool. Need to install agent to the source resources in order to migrate. The target resource can be S3, EFS, or FSx.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.