Executive Summary
Amazon Web Services remains the leading cloud provider in 2026 with 30% market share, offering 200+ services across compute, storage, databases, networking, security, and machine learning. While Azure continues to close the gap at 27%, AWS maintains the broadest service portfolio, the largest global infrastructure with 33+ regions and 600+ edge locations, and the deepest enterprise adoption. This guide covers the eight foundational AWS services that form the backbone of most cloud architectures.
- EC2 Graviton4 instances deliver 40% better price-performance than x86 alternatives. ARM-based workloads continue to grow as language runtimes and container images fully support arm64.
- S3 Intelligent-Tiering now covers 90% of archival use cases automatically. Most organizations no longer manually manage storage class transitions.
- Lambda SnapStart extends beyond Java to Node.js and Python, reducing cold starts to under 200ms for all major runtimes.
- Aurora Serverless v2 auto-scales from 0.5 to 256 ACUs in seconds, making it the default choice for variable-traffic database workloads.
30%
Cloud market share
200+
AWS services
600+
Edge locations
43
Glossary terms
Part 1: Cloud Market Landscape
AWS launched in 2006 with S3 and EC2 and has maintained market leadership ever since. In 2026, AWS holds approximately 30% of the global cloud infrastructure market, generating over $110 billion in annual revenue. Microsoft Azure is the closest competitor at 27%, followed by Google Cloud Platform at 12%. The overall cloud market continues growing at 20%+ annually as enterprises accelerate digital transformation.
AWS differentiates through breadth of services (200+), global infrastructure (33+ regions, 100+ availability zones), and deep integration between services. The company invests heavily in custom silicon (Graviton for compute, Trainium/Inferentia for ML, Nitro for virtualization) to deliver better price-performance than commodity hardware. In 2026, over 60% of new EC2 workloads run on Graviton processors.
Multi-cloud adoption has stabilized at around 75% of enterprises using two or more cloud providers. However, most organizations have a primary cloud provider (usually AWS or Azure) and use secondary providers for specific workloads. True multi-cloud (running the same workload across providers) remains rare due to complexity and cost. Kubernetes and Terraform have emerged as the primary tools for managing multi-cloud infrastructure.
Cloud Infrastructure Market Share (2018-2026)
Source: OnlineTools4Free Research
Part 2: EC2 Compute
Amazon EC2 (Elastic Compute Cloud) provides resizable virtual servers in the cloud. With 750+ instance types across general purpose, compute optimized, memory optimized, storage optimized, and accelerated computing families, EC2 covers every workload from micro-services to GPU-intensive machine learning training. Graviton4 (ARM) instances offer the best price-performance for most workloads in 2026.
EC2 pricing models include On-Demand (pay by the second, no commitment), Reserved Instances (1 or 3 year commitment, up to 72% savings), Savings Plans (flexible commitment to a dollar amount per hour), and Spot Instances (unused capacity at up to 90% discount, can be interrupted). Most production workloads use a combination: Reserved or Savings Plans for baseline capacity, On-Demand for predictable spikes, and Spot for fault-tolerant batch processing.
Auto Scaling Groups (ASGs) automatically adjust the number of EC2 instances based on demand, health checks, or schedules. Target tracking policies maintain a specific metric (CPU at 60%, request count per target). Step scaling adds or removes instances at defined thresholds. Predictive scaling uses machine learning to forecast demand and pre-provision capacity. Launch Templates define the instance configuration (AMI, instance type, security groups, user data).
EC2 networking has evolved significantly. Enhanced Networking provides up to 200 Gbps bandwidth with SR-IOV. Elastic Fabric Adapter (EFA) enables HPC and ML training with inter-node communication at near bare-metal performance. Placement Groups control instance placement: Cluster (low latency within an AZ), Spread (across AZs for HA), and Partition (large distributed systems like Hadoop/Cassandra).
EC2 Instance Families
8 rows
| Family | Category | vCPUs | Memory | Use Case | Price/Hour |
|---|---|---|---|---|---|
| t3 | General Purpose | 2-8 | 1-32 GB | Burstable workloads, dev/test, small databases | $0.0104-$0.1664 |
| m7i | General Purpose | 1-192 | 4-768 GB | Web servers, application servers, enterprise apps | $0.0504-$9.6768 |
| c7g | Compute Optimized | 1-64 | 2-128 GB | Batch processing, ML inference, gaming, HPC | $0.0361-$2.3104 |
| r7g | Memory Optimized | 1-64 | 8-512 GB | In-memory databases, real-time analytics | $0.0667-$4.2688 |
| i4i | Storage Optimized | 2-128 | 16-1024 GB | NoSQL databases, data warehouses, Elasticsearch | $0.156-$12.4992 |
| p5 | Accelerated (GPU) | 192 | 2048 GB | ML training, generative AI, scientific simulation | $98.32 |
| g5 | Accelerated (GPU) | 4-96 | 16-768 GB | ML inference, graphics rendering, video encoding | $1.006-$16.288 |
| hpc7g | HPC | 64 | 128 GB | CFD, weather modeling, molecular dynamics | $1.6832 |
Part 3: S3 Storage
Amazon S3 (Simple Storage Service) is object storage with 99.999999999% (11 nines) durability. S3 stores trillions of objects for millions of applications. It serves as the foundation for data lakes, backup and restore, content distribution, static website hosting, and application data storage. S3 automatically distributes data across a minimum of three Availability Zones within a region.
S3 security follows a defense-in-depth approach. Block Public Access prevents accidental public exposure at the account or bucket level. Bucket policies define resource-based permissions using IAM policy language. S3 Object Lock provides WORM (Write Once Read Many) compliance for regulatory requirements. Server-side encryption with AWS KMS keys (SSE-KMS) or S3-managed keys (SSE-S3) encrypts data at rest. S3 Access Points simplify managing access for shared datasets.
S3 performance has improved dramatically. Multi-part uploads parallelize large file transfers. S3 Transfer Acceleration uses CloudFront edge locations for faster uploads across regions. S3 Express One Zone provides single-digit millisecond latency for frequently accessed data. Batch Operations run large-scale operations (copy, tag, restore) across billions of objects. S3 Event Notifications trigger Lambda functions, SQS messages, or EventBridge events when objects are created, modified, or deleted.
S3 Storage Classes Comparison
7 rows
| Storage Class | Availability | Min Duration | Cost/GB/Month | Best For |
|---|---|---|---|---|
| S3 Standard | 99.99% | None | $0.023 | Frequently accessed data, websites, content distribution |
| S3 Intelligent-Tiering | 99.9% | None | $0.0025-$0.023 | Unknown or changing access patterns |
| S3 Standard-IA | 99.9% | 30 days | $0.0125 | Infrequent access, backups, disaster recovery |
| S3 One Zone-IA | 99.5% | 30 days | $0.01 | Reproducible infrequent data, thumbnails |
| S3 Glacier Instant | 99.9% | 90 days | $0.004 | Long-lived archive with instant access |
| S3 Glacier Flexible | 99.99% | 90 days | $0.0036 | Archives with retrieval in minutes to hours |
| S3 Glacier Deep Archive | 99.99% | 180 days | $0.00099 | Long-term compliance archives, 7-10 year retention |
Part 4: Lambda Serverless
AWS Lambda runs code without provisioning or managing servers. You upload your function code, define triggers (API Gateway, S3, SQS, EventBridge, DynamoDB Streams), and Lambda handles scaling from zero to thousands of concurrent executions automatically. You pay only for the compute time consumed, billed in 1ms increments. The free tier includes 1 million requests and 400,000 GB-seconds per month.
Lambda cold starts occur when a new execution environment is created. For Node.js and Python, cold starts are typically 100-300ms. For Java and .NET, cold starts can reach 2-5 seconds without optimization. SnapStart (available for Java, Node.js, and Python in 2026) snapshots the initialized environment, reducing cold starts to under 200ms. Provisioned Concurrency pre-warms a specified number of execution environments, eliminating cold starts entirely at a fixed hourly cost.
Lambda integrates with over 200 AWS services. Common patterns include: API backends (API Gateway + Lambda), event processing (S3/SQS/EventBridge + Lambda), real-time stream processing (Kinesis/DynamoDB Streams + Lambda), scheduled tasks (EventBridge Scheduler + Lambda), and file processing (S3 trigger + Lambda for image resize, PDF generation, video transcoding). Lambda Destinations route successful and failed invocations to different targets without writing error handling code.
Lambda@Edge and CloudFront Functions extend serverless to the edge. Lambda@Edge runs at CloudFront edge locations for request/response manipulation, A/B testing, and authentication. CloudFront Functions handle lightweight operations (URL rewrites, header manipulation) at 1/6th the cost. Lambda SnapStart and arm64 architecture (Graviton2) reduce both latency and cost for production Lambda workloads.
Lambda Configuration Parameters
8 rows
| Parameter | Range | Default | Notes |
|---|---|---|---|
| Memory | 128 MB - 10,240 MB | 128 MB | CPU scales proportionally with memory. 1,769 MB = 1 vCPU equivalent. |
| Timeout | 1 sec - 900 sec | 3 sec | Maximum execution time. Use Step Functions for longer workflows. |
| Ephemeral Storage | 512 MB - 10,240 MB | 512 MB | Temporary storage in /tmp. Persists across warm invocations. |
| Concurrency | 1 - 1,000+ | Unreserved | Reserved concurrency guarantees capacity. Provisioned eliminates cold starts. |
| Architecture | x86_64, arm64 | x86_64 | Graviton2 (arm64) is 20% cheaper and often faster. |
| Runtime | Node 22, Python 3.13, Java 21, .NET 8, Go, Ruby 3.3, Custom | N/A | Managed runtimes receive security patches automatically. |
| Layers | 0-5 layers | 0 | Shared libraries and dependencies across functions. Max 250 MB unzipped total. |
| SnapStart | On/Off | Off | Snapshots initialized execution environment. Reduces cold starts to <200ms for Java. |
Part 5: RDS and Aurora
Amazon RDS (Relational Database Service) manages relational databases in the cloud. It handles provisioning, patching, backups, recovery, and scaling for MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server. RDS automates administrative tasks that consume 30-40% of a DBA team time. Multi-AZ deployments provide automatic failover with a synchronous standby replica in a different Availability Zone.
Amazon Aurora is a cloud-native relational database designed by AWS. It is compatible with MySQL 8.0 and PostgreSQL 16 but offers significantly better performance: 5x throughput of standard MySQL and 3x throughput of standard PostgreSQL. Aurora stores data across three AZs with six copies, provides up to 15 read replicas with sub-10ms replication lag, and supports up to 128 TB of storage that grows automatically.
Aurora Serverless v2 auto-scales compute capacity in fine-grained increments (0.5 ACU steps) based on application demand. It scales from 0.5 ACU to 256 ACU in seconds, making it ideal for variable or unpredictable workloads. Aurora Global Database provides cross-region replication with sub-second latency, enabling disaster recovery and low-latency global reads. Aurora Machine Learning integrates with SageMaker and Comprehend for in-database ML inference.
RDS Engine Comparison
7 rows
| Engine | Performance | Max Storage | Read Replicas | Serverless | Best For |
|---|---|---|---|---|---|
| Aurora MySQL | 5x MySQL | 128 TB | 15 | v2 | High-performance MySQL workloads |
| Aurora PostgreSQL | 3x PostgreSQL | 128 TB | 15 | v2 | High-performance PostgreSQL workloads |
| RDS MySQL | Baseline | 64 TB | 5 | No | Standard MySQL with full compatibility |
| RDS PostgreSQL | Baseline | 64 TB | 5 | No | Standard PostgreSQL with extensions |
| RDS SQL Server | Baseline | 16 TB | 5 | No | Microsoft ecosystem, .NET apps |
| RDS Oracle | Baseline | 64 TB | 5 | No | Enterprise Oracle migrations |
| RDS MariaDB | Baseline | 64 TB | 5 | No | Open-source MySQL alternative |
Part 6: DynamoDB
Amazon DynamoDB is a fully managed NoSQL database that delivers single-digit millisecond performance at any scale. It supports key-value and document data models. DynamoDB automatically manages infrastructure scaling, replication, and patching. It is the backbone of many AWS services (IAM, S3 metadata, Lambda, API Gateway) and powers some of the world's highest-traffic applications.
DynamoDB data modeling differs fundamentally from relational databases. Instead of normalizing data across tables with joins, you denormalize and store data in the access patterns your application needs. A single table can serve multiple access patterns using composite sort keys, GSIs (Global Secondary Indexes), and sparse indexes. The Single Table Design pattern stores multiple entity types in one table to minimize the number of requests per page load.
DynamoDB Streams capture ordered, time-ordered changes to items in a table. Each stream record contains the old and new item images. Streams power event-driven architectures: trigger Lambda functions for real-time aggregation, replicate data to Elasticsearch for full-text search, or synchronize with other databases. DynamoDB Global Tables replicate data across up to five AWS regions with active-active write capability, providing sub-second replication latency and 99.999% availability.
DynamoDB Capacity Modes
7 rows
| Feature | On-Demand | Provisioned | Provisioned + Auto Scaling |
|---|---|---|---|
| Pricing Model | Pay per request | Pay per hour for capacity units | Pay per hour with auto-adjustment |
| Read Cost | $1.25 per million RRU | $0.00013 per RCU/hour | Same as provisioned |
| Write Cost | $1.25 per million WRU | $0.00065 per WCU/hour | Same as provisioned |
| Scaling Speed | Instant | Manual | 1-5 minutes |
| Throttling Risk | Very Low | High if under-provisioned | Low |
| Cost Predictability | Variable | Fixed | Semi-predictable |
| Best For | Unpredictable traffic, new apps | Steady, predictable workloads | Predictable with spikes |
Part 7: CloudFront CDN
Amazon CloudFront is a global content delivery network (CDN) with 600+ edge locations across 90+ cities in 49 countries. CloudFront caches static assets (images, CSS, JavaScript, videos) at edge locations closest to users, reducing latency from hundreds of milliseconds to single digits. It also accelerates dynamic content through persistent connections to the origin, TCP/TLS optimization, and intelligent routing.
CloudFront supports multiple origin types: S3 buckets, Application Load Balancers, EC2 instances, API Gateway, MediaStore, and any custom HTTP origin. Origin Groups provide automatic failover between a primary and secondary origin. Origin Access Control (OAC) restricts S3 bucket access to CloudFront only, preventing direct S3 URL access. Cache behaviors define different settings for different URL path patterns (e.g., /api/* no-cache, /static/* cache for 1 year).
Edge compute capabilities let you run code at CloudFront edge locations. CloudFront Functions execute in under 1ms for lightweight tasks like URL rewrites, header manipulation, and request routing. Lambda@Edge handles heavier tasks like authentication, A/B testing, and dynamic content generation. CloudFront also integrates with AWS WAF for web application security and AWS Shield for DDoS protection. Real-time logs stream to Kinesis Data Streams for monitoring and analytics.
CloudFront Features
10 rows
| Feature | Value | Description |
|---|---|---|
| Edge Locations | 600+ | Points of presence worldwide for low-latency content delivery |
| Regional Edge Caches | 13 | Intermediate cache layer between edge and origin for less popular content |
| Lambda@Edge | Node.js, Python | Run code at edge locations for request/response manipulation |
| CloudFront Functions | JavaScript | Lightweight functions for URL rewrites, header manipulation, A/B testing |
| Origin Shield | Optional | Additional caching layer to reduce origin load |
| Real-time Logs | Kinesis Data Streams | Stream access logs in real-time for monitoring and analytics |
| Field-Level Encryption | RSA | Encrypt specific POST fields at the edge before forwarding to origin |
| WebSocket Support | Yes | Persistent WebSocket connections through CloudFront |
| HTTP/3 (QUIC) | Yes | Faster connections with UDP-based HTTP/3 protocol support |
| Continuous Deployment | Staging Distribution | Test configuration changes with a percentage of traffic before full rollout |
Part 8: IAM and Security
AWS Identity and Access Management (IAM) controls who (authentication) can do what (authorization) on which AWS resources. IAM is free and foundational to every AWS account. Every API call to AWS is authenticated and authorized through IAM. The principle of least privilege means granting only the minimum permissions needed for a task, and removing permissions that are no longer used.
IAM Roles are the preferred way to grant permissions in AWS. Unlike IAM Users with long-lived access keys, roles provide temporary credentials via the Security Token Service (STS). EC2 instance profiles attach roles to instances. Lambda execution roles grant functions access to other services. Cross-account roles enable secure access between AWS accounts. Service-linked roles are managed by AWS services for their internal operations.
IAM Access Analyzer identifies resources shared with external entities and unused permissions. It analyzes resource-based policies on S3 buckets, IAM roles, KMS keys, Lambda functions, and SQS queues. Policy generation creates least-privilege policies based on CloudTrail activity. Permission boundaries set maximum permissions for IAM entities, useful for delegating IAM administration to developers without giving them full IAM access.
AWS Organizations manages multiple AWS accounts centrally. Service Control Policies (SCPs) set permission guardrails across accounts. AWS Control Tower provides a landing zone with pre-configured governance, security, and compliance controls. AWS SSO (IAM Identity Center) provides single sign-on access to multiple accounts and applications using existing identity providers (Okta, Azure AD, Google Workspace).
IAM Policy Concepts
8 rows
| Concept | Description | Example |
|---|---|---|
| Principal | The entity (user, role, service) making the request. Specified in resource-based policies. | "Principal": {"AWS": "arn:aws:iam::123456789012:role/MyRole"} |
| Action | The specific API operation being allowed or denied. | "Action": ["s3:GetObject", "s3:PutObject"] |
| Resource | The AWS resource(s) the policy applies to. Uses ARN format. | "Resource": "arn:aws:s3:::my-bucket/*" |
| Effect | Whether the statement allows or denies access. Deny always wins. | "Effect": "Allow" or "Effect": "Deny" |
| Condition | Optional constraints like IP range, MFA, time, or tag values. | "Condition": {"IpAddress": {"aws:SourceIp": "10.0.0.0/8"}} |
| Policy Types | Identity-based (attached to users/roles), Resource-based (on resources), SCPs (organization-wide), Permission boundaries. | Identity policy on role + resource policy on S3 bucket |
| Least Privilege | Grant only the minimum permissions needed. Use IAM Access Analyzer to identify unused permissions. | Start with ReadOnly, add specific write actions as needed |
| Roles vs Users | Roles provide temporary credentials via STS. Preferred over long-lived user access keys. | EC2 instance role, Lambda execution role, cross-account role |
Part 9: VPC Networking
Amazon VPC (Virtual Private Cloud) provides an isolated virtual network for your AWS resources. You define the IP address range (CIDR block), create subnets across Availability Zones, configure route tables, and control traffic with security groups and network ACLs. Every AWS account gets a default VPC in each region, but production workloads should use custom VPCs with deliberate network design.
A typical VPC architecture includes public subnets (with Internet Gateway access for load balancers and NAT gateways), private subnets (for application servers and databases), and isolated subnets (no internet access, for sensitive workloads). Each subnet spans a single Availability Zone. Deploy across at least two AZs for high availability. Use /24 CIDR blocks (256 IPs) per subnet as a starting point, with room to grow.
VPC connectivity options include: VPC Peering for direct connections between two VPCs, Transit Gateway for hub-and-spoke multi-VPC architectures, Site-to-Site VPN for encrypted connections to on-premises networks, Direct Connect for dedicated private fiber connections (1/10/100 Gbps), and PrivateLink for private access to AWS services and third-party SaaS without internet traversal. VPC Flow Logs capture IP traffic metadata for security analysis and troubleshooting.
DNS resolution in VPCs uses Route 53 Resolver. Private hosted zones enable custom DNS names for internal resources. Route 53 Resolver endpoints forward DNS queries between VPCs and on-premises networks. VPC DHCP Options Sets configure DNS servers, domain names, and NTP servers for instances. IPv6 support enables dual-stack networking with both IPv4 and IPv6 addresses on the same resources.
VPC Architecture Components
10 rows
| Component | Description | Typical Config | Default Limit |
|---|---|---|---|
| VPC | Isolated virtual network. Define IP range (CIDR block), subnets, routing, and security. | 10.0.0.0/16 (65,536 IPs) | 5 per region (adjustable) |
| Subnet | IP address range within a VPC. Can be public (internet access) or private. | /24 (256 IPs) per AZ | 200 per VPC |
| Internet Gateway | Connects VPC to the internet. Attach to VPC, add route in public subnet route table. | 1 per VPC | 1 per VPC |
| NAT Gateway | Allows private subnet instances to access internet without inbound access. Managed, HA within AZ. | 1 per AZ | 5 per AZ |
| Security Group | Stateful firewall at instance level. Allow rules only (no deny). Evaluates all rules before deciding. | 3-5 per instance | 2,500 per VPC |
| Network ACL | Stateless firewall at subnet level. Allow and deny rules. Evaluated in order by rule number. | 1 per subnet | 200 per VPC |
| Route Table | Rules that determine where network traffic is directed. Each subnet associates with one route table. | 1 public + 1 private per AZ | 200 per VPC |
| VPC Peering | Direct networking connection between two VPCs. Non-transitive. Can span accounts and regions. | Hub-spoke topology | 50 per VPC |
| Transit Gateway | Central hub connecting multiple VPCs and on-premises networks. Transitive routing. | 1 per region | 5,000 attachments |
| PrivateLink / VPC Endpoint | Private connectivity to AWS services or third-party services without internet traversal. | Interface or Gateway type | 50 per VPC |
Part 10: Pricing and Cost Optimization
AWS pricing follows a pay-as-you-go model with discounts for committed usage. Understanding pricing is critical because cloud costs can spiral without governance. The three pillars of cost optimization are: right-sizing (use the smallest instance that meets requirements), commitment discounts (Reserved Instances and Savings Plans for predictable workloads), and waste elimination (turn off unused resources, use lifecycle policies).
Savings Plans offer the best balance of flexibility and savings. Compute Savings Plans provide up to 66% savings and apply automatically to EC2, Lambda, and Fargate usage regardless of instance family, size, OS, or region. EC2 Instance Savings Plans offer up to 72% savings but are locked to a specific instance family in a region. Both require a 1 or 3 year commitment to a consistent dollar amount per hour of usage.
AWS Cost Explorer visualizes spending trends, forecasts future costs, and identifies savings opportunities. AWS Budgets sends alerts when spending exceeds thresholds. AWS Compute Optimizer analyzes utilization metrics and recommends right-sized instance types. Trusted Advisor checks for idle resources, underutilized instances, and unused EBS volumes. Tags enable cost allocation by team, project, and environment for chargeback and showback reporting.
AWS Service Pricing Comparison
8 rows
| Service | Unit | On-Demand | 1-Year RI | 3-Year RI | Savings |
|---|---|---|---|---|---|
| EC2 (m7i.large) | per hour | $0.1008 | $0.0634 | $0.0406 | 36-60% |
| Lambda | per 1M requests | $0.20 | N/A | N/A | N/A (pay per use) |
| S3 Standard | per GB/month | $0.023 | N/A | N/A | Use lifecycle policies |
| RDS (db.r7g.large) | per hour | $0.260 | $0.164 | $0.105 | 37-60% |
| DynamoDB | per 1M WRU | $1.25 | Use provisioned | Use provisioned | Up to 77% |
| CloudFront | per TB transfer | $0.085 | N/A | N/A | Volume discounts |
| EKS | per cluster/hour | $0.10 | N/A | N/A | N/A |
| ElastiCache (r7g.large) | per hour | $0.252 | $0.159 | $0.102 | 37-60% |
Part 11: Architecture Patterns
The three-tier architecture remains the most common pattern on AWS: presentation layer (CloudFront + S3 for static assets, ALB for dynamic), application layer (EC2/ECS/Lambda for business logic), and data layer (RDS/DynamoDB for persistence, ElastiCache for caching). This pattern works for most web applications and provides clear separation of concerns, independent scaling, and straightforward security boundaries.
Event-driven architecture uses EventBridge as a central event bus. Producers (applications, AWS services, SaaS) publish events. Rules route events to consumers (Lambda, SQS, Step Functions, API destinations). This pattern provides loose coupling, independent scaling, and resilience. Dead letter queues capture failed events for retry. Event replay enables reprocessing historical events after deploying bug fixes.
Microservices on AWS typically run on ECS Fargate or EKS with an Application Load Balancer for routing. Service discovery uses AWS Cloud Map or Kubernetes service discovery. Inter-service communication uses synchronous REST/gRPC (via ALB) or asynchronous messaging (SQS/SNS/EventBridge). AWS App Mesh provides a service mesh for traffic management, observability, and security between microservices.
Serverless architecture combines API Gateway, Lambda, DynamoDB, and S3 for zero-infrastructure applications. Step Functions orchestrate complex workflows across multiple Lambda functions. This pattern eliminates server management, scales automatically, and charges only for actual usage. It works well for APIs, event processing, and backend for mobile/web applications. Limitations include cold starts, 15-minute Lambda timeout, and vendor lock-in.
Part 12: DevOps and Infrastructure as Code
Infrastructure as Code (IaC) is essential for managing AWS resources reliably. AWS CDK (Cloud Development Kit) is the recommended IaC tool for teams comfortable with TypeScript, Python, Java, or Go. CDK lets you define infrastructure using familiar programming constructs (loops, conditionals, classes), then synthesizes to CloudFormation templates. L2 constructs provide smart defaults and best practices built-in.
AWS CodePipeline automates CI/CD with CodeCommit (or GitHub/GitLab) for source, CodeBuild for build and test, and CodeDeploy for deployment. However, many teams use GitHub Actions or GitLab CI/CD instead of AWS-native tools. Deployment strategies include rolling updates (gradually replace instances), blue/green (deploy to new environment, switch traffic), and canary (route small percentage of traffic to new version, monitor, then expand).
AWS Systems Manager provides operational management for EC2 instances and on-premises servers. Session Manager enables SSH/RDP access without opening inbound ports. Parameter Store and Secrets Manager store configuration values and secrets. Patch Manager automates OS patching. Run Command executes scripts across fleets of instances. State Manager ensures consistent configuration through association documents.
Monitoring and observability on AWS centers around CloudWatch for metrics and logs, X-Ray for distributed tracing, and CloudTrail for API audit logging. CloudWatch Logs Insights provides SQL-like query language for log analysis. CloudWatch Contributor Insights identifies top contributors to operational issues. Amazon Managed Grafana and Amazon Managed Service for Prometheus provide open-source observability tools as managed services.
Glossary (43 Terms)
Region
InfrastructureA geographic area containing multiple isolated Availability Zones. AWS has 33+ regions worldwide. Choose based on latency, compliance, and service availability.
Availability Zone (AZ)
InfrastructureOne or more discrete data centers with redundant power, networking, and connectivity within a region. Designing across AZs provides high availability.
Edge Location
InfrastructureA site that CloudFront uses to cache copies of content closer to users. 600+ edge locations globally for low-latency delivery.
ARN
CoreAmazon Resource Name. Unique identifier for AWS resources. Format: arn:aws:service:region:account-id:resource-type/resource-id.
IAM
SecurityIdentity and Access Management. Controls who can access what in AWS. Users, groups, roles, and policies define permissions.
STS
SecuritySecurity Token Service. Provides temporary, limited-privilege credentials for IAM roles. Used by EC2 instance profiles, Lambda, and cross-account access.
VPC
NetworkingVirtual Private Cloud. Isolated virtual network where you launch AWS resources. Define IP ranges, subnets, route tables, and gateways.
CIDR
NetworkingClassless Inter-Domain Routing. Notation for IP address ranges (e.g., 10.0.0.0/16). Used to define VPC and subnet address spaces.
Security Group
NetworkingVirtual firewall for EC2 instances. Stateful (return traffic automatically allowed). Only allow rules, no deny. Evaluated as a set.
NACL
NetworkingNetwork Access Control List. Stateless firewall at the subnet level. Supports allow and deny rules. Rules evaluated in order by number.
EC2
ComputeElastic Compute Cloud. Virtual servers in the cloud. Choose instance type, AMI, storage, and networking. On-demand, reserved, or spot pricing.
AMI
ComputeAmazon Machine Image. Template containing OS, application server, and applications for launching EC2 instances. Can be public, shared, or private.
Auto Scaling Group
ComputeCollection of EC2 instances managed as a unit. Automatically scales in/out based on demand, health checks, and scheduling policies.
Lambda
ComputeServerless compute service. Run code without provisioning servers. Pay per request and compute time. Scales automatically from 0 to thousands of instances.
ECS
ContainersElastic Container Service. Run and manage Docker containers. Supports Fargate (serverless) and EC2 launch types. Integrates with ALB and service discovery.
EKS
ContainersElastic Kubernetes Service. Managed Kubernetes control plane. Run Kubernetes workloads on EC2 or Fargate. Supports Karpenter for node auto-scaling.
Fargate
ContainersServerless compute engine for containers. No EC2 instances to manage. Pay for vCPU and memory used by containers. Works with ECS and EKS.
S3
StorageSimple Storage Service. Object storage with 99.999999999% durability. Unlimited storage. Lifecycle policies, versioning, replication, and encryption.
EBS
StorageElastic Block Store. Persistent block storage for EC2 instances. Types: gp3 (general), io2 (high IOPS), st1 (throughput), sc1 (cold).
EFS
StorageElastic File System. Managed NFS file system. Shared storage across multiple EC2 instances and AZs. Grows and shrinks automatically.
RDS
DatabaseRelational Database Service. Managed relational databases: MySQL, PostgreSQL, Oracle, SQL Server, MariaDB. Automated backups, patching, and Multi-AZ.
Aurora
DatabaseAWS-designed relational database compatible with MySQL and PostgreSQL. 5x MySQL performance. 128 TB max. Serverless v2 for auto-scaling.
DynamoDB
DatabaseFully managed NoSQL database. Single-digit millisecond latency at any scale. Key-value and document data models. Global tables for multi-region.
ElastiCache
DatabaseManaged in-memory data store. Supports Redis and Memcached. Sub-millisecond latency for caching, session storage, and real-time analytics.
CloudFront
NetworkingContent Delivery Network (CDN). 600+ edge locations. Caches static and dynamic content. Lambda@Edge and CloudFront Functions for edge compute.
Route 53
NetworkingDNS service. Domain registration, DNS routing, and health checking. Supports latency-based, geolocation, weighted, and failover routing policies.
ALB
NetworkingApplication Load Balancer. Layer 7 load balancing. Path-based and host-based routing. WebSocket, HTTP/2, gRPC support. Integrates with WAF.
NLB
NetworkingNetwork Load Balancer. Layer 4 load balancing. Ultra-low latency. Static IP addresses. Millions of requests per second. TCP, UDP, TLS.
CloudFormation
DevOpsInfrastructure as Code service. Define AWS resources in JSON/YAML templates. Stack-based deployment with rollback on failure.
CDK
DevOpsCloud Development Kit. Define infrastructure using programming languages (TypeScript, Python, Java, C#, Go). Synthesizes to CloudFormation templates.
CloudWatch
MonitoringMonitoring and observability service. Metrics, logs, alarms, dashboards. Custom metrics, log insights queries, and anomaly detection.
X-Ray
MonitoringDistributed tracing service. Trace requests across microservices. Identify performance bottlenecks and errors in distributed applications.
SNS
IntegrationSimple Notification Service. Pub/sub messaging. Push notifications to SQS, Lambda, HTTP, email, SMS. Fan-out pattern for event-driven architectures.
SQS
IntegrationSimple Queue Service. Fully managed message queuing. Standard (at-least-once, best-effort ordering) and FIFO (exactly-once, ordered) queues.
EventBridge
IntegrationServerless event bus. Route events between AWS services, SaaS apps, and custom applications. Schema registry and event replay.
Step Functions
IntegrationServerless orchestration service. Visual workflows coordinating Lambda, ECS, and other services. Standard and Express execution types.
KMS
SecurityKey Management Service. Create and manage encryption keys. Integrated with 100+ AWS services. Automatic key rotation. HSM-backed.
WAF
SecurityWeb Application Firewall. Protect web applications from common exploits. Rate limiting, IP blocking, SQL injection and XSS protection. Managed rules.
Cognito
SecurityUser authentication and authorization service. User pools (sign-up/sign-in) and identity pools (temporary AWS credentials). OAuth2/OIDC support.
Cost Explorer
CostVisualize and analyze AWS spending. Forecast future costs. Identify savings opportunities with Reserved Instances and Savings Plans.
Savings Plans
CostFlexible pricing model offering up to 72% savings over on-demand. Commit to consistent usage ($/hour) for 1 or 3 years. Compute or EC2 plans.
Well-Architected Framework
ArchitectureBest practices across six pillars: operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability.
Shared Responsibility Model
SecurityAWS secures the cloud infrastructure (hardware, software, networking). Customer secures what runs in the cloud (data, applications, IAM, encryption).
FAQ (15 Questions)
Try It Yourself
Explore JSON and YAML tools relevant to AWS configuration.
Try it yourself
Json Formatter
Try it yourself
Yaml Formatter
Tool preview unavailable.
Open Yaml Formatter in a new pageRaw Data Downloads
Citations and Sources
Try These Tools for Free
Put this knowledge into practice with our browser-based tools. No signup needed.
JSON Formatter
Format, validate, and beautify JSON data with syntax highlighting.
YAML Validate
Validate YAML syntax, show errors with line numbers, format/beautify, and convert YAML to JSON.
Dockerfile Gen
Generate Dockerfiles for Node, Python, Go, Java, Nginx, and Alpine. Configure port, env vars, and commands.
.env Gen
Generate .env files from templates. Select services like DB, Stripe, Auth, AWS, and get properly commented environment variables.
Subnet Calc
Calculate network address, broadcast, host range, subnet mask, and number of hosts from IP + CIDR.
Related Research Reports
The Complete Cloud Computing Guide 2026: AWS vs Azure vs GCP, Serverless, Containers & IaC
The definitive cloud computing reference for 2026. Covers AWS, Azure, GCP service comparisons, serverless architecture, container orchestration, Infrastructure as Code, cost optimization, and multi-cloud strategies. 28,000+ words.
Terraform and IaC Guide 2026: HCL, Providers, Modules, State, Workspaces, Terragrunt
The definitive Terraform and IaC guide for 2026. HCL, providers, modules, state management, workspaces, Terragrunt. 40 glossary, 15 FAQ. 30,000+ words.
The Complete DevOps & CI/CD Guide 2026: Pipelines, GitHub Actions, ArgoCD & Monitoring
The definitive DevOps reference for 2026. Covers CI/CD pipeline design, GitHub Actions, Jenkins, ArgoCD, GitOps, monitoring with Prometheus and Grafana, logging, Infrastructure as Code, and SRE practices. 28,000+ words.
