Building a Cloud-Native Container Platform from Scratch - Part 3
So far, we’ve covered the “why” and “what” of building your internal platform. Now it’s time to build the foundation — and we’ll do it the right way: with Infrastructure as Code (IaC) from day one.
No console clicking. No snowflake environments. Just declarative, reviewable, auditable infrastructure that scales with your team.
In this post, we’ll focus on bootstrapping your EKS-based platform in AWS using Terraform; laying down networking, identity, and the Kubernetes cluster itself.
Full series
- Part 1: Why Build a Self-Service Container Platform
- Part 2: Choosing Your Platform’s Building Blocks
- Part 3: Bootstrapping Your Infrastructure with Terraform (you are here)
- Part 4: Installing Core Platform Services
- Part 5: Crafting the Developer Experience Layer
- Part 6: Scaling the Platform — Multi-Tenancy, Environments, and Governance
- Part 7: Day-2 Operations and Platform Maturity
- Part 8: The Future of Your Internal Platform
What We’re Building in This Phase
Your platform needs more than just a Kubernetes control plane. Here’s what we’ll provision in this foundational step:
- S3 & DynamoDB – For remote Terraform state storage
- VPC & Subnets – Isolated and segmented for public/private workloads
- IAM Roles & Policies – Secure access for Kubernetes nodes and GitOps
- EKS Cluster – With managed node groups and minimal friction
- Security Groups & Networking – To control traffic flow
Each of these layers will set you up for secure multi-environment management, Git-based automation, and controlled network flow — all critical for a production-grade platform.
Step 1: Set Up Secure Remote State
Before creating any resources, we need to store Terraform state somewhere stable. This ensures that the state file, which tracks resource configurations and dependencies, is secure, consistent, and accessible for future updates or team collaboration. A Terraform state lock prevents conflicting concurrent deployments.
Use an S3 bucket for the state file, and a DynamoDB table for locking:
resource "aws_s3_bucket" "tf_state" {
bucket = "my-platform-terraform-state"
acl = "private"
}
resource "aws_dynamodb_table" "tf_lock" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
Run a terraform apply
command to create these foundation resources before we continue building the EKS solution.
Now create a new Terraform module and set your Terraform backend config:
terraform {
backend "s3" {
bucket = "my-platform-terraform-state"
key = "eks/base.tfstate"
region = "eu-west-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
Step 2: Define Your VPC and Subnets
We’ll split the VPC into public and private subnets across multiple AZs:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
name = "platform-vpc"
cidr = "10.0.0.0/16"
azs = ["eu-west-1a", "eu-west-1b"]
public_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
private_subnets = ["10.0.11.0/24", "10.0.12.0/24"]
enable_nat_gateway = true
}
Public vs. Private Subnets
Public Subnet: A public subnet is a network segment that has direct access to the internet. Resources in a public subnet, such as load balancers or NAT gateways, typically have public IP addresses and can send/receive traffic directly from the internet. These subnets are often used for components that need to be externally accessible.
Private Subnet: A private subnet is isolated from direct internet access. Resources in a private subnet, such as EKS worker nodes or databases, do not have public IP addresses and rely on NAT gateways or other mechanisms to access the internet indirectly. These subnets are used for components that should remain secure and inaccessible from the public internet.
In your cloud-native container platform:
- Public Subnets: Host load balancers to distribute traffic to your application and NAT gateways to allow private resources to access the internet securely.
- Private Subnets: Host EKS worker nodes to run your containerised workloads, ensuring they are protected from direct internet exposure.
Step 3: IAM for EKS and Workers
EKS needs roles with precise permissions. This community-maintained Terraform module creates an IAM role that can be assumed by your EKS ServiceAccount:
module "eks_iam" {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
name = "eks-cluster-role"
trusted_role_arns = ["arn:aws:iam::${var.account_id}:root"]
create_policy = true
policy = data.aws_iam_policy_document.eks_policy.json
}
Worker nodes get their own role with access to pull from registries, write logs, etc.
Step 4: Spin Up the EKS Cluster
Use the terraform-aws-modules/eks/aws module for deploying an EKS cluster with simplicity. This module abstracts much of the complexity involved in setting up an EKS cluster, including networking, IAM roles, and managed node groups.
Here’s an example configuration:
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "my-platform-cluster"
cluster_version = "1.29"
subnets = module.vpc.private_subnets
vpc_id = module.vpc.vpc_id
eks_managed_node_groups = {
default = {
desired_capacity = 2
max_capacity = 4
min_capacity = 1
instance_types = ["t3.medium"]
}
}
enable_irsa = true
}
This configuration sets up an EKS cluster with managed node groups, enabling IAM Roles for Service Accounts (IRSA) for secure pod-level permissions. Adjust the desired_capacity
, max_capacity
, and instance_types
to fit your workload requirements.
EKS managed node groups are groups of EC2 instances (nodes) that AWS manages for you. When you use managed node groups, AWS automatically handles tasks like:
- Launching and updating EC2 instances for your Kubernetes cluster
- Patching the underlying operating system
- Replacing unhealthy nodes automatically
This means you don’t have to manually provision, update, or scale the worker nodes yourself. You simply define the desired size and instance types, and AWS takes care of the rest. Managed node groups make it easier and safer to run Kubernetes workloads on EKS, especially for teams new to managing infrastructure.
IRSA (IAM Roles for Service Accounts) will allow us to bind cloud permissions directly to pods later — no kube-admin hacks required. IRSA provide the ability to manage credentials for your applications, similar to the way that Amazon EC2 instance profiles provide credentials to Amazon EC2 instances. Instead of creating and distributing your AWS credentials to the containers or using the Amazon EC2 instance’s role, you associate an IAM role with a Kubernetes service account and configure your Pods to use the service account.
Step 5: Validate the Cluster
Once all resources are applied successfully, configure your local Kubernetes context to interact with the cluster:
aws eks --region us-east-1 update-kubeconfig --name my-platform-cluster
kubectl get nodes
You should now see your worker nodes in a Ready
state, indicating that the cluster is operational. At this point, you can begin deploying workloads securely to your new platform.
What’s Next?
With the core infrastructure in place, we now have a secure and repeatable foundation for your internal platform.
In Part 4, we’ll install core platform services:
- GitOps (ArgoCD)
- Ingress Controller (e.g. AWS ALB or NGINX)
- Certificate management
- Observability stack (Prometheus, Grafana, Loki)
- Secrets integration
From here, your Kubernetes cluster starts to transform into a true platform.
Leave a comment