Building a Cloud-Native Container Platform from Scratch - Part 3

8 minute read Platform Engineering

So far, we’ve covered the “why” and “what” of building your internal platform. Now it’s time to build the foundation — and we’ll do it the right way: with Infrastructure as Code (IaC) from day one.

No console clicking. No snowflake environments. Just declarative, reviewable, auditable infrastructure that scales with your team.

In this post, we’ll focus on bootstrapping your EKS-based platform in AWS using Terraform; laying down networking, identity, and the Kubernetes cluster itself.

Full series

Part 1: Why Build a Self-Service Container Platform
Part 2: Choosing Your Platform’s Building Blocks
Part 3: Bootstrapping Your Infrastructure with Terraform (you are here)
Part 4: Installing Core Platform Services
Part 5: Crafting the Developer Experience Layer
Part 6: Scaling the Platform — Multi-Tenancy, Environments, and Governance
Part 7: Day-2 Operations and Platform Maturity
Part 8: The Future of Your Internal Platform

What We’re Building in This Phase

Your platform needs more than just a Kubernetes control plane. Here’s what we’ll provision in this foundational step:

S3 & DynamoDB – For remote Terraform state storage
VPC & Subnets – Isolated and segmented for public/private workloads
IAM Roles & Policies – Secure access for Kubernetes nodes and GitOps
EKS Cluster – With managed node groups and minimal friction
Security Groups & Networking – To control traffic flow

Each of these layers will set you up for secure multi-environment management, Git-based automation, and controlled network flow — all critical for a production-grade platform.

Step 1: Set Up Secure Remote State

Before creating any resources, we need to store Terraform state somewhere stable. This ensures that the state file, which tracks resource configurations and dependencies, is secure, consistent, and accessible for future updates or team collaboration. A Terraform state lock prevents conflicting concurrent deployments.

Use an S3 bucket for the state file, and a DynamoDB table for locking:

resource "aws_s3_bucket" "tf_state" {
  bucket = "cloud-native-container-platform-terraform-state"
}

resource "aws_s3_bucket_public_access_block" "tf_state_block" {
  bucket                  = aws_s3_bucket.tf_state.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "tf_lock" {
  name         = "cloud-native-container-platform-terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  attribute {
    name = "LockID"
    type = "S"
  }
}

Run a terraform apply command to create these foundation resources before we continue building the EKS solution.

Now create a new Terraform module and set your Terraform backend config:

terraform {
  backend "s3" {
    bucket         = "cloud-native-container-platform-terraform-state"
    key            = "eks/base.tfstate"
    region         = "eu-west-1"
    dynamodb_table = "cloud-native-container-platform-terraform-locks"
    encrypt        = true
  }
}

Step 2: Define Your VPC and Subnets

We’ll split the VPC into public and private subnets across multiple AZs:

module "vpc" {
  source         = "terraform-aws-modules/vpc/aws"
  version        = "~> 5.0"
  name           = "cloud-native-container-platform"
  cidr           = "10.0.0.0/16"
  azs            = ["eu-west-1a", "eu-west-1b"]
  public_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }
  private_subnets = ["10.0.11.0/24", "10.0.12.0/24"]
  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }
  enable_nat_gateway = true
}

Public vs. Private Subnets

Public Subnet: A public subnet is a network segment that has direct access to the internet. Resources in a public subnet, such as load balancers or NAT gateways, typically have public IP addresses and can send/receive traffic directly from the internet. These subnets are often used for components that need to be externally accessible.

Private Subnet: A private subnet is isolated from direct internet access. Resources in a private subnet, such as EKS worker nodes or databases, do not have public IP addresses and rely on NAT gateways or other mechanisms to access the internet indirectly. These subnets are used for components that should remain secure and inaccessible from the public internet.

In your cloud-native container platform:

Public Subnets: Host load balancers to distribute traffic to your application and NAT gateways to allow private resources to access the internet securely.
Private Subnets: Host EKS worker nodes to run your containerised workloads, ensuring they are protected from direct internet exposure.

Subnet Tags

You will notice that the public and private subnets are tagged with:

Public: “kubernetes.io/role/elb” = 1
private: “kubernetes.io/role/internal-elb” = 1

This is so that Kubernetes and the AWS load balancer controller know which subnets can be used for internal and external load balancers.

Step 3: Spin Up the EKS Cluster

Use the terraform-aws-modules/eks/aws module for deploying an EKS cluster with simplicity. This module abstracts much of the complexity involved in setting up an EKS cluster, including networking, IAM roles, and managed node groups.

Here’s an example configuration:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.37.1"

  cluster_name    = "my-container-platform"
  cluster_version = "1.33"

  enable_cluster_creator_admin_permissions = true
  cluster_endpoint_public_access           = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_groups = {
    default = {
      max_size       = 4
      min_size       = 1
      # This value is ignored after the initial creation
      desired_size   = 2
      instance_types = ["t3.small"]
    }
  }

  cluster_addons = {
    coredns                = {}
    eks-pod-identity-agent = {}
    kube-proxy             = {}
    vpc-cni                = {}
  }

  enable_irsa = true
}

This configuration sets up an EKS cluster with managed node groups, enabling IAM Roles for Service Accounts (IRSA) for secure pod-level permissions. Adjust the desired_size, max_size, and instance_types to fit your workload requirements.

Managed node groups vs Auto node groups

EKS managed node groups are groups of EC2 instances (nodes) that AWS manages for you. When you use managed node groups, AWS automatically handles tasks like:

Launching and updating EC2 instances for your Kubernetes cluster
Patching the underlying operating system
Replacing unhealthy nodes automatically

This means you don’t have to manually provision, update, or scale the worker nodes yourself. You simply define the desired size and instance types, and AWS takes care of the rest. Managed node groups make it easier and safer to run Kubernetes workloads on EKS, especially for teams new to managing infrastructure.

EKS auto node groups provides fully-managed worker node operations, eliminating the need for customers to set up Managed Node Groups or AutoScaling groups.

EKS Auto Mode uses a Karpenter-based system that automatically provisions EC2 instances in response to pod requests. These instances run on Bottlerocket AMIs with pre-installed add-ons like EBS CSI drivers, making the infrastructure truly managed by AWS.

Auto Mode is geared towards users that want the benefits of Kubernetes and EKS but need to minimise operational burden around Kubernetes like upgrades and installation/maintenance of critical platform pieces like auto-scaling, load balancing, and storage.

From a cost perspective, EKS Auto Mode maintains standard EC2 pricing while adding a management fee only for Auto Mode-managed nodes.

You can mix Auto Mode managed nodes with self-managed nodes in the same cluster.

Cluster Addons

A cluster add-on is software that provides supporting operational capabilities to Kubernetes applications, but is not specific to the application. This includes software like observability agents or Kubernetes drivers that allow the cluster to interact with underlying AWS resources for networking, compute, and storage. Add-on software is typically built and maintained by the Kubernetes community, AWS, or third-party vendors.

Some common AWS EKS add-ons:

Add-on Name	Description
CoreDNS	Provides DNS-based service discovery for Kubernetes pods and services.
Kube-proxy	Maintains network rules on nodes for Kubernetes service networking.
Amazon VPC CNI	Enables Kubernetes pods to have native VPC networking and IP addresses.
Amazon EBS CSI	Allows Kubernetes to dynamically provision and manage AWS EBS volumes for persistent storage.
Pod Identity Agent	Enables Kubernetes pods to assume AWS IAM roles using IRSA, providing secure pod-level AWS access.

With Auto mode compute, many commonly used EKS add-ons become redundant. However, if your cluster combines Auto mode with other compute options like Managed Node Groups, these add-ons remain necessary.

IRSA

IRSA (IAM Roles for Service Accounts) will allow us to bind cloud permissions directly to pods later — no kube-admin hacks required. IRSA provide the ability to manage credentials for your applications, similar to the way that Amazon EC2 instance profiles provide credentials to Amazon EC2 instances. Instead of creating and distributing your AWS credentials to the containers or using the Amazon EC2 instance’s role, you associate an IAM role with a Kubernetes service account and configure your Pods to use the service account.

Step 4: IAM for EKS and Workers

EKS needs roles with precise permissions. This community-maintained Terraform module creates an IAM role that can be assumed by your EKS ServiceAccount:

module "iam_eks_role" {
  source = "terraform-aws-modules/iam/aws//modules/iam-eks-role"

  role_name = "eks-cluster-role"

  cluster_service_accounts = {
    "${module.eks.cluster_name}" = [
      "default:cloud-native-container-platform",
      "kube-system:aws-load-balancer-controller"
    ]
  }

  tags = {
    Name = "eks-cluster-role"
  }

  role_policy_arns = {
    /*
     * This policy provides the Amazon VPC CNI Plugin (amazon-vpc-cni-k8s)
     * the permissions it requires to modify the IP address configuration
     * on your EKS worker nodes. This permission set allows the CNI to list,
     * describe, and modify Elastic Network Interfaces on your behalf.
     */
    AmazonEKS_CNI_Policy = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  }

  depends_on = [ module.eks ]
}

Worker nodes get their own role with access to pull from registries, write logs, etc.

Step 5: Validate the Cluster

Once all resources are applied successfully, configure your local Kubernetes context to interact with the cluster:

aws eks --region eu-west-1 update-kubeconfig --name my-container-platform

You should see output like:

Added new context arn:aws:eks:eu-west-1:556192351748:cluster/my-container-platform to /Users/glen.thomas/.kube/config

Ensure you have the kubernetes CLI installed:

brew install kubernetes-cli

Use the kubernetes CLI to get the cluster nodes:

kubectl get nodes

You should now see your worker nodes in a Ready state, indicating that the cluster is operational.

NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-0-11-71.eu-west-1.compute.internal   Ready    <none>   14m   v1.33.0-eks-802817d

At this point, you can begin deploying workloads securely to your new platform.

What’s Next?

With the core infrastructure in place, we now have a secure and repeatable foundation for your internal platform.

In Part 4, we’ll install core platform services:

GitOps (ArgoCD)
Ingress Controller (e.g. AWS ALB or NGINX)
Certificate management
Observability stack (Prometheus, Grafana, Loki)
Secrets integration

From here, your Kubernetes cluster starts to transform into a true platform.

Part 4: Installing Core Platform Services

Share on

X Facebook LinkedIn Bluesky

Glen Thomas