Building a Cloud-Native Container Platform from Scratch - Part 6
By now, you’ve built a robust internal platform with a clean developer experience. But if your organisation is growing; more teams, more services, more environments — you’ll quickly encounter new challenges:
- How do we isolate teams securely?
- How do we manage staging vs. production?
- How do we roll this out across multiple regions or accounts?
This post tackles the organisational scaling layer of your self-service container platform.
Full series
- Part 1: Why Build a Self-Service Container Platform
- Part 2: Choosing Your Platform’s Building Blocks
- Part 3: Bootstrapping Your Infrastructure with Terraform
- Part 4: Installing Core Platform Services
- Part 5: Crafting the Developer Experience Layer
- Part 6: Scaling the Platform — Multi-Tenancy, Environments, and Governance (you are here)
- Part 7: Day-2 Operations and Platform Maturity
- Part 8: The Future of Your Internal Platform
What Does Multi-Tenancy Really Mean?
In Kubernetes, “multi-tenancy” can mean many things, but in a platform context, we usually want:
- Soft multi-tenancy (namespaces with RBAC, quotas, and policies)
- Hard multi-tenancy (separate clusters or AWS accounts per business unit, compliance boundary, or region)
We’ll use a hybrid model:
- Namespaces for most teams
- Separate clusters/accounts for staging vs. prod, or for regulated data
This keeps cognitive overhead low while still enforcing strong isolation where needed.
Designing for Multiple Environments
Each workload typically moves through environments: dev → staging → prod.
Here’s a model that works well with GitOps:
- One Git repo per app, with separate folders per environment:
app-repo/
envs/
dev/
staging/
prod/
- Or better, one GitOps repo per environment, using Argo CD ApplicationSets to sync apps per environment cluster.
This separation allows you to:
- Enforce stricter policies in production
- Use different Helm/Kustomize values
- Add promotion gates (manual approval, testing)
Regional and Account Isolation
For global teams or regulated workloads, split clusters by:
- AWS account: Use an account per environment or BU (staging, prod, sandbox, regulated)
- Region: Deploy EKS clusters in us-east-1, eu-west-1, etc.
Terraform modules can help replicate infrastructure with environment-specific settings.
Use shared tooling (like a central Argo CD instance or federation tools) to manage apps across clusters, or go cluster-local and treat each one as an independent platform instance.
Governance at Scale
As your platform scales, governance matters more. Bake it in early with:
- Policy as Code (OPA/Gatekeeper or Kyverno): Prevent bad YAMLs from being deployed
- Audit logging: Capture who deployed what and when
- Access controls: Use SSO and fine-grained IAM to separate platform admin from app teams
You can also implement:
- Image policies (only signed/trusted images can be used)
- Namespace controls (e.g., production workloads can’t use :latest tags)
These aren’t just for compliance — they help teams ship faster with less risk.
Automating Cluster Lifecycle
Managing 2–3 clusters manually is annoying. Managing 10+ that way? Impossible.
Use tools like:
- Terraform + Terragrunt for environment-specific infra
- Argo CD + App of Apps pattern to bootstrap each cluster
- Crossplane or Cluster API for cluster lifecycle automation, if needed
You can build an internal tool or CLI to create a new environment with a single command:
./platform create-environment prod-eu-west
This should provision:
- The EKS cluster
- IAM, VPC, secrets
- Argo CD apps
- Monitoring stack
- Namespace templates
Platform as a Product
With multiple tenants and environments, your platform becomes a product with:
- SLAs for uptime and support
- Roadmaps and changelogs
- Onboarding docs and training
- Feedback loops and support channels
The platform team should track usage, outages, and satisfaction — and continuously iterate on templates, tooling, and documentation.
What You’ve Built So Far
You’re now operating at scale with:
- Isolated, secure team workspaces
- Production-ready environments with compliance boundaries
- Automated cluster provisioning and GitOps delivery
- Central governance and policy enforcement
This is what a modern internal platform looks like — not just Kubernetes with a UI, but a holistic system of people, tooling, and automation.
Coming Up in Part 7
Next, we’ll focus on day-2 operations and platform maturity:
- Platform observability
- Cost controls and chargeback
- Backup and disaster recovery
- Upgrades and long-term maintenance
Leave a comment