Building a Cloud-Native Container Platform from Scratch - Part 6

3 minute read Platform Engineering

By now, you’ve built a robust internal platform with a clean developer experience. But if your organisation is growing; more teams, more services, more environments — you’ll quickly encounter new challenges:

How do we isolate teams securely?
How do we manage staging vs. production?
How do we roll this out across multiple regions or accounts?

This post tackles the organisational scaling layer of your self-service container platform.

Full series

Part 1: Why Build a Self-Service Container Platform
Part 2: Choosing Your Platform’s Building Blocks
Part 3: Bootstrapping Your Infrastructure with Terraform
Part 4: Installing Core Platform Services
Part 5: Crafting the Developer Experience Layer
Part 6: Scaling the Platform — Multi-Tenancy, Environments, and Governance (you are here)
Part 7: Day-2 Operations and Platform Maturity
Part 8: The Future of Your Internal Platform

What Does Multi-Tenancy Really Mean?

In Kubernetes, “multi-tenancy” can mean many things, but in a platform context, we usually want:

Soft multi-tenancy (namespaces with RBAC, quotas, and policies)
Hard multi-tenancy (separate clusters or AWS accounts per business unit, compliance boundary, or region)

We’ll use a hybrid model:

Namespaces for most teams
Separate clusters/accounts for staging vs. prod, or for regulated data

This keeps cognitive overhead low while still enforcing strong isolation where needed.

Designing for Multiple Environments

Each workload typically moves through environments: dev → staging → prod.

Here’s a model that works well with GitOps:

One Git repo per app, with separate folders per environment:

app-repo/
  envs/
    dev/
    staging/
    prod/

Or better, one GitOps repo per environment, using Argo CD ApplicationSets to sync apps per environment cluster.

This separation allows you to:

Enforce stricter policies in production
Use different Helm/Kustomize values
Add promotion gates (manual approval, testing)

Regional and Account Isolation

For global teams or regulated workloads, split clusters by:

AWS account: Use an account per environment or BU (staging, prod, sandbox, regulated)
Region: Deploy EKS clusters in us-east-1, eu-west-1, etc.

Terraform modules can help replicate infrastructure with environment-specific settings.

Use shared tooling (like a central Argo CD instance or federation tools) to manage apps across clusters, or go cluster-local and treat each one as an independent platform instance.

Governance at Scale

As your platform scales, governance matters more. Bake it in early with:

Policy as Code (OPA/Gatekeeper or Kyverno): Prevent bad YAMLs from being deployed
Audit logging: Capture who deployed what and when
Access controls: Use SSO and fine-grained IAM to separate platform admin from app teams

You can also implement:

Image policies (only signed/trusted images can be used)
Namespace controls (e.g., production workloads can’t use :latest tags)

These aren’t just for compliance — they help teams ship faster with less risk.

Automating Cluster Lifecycle

Managing 2–3 clusters manually is annoying. Managing 10+ that way? Impossible.

Use tools like:

Terraform + Terragrunt for environment-specific infra
Argo CD + App of Apps pattern to bootstrap each cluster
Crossplane or Cluster API for cluster lifecycle automation, if needed

You can build an internal tool or CLI to create a new environment with a single command:

./platform create-environment prod-eu-west

This should provision:

The EKS cluster
IAM, VPC, secrets
Argo CD apps
Monitoring stack
Namespace templates

Platform as a Product

With multiple tenants and environments, your platform becomes a product with:

SLAs for uptime and support
Roadmaps and changelogs
Onboarding docs and training
Feedback loops and support channels

The platform team should track usage, outages, and satisfaction — and continuously iterate on templates, tooling, and documentation.

What You’ve Built So Far

You’re now operating at scale with:

Isolated, secure team workspaces
Production-ready environments with compliance boundaries
Automated cluster provisioning and GitOps delivery
Central governance and policy enforcement

This is what a modern internal platform looks like — not just Kubernetes with a UI, but a holistic system of people, tooling, and automation.

Coming Up in Part 7

Next, we’ll focus on day-2 operations and platform maturity:

Platform observability
Cost controls and chargeback
Backup and disaster recovery
Upgrades and long-term maintenance

Part 7: Day-2 Operations and Platform Maturity

Share on

X Facebook LinkedIn Bluesky

Glen Thomas

Building a Cloud-Native Container Platform from Scratch - Part 6

What Does Multi-Tenancy Really Mean?

Designing for Multiple Environments

Regional and Account Isolation

Governance at Scale

Automating Cluster Lifecycle

Platform as a Product

What You’ve Built So Far

Coming Up in Part 7

Share on

Leave a comment

You may also enjoy

Running Effective 1:1s: A Tactical Guide for Engineering Leaders

How to Make Architectural Decisions (and Stick to Them)

Designing a Scalable DNS Schema for Large Distributed Systems

Platform as a Product – A Guide for Platform Product Owners