5 minute read Platform Engineering

By now, your platform has all the infrastructure essentials: a secure Kubernetes foundation, GitOps-based delivery, ingress, observability, and secrets management. But to truly empower developers — and reduce cognitive load — we need to build something more than infrastructure.

We need a developer experience (DevEx) layer.

In this part, we’ll walk through how to provide self-service capabilities that abstract away Kubernetes complexity and make the platform usable by product teams.

What Developers Really Want

Ask most product engineers how they want to deploy services, and you won’t hear “Helm values” or “pod specs.” They want:

  • A place to push code
  • A repeatable CI/CD pipeline
  • A clear way to request infrastructure (databases, secrets, DNS)
  • Logs, metrics, and health checks at their fingertips

So our goal is to build opinionated golden paths while still keeping things modular and flexible.

Step 1: Namespace-as-a-Service

Start by creating an automated way for teams to request and manage their own isolated namespaces, complete with:

  • Pre-provisioned RBAC (read/write for devs, read-only for others)
  • Resource quotas and network policies
  • Tooling: metrics dashboards, secrets access, CI/CD integration

This can be automated using GitOps (Argo CD) or a portal where teams submit a request and get a PR in the platform config repo.

Here’s a typical structure in Git:

namespaces/
  team-a/
    namespace.yaml
    rbac.yaml
    resource-quota.yaml
    application.yaml

Example contents:

namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: team-a
  labels:
    owner: team-a

A Kubernetes namespace provides the scope for Pods, Services, and Deployments in the cluster. Users interacting with one namespace do not see the content in another namespace.

rbac.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-a-developers
  namespace: team-a
subjects:
  - kind: Group
  name: team-a-developers
roleRef:
  kind: Role
  name: edit
  apiGroup: rbac.authorization.k8s.io

A RoleBinding grants the permissions defined in a role to a user or set of users. It holds a list of subjects (users, groups, or service accounts), and a reference to the role being granted. A RoleBinding grants permissions within a specific namespace whereas a ClusterRoleBinding grants that access cluster-wide.

resource-quota.yaml

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-a-quota
  namespace: team-a
spec:
  hard:
    cpu: "4"
    memory: 8Gi
    pods: "10"

A ResourceQuota provides constraints that limit aggregate resource consumption per namespace.

application.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: team-a-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/team-a-app
    targetRevision: main
    path: deploy
  destination:
    server: https://kubernetes.default.svc
    namespace: team-a-development
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Application is an Argo CD CRD (Custom Resource Definition) representing a deployed application instance in an environment.

  • source references the desired state in Git (repository, revision, path, environment)
  • destination references the target cluster and namespace for deployment.

Step 2: Standardised Workload Templates

Here’s an example of a simple Kubernetes Deployment and Service for a Node.js web app:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nodejs-web
  namespace: team-a
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nodejs-web
  template:
    metadata:
      labels:
        app: nodejs-web
    spec:
      containers:
        - name: nodejs-web
          image: ghcr.io/org/nodejs-web:1.0.0
          ports:
            - containerPort: 3000
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          livenessProbe:
            httpGet:
              path: /healthz
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /readyz
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: nodejs-web
  namespace: team-a
spec:
  selector:
    app: nodejs-web
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
  type: ClusterIP

This example includes resource requests/limits, liveness/readiness probes, and exposes the app on port 80 within the cluster.

Not every team wants to figure out how to configure liveness probes, memory requests, or TLS.

Suppose you provide a Helm chart for a standard web service.

Here’s a sample generic web app Helm template:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: 
  namespace: 
spec:
  replicas: 
  selector:
    matchLabels:
      app: 
  template:
    metadata:
      labels:
        app: 
    spec:
      containers:
        - name: 
          image: :
          ports:
            - containerPort: 
          resources:
            requests:
              cpu: 
              memory: 
            limits:
              cpu: 
              memory: 
          livenessProbe:
            httpGet:
              path: /healthz
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /readyz
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: 
  namespace: 
spec:
  selector:
    app: nodejs-web
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
  type: ClusterIP

Teams only need to fill in a simple values.yaml to fill in the variables:

service:
  name: my-app
  namespace: team-a
  image:
    repository: ghcr.io/org/my-app
    tag: "1.2.3"
  port: 8080

resources:
  requests:
    cpu: "200m"
    memory: "256Mi"
  limits:
    cpu: "1"
    memory: "1Gi"

ingress:
  enabled: true
  host: my-app.dev.example.com
  tls: true

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 5
  targetCPUUtilizationPercentage: 70

monitoring:
  enabled: true
  path: /metrics

This abstracts away the boilerplate and ensures every service gets best-practice defaults for security, scaling, and observability.

Provide curated templates, delivered as:

Each template might include:

  • A deployment with sane defaults
  • Autoscaling enabled
  • TLS and ingress pre-configured
  • Service monitors for Prometheus
  • External secret references

Then teams just need to tweak a few inputs (repo URL, image tag, port).

Step 3: CI/CD Integration

CI/CD should be plug-and-play. Use tools like:

  • GitHub Actions, GitLab CI, or Tekton to build and push images
  • Argo CD to deploy from Git branches or tags
  • Optional integration with Backstage for a one-click “Create Component” flow

Tip: Define a reusable CI template that teams can extend with their own tests:

jobs:
  build:
    steps:
      - uses: actions/checkout@v3
      - name: Build & Push
        run: |
          docker build -t $REGISTRY/$SERVICE:$SHA .
          docker push $REGISTRY/$SERVICE:$SHA

Step 4: Internal Developer Portal (Optional, but Powerful)

To tie it all together, consider exposing this experience through a developer portal, like Backstage.

Features include:

  • Templates for new services (with built-in CI/CD, observability, secrets)
  • One-click deploys to dev or staging
  • Service catalog with docs, APIs, ownership
  • Direct links to logs, metrics, and alerts

The portal becomes the front door to your platform; giving developers confidence and autonomy.

What You’ve Achieved

At this point, you’ve shifted from “a Kubernetes cluster” to a true internal platform. Developers can:

  • Get a namespace with built-in guardrails
  • Deploy services using templated, approved paths
  • Push code and see it deployed automatically
  • Access logs, metrics, and alerts without digging through YAML

All while your platform team maintains central governance, security, and consistency.

Coming Up in Part 6

In the next part, we’ll zoom out and talk about multi-tenancy, environment management, and scaling your platform for teams and regions.

Part 6: Scaling the Platform — Multi-Tenancy, Environments, and Governance

Leave a comment