3 minute read Platform Engineering

If you run AKS long enough, you eventually meet the ghosts of deprecated features past.
This is the story of how AAD Pod Identity, long deprecated but still lurking in many clusters, silently broke Flux image automation, and how to diagnose and fix it when it happens to you.

This post documents the symptoms, the root cause, and the exact steps required to recover a cluster stuck in this half‑installed, half‑removed state.


The Setup

This cluster was:

  • Running AKS with the old AAD Pod Identity Helm chart still installed
  • Missing MIC (Managed Identity Controller)
  • Still running NMI (Node Managed Identity) on every node
  • Running Flux v2 with image automation enabled
  • Using ACR with a node‑assigned managed identity

This combination is more common than you’d think—especially in clusters upgraded over several years.


The Symptoms

Flux’s image-reflector-controller began failing with IMDS authentication errors:

failed to configure authentication options: failed to create provider access token for the controller: ManagedIdentityCredential: context deadline exceeded

NMI logs showed:

failed to get matching identities for pod: flux-system/image-reflector-controller…

Flux could not:

  • Query ACR
  • Populate its tag database
  • Resolve image policies

Image automation was effectively dead.


Diagnosis: The Smoking Gun

Running kubectl get pods -n kube-system | grep nmi revealed dozens of NMI pods, all running for hundreds of days.

kubectl get crd | grep aadpodidentity revealed AAD Pod Identity CRDs still installed.

But running kubectl get pods -n kube-system | grep mic showed no MIC pods at all.

This is the broken state:

  • NMI intercepts IMDS calls
  • MIC is missing, so identities are never assigned
  • Every IMDS call from a pod fails
  • Flux cannot authenticate to ACR

This is where the cluster was stuck.

Why This Breaks Flux

Flux’s controllers authenticate to ACR using the node’s managed identity via IMDS.

But NMI intercepts IMDS calls and tries to look up a pod’s assigned identity.

With MIC missing, NMI always returns:

no AzureAssignedIdentity found

Flux never reaches IMDS → never reaches ACR → never sees image tags.


The Fix: AzurePodIdentityException

The solution is to tell NMI:

“Do not intercept IMDS calls for Flux pods.”

This is done using an AzurePodIdentityException.

But here’s the twist:
AAD Pod Identity has multiple CRD schemas depending on the version, and this cluster was running a very old one.

Attempts using modern fields like:

spec:
  podSelector:

or

spec:
  PodLabels:

were rejected.

The correct schema, discovered by inspecting existing exceptions (kubectl get azurepodidentityexception -n flux-system -o yaml), was:

spec:
  podLabels:
    <label>: <value>

I discovered a couple of existing exception for Flux that didn’t cover the label used by the image-reflector-controller pod:

spec:
  podLabels:
    app.kubernetes.io/name: flux-extension

Creating the Correct Exception

I inspected the labels of the pods in the flux-system namespace:

kubectl get pods -n flux-system --show-labels

Flux controllers all share this label:

app.kubernetes.io/name=microsoft.flux

So the correct exception is:

apiVersion: aadpodidentity.k8s.io/v1
kind: AzurePodIdentityException
metadata:
  name: flux-system-exception
  namespace: flux-system
spec:
  podLabels:
    app.kubernetes.io/name: microsoft.flux

Apply it:

kubectl apply -f flux-system-exception.yaml

Then restart Flux:

kubectl rollout restart deploy -n flux-system

Verifying the Fix

Check NMI logs:

kubectl logs -n kube-system -l app.kubernetes.io/component=nmi --tail=200

Before the fix:

failed to get matching identities for pod flux-system/image-reflector-controller…

After the fix no errors are shown.

Flux logs then showed healthy behaviour:

successful scan: found 1 tags

Latest image tag … resolved to prod-8552309f0-20260512T090128

Image automation was back.

Final Thoughts

This issue is obscure, poorly documented, and easy to miss. If you’re running Flux on AKS and see IMDS or ACR authentication failures, check for:

  • NMI pods still running
  • MIC missing
  • Old AAD Pod Identity CRDs
  • Missing or incorrect AzurePodIdentityException objects

Hopefully this post saves someone else the hours of digging it took to unravel this.

Leave a comment