When AAD Pod Identity Breaks Flux: A Deep Dive into a Hidden AKS Failure Mode
If you run AKS long enough, you eventually meet the ghosts of deprecated features past.
This is the story of how AAD Pod Identity, long deprecated but still lurking in many clusters, silently broke Flux image automation, and how to diagnose and fix it when it happens to you.
This post documents the symptoms, the root cause, and the exact steps required to recover a cluster stuck in this half‑installed, half‑removed state.
The Setup
This cluster was:
- Running AKS with the old AAD Pod Identity Helm chart still installed
- Missing MIC (Managed Identity Controller)
- Still running NMI (Node Managed Identity) on every node
- Running Flux v2 with image automation enabled
- Using ACR with a node‑assigned managed identity
This combination is more common than you’d think—especially in clusters upgraded over several years.
The Symptoms
Flux’s image-reflector-controller began failing with IMDS authentication errors:
failed to configure authentication options: failed to create provider access token for the controller: ManagedIdentityCredential: context deadline exceeded
NMI logs showed:
failed to get matching identities for pod: flux-system/image-reflector-controller…
Flux could not:
- Query ACR
- Populate its tag database
- Resolve image policies
Image automation was effectively dead.
Diagnosis: The Smoking Gun
Running kubectl get pods -n kube-system | grep nmi revealed dozens of NMI pods, all running for hundreds of days.
kubectl get crd | grep aadpodidentity revealed AAD Pod Identity CRDs still installed.
But running kubectl get pods -n kube-system | grep mic showed no MIC pods at all.
This is the broken state:
- NMI intercepts IMDS calls
- MIC is missing, so identities are never assigned
- Every IMDS call from a pod fails
- Flux cannot authenticate to ACR
This is where the cluster was stuck.
Why This Breaks Flux
Flux’s controllers authenticate to ACR using the node’s managed identity via IMDS.
But NMI intercepts IMDS calls and tries to look up a pod’s assigned identity.
With MIC missing, NMI always returns:
no AzureAssignedIdentity found
Flux never reaches IMDS → never reaches ACR → never sees image tags.
The Fix: AzurePodIdentityException
The solution is to tell NMI:
“Do not intercept IMDS calls for Flux pods.”
This is done using an AzurePodIdentityException.
But here’s the twist:
AAD Pod Identity has multiple CRD schemas depending on the version, and this cluster was running a very old one.
Attempts using modern fields like:
spec:
podSelector:
or
spec:
PodLabels:
were rejected.
The correct schema, discovered by inspecting existing exceptions (kubectl get azurepodidentityexception -n flux-system -o yaml), was:
spec:
podLabels:
<label>: <value>
I discovered a couple of existing exception for Flux that didn’t cover the label used by the image-reflector-controller pod:
spec:
podLabels:
app.kubernetes.io/name: flux-extension
Creating the Correct Exception
I inspected the labels of the pods in the flux-system namespace:
kubectl get pods -n flux-system --show-labels
Flux controllers all share this label:
app.kubernetes.io/name=microsoft.flux
So the correct exception is:
apiVersion: aadpodidentity.k8s.io/v1
kind: AzurePodIdentityException
metadata:
name: flux-system-exception
namespace: flux-system
spec:
podLabels:
app.kubernetes.io/name: microsoft.flux
Apply it:
kubectl apply -f flux-system-exception.yaml
Then restart Flux:
kubectl rollout restart deploy -n flux-system
Verifying the Fix
Check NMI logs:
kubectl logs -n kube-system -l app.kubernetes.io/component=nmi --tail=200
Before the fix:
failed to get matching identities for pod flux-system/image-reflector-controller…
After the fix no errors are shown.
Flux logs then showed healthy behaviour:
successful scan: found 1 tags
Latest image tag … resolved to prod-8552309f0-20260512T090128
Image automation was back.
Final Thoughts
This issue is obscure, poorly documented, and easy to miss. If you’re running Flux on AKS and see IMDS or ACR authentication failures, check for:
- NMI pods still running
- MIC missing
- Old AAD Pod Identity CRDs
- Missing or incorrect
AzurePodIdentityExceptionobjects
Hopefully this post saves someone else the hours of digging it took to unravel this.
Leave a comment