Building a Centralised Azure Container Registry: A Platform Engineering Guide
When building container-based applications on Azure, you’ll inevitably need to decide how to manage your container images. Azure Container Registry (ACR) is the natural choice for Azure workloads, but the real question is whether to create multiple registries across teams and environments, or build a single, centralised registry that serves your entire organisation.
In this comprehensive guide, I’ll walk you through building a centralised Azure Container Registry that serves as the single source of truth for container images across your organisation. We’ll explore the architectural considerations, implement the infrastructure using Terraform with security best practices baked in, and create GitHub Actions workflows that integrate seamlessly with your registry.
Why Centralise Your Container Registry?
Before we dive into the implementation, it’s worth understanding why a centralised approach often makes more sense than having multiple registries scattered across your Azure subscriptions.
The most compelling reason is cost efficiency. Azure Container Registry pricing is based on storage and data egress, and consolidating images into a single registry reduces both duplication and management overhead. When multiple teams build similar base images or share common dependencies, a centralised registry with proper namespace organisation prevents storing the same layers multiple times.
Security and governance become significantly simpler with centralisation. Rather than managing access policies, network rules, and compliance requirements across multiple registries, you have a single point of control. This doesn’t mean sacrificing isolation. Azure Container Registry supports repository-scoped permissions that allow you to maintain strict boundaries between teams whilst sharing the infrastructure.
From an operational perspective, centralisation makes your platform easier to reason about. Your development teams know exactly where to push and pull images. Your security team has one place to scan for vulnerabilities. Your cost reporting becomes clearer. The cognitive load decreases as the architecture becomes more predictable.
Architectural Considerations
When designing a centralised container registry, several architectural decisions will shape your implementation. These aren’t merely technical choices, they reflect how your organisation operates and how your platform serves its users.
Network Topology and Private Endpoints
The first decision revolves around network accessibility. Whilst Azure Container Registry can operate with public endpoints, production environments almost universally require private network access. Using Azure Private Link, your ACR becomes accessible only through private endpoints within your virtual networks, ensuring container images never traverse the public internet.
This creates an interesting challenge: how do GitHub Actions runners, which operate outside your Azure network, push images to a private registry? The answer involves a hybrid approach where you maintain controlled public access for specific operations whilst keeping the registry predominantly private. We’ll implement this using network rules that allow GitHub’s IP ranges for push operations whilst restricting pulls to your private network.
Multi-Tenancy and Repository Organisation
A centralised registry serves multiple teams, and establishing a clear organisational structure from the start prevents chaos as adoption grows. The pattern I recommend involves using repository namespaces that mirror your organisational structure. For example, images might follow a structure like teamname/application/service:tag rather than a flat namespace.
This provides natural isolation and makes it immediately clear who owns which images. Combined with Azure RBAC at the repository level, you can ensure teams can only push to their namespaces whilst potentially allowing broader pull access across the organisation.
SKU Selection and Geo-Replication
Azure Container Registry offers three SKUs: Basic, Standard, and Premium. For a production centralised registry, Premium is almost always the right choice. Beyond the increased storage and throughput, Premium unlocks critical features like geo-replication, customer-managed keys, and dedicated data endpoints.
Geo-replication deserves particular attention. If your AKS clusters or Container Apps exist in multiple Azure regions, replicating your registry to those regions dramatically improves image pull performance and resilience. The registry automatically routes pull requests to the nearest replica, reducing latency and eliminating cross-region data egress charges for pulls.
Infrastructure as Code with Terraform
Let’s translate these architectural principles into actual infrastructure. We’ll build this incrementally, starting with the core registry and progressively adding security layers.
Core Registry Infrastructure
Our foundation begins with a Premium SKU registry configured for high availability. Create a new file called acr.tf:
# Resource group for the centralised container registry
resource "azurerm_resource_group" "acr" {
name = "rg-acr-${var.environment}-${var.location}"
location = var.location
tags = {
environment = var.environment
managed_by = "terraform"
purpose = "centralised-container-registry"
}
}
# Premium ACR with enhanced features enabled
resource "azurerm_container_registry" "main" {
name = "acr${var.organisation}${var.environment}"
resource_group_name = azurerm_resource_group.acr.name
location = azurerm_resource_group.acr.location
sku = "Premium"
admin_enabled = false
# Enable anonymous pull for public base images (optional)
anonymous_pull_enabled = false
# Network access default action
public_network_access_enabled = true
network_rule_bypass_option = "AzureServices"
# Enable zone redundancy for high availability
zone_redundancy_enabled = true
# Encryption with customer-managed keys
encryption {
enabled = true
key_vault_key_id = azurerm_key_vault_key.acr_encryption.id
identity_client_id = azurerm_user_assigned_identity.acr_encryption.client_id
}
# Enable the retention policy for untagged manifests
retention_policy {
days = 7
enabled = true
}
# Trust policy for content trust
trust_policy {
enabled = true
}
# Quarantine policy to scan images before making them available
quarantine_policy {
enabled = true
}
identity {
type = "UserAssigned"
identity_ids = [
azurerm_user_assigned_identity.acr_encryption.id
]
}
tags = {
environment = var.environment
managed_by = "terraform"
}
depends_on = [
azurerm_key_vault_access_policy.acr_encryption
]
}
# Geo-replication to additional regions
resource "azurerm_container_registry_replication" "replicas" {
for_each = toset(var.replication_regions)
name = each.value
container_registry_name = azurerm_container_registry.main.name
resource_group_name = azurerm_resource_group.acr.name
location = each.value
zone_redundancy_enabled = true
regional_endpoint_enabled = true
tags = {
environment = var.environment
managed_by = "terraform"
}
}
This configuration establishes several important security baselines. Admin credentials are explicitly disabled, forcing all authentication to use Azure AD identities. Zone redundancy ensures the registry remains available even if an availability zone fails. The retention policy automatically cleans up untagged manifests after seven days, preventing storage bloat from ephemeral CI builds.
Encryption and Key Management
Security-conscious organisations require customer-managed encryption keys rather than Microsoft-managed keys. This gives you complete control over key rotation and access. Create encryption.tf:
# User-assigned managed identity for ACR encryption
resource "azurerm_user_assigned_identity" "acr_encryption" {
name = "id-acr-encryption-${var.environment}"
resource_group_name = azurerm_resource_group.acr.name
location = azurerm_resource_group.acr.location
tags = {
environment = var.environment
managed_by = "terraform"
}
}
# Key Vault for storing encryption keys
resource "azurerm_key_vault" "acr" {
name = "kv-acr-${var.environment}-${random_string.key_vault_suffix.result}"
location = azurerm_resource_group.acr.location
resource_group_name = azurerm_resource_group.acr.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "premium"
soft_delete_retention_days = 90
purge_protection_enabled = true
# Network rules to restrict access
network_acls {
bypass = "AzureServices"
default_action = "Deny"
ip_rules = var.admin_ip_allowlist
}
tags = {
environment = var.environment
managed_by = "terraform"
}
}
# Random suffix to ensure Key Vault name uniqueness
resource "random_string" "key_vault_suffix" {
length = 4
special = false
upper = false
}
# Access policy for the ACR managed identity
resource "azurerm_key_vault_access_policy" "acr_encryption" {
key_vault_id = azurerm_key_vault.acr.id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.acr_encryption.principal_id
key_permissions = [
"Get",
"UnwrapKey",
"WrapKey"
]
}
# Access policy for Terraform service principal
resource "azurerm_key_vault_access_policy" "terraform" {
key_vault_id = azurerm_key_vault.acr.id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = data.azurerm_client_config.current.object_id
key_permissions = [
"Get",
"Create",
"Delete",
"List",
"Purge",
"Recover",
"GetRotationPolicy",
"SetRotationPolicy"
]
}
# Customer-managed encryption key
resource "azurerm_key_vault_key" "acr_encryption" {
name = "acr-encryption-key"
key_vault_id = azurerm_key_vault.acr.id
key_type = "RSA"
key_size = 2048
key_opts = [
"decrypt",
"encrypt",
"sign",
"unwrapKey",
"verify",
"wrapKey"
]
rotation_policy {
automatic {
time_before_expiry = "P30D"
}
expire_after = "P90D"
notify_before_expiry = "P29D"
}
depends_on = [
azurerm_key_vault_access_policy.terraform
]
}
data "azurerm_client_config" "current" {}
This setup demonstrates defence in depth. The Key Vault itself sits behind network restrictions, accepting connections only from Azure services and specified admin IPs. Soft delete with purge protection prevents accidental key deletion from destroying your encrypted data. The automatic rotation policy ensures keys are refreshed every 90 days without manual intervention.
Network Security and Private Endpoints
Now we implement the network security layer that restricts registry access to your private networks whilst allowing controlled access from GitHub Actions. Create networking.tf:
# Private DNS zone for ACR
resource "azurerm_private_dns_zone" "acr" {
name = "privatelink.azurecr.io"
resource_group_name = azurerm_resource_group.acr.name
tags = {
environment = var.environment
managed_by = "terraform"
}
}
# Link DNS zone to hub VNet
resource "azurerm_private_dns_zone_virtual_network_link" "acr_hub" {
name = "acr-hub-link"
resource_group_name = azurerm_resource_group.acr.name
private_dns_zone_name = azurerm_private_dns_zone.acr.name
virtual_network_id = var.hub_vnet_id
registration_enabled = false
tags = {
environment = var.environment
managed_by = "terraform"
}
}
# Link DNS zone to AKS spoke VNets
resource "azurerm_private_dns_zone_virtual_network_link" "acr_aks_spokes" {
for_each = var.aks_spoke_vnet_ids
name = "acr-aks-${each.key}-link"
resource_group_name = azurerm_resource_group.acr.name
private_dns_zone_name = azurerm_private_dns_zone.acr.name
virtual_network_id = each.value
registration_enabled = false
tags = {
environment = var.environment
managed_by = "terraform"
}
}
# Subnet for ACR private endpoint
resource "azurerm_subnet" "acr_endpoints" {
name = "snet-acr-endpoints"
resource_group_name = var.hub_vnet_resource_group
virtual_network_name = var.hub_vnet_name
address_prefixes = [var.acr_endpoint_subnet_cidr]
private_endpoint_network_policies_enabled = false
}
# Private endpoint for ACR
resource "azurerm_private_endpoint" "acr" {
name = "pe-acr-${var.environment}"
location = azurerm_resource_group.acr.location
resource_group_name = azurerm_resource_group.acr.name
subnet_id = azurerm_subnet.acr_endpoints.id
private_service_connection {
name = "acr-private-connection"
private_connection_resource_id = azurerm_container_registry.main.id
is_manual_connection = false
subresource_names = ["registry"]
}
private_dns_zone_group {
name = "acr-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.acr.id]
}
tags = {
environment = var.environment
managed_by = "terraform"
}
}
# Network rules for ACR - allow GitHub Actions
resource "azurerm_container_registry_network_rule_set" "main" {
container_registry_id = azurerm_container_registry.main.id
default_action = "Deny"
# Allow GitHub Actions IP ranges for image push
ip_rule = [
for cidr in var.github_actions_ip_ranges : {
action = "Allow"
ip_range = cidr
}
]
# Allow Azure services
virtual_network = []
}
This network configuration creates a crucial security boundary. The private endpoint places your registry inside your virtual network, making it accessible to AKS clusters and Container Apps through private IP addresses. The private DNS zone ensures that when your applications resolve the registry’s hostname, they receive the private IP rather than the public one.
The network rule set demonstrates the hybrid approach I mentioned earlier. By allowing GitHub’s IP ranges whilst denying everything else by default, we enable CI/CD workflows to push images whilst preventing unauthorised access. In production, you might further restrict this by using self-hosted GitHub Actions runners within your network, eliminating the need for public access entirely.
Identity and Access Management
Proper access control ensures teams can push to their namespaces whilst the registry remains secure. Create iam.tf:
# Service principal for GitHub Actions with OIDC federated credentials
resource "azuread_application" "github_actions" {
display_name = "sp-acr-github-actions-${var.environment}"
}
resource "azuread_service_principal" "github_actions" {
client_id = azuread_application.github_actions.client_id
}
# Federated identity credential for GitHub OIDC (main branch)
resource "azuread_application_federated_identity_credential" "github_actions_main" {
application_id = azuread_application.github_actions.id
display_name = "github-actions-main-branch"
description = "Federated credential for GitHub Actions main branch"
audiences = ["api://AzureADTokenExchange"]
issuer = "https://token.actions.githubusercontent.com"
subject = "repo:${var.github_org}/${var.github_repo}:ref:refs/heads/main"
}
# Federated identity credential for GitHub OIDC (pull requests)
resource "azuread_application_federated_identity_credential" "github_actions_pr" {
application_id = azuread_application.github_actions.id
display_name = "github-actions-pull-requests"
description = "Federated credential for GitHub Actions pull requests"
audiences = ["api://AzureADTokenExchange"]
issuer = "https://token.actions.githubusercontent.com"
subject = "repo:${var.github_org}/${var.github_repo}:pull_request"
}
# Federated identity credential for GitHub OIDC (develop branch)
resource "azuread_application_federated_identity_credential" "github_actions_develop" {
application_id = azuread_application.github_actions.id
display_name = "github-actions-develop-branch"
description = "Federated credential for GitHub Actions develop branch"
audiences = ["api://AzureADTokenExchange"]
issuer = "https://token.actions.githubusercontent.com"
subject = "repo:${var.github_org}/${var.github_repo}:ref:refs/heads/develop"
}
# Role assignment for GitHub Actions - push access
resource "azurerm_role_assignment" "github_actions_push" {
scope = azurerm_container_registry.main.id
role_definition_name = "AcrPush"
principal_id = azuread_service_principal.github_actions.object_id
}
# Managed identities for AKS clusters
resource "azurerm_user_assigned_identity" "aks_clusters" {
for_each = var.aks_clusters
name = "id-aks-${each.key}-${var.environment}"
resource_group_name = azurerm_resource_group.acr.name
location = azurerm_resource_group.acr.location
tags = {
environment = var.environment
managed_by = "terraform"
cluster = each.key
}
}
# Role assignment for AKS - pull access
resource "azurerm_role_assignment" "aks_pull" {
for_each = azurerm_user_assigned_identity.aks_clusters
scope = azurerm_container_registry.main.id
role_definition_name = "AcrPull"
principal_id = each.value.principal_id
}
# Managed identities for Container Apps
resource "azurerm_user_assigned_identity" "container_apps" {
for_each = var.container_app_environments
name = "id-containerapp-${each.key}-${var.environment}"
resource_group_name = azurerm_resource_group.acr.name
location = azurerm_resource_group.acr.location
tags = {
environment = var.environment
managed_by = "terraform"
app_env = each.key
}
}
# Role assignment for Container Apps - pull access
resource "azurerm_role_assignment" "container_apps_pull" {
for_each = azurerm_user_assigned_identity.container_apps
scope = azurerm_container_registry.main.id
role_definition_name = "AcrPull"
principal_id = each.value.principal_id
}
# Custom role for team-specific repository access (example)
resource "azurerm_role_definition" "team_repository_contributor" {
name = "ACR Team Repository Contributor"
scope = azurerm_container_registry.main.id
permissions {
actions = [
"Microsoft.ContainerRegistry/registries/pull/read",
"Microsoft.ContainerRegistry/registries/push/write",
"Microsoft.ContainerRegistry/registries/artifacts/delete"
]
data_actions = [
"Microsoft.ContainerRegistry/registries/*/read",
"Microsoft.ContainerRegistry/registries/*/write",
"Microsoft.ContainerRegistry/registries/*/delete"
]
}
assignable_scopes = [
azurerm_container_registry.main.id
]
}
This IAM configuration follows the principle of least privilege. GitHub Actions receives only push permissions, not the ability to delete or modify the registry itself. AKS clusters and Container Apps receive pull-only access since they never need to push images. The custom role definition provides a template for team-specific access that can be scoped to particular repositories.
Monitoring and Diagnostics
Observability is crucial for a centralised service that multiple teams depend upon. Create monitoring.tf:
# Log Analytics workspace for ACR diagnostics
resource "azurerm_log_analytics_workspace" "acr" {
name = "log-acr-${var.environment}"
location = azurerm_resource_group.acr.location
resource_group_name = azurerm_resource_group.acr.name
sku = "PerGB2018"
retention_in_days = 30
tags = {
environment = var.environment
managed_by = "terraform"
}
}
# Diagnostic settings for ACR
resource "azurerm_monitor_diagnostic_setting" "acr" {
name = "acr-diagnostics"
target_resource_id = azurerm_container_registry.main.id
log_analytics_workspace_id = azurerm_log_analytics_workspace.acr.id
enabled_log {
category = "ContainerRegistryRepositoryEvents"
}
enabled_log {
category = "ContainerRegistryLoginEvents"
}
metric {
category = "AllMetrics"
enabled = true
}
}
# Alert for failed authentication attempts
resource "azurerm_monitor_metric_alert" "auth_failures" {
name = "acr-auth-failures-${var.environment}"
resource_group_name = azurerm_resource_group.acr.name
scopes = [azurerm_container_registry.main.id]
description = "Alert when ACR authentication failures exceed threshold"
severity = 2
frequency = "PT5M"
window_size = "PT15M"
criteria {
metric_namespace = "Microsoft.ContainerRegistry/registries"
metric_name = "TotalPullCount"
aggregation = "Total"
operator = "GreaterThan"
threshold = 100
dimension {
name = "StatusCode"
operator = "Include"
values = ["401", "403"]
}
}
action {
action_group_id = var.security_alert_action_group_id
}
}
# Alert for storage usage
resource "azurerm_monitor_metric_alert" "storage_usage" {
name = "acr-storage-usage-${var.environment}"
resource_group_name = azurerm_resource_group.acr.name
scopes = [azurerm_container_registry.main.id]
description = "Alert when ACR storage usage exceeds 80%"
severity = 3
frequency = "PT1H"
window_size = "PT1H"
criteria {
metric_namespace = "Microsoft.ContainerRegistry/registries"
metric_name = "StorageUsed"
aggregation = "Average"
operator = "GreaterThan"
threshold = var.storage_alert_threshold_bytes
}
action {
action_group_id = var.platform_alert_action_group_id
}
}
# Workbook for ACR usage analytics
resource "azurerm_application_insights_workbook" "acr_usage" {
name = "workbook-acr-usage-${var.environment}"
resource_group_name = azurerm_resource_group.acr.name
location = azurerm_resource_group.acr.location
display_name = "ACR Usage Analytics"
data_json = jsonencode({
version = "Notebook/1.0"
items = [
{
type = 1
content = {
json = "## Container Registry Usage Analytics\n\nThis workbook provides insights into registry usage, image pulls, authentication events, and storage consumption."
}
},
{
type = 3
content = {
version = "KqlItem/1.0"
query = "ContainerRegistryRepositoryEvents\n| where TimeGenerated > ago(7d)\n| summarize PullCount = countif(OperationName == 'Pull'), PushCount = countif(OperationName == 'Push') by bin(TimeGenerated, 1h), Repository\n| render timechart"
size = 0
title = "Pull and Push Operations by Repository (Last 7 Days)"
queryType = 0
resourceType = "microsoft.operationalinsights/workspaces"
}
}
]
})
tags = {
environment = var.environment
managed_by = "terraform"
}
}
These monitoring resources provide comprehensive visibility into registry operations. Authentication failures might indicate misconfigured applications or potential security incidents. Storage usage alerts prevent unexpected cost increases. The workbook gives platform teams an at-a-glance view of which repositories are most active and how the registry is being used.
Variables and Outputs
To make this Terraform configuration reusable, create variables.tf:
variable "environment" {
description = "Environment name (e.g., prod, staging)"
type = string
}
variable "location" {
description = "Primary Azure region"
type = string
default = "uksouth"
}
variable "organisation" {
description = "Organisation name for resource naming"
type = string
}
variable "replication_regions" {
description = "Additional regions for ACR geo-replication"
type = list(string)
default = ["ukwest", "northeurope"]
}
variable "hub_vnet_id" {
description = "Resource ID of hub VNet"
type = string
}
variable "hub_vnet_name" {
description = "Name of hub VNet"
type = string
}
variable "hub_vnet_resource_group" {
description = "Resource group of hub VNet"
type = string
}
variable "acr_endpoint_subnet_cidr" {
description = "CIDR block for ACR private endpoint subnet"
type = string
}
variable "aks_spoke_vnet_ids" {
description = "Map of AKS spoke VNet IDs for DNS zone linking"
type = map(string)
default = {}
}
variable "aks_clusters" {
description = "Map of AKS cluster names for managed identity creation"
type = map(string)
default = {}
}
variable "container_app_environments" {
description = "Map of Container App environment names"
type = map(string)
default = {}
}
variable "github_actions_ip_ranges" {
description = "IP ranges for GitHub Actions runners"
type = list(string)
default = [
# These are example ranges - use actual GitHub IP ranges
"140.82.112.0/20",
"143.55.64.0/20",
"185.199.108.0/22",
"192.30.252.0/22"
]
}
variable "admin_ip_allowlist" {
description = "IP addresses allowed to access Key Vault"
type = list(string)
}
variable "security_alert_action_group_id" {
description = "Action group ID for security alerts"
type = string
}
variable "platform_alert_action_group_id" {
description = "Action group ID for platform alerts"
type = string
}
variable "storage_alert_threshold_bytes" {
description = "Storage threshold for alerts in bytes"
type = number
default = 1099511627776 # 1TB
}
variable "github_org" {
description = "GitHub organisation name"
type = string
}
variable "github_repo" {
description = "GitHub repository name"
type = string
}
And create outputs.tf:
output "registry_name" {
description = "Name of the container registry"
value = azurerm_container_registry.main.name
}
output "registry_id" {
description = "Resource ID of the container registry"
value = azurerm_container_registry.main.id
}
output "registry_login_server" {
description = "Login server URL for the registry"
value = azurerm_container_registry.main.login_server
}
output "github_actions_client_id" {
description = "Client ID for GitHub Actions service principal"
value = azuread_application.github_actions.client_id
}
output "github_actions_tenant_id" {
description = "Azure AD tenant ID"
value = data.azurerm_client_config.current.tenant_id
}
output "github_actions_subscription_id" {
description = "Azure subscription ID"
value = data.azurerm_client_config.current.subscription_id
}
output "aks_identity_ids" {
description = "Map of AKS managed identity resource IDs"
value = {
for k, v in azurerm_user_assigned_identity.aks_clusters : k => v.id
}
}
output "container_app_identity_ids" {
description = "Map of Container App managed identity resource IDs"
value = {
for k, v in azurerm_user_assigned_identity.container_apps : k => v.id
}
}
output "private_endpoint_ip" {
description = "Private IP address of the ACR endpoint"
value = azurerm_private_endpoint.acr.private_service_connection[0].private_ip_address
}
GitHub Actions Integration
With the infrastructure in place, let’s build GitHub Actions workflows that push images to our centralised registry. These workflows demonstrate production-ready practices including multi-stage builds, security scanning, and proper tagging strategies.
Reusable Docker Build and Push Workflow
Create .github/workflows/docker-build-push.yml as a reusable workflow:
name: Build and Push Docker Image
on:
workflow_call:
inputs:
image_name:
description: 'Name of the Docker image (without registry prefix)'
required: true
type: string
dockerfile_path:
description: 'Path to Dockerfile'
required: false
type: string
default: './Dockerfile'
context_path:
description: 'Build context path'
required: false
type: string
default: '.'
platforms:
description: 'Target platforms for multi-arch builds'
required: false
type: string
default: 'linux/amd64,linux/arm64'
push:
description: 'Whether to push the image'
required: false
type: boolean
default: true
scan_image:
description: 'Whether to scan for vulnerabilities'
required: false
type: boolean
default: true
secrets:
AZURE_CLIENT_ID:
required: true
AZURE_TENANT_ID:
required: true
AZURE_SUBSCRIPTION_ID:
required: true
ACR_LOGIN_SERVER:
required: true
outputs:
image_tag:
description: 'The full image tag that was built and pushed'
value: $
image_digest:
description: 'The image digest SHA'
value: $
permissions:
contents: read
security-events: write
id-token: write
jobs:
build:
runs-on: ubuntu-latest
outputs:
image_tag: $
image_digest: $
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Azure Login via OIDC
uses: azure/login@v1
with:
client-id: $
tenant-id: $
subscription-id: $
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
platforms: $
driver-opts: |
image=moby/buildkit:latest
network=host
- name: Log in to Azure Container Registry
run: |
az acr login --name $(echo $ | cut -d'.' -f1)
- name: Extract metadata for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: $/$
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern=
type=semver,pattern=.
type=semver,pattern=
type=sha,prefix=-
type=raw,value=latest,enable=
labels: |
org.opencontainers.image.title=$
org.opencontainers.image.description=Built by GitHub Actions
org.opencontainers.image.vendor=$
- name: Build and push Docker image
id: build
uses: docker/build-push-action@v5
with:
context: $
file: $
platforms: $
push: $
tags: $
labels: $
cache-from: type=registry,ref=$/$:buildcache
cache-to: type=registry,ref=$/$:buildcache,mode=max
provenance: true
sbom: true
build-args: |
BUILD_DATE=$
VERSION=$
REVISION=$
- name: Run Trivy vulnerability scanner
if: inputs.scan_image
uses: aquasecurity/trivy-action@master
with:
image-ref: $/$@$
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
ignore-unfixed: true
scanners: 'vuln,secret,config'
- name: Upload Trivy results to GitHub Security
if: inputs.scan_image && !cancelled()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: 'trivy-results.sarif'
- name: Generate build summary
if: always()
run: |
echo "## Docker Build Summary" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "**Image:** \`$/$\`" >> $GITHUB_STEP_SUMMARY
echo "**Tags:** $" >> $GITHUB_STEP_SUMMARY
echo "**Digest:** \`$\`" >> $GITHUB_STEP_SUMMARY
echo "**Platforms:** $" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "**Build Context:** \`$\`" >> $GITHUB_STEP_SUMMARY
echo "**Dockerfile:** \`$\`" >> $GITHUB_STEP_SUMMARY
This reusable workflow encapsulates all the best practices for building container images. The metadata action generates semantic tags automatically based on git references and semantic versioning. BuildKit’s cache management significantly speeds up subsequent builds by storing layer caches in the registry itself. The workflow generates both provenance attestations and SBOMs (Software Bill of Materials), which are increasingly important for supply chain security.
The Trivy scanner catches vulnerabilities before they reach production. By uploading results to GitHub Security, you centralise vulnerability management alongside your code. The build summary provides immediate feedback in the workflow run, making it easy to verify what was built.
Application-Specific Workflow
Now create a workflow that uses this reusable workflow for a specific application, .github/workflows/api-service.yml:
name: API Service - Build and Deploy
on:
push:
branches:
- main
- develop
paths:
- 'services/api/**'
- '.github/workflows/api-service.yml'
pull_request:
branches:
- main
- develop
paths:
- 'services/api/**'
workflow_dispatch:
inputs:
environment:
description: 'Deployment environment'
required: true
type: choice
options:
- development
- staging
- production
permissions:
contents: read
security-events: write
id-token: write
pull-requests: write
jobs:
build:
name: Build API Service Image
uses: ./.github/workflows/docker-build-push.yml
with:
image_name: 'platform/api-service'
dockerfile_path: './services/api/Dockerfile'
context_path: './services/api'
platforms: 'linux/amd64'
push: $
scan_image: true
secrets:
AZURE_CLIENT_ID: $
AZURE_TENANT_ID: $
AZURE_SUBSCRIPTION_ID: $
ACR_LOGIN_SERVER: $
deploy-dev:
name: Deploy to Development
needs: build
if: github.ref == 'refs/heads/develop' && github.event_name == 'push'
runs-on: ubuntu-latest
environment: development
steps:
- name: Azure Login
uses: azure/login@v1
with:
creds: $
- name: Deploy to Container Apps
uses: azure/container-apps-deploy-action@v1
with:
containerAppName: ca-api-service-dev
resourceGroup: rg-platform-dev
imageToDeploy: $
deploy-staging:
name: Deploy to Staging
needs: build
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
runs-on: ubuntu-latest
environment: staging
steps:
- name: Azure Login
uses: azure/login@v1
with:
creds: $
- name: Deploy to AKS
uses: azure/k8s-deploy@v4
with:
namespace: api-services
manifests: |
services/api/k8s/deployment.yaml
services/api/k8s/service.yaml
images: $
deploy-prod:
name: Deploy to Production
needs: [build, deploy-staging]
if: github.event.inputs.environment == 'production'
runs-on: ubuntu-latest
environment: production
steps:
- name: Azure Login
uses: azure/login@v1
with:
creds: $
- name: Deploy to AKS
uses: azure/k8s-deploy@v4
with:
namespace: api-services
manifests: |
services/api/k8s/deployment.yaml
services/api/k8s/service.yaml
images: $
- name: Create deployment record
run: |
echo "Deployed $ to production" >> deployment-log.txt
This workflow demonstrates a complete CI/CD pipeline. Pull requests trigger builds without pushing to verify the image builds successfully. Pushes to develop automatically deploy to the development environment. The main branch deploys to staging, whilst production deployments require manual approval through the workflow dispatch trigger.
Multi-Service Monorepo Workflow
For organisations managing multiple services in a monorepo, create .github/workflows/monorepo-build.yml:
name: Monorepo - Build All Services
on:
push:
branches:
- main
pull_request:
branches:
- main
permissions:
contents: read
security-events: write
id-token: write
jobs:
detect-changes:
name: Detect Service Changes
runs-on: ubuntu-latest
outputs:
api: $
web: $
worker: $
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v2
id: filter
with:
filters: |
api:
- 'services/api/**'
- 'shared/**'
web:
- 'services/web/**'
- 'shared/**'
worker:
- 'services/worker/**'
- 'shared/**'
build-api:
name: Build API Service
needs: detect-changes
if: needs.detect-changes.outputs.api == 'true'
uses: ./.github/workflows/docker-build-push.yml
with:
image_name: 'platform/api-service'
dockerfile_path: './services/api/Dockerfile'
context_path: '.'
platforms: 'linux/amd64'
push: $
secrets:
AZURE_CLIENT_ID: $
AZURE_TENANT_ID: $
AZURE_SUBSCRIPTION_ID: $
ACR_LOGIN_SERVER: $
build-web:
name: Build Web Service
needs: detect-changes
if: needs.detect-changes.outputs.web == 'true'
uses: ./.github/workflows/docker-build-push.yml
with:
image_name: 'platform/web-service'
dockerfile_path: './services/web/Dockerfile'
context_path: '.'
platforms: 'linux/amd64'
push: $
secrets:
AZURE_CLIENT_ID: $
AZURE_TENANT_ID: $
AZURE_SUBSCRIPTION_ID: $
ACR_LOGIN_SERVER: $
build-worker:
name: Build Worker Service
needs: detect-changes
if: needs.detect-changes.outputs.worker == 'true'
uses: ./.github/workflows/docker-build-push.yml
with:
image_name: 'platform/worker-service'
dockerfile_path: './services/worker/Dockerfile'
context_path: '.'
platforms: 'linux/amd64'
push: $
secrets:
AZURE_CLIENT_ID: $
AZURE_TENANT_ID: $
AZURE_SUBSCRIPTION_ID: $
ACR_LOGIN_SERVER: $
This workflow uses path filtering to detect which services have changed, building only what’s necessary. This dramatically reduces CI/CD time for large monorepos where a change in one service doesn’t affect others.
Integrating with AKS Clusters
Your AKS clusters need proper configuration to pull from the centralised registry. The key is using managed identities rather than image pull secrets, which eliminates credential management headaches.
AKS Cluster Configuration
When creating your AKS cluster, attach the managed identity we created earlier:
resource "azurerm_kubernetes_cluster" "main" {
name = "aks-${var.cluster_name}-${var.environment}"
location = var.location
resource_group_name = var.resource_group_name
dns_prefix = "aks-${var.cluster_name}"
default_node_pool {
name = "system"
node_count = 3
vm_size = "Standard_D4s_v5"
# Critical for private registry access
vnet_subnet_id = var.aks_subnet_id
}
# Use the managed identity that has AcrPull permissions
identity {
type = "UserAssigned"
identity_ids = [var.acr_pull_identity_id]
}
# Attach to the ACR
kubelet_identity {
client_id = var.acr_pull_identity_client_id
object_id = var.acr_pull_identity_object_id
user_assigned_identity_id = var.acr_pull_identity_id
}
network_profile {
network_plugin = "azure"
network_policy = "calico"
dns_service_ip = "10.2.0.10"
service_cidr = "10.2.0.0/16"
load_balancer_sku = "standard"
}
# Enable private cluster for enhanced security
private_cluster_enabled = true
}
With this configuration, your pods can reference images directly without any image pull secrets:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
namespace: api-services
spec:
replicas: 3
selector:
matchLabels:
app: api-service
template:
metadata:
labels:
app: api-service
spec:
containers:
- name: api
image: acrorganisationprod.azurecr.io/platform/api-service:v1.2.3
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
The kubelet automatically authenticates to ACR using the managed identity, pulling images seamlessly over the private endpoint.
Integrating with Azure Container Apps
Container Apps integration is similarly straightforward when using managed identities:
resource "azurerm_container_app" "api" {
name = "ca-api-service-${var.environment}"
container_app_environment_id = var.container_app_environment_id
resource_group_name = var.resource_group_name
revision_mode = "Single"
identity {
type = "UserAssigned"
identity_ids = [var.acr_pull_identity_id]
}
registry {
server = var.acr_login_server
identity = var.acr_pull_identity_id
}
template {
container {
name = "api-service"
image = "${var.acr_login_server}/platform/api-service:latest"
cpu = 0.5
memory = "1Gi"
env {
name = "ASPNETCORE_ENVIRONMENT"
value = var.environment
}
}
min_replicas = 1
max_replicas = 10
}
ingress {
external_enabled = true
target_port = 8080
traffic_weight {
latest_revision = true
percentage = 100
}
}
}
The Container App uses the managed identity to authenticate with ACR automatically. Because the Container App environment is connected to your VNet, it pulls images through the private endpoint.
Security Best Practices in Practice
Throughout this implementation, we’ve embedded security best practices at every layer. Let me highlight the critical security decisions and why they matter.
Defence in Depth
The architecture implements multiple security layers. Even if an attacker breaches one layer, others remain intact. The registry sits behind network restrictions and private endpoints. Access requires Azure AD authentication with role-based access control. Encryption protects data at rest using customer-managed keys. Each layer independently contributes to security.
Least Privilege Access
No identity receives more permissions than absolutely necessary. GitHub Actions can push but not delete. AKS clusters can pull but not push. Teams can access only their namespaces. The principle of least privilege minimises the blast radius if credentials are compromised.
Immutable Infrastructure
Whilst the Terraform code doesn’t explicitly show this, consider implementing repository locks and retention policies that prevent tag overwrites. Once you tag an image as v1.2.3, that tag should never change. This immutability ensures reproducible deployments and prevents malicious tag replacement.
Audit Logging
Every operation against the registry is logged to Log Analytics. Authentication attempts, image pulls, image pushes, configuration changes—all are recorded with timestamps and identity information. These logs are crucial for security investigations and compliance requirements.
Vulnerability Scanning
The GitHub Actions workflow scans every image before it enters the registry. Additionally, consider enabling Azure Defender for Container Registries, which continuously scans images even after they’re pushed and alerts you to newly discovered vulnerabilities.
Network Isolation
The private endpoints ensure registry traffic never touches the public internet when accessed from your Azure workloads. This prevents eavesdropping and man-in-the-middle attacks. The hybrid model for GitHub Actions is a pragmatic compromise, but self-hosted runners would eliminate it entirely.
Operational Considerations
Building the infrastructure is one thing; operating it successfully requires ongoing attention to several areas.
Image Lifecycle Management
Container registries grow quickly as CI/CD pipelines push images continuously. Without lifecycle management, you’ll accumulate thousands of untagged manifests and old images that nobody uses. The retention policy we configured helps, but consider implementing more sophisticated cleanup:
# Image retention task to clean up old images
resource "azurerm_container_registry_task" "cleanup" {
name = "cleanup-old-images"
container_registry_id = azurerm_container_registry.main.id
platform {
os = "Linux"
}
encoded_step {
task_content = base64encode(<<-EOT
version: v1.1.0
steps:
- cmd: acr purge --filter 'platform/.*:.*' --ago 90d --untagged
disableWorkingDirectoryOverride: true
timeout: 3600
EOT
)
}
timer_trigger {
name = "weekly"
schedule = "0 0 * * 0"
enabled = true
}
}
This ACR Task runs weekly, removing untagged manifests older than 90 days and images in the platform namespace that haven’t been pulled in 90 days. Adjust the filters and retention periods based on your organisation’s needs.
Cost Management
Container registries can become expensive if not monitored. The primary costs come from storage and geo-replication data egress. Monitor your storage usage trends and investigate unexpected growth. Consider whether you need geo-replication to all regions or if strategic placement in key regions suffices.
Enable Azure Cost Management alerts to notify you when registry costs exceed expected thresholds. Tag images with team or project information so you can attribute costs accurately.
Disaster Recovery
Your container registry becomes a critical dependency. If it’s unavailable, deployments fail and existing pods that need to pull images can’t start. Geo-replication provides regional redundancy, but also implement backup strategies:
resource "azurerm_backup_policy_file_share" "acr" {
name = "acr-backup-policy"
resource_group_name = azurerm_resource_group.acr.name
recovery_vault_name = var.recovery_vault_name
backup {
frequency = "Daily"
time = "23:00"
}
retention_daily {
count = 30
}
retention_weekly {
count = 12
weekdays = ["Sunday"]
}
retention_monthly {
count = 12
weekdays = ["Sunday"]
weeks = ["First"]
}
}
Test your disaster recovery procedures regularly. Can you restore the registry if it’s accidentally deleted? How quickly can you recover? Document the procedures so any team member can execute them under pressure.
Performance Optimisation
Image pull performance directly affects deployment speed and pod startup time. Several factors influence performance:
Geo-replication places registry replicas close to your workloads, reducing latency. The regional endpoint feature we enabled routes requests to the nearest replica automatically.
Layer caching in your Dockerfiles minimises rebuild time. Structure your Dockerfiles so frequently changing layers appear near the end, allowing Docker to reuse cached layers for earlier steps.
Multi-stage builds reduce final image size by excluding build tools and intermediate artefacts from the runtime image. Smaller images pull faster and consume less storage.
Consider implementing a registry cache or pull-through cache for external base images. This prevents pulling the same base image from Docker Hub repeatedly, reducing external bandwidth usage and improving reliability.
Team Adoption and Documentation
A centralised registry serves multiple teams, and successful adoption requires clear documentation and communication.
Developer Documentation
Create comprehensive documentation that answers common questions developers will have. Document the registry URL, authentication methods, naming conventions, and how to troubleshoot common issues. Include examples for different use cases.
The key constraint to communicate clearly is that our registry is configured for maximum security with admin credentials disabled and network access restricted to private endpoints. This means local development workflows differ from traditional registry patterns.
For local development and testing:
With our security-first configuration (private endpoints, network deny-by-default), developers working from their local machines cannot directly access the registry. The network rules only permit GitHub Actions IP ranges and private endpoint access from within Azure.
This is intentional and represents a security best practice, but it requires developers to adapt their workflows:
Option 1: Build and test locally without registry access (recommended)
# Build locally with a temporary tag
docker build -t myapp:local-test .
# Test locally
docker run -p 8080:8080 myapp:local-test
# When ready, commit and push code
git add .
git commit -m "feat: add new feature"
git push origin feature/my-branch
# Let GitHub Actions build and push to ACR
This is the standard workflow for secure production registries. Developers never interact with the registry directly—they build and test locally, then CI/CD handles all registry operations.
Option 2: VPN access for pulling images
For developers who need to pull production images for debugging or testing, provide VPN access to your Azure network:
# 1. Connect to corporate VPN that routes to Azure VNet
# 2. Authenticate using Azure AD
az login
az acr login --name acrorganisationprod
# 3. Pull images (pulling works, pushing still requires appropriate RBAC)
docker pull acrorganisationprod.azurecr.io/platform/api-service:latest
# 4. Run locally for debugging
docker run -p 8080:8080 acrorganisationprod.azurecr.io/platform/api-service:latest
Once connected via VPN, developers can access the registry through the private endpoint. However, even with VPN access, pushing requires both network access and appropriate RBAC permissions (AcrPush role), which should remain restricted to CI/CD service principals.
Option 3: Temporary IP allowlist (not recommended)
For exceptional circumstances, platform teams could temporarily add a developer’s IP to the network rules:
# Temporary addition to variables.tf
variable "developer_ip_allowlist" {
description = "Temporary developer IPs (remove after use)"
type = list(string)
default = []
}
# In networking.tf, add to ip_rule
ip_rule = concat(
[for cidr in var.github_actions_ip_ranges : {
action = "Allow"
ip_range = cidr
}],
[for ip in var.developer_ip_allowlist : {
action = "Allow"
ip_range = ip
}]
)
This defeats the purpose of network security controls and should only be used as a last resort for troubleshooting.
The key message for developers: build and test locally, let CI/CD handle registry pushes. This pattern aligns with security best practices whilst maintaining developer productivity.
For CI/CD pipelines:
GitHub Actions workflows authenticate using workload identity federation with OIDC, which is far more secure than traditional client secrets. This approach uses short-lived tokens issued by GitHub that are trusted by Azure AD through federated credentials.
The authentication happens automatically through the azure/login action:
- name: Azure Login via OIDC
uses: azure/login@v1
with:
client-id: $
tenant-id: $
subscription-id: $
- name: Log in to Azure Container Registry
run: |
az acr login --name acrorganisationprod
Setting up GitHub repository secrets:
After deploying the Terraform configuration, add these secrets to your GitHub repository (Settings → Secrets and variables → Actions):
# These values come from Terraform outputs
AZURE_CLIENT_ID: <output from terraform>
AZURE_TENANT_ID: <output from terraform>
AZURE_SUBSCRIPTION_ID: <output from terraform>
ACR_LOGIN_SERVER: acrorganisationprod.azurecr.io
The Terraform configuration creates federated identity credentials for your main branch, develop branch, and pull requests. If you need additional branches or environments, add more azuread_application_federated_identity_credential resources.
Naming Conventions
Establish and document clear naming conventions. A suggested structure:
{registry}.azurecr.io/{team|org-unit}/{application}/{component}:{tag}
Examples:
acrorganisationprod.azurecr.io/platform/api-service:v1.2.3acrorganisationprod.azurecr.io/data-team/ingestion-worker:2024-01-15-abc123acrorganisationprod.azurecr.io/mobile/ios-backend:release-candidate
Consistent naming makes it easy to understand image ownership, apply automation, and manage access policies.
Onboarding Process
Create a streamlined onboarding process for new teams or projects. This might involve:
- Requesting a namespace in the registry (e.g.,
teamname/) - Creating a managed identity with pull access
- Granting the team push access to their namespace
- Providing GitHub Actions secrets or service principal credentials
- Adding the team to monitoring and alerting for their images
Automate this process where possible. A simple internal portal or infrastructure-as-code template that provisions these resources reduces friction and ensures consistency.
Troubleshooting Common Issues
Even with careful implementation, teams will encounter issues. Here are solutions to common problems.
Authentication Failures
Symptom: unauthorized: authentication required or unauthorized: access denied
Causes:
- Expired service principal credentials
- Incorrect RBAC assignments
- Network rules blocking the request
- Managed identity not properly attached to AKS or Container Apps
Resolution: Check the diagnostic logs in Log Analytics to see the specific authentication failure. Verify that the identity being used has the appropriate role assignment. For AKS, ensure the kubelet identity is correctly configured. For Container Apps, verify the registry identity matches the pull identity.
Image Pull Failures from AKS
Symptom: Pods stuck in ImagePullBackOff state
Causes:
- Private endpoint DNS resolution failing
- Network connectivity issues
- Image doesn’t exist or tag is wrong
- Kubelet identity lacks permissions
Resolution: First, verify DNS resolution from a pod:
kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup acrorganisationprod.azurecr.io
The hostname should resolve to a private IP address from the 10.x.x.x range, not a public IP. If it resolves to a public IP, the private DNS zone linking is misconfigured.
Check pod events for specific error messages:
kubectl describe pod <pod-name>
Test pulling the image manually from a node:
# SSH to a node (or use kubectl debug)
sudo crictl pull acrorganisationprod.azurecr.io/platform/api-service:latest
Slow Image Pulls
Symptom: Deployments take minutes to pull images
Causes:
- Large image sizes
- Pulling across regions
- Network bandwidth constraints
- Missing layer cache
Resolution: Investigate image size and optimise:
docker images --format "table \t\t"
Images over 1GB warrant investigation. Use multi-stage builds, alpine base images, and .dockerignore files to reduce size.
Verify geo-replication is working:
az acr replication list --registry acrorganisationprod --output table
Ensure replicas exist in the same regions as your workloads.
GitHub Actions Push Failures
Symptom: denied: client with IP not allowed or similar network errors
Causes:
- GitHub Actions IP ranges changed
- Network rule set too restrictive
- Service principal credentials expired
Resolution: GitHub occasionally adds new IP ranges. Check the current ranges:
curl https://api.github.com/meta | jq .actions
Update your network rules if necessary. Consider using self-hosted runners in your Azure network to avoid this issue entirely.
Conclusion
Building a centralised Azure Container Registry is an investment in your platform’s future. Done correctly, it becomes invisible infrastructure that just works. Developers push images, deployments pull them, and the registry quietly manages terabytes of container layers across regions.
The architecture we’ve built embeds security at every layer, from customer-managed encryption keys to network isolation to comprehensive audit logging. The Terraform code is production-ready, implementing Azure best practices that satisfy security teams whilst remaining operationally practical.
The GitHub Actions workflows demonstrate modern CI/CD patterns: semantic versioning, multi-architecture builds, vulnerability scanning, provenance attestations, and efficient layer caching. These aren’t just nice-to-haves, they’re essential practices for maintaining secure, reliable container deployments.
Perhaps most importantly, this centralised approach scales with your organisation. Whether you’re supporting ten services or a thousand, the architectural patterns remain consistent. Teams gain autonomy through namespaced repositories whilst platform engineering maintains governance through centralized policies.
Container registries are foundational infrastructure. Build them well, secure them thoroughly, and they’ll serve your organisation reliably for years. The initial investment in proper architecture, security controls, and operational procedures pays dividends every single day as your teams deploy with confidence, knowing their images are stored securely and delivered efficiently wherever they’re needed.
Leave a comment