Designing a Scalable DNS Schema for Large Distributed Systems

7 minute read Platform Engineering

In large distributed systems, a well-designed DNS schema isn’t just about naming; it’s about controlling access, enforcing boundaries, and supporting automation.

In this post, I’ll walk through how to design a DNS naming convention that supports least privilege, environmental separation, and application scoping, with practical examples.

Why DNS Naming Conventions Matter

DNS is often your first layer of abstraction in distributed systems. A good schema:

Communicates who owns what
Enables least privilege access policies
Helps tools and teams automate safely
Makes it easier to route traffic by environment, application, or region
Reduces risk of accidental or malicious cross-boundary access

Core Elements of the DNS Schema

We’ll use a schema that includes these elements, in this order:

<resource>.<app>.<env>.<region>.<domain>

For example:

api.orders.prod.us-east-1.company.com
web.inventory.staging.eu-west-1.company.com
db.users.dev.us-west-2.company.com

1. `<resource>` — Type of Service or Role

This identifies the component’s function, e.g.:

api: public or internal APIs
web: frontend services
db: backend databases
cache: Redis, Memcached, etc.
queue: messaging components (e.g. Kafka, SQS)

Example:

api.orders.prod.us-east-1.company.com

Here, api indicates this is the Orders service’s API endpoint.

2. `<app>` — Application or System Name

This is your logical service boundary — it enables clear ownership and access control.

orders, inventory, users, payments
Often aligned with microservices or bounded contexts

Helps restrict access policies to *.orders.* or *.users.* without catching unrelated services.

3. `<env>` — Environment

A critical boundary for least privilege. Common values:

dev
test
staging
prod

You could choose to omit the <env> element for the production environment. This makes a nice cosmetic enhancement for domain names presented to end users. For internal service communication, where the domain name is not part of the branding and manually typed into a browser address bar, I would prefer to maintain the consistency of the DNS schema and include a .prod. in the domain name.

Policies should enforce that services in dev can’t talk to prod:

*.dev.*.company.com ⛔ access to *.prod.*.company.com

Allowing development services to communicate with production can expose sensitive data, introduce instability, and increase the risk of accidental changes or outages in critical systems. Strict separation ensures that experiments or untested code in dev cannot impact the reliability, security, or compliance of production environments.

To enforce least privilege for infrastructure configuration, configure your CI/CD pipelines so that write access to production DNS zones is only permitted from the main branch after peer review. This ensures all production changes are vetted. In contrast, non-production DNS zones (e.g. dev, test, staging) can be updated during development, enabling faster iteration and testing without risking production stability.

4. `<region>` — Geographic or Cloud Region

This helps with:

Traffic routing
Reducing cross-region latency
Isolating blast radius

Examples:

us-east-1
eu-west-1
ap-south-1

This element is only required for regionally hosted services. You will typically also have a ‘global’ DNS record without the region element that will then route to the regional equivalents using some performance routing or load balancing criteria.

Example:

A request querying api.orders.prod.company.com might route to api.orders.prod.us-east-1.company.com or api.orders.prod.eu-west-1.company.com based on the latency between the client and the regional datacentre.

For AWS customers, AWS Route 53 latency-based routing makes this performance based routing incredibly easy, allowing you to create multiple DNS records with the same name, each one associated with one of their regional datacentres. When a Route 53 nameserver receives a DNS query for your domain, it checks which AWS regions you’ve created latency records for, determines which region gives the user the lowest latency, and then selects a latency record for that region. Route 53 responds with the value from the selected record, such as the IP address for a web server.

You could use a policy to only allow web.*.*.us-east-1 to communicate with api.*.*.us-east-1. This avoids additional latency and egress charges from making internal cross-region requests. A health check on the website could detect when the API is unavailable to service requests, disabling web DNS for the region, automatically re-routing users to the website in another region.

OR, if you have a large system and don’t want to take the whole region offline when one API is unavailable, you could have web.*.*.us-east-1 target api.*.*.* and fail over a single dependency, allowing cross-region requests to happen and accept the additional latency.

OR, you could simply choose to make a feature of the website unavailable when the required API in the same region is unavailable.

5. `<domain>` — Internal/External Organisation Domain

I would recommend segregating internal and public facing DNS with different root domains like:

company.net for internal service communication
company.com for public-facing DNS

This approach makes it easy to manage and understand internal vs external DNS structure.

Split-Horizon

A common alternative to using different root domains; split-horizon DNS is a technique where the same DNS name resolves to different IP addresses depending on where the query originates (e.g. internal vs. external networks). This allows you to present different DNS records for the same domain name to internal users versus external users.

Example:

Internal users querying api.orders.prod.us-east-1.company.com might receive a private IP (e.g., 10.0.1.5).
External users querying the same name would receive a public IP (e.g., 203.0.113.42).

Why Not Favour Split-Horizon?

Complexity: Managing multiple views of DNS increases operational complexity and risk of misconfiguration.
Security: Mistakes can expose internal services to the public internet.
Auditing: Harder to audit and reason about access boundaries, since the same name can mean different things in different contexts.
Automation: Tools and scripts may behave unpredictably if they resolve different addresses based on their network location.

For these reasons, using distinct root domains for internal and external services is often simpler, safer, and easier to automate.

Least Privilege DNS Ownership with Delegated Zones

DNS zone delegation is a powerful mechanism that lets you break your DNS hierarchy into separately controlled units, where each unit (zone) can be managed independently. Perfect for enforcing least privilege access.

What Is a Delegated DNS Zone?

A delegated DNS zone is a subdomain whose administrative control is delegated to a separate set of DNS servers (or teams), allowing:

Decentralised management of DNS entries
Access control per environment, team, or application
Isolation to prevent accidental or malicious changes to unrelated services

Example DNS Structure

Assume your base domain is:

company.com

You can delegate zones like:

orders.dev.us-east-1.company.com
inventory.prod.eu-west-1.company.com

Each of these is its own DNS zone and can be managed independently.

How This Enables Least Privilege

Only give each team or automation pipeline write access to their own delegated zone:

Team	Zone Delegated	Access Level
Orders	orders.dev.*.company.com	Full
Orders	orders.prod.*.company.com	Restricted
Platform	*.company.com	Full

This means:

The orders team can manage only their records in their env/region
No team can modify prod DNS for another team
Platform team manages only delegation, not records

Implementation Steps

Delegate Zones from the Root

In the parent zone company.com, you can delegate control using NS records:

; Delegate orders.dev.us-east-1.company.com
orders.dev.us-east-1.company.com. IN NS ns1.orders.dev.us-east-1.company.com.
orders.dev.us-east-1.company.com. IN NS ns2.orders.dev.us-east-1.company.com.

These NS records tell DNS resolvers that this subdomain is handled by different nameservers.

Control Access via IAM or Role-Based Permissions

In cloud-based DNS systems (e.g. AWS Route 53, GCP Cloud DNS, Azure DNS), you can set:

IAM policies to restrict which teams or services can modify which zones
Service roles that grant limited access in CI/CD pipelines

Example (AWS IAM policy for Route 53):

{
  "Effect": "Allow",
  "Action": [
    "route53:ChangeResourceRecordSets",
    "route53:ListResourceRecordSets"
  ],
  "Resource": "arn:aws:route53:::hostedzone/ZONE-ID-FOR-orders.dev.us-east-1.company.com"
}

Example (Azure RBAC for Azure DNS):

To grant a team access to manage only their delegated DNS zone in Azure, assign the DNS Zone Contributor role scoped to the specific DNS zone resource:

# Assign DNS Zone Contributor role to a user/group/service principal
az role assignment create \
    --assignee <user-or-service-principal-id> \
    --role "DNS Zone Contributor" \
    --scope "/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Network/dnszones/orders.dev.us-east-1.company.com"

This ensures the assignee can only manage DNS records within the specified zone, not other zones or parent domains.

Only the dev pipeline for orders gets this permission.

Benefits of This Model

Benefit	How It Helps
Separation of Duties	Different teams manage only their own zone
Environment Isolation	Prevents dev/test systems from accidentally interfering with prod
Tighter Access Control	IAM or RBAC policies applied per zone
Scalability	Teams can self-manage without bottlenecks on a central DNS team
Blast Radius Reduction	Misconfigurations affect only a single zone

Best Practices

Automate delegation via IaC tools like Terraform
Document each delegated zone: who owns it, who has access
Don’t use wildcard delegations
- Wildcard delegations (e.g. *.dev.company.com) can unintentionally grant broad control over large portions of your DNS hierarchy. This increases the risk of accidental or malicious changes affecting unrelated services, weakens access boundaries, and makes auditing more difficult. Always delegate explicit, narrowly scoped zones to maintain clear ownership and least privilege.
Use DNSSEC to validate integrity if you’re exposing public zones. This prevents:
- DNS spoofing / cache poisoning
- Man-in-the-middle (MITM) attacks
- Redirection to malicious services

Share on

X Facebook LinkedIn Bluesky

Glen Thomas

Designing a Scalable DNS Schema for Large Distributed Systems

Why DNS Naming Conventions Matter

Core Elements of the DNS Schema

1. `<resource>` — Type of Service or Role

2. `<app>` — Application or System Name

3. `<env>` — Environment

4. `<region>` — Geographic or Cloud Region

5. `<domain>` — Internal/External Organisation Domain

Least Privilege DNS Ownership with Delegated Zones

What Is a Delegated DNS Zone?

Example DNS Structure

How This Enables Least Privilege

Implementation Steps

Delegate Zones from the Root

Control Access via IAM or Role-Based Permissions

Benefits of This Model

Best Practices

Share on

Leave a comment

You may also enjoy

Running Effective 1:1s: A Tactical Guide for Engineering Leaders

How to Make Architectural Decisions (and Stick to Them)

Platform as a Product – A Guide for Platform Product Owners

Scaling Yourself: Lessons from the Transition to Engineering Leadership

Glen Thomas

Why DNS Naming Conventions Matter

Core Elements of the DNS Schema

1. <resource> — Type of Service or Role

2. <app> — Application or System Name

3. <env> — Environment

4. <region> — Geographic or Cloud Region

5. <domain> — Internal/External Organisation Domain

Least Privilege DNS Ownership with Delegated Zones

What Is a Delegated DNS Zone?

Example DNS Structure

How This Enables Least Privilege

Implementation Steps

Delegate Zones from the Root

Control Access via IAM or Role-Based Permissions

Benefits of This Model

Best Practices

Share on

Leave a comment

You may also enjoy

Running Effective 1:1s: A Tactical Guide for Engineering Leaders

How to Make Architectural Decisions (and Stick to Them)

Platform as a Product – A Guide for Platform Product Owners

Scaling Yourself: Lessons from the Transition to Engineering Leadership

1. `<resource>` — Type of Service or Role

2. `<app>` — Application or System Name

3. `<env>` — Environment

4. `<region>` — Geographic or Cloud Region

5. `<domain>` — Internal/External Organisation Domain