Designing a Scalable DNS Schema for Large Distributed Systems
In large distributed systems, a well-designed DNS schema isn’t just about naming; it’s about controlling access, enforcing boundaries, and supporting automation.
In this post, I’ll walk through how to design a DNS naming convention that supports least privilege, environmental separation, and application scoping, with practical examples.
Why DNS Naming Conventions Matter
DNS is often your first layer of abstraction in distributed systems. A good schema:
- Communicates who owns what
- Enables least privilege access policies
- Helps tools and teams automate safely
- Makes it easier to route traffic by environment, application, or region
- Reduces risk of accidental or malicious cross-boundary access
Core Elements of the DNS Schema
We’ll use a schema that includes these elements, in this order:
<resource>.<app>.<env>.<region>.<domain>
For example:
api.orders.prod.us-east-1.company.com
web.inventory.staging.eu-west-1.company.com
db.users.dev.us-west-2.company.com
1. <resource>
— Type of Service or Role
This identifies the component’s function, e.g.:
- api: public or internal APIs
- web: frontend services
- db: backend databases
- cache: Redis, Memcached, etc.
- queue: messaging components (e.g. Kafka, SQS)
Example:
api.orders.prod.us-east-1.company.com
Here, api
indicates this is the Orders service’s API endpoint.
2. <app>
— Application or System Name
This is your logical service boundary — it enables clear ownership and access control.
- orders, inventory, users, payments
- Often aligned with microservices or bounded contexts
Helps restrict access policies to .orders. or .users. without catching unrelated services.
3. <env>
— Environment
A critical boundary for least privilege. Common values:
- dev
- test
- staging
- prod
DNS policies should enforce that services in dev can’t talk to prod:
*.dev.*.company.com ⛔ access to *.prod.*.company.com
To enforce least privilege, configure your CI/CD pipelines so that write access to production DNS zones is only permitted from the main branch after peer review. This ensures all production changes are vetted. In contrast, non-production DNS zones (e.g. dev, test, staging) can be updated during development, enabling faster iteration and testing without risking production stability.
4. <region>
— Geographic or Cloud Region
This helps with:
- Traffic routing
- Reducing cross-region latency
- Isolating blast radius
Examples:
- us-east-1
- eu-west-1
- ap-south-1
This element is only required for regionally hosted services. You will typically also have a global DNS record without the region element that will then route to the regional equivalents using some performance routing or load balancing criteria.
Example:
A request querying api.orders.prod.company.com
might route to api.orders.prod.us-east-1.company.com
or api.orders.prod.eu-west-1.company.com
based on the latency between the client and the regional datacentre.
You could use a policy enforcement to only allow web.*.*.us-east-1
to talk to api.*.*.us-east-1
. This avoids additional latency and egress charges from making cross-region requests.
5. <domain>
— Internal/External Organisation Domain
I would recommend segregating internal and public facing DNS with different root domains like:
- company.net for internal service communication
- company.com for public-facing DNS
This approach makes it easy to manage and understand internal vs external DNS structure.
Split-Horizon
A common alternative to using different root domains; split-horizon DNS is a technique where the same DNS name resolves to different IP addresses depending on where the query originates (e.g. internal vs. external networks). This allows you to present different DNS records for the same domain name to internal users versus external users.
Example:
- Internal users querying
api.orders.prod.us-east-1.company.com
might receive a private IP (e.g.,10.0.1.5
). - External users querying the same name would receive a public IP (e.g.,
203.0.113.42
).
Why Not Favour Split-Horizon?
- Complexity: Managing multiple views of DNS increases operational complexity and risk of misconfiguration.
- Security: Mistakes can expose internal services to the public internet.
- Auditing: Harder to audit and reason about access boundaries, since the same name can mean different things in different contexts.
- Automation: Tools and scripts may behave unpredictably if they resolve different addresses based on their network location.
For these reasons, using distinct root domains for internal and external services is often simpler, safer, and easier to automate.
Least Privilege DNS Access with Delegated Zones
DNS zone delegation is a powerful mechanism that lets you break your DNS hierarchy into separately controlled units, where each unit (zone) can be managed independently. Perfect for enforcing least privilege access.
What Is a Delegated DNS Zone?
A delegated DNS zone is a subdomain whose administrative control is delegated to a separate set of DNS servers (or teams), allowing:
- Decentralised management of DNS entries
- Access control per environment, team, or application
- Isolation to prevent accidental or malicious changes to unrelated services
Example DNS Structure
Assume your base domain is:
company.com
You can delegate zones like:
orders.dev.us-east-1.company.com
inventory.prod.eu-west-1.company.com
Each of these is its own DNS zone and can be managed independently.
How This Enables Least Privilege
Only give each team or automation pipeline write access to their own delegated zone:
Team | Zone Delegated | Access Level |
---|---|---|
Orders | orders.dev.*.company.com | Full |
Orders | orders.prod.*.company.com | Restricted |
Platform | *.company.com | Full |
This means:
- The orders team can manage only their records in their env/region
- No team can modify prod DNS for another team
- Platform team manages only delegation, not records
Implementation Steps
Delegate Zones from the Root
In the parent zone company.com, you delegate control like this:
; Delegate orders.dev.us-east-1.company.com
orders.dev.us-east-1.company.com. IN NS ns1.orders.dev.us-east-1.company.com.
orders.dev.us-east-1.company.com. IN NS ns2.orders.dev.us-east-1.company.com.
These NS records tell DNS resolvers that this subdomain is handled by different nameservers.
Control Access via IAM or Role-Based Permissions
In cloud-based DNS systems (e.g. AWS Route 53, GCP Cloud DNS, Azure DNS), you can set:
- IAM policies to restrict which teams or services can modify which zones
- Service roles that grant limited access in CI/CD pipelines
Example (AWS IAM policy for Route 53):
{
"Effect": "Allow",
"Action": [
"route53:ChangeResourceRecordSets",
"route53:ListResourceRecordSets"
],
"Resource": "arn:aws:route53:::hostedzone/ZONE-ID-FOR-orders.dev.us-east-1.company.com"
}
Only the dev pipeline for orders gets this permission.
Benefits of This Model
Benefit | How It Helps |
---|---|
Separation of Duties | Different teams manage only their own zone |
Environment Isolation | Prevents dev/test systems from accidentally interfering with prod |
Tighter Access Control | IAM or RBAC policies applied per zone |
Scalability | Teams can self-manage without bottlenecks on a central DNS team |
Blast Radius Reduction | Misconfigurations affect only a single zone |
Best Practices
- Automate delegation via IaC tools like Terraform
- Document each delegated zone: who owns it, who has access
- Don’t use wildcard delegations
- Wildcard delegations (e.g.
*.dev.company.com
) can unintentionally grant broad control over large portions of your DNS hierarchy. This increases the risk of accidental or malicious changes affecting unrelated services, weakens access boundaries, and makes auditing more difficult. Always delegate explicit, narrowly scoped zones to maintain clear ownership and least privilege.
- Wildcard delegations (e.g.
- Use DNSSEC to validate integrity if you’re exposing public zones. This prevents:
- DNS spoofing / cache poisoning
- Man-in-the-middle (MITM) attacks
- Redirection to malicious services
Leave a comment