Introduction This post is intended to explain how our Authoritative DNS (Auth DNS) system works. It is a moderately technical document and it's target audience is the enthusiast interested in how we run a DNS implementation at a medium-sized ISP. If you are looking for an explanation of how DNS works in general, http://www.howstuffworks.com/dns.htm is a good starting point. Backstory Our current AuthDNS system has run pretty much untouched since about 1998. The system has been robust and served upwards of a million records. While it has served us well over the years, the design has a number of scalability issues and is not really optimal for use today. As time goes on, more of our tech-savvy users are noticing its shortfalls. We are aware that our AuthDNS system is not the best implementation, and part of the reason it has remained unchanged is that it is reliable. It has also had many changes made to it over the course of the years and as a result the implementation is quite complex. One example of this is that our AuthDNS system also answers recursion, i.e. it will act as a caching DNS service. This is a hangover from the time when it was actually our caching DNS service as well. For several years the addresses of these machines have no longer been given out via RADIUS, and we have made posts saying that they will be removed in the past, so this functionality will be dropped in the new implementation. In addition, reverse DNS is largely automated and fixed, any changes to the standard are done manually by technical support, which is not ideal and leaves us open to issues such as reverse DNS persisting on an IP address after a customer has left (there is no automated cleanup process for the manual system). Current Implementation Your DNS records are stored in our Core database (the main database system we use to store customer data). When an account is created or destroyed, you change them via the portal, or support staff change them through Workplace (our internal CRM software) these records are changed in the database. Similarly, reverse DNS addresses are automatically generated via the Workplace system when an account is created or destroyed. There is also an override file maintained by our support staff that allows for custom reverse DNS records. A number of scripts running on our backend systems take these records from the database and write them out into BIND zonefile format. These are kept as text files on a shared storage system, and are available over NFS to our AuthDNS servers. We have a pair of machines spread over two sites in Sheffield. These machines are authdns01 and authdns02. They run a modern version of BIND, a popular DNS implementation. The setup is quite complex but involves running a DNS master and slave on each machine (on different IP addresses). There are a set of scripts running on the AuthDNS machines. These take the files from the shared storage, "localise them" (customise them with machine-specific information), and copy them into the correct place to be loaded by the DNS daemon. Every 4 hours a script runs that reloads zones that the DNS system knows about (this picks up changes made via the portal or Workplace). Overnight, the DNS daemon refreshes its configuration entirely, which picks up new zones that have been added since the last refresh, as well as dropping ones that have been removed. A set of load balancers sit in front of the AuthDNS machines and these have the VIP (virtual IP) addresses that you may know of today (e.g., 188.8.131.52, 184.108.40.206, etc.). These accept requests on TCP and UDP port 53 and based on a load balancing algorithm forward the DNS requests to the various DNS daemons running on the AuthDNS machines. These machines then answer the queries. If the query is regarding a zone that the server is authoritative for, the server will reply directly with the requested information. If the query is for a zone that the server is not authoritative for, it will go and retrieve the records recursively before returning them to the customer. Due to the size of some of these zonefiles, it takes the DNS daemon software a long time to reload its configuration (BIND loads every zone into memory before starting to answer queries). New Implementation The new implementation moves from BIND to PowerDNS (http://www.powerdns.com). PowerDNS offers a database-backed AuthDNS system. The software is modern and under active development. Because we have to interface with the existing systems, the interaction between the Core database will remain the same. At a later date our development department may take advantage of the extra functionality available. The system comprises of four DNS frontends (called adnsfeXX, running PowerDNS) and four DNS backends (called adnsbeXX, running MySQL). These are again split across two sites in Sheffield (though we are considering moving some of the system to London). The DNS frontends, as before, sit behind two pairs of load balancers (one pair in each site). The DNS software will answer authoritative queries only (non-authoritative queries will be sent a SERVFAIL message). As queries come into the adnsfe machines, they are stored in memory for speed. These are periodically checked against the backend for changes. If the adnsfe does not know about a zone from its cache, it will query the backend machines for the answer. If they know, the records will be returned. The DNS backends sit behind two pairs of load balancers (one pair in each site). These machines answer questions via MySQL from the adnsfe machines. These can safely be kept behind a load balancer as they are never written to by the adnsfe machines. The databases are set up in a replication chain. One of them will be the master, and will receive writes from admin scripts (and eventually possibly Workplace and the portals). These changes are then replicated out to the other databases, making updates much quicker than the previous system. Reverse DNS lives in the database as do the normal records, so updating these internally is far easier, and creating a system to manage rDNS records (and indeed all of the DNS records) will be much simpler. Finally, all of the interaction with the database is done as stored procedures to ensure a consistent interface to the backend. We would be able to move the underlying data structures around as long as we keep the interface consistent; this should buy us longeivity of the setup if our administrative systems change over time. The adnsfe machines begin to answer queries the second they are started, which is a preferred behaviour over BIND. We can also scale the system sideways and add more machines into the load-balanced pools if we need to. Conclusion The new AuthDNS system should provide us with scalability for at least the next few years, and should have enough flexibility to provide new features as time goes on. It puts in place a framework for our Application Development team to utilise DNS in their Portal applications. If you have any feedback feel free to give me a PM or email, I'm happy to discuss most things Internet or PlusNet related. Bye for now, Gricey.