cancel
Showing results for 
Search instead for 
Did you mean: 

Tracert's to ntp.plus.net

Anotherone
Champion
Posts: 19,107
Thanks: 457
Fixes: 21
Registered: ‎31-08-2007

Re: Tracert's to ntp.plus.net

Thanks for trying to explain a bit more Bob, I understand what you are saying.
I'm now on pcl-ag04, with identical routing to pcl-ag01. If I'm interpreting the tracerts correctly the load balancer is at hop 6 except for 212.159.13.50 where it's at hop 7 because of the extra hop.
So in the example I quoted about xe-10-0-0.ptw-cr01.plus.net falling over, what you seem to be saying is it has sufficient resiliency built in making is less likely to fall over completely, but should it do so, somehow (beyond your knowledge), it would go to another physical unit cr02, if I've understood that correctly.
Which is fine, but I suppose the underlying query, which none of this answers is why many of us see random periodic fails in DNS lookups - it's not errors on my connection to the exchange before someone leaps on that one!

npr
Pro
Posts: 1,898
Thanks: 119
Fixes: 9
Registered: ‎21-01-2013

Re: Tracert's to ntp.plus.net

Quote from: Bob
I think BGP or something similar is used to route you to the closest cluster which is the basis for the question jelv asked

Thanks for the explanation Bob, it all sounds very good and from this you would expect the plusnet name servers to be the fastest around for PN users. Unfortunately my tests using DNS Benchmark frequently show PN's name servers to be the slowest for cached look ups.
Is there a reason for this and is there any plans to improve?

Townman
Superuser
Superuser
Posts: 23,039
Thanks: 9,622
Fixes: 160
Registered: ‎22-08-2007

Re: Tracert's to ntp.plus.net

Quote from: Bob
The IP's you see in traceroutes are not those of the physical caching DNS servers themselves. They are virtual IP's. The physical caching servers themselves are clustered across two sites for resiliency.

Bob,
Thank you for the clear explanation which has not been so well given in previous discussions on this topic.
Quote from: Bob
I'm going to bow out now before I get out of my depth, however if this thread is a question of whether or not we've sufficient network redundancy then I don't think people have anything to worry about Wink

...and take the rest of us with you.  I suggest that the questions have arisen out of numerous recent issues where DNS name resolution has repeatedly failed completely or in part or (as noted by NPR) not been particularly fast.
On that subject I'm sat at home ( rather than on my business site ) and am finding some issues with DNS resolution... but that might be related to the connectivity issues I'm waiting for BTOR to come and investigate.
Kevin

Superusers are not staff, but they do have a direct line of communication into the business in order to raise issues, concerns and feedback from the community.

npr
Pro
Posts: 1,898
Thanks: 119
Fixes: 9
Registered: ‎21-01-2013

Re: Tracert's to ntp.plus.net

Quote from: townman

On that subject I'm sat at home ( rather than on my business site ) and am finding some issues with DNS resolution... but that might be related to the connectivity issues I'm waiting for BTOR to come and investigate.
Kevin

Could run your own caching DNS resolver, I gave up on all ISP DNS resolvers years ago.  Cheesy
I can recommend Unbound DNS, very easy to install and been 100% reliable during the years I've run it.
Details in the usual place Wink
bobpullen
Community Gaffer
Community Gaffer
Posts: 16,887
Thanks: 4,979
Fixes: 316
Registered: ‎04-04-2007

Re: Tracert's to ntp.plus.net

Quote from: Anotherone
So in the example I quoted about xe-10-0-0.ptw-cr01.plus.net falling over, what you seem to be saying is it has sufficient resiliency built in making is less likely to fall over completely, but should it do so, somehow (beyond your knowledge), it would go to another physical unit cr02, if I've understood that correctly.

Yes, you've understood correctly Smiley
Quote from: Anotherone
Which is fine, but I suppose the underlying query, which none of this answers is why many of us see random periodic fails in DNS lookups.

Do you? Huh
I certainly wasn't aware of any widespread complaints and I've not personally noticed any problems myself. Whilst other servers might be quicker responding, ours should rarely fail unless there's a service wide problem of some description.
Quote from: npr
Quote from: Bob
I think BGP or something similar is used to route you to the closest cluster which is the basis for the question jelv asked

Thanks for the explanation Bob, it all sounds very good and from this you would expect the plusnet name servers to be the fastest around for PN users. Unfortunately my tests using DNS Benchmark frequently show PN's name servers to be the slowest for cached look ups.
Is there a reason for this and is there any plans to improve?

That's the slowest of the fastest percentile though I'm guessing? I very much doubt our servers are slower to respond than the majority of publicly accessible resolvers. Using a default namebench install I get the following which shows our resolvers to be up there with the quickest of them.

I've just downloaded and optimised DNS Benchmark too. Out of thousands of resolvers it claims to have selected the fastest 50 or so. Granted if I specify our server address by comparison then it doesn't look good but Google doesn't seem to fair much better either. Both are in the bottom five:

Location isn't everything and there are clearly other factors that need considering. The architecture of the physical boxes themselves and the software being used etc. Whilst the round trip time to our caches should definitely be quicker than to other DNS servers, that doesn't guarantee the quickest average DNS response time.
In summary, our caching DNS servers should be perfectly fine to use. There are reasons you may want to specify others though e.g. for resiliency and if you roam (I think our servers are locked to our IP ranges). You might even want to replace your primary server with a non-Plusnet server. Personally though I don't think the milliseconds it purportedly saves makes much of a difference Wink
To answer your other question, I don't think there are any immediate plans to overhaul the DNS platform.

Bob Pullen
Plusnet Product Team
If I've been helpful then please give thanks ⤵

Anotherone
Champion
Posts: 19,107
Thanks: 457
Fixes: 21
Registered: ‎31-08-2007

Re: Tracert's to ntp.plus.net

Quote from: Bob
Quote from: Anotherone
Which is fine, but I suppose the underlying query, which none of this answers is why many of us see random periodic fails in DNS lookups.

Do you? Huh
I certainly wasn't aware of any widespread complaints and I've not personally noticed any problems myself. Whilst other servers might be quicker responding, ours should rarely fail unless there's a service wide problem of some description.

I'm not referring specifically to anything that's just cropped up. There have been many threads over lengthy periods, where we've experienced these things, and you have responded to some of them IIRC. Using an alternative DNS server always remedies the problem, so it's difficult to draw any other conclusion, however I am (as always) prepared to be educated on these things  Smiley
ejs
Aspiring Hero
Posts: 5,442
Thanks: 631
Fixes: 25
Registered: ‎10-06-2010

Re: Tracert's to ntp.plus.net

There is also the issue that different DNS servers return different IP addresses for things like Akamai hosted websites, not that this appears to help, in fact the opposite, more likely I'll be stuck "Waiting for i.microsoft.com...", or an image on the BBC news site doesn't load etc.
I have noticed a few issues in the past with Plusnet's DNS servers, past issues were usually how Plusnet's DNS servers appeared to handle certain misconfigured domains differently to others DNS servers, which managed to resolve the domain.
The most recent DNS issue was again an inability for Plusnet's DNS to resolve certain domains, but that time it was very unlikely for the ripe.net and isc.org domains to be misconfigured somehow. I noticed that Plusnet's DNS servers now don't respond to type ANY queries, e.g. "dig @212.159.6.9 plus.net ANY", but I can't remember if that was always the case or not.
npr
Pro
Posts: 1,898
Thanks: 119
Fixes: 9
Registered: ‎21-01-2013

Re: Tracert's to ntp.plus.net

Quote from: Bob

That's the slowest of the fastest percentile though I'm guessing? I

You could put it that way Wink
IMO PN's servers should easily be the fastest for cached look up, they have the home advantage.
I can ping ntp.plus.net in under 16ms yet PN's cached name look ups takes on average 30 to 40 ms -- there's a lot of time being lost somewhere.
Quote
"In summary, our caching DNS servers should be perfectly fine to use."

I don't doubt that, I'm just questioning whether they should / could be faster.
Anotherone
Champion
Posts: 19,107
Thanks: 457
Fixes: 21
Registered: ‎31-08-2007

Re: Tracert's to ntp.plus.net

pcl-ag06 now
>tracert 212.159.13.49
Tracing route to cdns01.plus.net [212.159.13.49]
over a maximum of 30 hops:
 1    <1 ms    <1 ms    <1 ms  dsldevice.lan [192.168.1.254]
 2    39 ms    38 ms    37 ms  lo0-central10.pcl-ag06.plus.net [195.166.128.187]
 3    38 ms    37 ms    38 ms  link5-central10.pcl-gw01.plus.net [84.93.249.168]
 4    37 ms    38 ms    37 ms  176.core.access.plus.net [212.159.0.176]
 5    37 ms    37 ms    36 ms  po2.pcl-gw01.plus.net [195.166.129.41]
 6    38 ms    38 ms    38 ms  vl63.pcl-lb02.plus.net [212.159.2.253]
 7    39 ms    37 ms    37 ms  cdns01.plus.net [212.159.13.49]
Trace complete.
>tracert 212.159.13.50
Tracing route to cdns02.plus.net [212.159.13.50]
over a maximum of 30 hops:
 1    <1 ms    <1 ms    <1 ms  dsldevice.lan [192.168.1.254]
 2    38 ms    37 ms    38 ms  lo0-central10.pcl-ag06.plus.net [195.166.128.187]
 3    37 ms    39 ms    37 ms  link11-central10.pcl-gw01.plus.net [84.93.249.180]
 4    37 ms    36 ms    37 ms  176.core.access.plus.net [212.159.0.176]
 5    38 ms    37 ms    37 ms  ae1.ptw-cr01.plus.net [195.166.129.0]
 6    38 ms    38 ms    37 ms  te9-4.ptn-gw01.plus.net [195.166.129.33]
 7    37 ms    37 ms    37 ms  vl55.ptn-lb02.plus.net [212.159.2.125]
 8    37 ms    37 ms    37 ms  cdns02.plus.net [212.159.13.50]
Trace complete.
tracerts to 212.159.6.9 & 10  hops 4,5, & 6 identical to 212.159.14.49
So what is it about the pcl gateways so far, the gives 8 hops to 212.159.13.50 where as on ptw-ag04 there was 7 hops and hops 4, 5, &  6 were identical to the other cdns whereas the pcl ones are different?
Edit: ptn-ag02, same story as ptw-ag04 and all are 7 hops, also ptw-ag01 & ptn-ag03
        pcl-ag05, same story as the other pcl's 8 hops to 212.159.13.50  also pcl-ag07 & pcl-ag04
ejs
Aspiring Hero
Posts: 5,442
Thanks: 631
Fixes: 25
Registered: ‎10-06-2010

Re: Tracert's to ntp.plus.net

Regarding DNS performance, something I noticed yesterday evening and repeated today, is that you seem to get more uncached results from Plusnet DNS servers. I've attached an example graph of repeatedly looking up www.kernel.org at 1 second intervals 90 times on Plusnet 212.159.6.9 and OpenDNS 208.67.222.222. Plusnet's uncached result was faster, Plusnet's cached result was faster, but there were 12 uncached answers from Plusnet vs only 1 uncached answer from OpenDNS.
npr
Pro
Posts: 1,898
Thanks: 119
Fixes: 9
Registered: ‎21-01-2013

Re: Tracert's to ntp.plus.net

@ejs,
You may be on to something there. I've just repeatedly used "dig  kernel.org @212.159.6.9". (plusnets DNS resolver)
The TTL time starts at 600 but doesn't count down properly with repeat tests.
eg 3 tests gave a TTL of 600, then:-
546
557
546
539
536
417
573
600
Also the query time suggests some were not coming from the cache.
Repeating the test using opendns and google dns the TTL time counted down with each repeat test as expected.
IMO there's something strange about the cache on PN's DNS resolver, it's as though each look up is to a different resolver.
spraxyt
Resting Legend
Posts: 10,063
Thanks: 674
Fixes: 75
Registered: ‎06-04-2007

Re: Tracert's to ntp.plus.net

Quote from: npr
… it's as though each look up is to a different resolver.

I wonder if it is - the effect of load balancing, and each resolver maintains its own cache?
David
Townman
Superuser
Superuser
Posts: 23,039
Thanks: 9,622
Fixes: 160
Registered: ‎22-08-2007

Re: Tracert's to ntp.plus.net

Quote from: npr
You may be on to something there. I've just repeatedly used "dig  kernel.org @212.159.6.9". (plusnets DNS resolver)

If I correctly understood Bob's reply (reply #14) then the IP addresses we see are virtual IP addresses which front any number of similar functioning servers.  Consequentially we have no means of knowing that the same server serviced all of the requests apparently sent to the same IP address.  For the test illustrated to be meaningful, one would need to know the addresses of the raw servers, not that of their load balancer.
In summary I now believe that PN have a multitude (number not known) of DNS servers hosted behind 4 virtual IP addresses.  We have no means of knowing which specific server serviced any particular enquiry.  As such, the configuration might at times deliver varied results and performance, but sounds highly resilient to single points of failure.
Kevin

Superusers are not staff, but they do have a direct line of communication into the business in order to raise issues, concerns and feedback from the community.

bobpullen
Community Gaffer
Community Gaffer
Posts: 16,887
Thanks: 4,979
Fixes: 316
Registered: ‎04-04-2007

Re: Tracert's to ntp.plus.net

Quote from: ejs
The most recent DNS issue was again an inability for Plusnet's DNS to resolve certain domains, but that time it was very unlikely for the ripe.net and isc.org domains to be misconfigured somehow.

Missed that thread entirely. There were a few out of hours callouts concerning DNS around that time and the load balancer work that Matt referenced towards the end of the thread. I'm not overly familiar with what happened though as I was on leave 17th-20th.
Quote
I noticed that Plusnet's DNS servers now don't respond to type ANY queries, e.g. "dig @212.159.6.9 plus.net ANY", but I can't remember if that was always the case or not.

I'm not sure whether or not they ever have but I'll agree it's odd. I'll query the situation with our engineers. If I had to force a guess I'd say it was intended as a security measure. I'll let you know what I manage to find out.
Quote from: townman
In summary I now believe that PN have a multitude (number not known) of DNS servers hosted behind 4 virtual IP addresses.

I believe there's 8 servers in two clusters across two sites and yes, I expect this it's this and the load balancing that are behind the odd TTL behaviour.
I still need to look into why certain traces to the resolvers cross data centres. Again, I'll report back once I've anything to share ...

Bob Pullen
Plusnet Product Team
If I've been helpful then please give thanks ⤵

Anotherone
Champion
Posts: 19,107
Thanks: 457
Fixes: 21
Registered: ‎31-08-2007

Re: Tracert's to ntp.plus.net

Quote from: townman
If I correctly understood Bob's reply (reply #14) then the IP addresses we see are virtual IP addresses which front any number of similar functioning servers.

I think this is confusing, see my reply #12 in the first instance and then reply #15
Quote from: Anotherone
If I'm interpreting the tracerts correctly the load balancer is at hop 6 except for 212.159.13.50 where it's at hop 7 because of the extra hop.

The real virtual IP addresses are the ones we don't see in the last hop, as indeed you quoted from Bob's reply #14 - nothing like mixed metaphors to confuse the picture more!
I do think ejs has hit on something and npr has picked up on a possible cause of the oddity.
But I also wonder if this has something to do with oddities I see when I ping ntp.plus.net which I'll cover in detail in another thread so as not to clog this with ping results.
Whilst this is something I've seen before, I'd not spotted the patterns that I've just seen, by testing whilst writing this
.
I'm currently on pcl-ag01, as as noted earlier in the thread, tracerts to 212.159.13.50 have an extra hop on pcl gateways.
First I did a ping -t to ntp.plus.net and because of dns caching on my machine this went to 212.159.6.10 and every 3rd result in a groups of 4 gave a longer ping time. Repeating the test a bit later it was every 4th result in a group of 4. The TTL for all these was 250.
It should be noted that at off-peak times I can get results where all ping values are the same.
A ping -t to 212.159.13.50 gave a TTL of 249 for several tests.
It will certainly be interesting to discover why tracerts to 212.159.13.50 have an extra hop from pcl gateways.
Edit: ping thread is here