cancel
Showing results for 
Search instead for 
Did you mean: 

Sudden loss of internet access - Possible Gateway problem

mikeb
Rising Star
Posts: 463
Thanks: 15
Registered: ‎10-06-2007

Re: Sudden loss of internet access - Possible Gateway problem

Sorry, but I've been having way more important non-IT nightmares to try and resolve recently but suffice it say that this problem is still very much present Sad
You're dead right Mr.EJS, there is no sensible way of ever determining whether the LCP commands allegedly being sent were in fact sent rather than presumably just not being received at the 'right' place or being ignored. However, the circumstantial evidence is fairly convincing to say the least ! There are just way too many indications of loss of data connections various to implicate the modem as being directly responsible for some problem and/or suggest that it is telling porkies. Suggesting that the modem somehow loses upstream LCP for certain commands only and/or loses data connectivity at certain times of the day while doing certain things under certain conditions and often only affecting data from certain sources ... but it's basically 100% OK at all other times ... is a somewhat unlikely scenario to say the very least. Add to that the fact the 'hang' issue didn't happen at any time during 2012 but has been occurring very frequently since mid January and has also occurred very frequently during similar very specific conditions and dates several years ago. I don't dispute that it is no doubt technically possible but it's a bit like telling a customer that his poor speedtest results are because his modem isn't requesting the test data (or acknowledging the received test data) fast enough and that's why downloads are being slowed right down isn't it !!! Possibly true but highly bl**dy unlikely.
The data connection IS without doubt being lost for periods of time sometimes measured almost in minutes rather than seconds and it is always when the connection is being used in anger but there are never any associated transmission errors. i.e. the fundamental connection is very stable but various network problems are apparently occurring. The strange thing is that they're not entirely consistent. Sometimes (mostly) it's unexplained breaks in the data connection to certain sources e.g. Akamai servers various.  Sometimes it's unexplained breaks in the data connection to all sources i.e. no browsing to anywhere and no TB pings being received resulting in PPP being terminated. Sometimes the upstream LCP is apparently lost (resulting in the secondary problem with a 'hang' situation) but sometimes it isn't. Pretty much all the colours in all the sizes and all that ! BUT everything indicates unexplained network connection problems somewhere down the line rather than unexplained local problems or unexplained modem errors. There is no indication of it being DNS related so far as I can see.
The other major thing of note is that none of what's going on is a new problem. I've been here done this and already have a drawer full of tee shirts from several years ago. The problem ultimately went away of it's own accord last time and hasn't been seen for some years.  I've had a War & Peace stylee post written for ages with some more interesting stuff but I'm not entirely sure that it's really worth the effort posting it let alone anyone reading it given the known problems there still appear to be which may or may not be behind what I'm seeing here. In any case I have always strongly suspected that it's potentially much more of a BT issue rather than being exclusively a PN issue. I'd certainly put a very large sum of money on BT have one or more fingers in this dodgy pie anyway ! I think there is something very strange going on between the exchange and PN that's for sure and there are definitely similarities between what I'm seeing and historic problems in general with BT screwing up PPP connections to PN.
Here's some interesting things collected over the last week or so seeing as it's happening virtually every time I try and do anything that makes full use of my connection:
The following netmeter graphs show multiple downloads from so-called high speed high bandwidth always reliable content delivery servers at various times. None of what is shown represents separate downloads, several downloads were all started before the graphed data and all either completed or were terminated after the graphed data. The graphs show the combined data transfer for several files part way through and they're mostly a bunch of iplayer downloads I think with very little if any other significant activity:



What they show is the VERY unusual and somewhat 'digital' data transfer rate often being experienced from such servers if not in general. All the individual downloads are either running at full speed (linespeed/n) or zero, nothing much in between. There was perhaps a lot more 'jitter' on the individual file download speeds than I'm used to seeing. For the last case in particular, only Akamai (and possibly also google-related) connections were affected. Other sites appeared mostly OK albeit slow and unresponsive and there was no obvious loss of connection indicated on the TB graph for instance either. On this occasion the data flow resumed after each break, well sort of anyway, but it never returned to anything even remotely 'normal' and eventually just gave up completely. Because TB pings were still coming in fairly regularly there was no reason for the modem to drop PPP so no chance of a 'hang' or finding out whether upstream LCP was affected. It was only a partial loss of data connection not a complete loss.
At (most) other times, the loss of data connection is more widespread and affects everything or at least everything I try and do in the short time between noting the problem myself and the modem terminating PPP. The following TB graphs for instance demonstrate what's happened on several recent occasion but the last one is the most interesting:



What they show is the customary complete loss of data connection as has been demonstrated on various previous occasions occurred very shortly after I started using the connection in anger. This is fairly typical behaviour but not every single time as can be seen on the last graph above where the data connection was lost after quite some time. What does appear to be the case almost all the time just recently though is that my always-on always very lightly in use connection is almost always VERY sluggish as soon as sudden demands are placed on it. With regards to the last graph, the initial loss of data connection obviously resulted in a loss of upstream LCP and manual intervention was required. What happened subsequent to that is there were several unexplained breaks in the download data transfer and two further complete losses of data connection that affected both Akamai servers AND the TB pings but on these two occasions upstream LCP wasn't lost. The modem sent 1 or more Echo Request(s) during the breaks, received an appropriate response and the data connection ultimately became active again in due course.
I have absolutely stacks of detailed archive data here going back to at least 2005 and plenty of current indications of this problem occurring. It's not a one-off kinda thing and I think it's pretty obviously related to full or partial loss of data connection somewhere way upstream from me. I can even give you full router stats, tell you which gateway I was connected to and even which IPs I was connected to on practically every second of every day for the best part of 10 years ! I just KNOW that this situation is relatively unusual but I equally know that it has happened during at least one isolated but extended period of time in the past and it started happening again in mid January. Whilst, for instance, I know that the data connection was randomly lost 55 times in total during 2012 and PPP was terminated to resolve the situation, I also know that the 'hang' issue didn't happen on any one of those occasions. It happened just once during what appears to be a middle-of-the-night maintenance period leading to a relatively lengthy loss of connection due to what seemed to be no access to PN Radius. Exactly the same sort of 'hang' as I'm seeing now needless to say. Every other instance was resolved fully automatically and EXACTLY as it should have been. Interesting to see that January 2012 was a particularly bad time with around 75% of the occasions of random loss of data connection being in January.  However, every time data connections have been lost in January/February 2013, the 'hang' has also occurred. If I had the time and could be bothered to look back far enough I could find out exactly when the 'hang' issue was first seen (i.e. several years ago) and how long before it just stopped happening.  I also already know that VERY similar things often tend to happen when BT are doing maintenance, be that planned or otherwise. I even think there are historic service.status posts to show there is/was a known problem with customers being unable to automatically reconnect to PN following BT maintenance periods. BT do seem to like 'hanging' connections when they're tinkering around and all that.  However, this clearly isn't a maintenance period issue in general but it is suggesting network and/or network control/configuration issues between BT and PN similar to those which can and do often occur during maintenance periods when parts of the network are taken out of service. What do BT/PN tend to do in January/February that seems like it regularly screws things up ?
I suspect there is probably more than one problem here and I definitely think the 'hang' issue is not only a 100% side-issue but is most likely BT related. But as I keep on saying, it's not the 'hang' that's the real problem here. It's just a consequence that wouldn't ever happen if the data connection hadn't been lost in the first place and no matter who or what is actually causing the 'hang' it's highly unlikely that it will ever be addressed and/or fixed except accidentally while sorting something else out perhaps.  The real problem here is why am I intermittently losing some or all data connectivity and why am I apparently also experiencing similar if not the same problems as lots of other customers are seeing during peak times ... but I'm seeing them during what are pretty d@mn obviously very off-peak times !!!
If anyone is running a book then I'll put my money on this issue mysteriously disappearing, just as mysteriously as it appeared in January, if and when BT/PN fully resolve all of the known problems and/or resolve what appears to be the gross overloading somewhere probably as a direct result of selling fibre and unlimited services without having the infrastructure necessary to supply them.


B T Plusnet, a bit kinda like P T Barnum ...

... but quite often appears to feature more clowns Tongue
Anotherone
Champion
Posts: 19,107
Thanks: 457
Fixes: 21
Registered: ‎31-08-2007

Re: Sudden loss of internet access - Possible Gateway problem

Mike, I sent you a PM yesterday.