cancel
Showing results for 
Search instead for 
Did you mean: 

PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

tijara33
Pro
Posts: 1,360
Thanks: 50
Fixes: 6
Registered: ‎22-06-2012

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Quote
I'm having no issues with my TG582 whatsoever

Neither have I jelv, I binned mine 2 years ago!!  Roll_eyes
ejs
Aspiring Hero
Posts: 5,442
Thanks: 631
Fixes: 25
Registered: ‎10-06-2010

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

I suppose you don't think it's a problem then that the technicolor firmware sends each DNS query to both DNS servers, needlessly increasing the load on the DNS servers. I hope there aren't any other aspects of the technicolor firmware that waste the time of external systems.
Townman
Superuser
Superuser
Posts: 23,055
Thanks: 9,642
Fixes: 160
Registered: ‎22-08-2007

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

@ejs,
That's really useful input.
@PlusNet,
Are your products people aware of this characteristic?  Can you please raise this with TG with a view to obtaining rectification please?  Reducing the load on the DNS servers has to be beneficial to all.

Superusers are not staff, but they do have a direct line of communication into the business in order to raise issues, concerns and feedback from the community.

npr
Pro
Posts: 1,898
Thanks: 119
Fixes: 9
Registered: ‎21-01-2013

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Quote from: ejs
I suppose you don't think it's a problem then that the technicolor firmware sends each DNS query to both DNS servers,

I rather suspect that happens because plusnet assigns the same metric to the two servers.
ejs
Aspiring Hero
Posts: 5,442
Thanks: 631
Fixes: 25
Registered: ‎10-06-2010

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Is it actually possible for different metrics to be assigned as part of the PPP setup process? As far as I can see, there is no facility for any metric value to be specified as part of the data exchanged while setting up the PPP connection. The metric value assigned to both can be configured as part of the router settings. The problem is that the router assigns the same metric value (from its own configuration) to the primary and secondary DNS server.
KevinG
Rising Star
Posts: 998
Thanks: 7
Fixes: 1
Registered: ‎05-11-2008

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Quote from: Townman
@KevinG,
I expect that in trying to get the TG582n to reconnect you did a power off sequence as requested by PN?  What befuddles me is how after powering off the TG582n and the connection cannot be re-established is that thought to be caused by the TG, unless it is the other end throwing some extreme exception event which kicks the TG in the nether region?  If this is the case, it needs to be profiled by PlusNet and fed back to the product owners for rectification.  People simply calling the TG a pile of crud does not move anyone forwards.

I did everything except a factory reset, which I wanted to avoid as I would have had to redo customisations. I don't understand the problem at all, I managed to connect after having been down for about 6 hours following the outage, but the next morning it had gone again and nothing would make it reconnect, hence my decision to change routers and try again a day later. For reasons I won't go into I have to use the TG582n at the moment so I am glad it is back up and working again.
kitz
Aspiring Pro
Posts: 833
Thanks: 55
Registered: ‎08-06-2007

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Quote from: Bob
Quote from: Townman on 19/09/2014, 15:04
RADIUS servers - is that PlusNet's or BT Wholesale's?
In this latest instant I suspect the latter contributed most significantly.

Whilst the BTw RADIUS would be responsible for giving those customers the BTw IPs, ultimately it would only do so because:-
    (a) Plusnets RADIUS were unable to cope with the deluge trying to all at re-connect at once OR
    (b) There were insufficient available sessions available on the Plusnet gateways.
I'm getting rather a vague sense of deja vu when it comes to (b) Sad
Whilst you may or may not have sufficient bandwidth, I suspect that much of the problems the other night centers around subscriber session limits.
You were already steering sessions to certain gateways...  then something happened just before 6pm and you seemed to drop a pipe.
At this point you would have had circa 30k users competing to reconnect, because you were session steering to certain endpoints, these would quickly have filled, leaving some with nowhere to go.
According to your graphs you have a max of 750k available sessions, yet on average 741k users on-line.  That leaves only 9k spare sessions, which isnt enough to cope if one gateway goes down Sad
I'd been sat on a nice quiet shiney new endpoint for the past week (central12.ptw-bng01) and I wasn't affected in the first wave.  However  I got knocked offline bang on 6:30pm..  implying that ptw-bng01 had very suddenly become overloaded and one of the following occurred:-
(a) The gateway couldn't cope when it reached the max amount of sessions (ie the Juniper)
(b) It was policed
(c) Someone at Plusnet flicked the switch in a manual attempt to re-balance the network.

Ive seen lots of mentions about the first wave of outages, but nothing about what happened later at 6:30.  Just what was going on to cause the 2nd .bng gateway to go down too?
Bearing in mind that the bngs appear to have larger hostlinks bandwidth wise and therefore will have more available user sessions... Is it remotely possible that the bng gateways have a hissy when they reach their configured max sessions? 
The reason I ask is I remember the JunOS problems years ago with max session during the periods of upgrading from 155Mb to 622Mb centrals and insufficient user sessions.  I dont even know if you still use ERX's but I tried to google their session limits these days, interestingly I came across something that seemed to imply that if there were more sessions on a JunOS gateway than directed via the RADIUS server values then the whole lot would drop. 
kitz
Aspiring Pro
Posts: 833
Thanks: 55
Registered: ‎08-06-2007

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Quote
timing out the 172.*.*.* connections sounds like a "smart idea"

That is unlikely to happen there are valid reasons for the 172 IP addresses.. its the fail-over for if the ISPs RADIUS fail, otherwise thousands of routers would endlessly repeat trying to connect.
There are cases when the 172 IPs are needed... ie you need authenticate on the BTw RADIUS to perform additional diagnostic tests
Perhaps a case could be made to reduce the attempts - Im not sure if that is under the control of BT or the ISP now... but I do seem to recall it was BTw - Gawd the joys of coming back to BTw.. I forgot a pile of stuff I learnt years ago and Im now digging deep into my memory banks again. Crazy
Quote
If would be interesting to know if anyone using a non TG router suffered the same reconnection difficulties?

Yep my TPlink got its knickers in a twist and wasnt able to reconnect.  I put my Zyxel back on in the end as it has better monitoring.  I strongly doubt that this would be a problem isolated to the TGs
TBH when on LLU I never had a problem with the TG582n..  its rock steady DSL wise and would sit at 3dB for months and months on end no problems with it...  and I do mean months - like 9 months plus. Yeah ok the wifi isnt the best.. but its far from the worst.  I used a (PN) TG582n with my FTTC modem for many months without issue too.   
The thing that most people seem to forget when bitching about its wifi range is that it has a pretty decent DSL chipset which is nice and stable.  If I were an ISP Id be wanting to supply a router with a known decent DSLchipset and adsl_phy...  you know the important bit of keeping your ADSL connection stable.  I keep saying if it really were such a PoS do you honestly think the likes of Zen and AAISP would also be supplying them.  They are now a bit long in the tooth and it is perhaps time for Plusnet to start looking at a new unit purely for the wifi side..  but this is going to push costs up..  but I would really hate it if Plusnet chose a router based on wifi capabilities over and above DSL chipset.  That would really be fun  Lips_are_sealed
kitz
Aspiring Pro
Posts: 833
Thanks: 55
Registered: ‎08-06-2007

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Sorry to add to this
Quote
According to your graphs you have a max of 750k available sessions, yet on average 741k users on-line.  That leaves only 9k spare sessions, which isnt enough to cope if one gateway goes down

...  but some one has just brought it to my attention elsewhere, that they thought the available sessions on the bngs may have been reduced.  When I went to look..  yep theyve been reduced today.  Im pretty certain that at least 2 of the gateways yesterday supposedly supported circa 66k concurrent sessions.
Why on earth would Plusnet reduce the session limits?  This doesnt make sense to me unless it really was as I theorised above, that the gateways may fall over and drop all users when they reach a maximum number of concurrent sessions.    Huh

Red Herring - ignore this post!
Oldjim
Resting Legend
Posts: 38,460
Thanks: 787
Fixes: 63
Registered: ‎15-06-2007

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

I hadn't read the data that way
I believe that the maximum number of sessions is the maximum seen on that gateway not the limit.
Also when the bng gateways were recovering the maximum number of sessions was way less than they are now so my supposition is that it is the maximum seen over a limited time frame
kitz
Aspiring Pro
Posts: 833
Thanks: 55
Registered: ‎08-06-2007

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Ah, thanks OldJim..  I guess it is a matter of interpretation and if thats the case - which it probably is reading your explanation -  then I apologise for jumping to conclusion that they havent got enough spare sessions.
.....but something still seems up with gateways falling over the other night when they reached a certain figure, or else why would bng2 have also suddenly shed its load at 6:30..  which goes back to point (a) (b) or (c)
... and also if they did have sufficient spare sessions why would so many users be unable to reconnect.  The PN RADIUS should be able to cope with a dropped pipe, but obviously couldnt... were they steering too aggressive towards the BNGs Sad
Oldjim
Resting Legend
Posts: 38,460
Thanks: 787
Fixes: 63
Registered: ‎15-06-2007

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

I take your point about a single dropped pipe but it looked to me as though they lost all the bng's  as non of them are back up to the 66,000 they had before and that is a very high number of reconnections hitting the radius servers
Obviously if I am correct there is a major problem with cascade failures which, I assume, they are investigating
ejs
Aspiring Hero
Posts: 5,442
Thanks: 631
Fixes: 25
Registered: ‎10-06-2010

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

I don't think it's clear if the intended number of users on each bng gateway has been reduced, or it's just that lots of people connected to the ag gateways during the outage and still remain on them now. I see my line dropped (probably when the phone was answered) and I've ended up on pcl-bng02.
ejs
Aspiring Hero
Posts: 5,442
Thanks: 631
Fixes: 25
Registered: ‎10-06-2010

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Also, a single gateway dropping all its users did occur about two days before this outage, but didn't cause so much of a problem - in that instance, all of the other gateways kept going.
ericgripp
Grafter
Posts: 182
Registered: ‎26-04-2013

Re: PPP dropped unable to authenticate - 18.00 on 17th Sept 2014

Quote from: chrcoluk
speedtest on bng2 central10

since the outage every night is congestion.
just hopped back onto a ipv6 gw
central10 ag04


is central 10 on ag04 same pipe as central10 bng02?

sorry to use caps here but after being fobbed off so many times
PLUSNET WHEN WILL YOU FIX THE BNG GATEWAYS TO GIVE END USERS THE CORRECT LINE SPEED ??????????
After this incident I spent ages getting back onto a non bng to recover my 5 meg that I lose when connected to a bng.
I can consistently reproduce the issue that when im on a bng I lose 5 meg throughput. On a non bng I get my full speed.
I have tried talking to plusnet support about this but they deny and fob you off that the BNG gateways don't have an issue. Once spoke to someone in support who was borderline rude about this and, if the situation with the new BNG doesn't improve im off at the end of my contract.
Personally I think they've acquired the bng gateway kit from BT and, well, we all know how rubbish BT broadband/fibre is.