cancel
Showing results for 
Search instead for 
Did you mean: 

Automating gateway hoping

KevinG
Rising Star
Posts: 998
Thanks: 7
Fixes: 1
Registered: ‎05-11-2008

Re: Automating gateway hoping

I get the same and my BQM shows it - I suffer from disconnections every few days and if it reconnects to a different gateway, the Minimum Latency normally changes by a small amount, in a range between about 11 and 14 ms. I haven't kept a record of the gateways involved. I'm currently on ptw-ag03, which is a "good" one.
jimbof
Grafter
Posts: 348
Thanks: 2
Registered: ‎02-05-2013

Re: Automating gateway hoping

Modified my script to log the BRAS node and store the traceroutes.  It is always the first hop (from my router to the gateway) which is slow.  I'm always connecting to the same BRAS node (as far as the log is concerned).
I have connected to a "slow" one purposefully now and will leave it for a while.  It is also my experience that they never recover by themselves.
My hunch for now is that the issue is down to how BT route the connection to Plusnet, and this changes every time you create a new PPPoE session, but lasts for the duration of the session.  Hence why the same gateway can appear good or bad - it depends on the route taken to get to the gateway.  I have no idea how to prove this theory though. 
This might also explain why the "pcl" gateways appear more prone to it for me - perhaps BT are more likely to route you to those via an unhelpful route.  Perhaps related to where in the country I am, too, so maybe users in the rest of the UK see different issues with different gateways.
Question for plusnet / anyone else who knows - are the gateway IP's actually a single machine?
Kelly
Hero
Posts: 5,497
Thanks: 380
Fixes: 9
Registered: ‎04-04-2007

Re: Automating gateway hoping

Each gateway has multiple end points terminating at it.  Each will be from one of the 3 BT node we have connectivity to, Colindale, Stepney and Faraday.  We have between 1 and 3 end points from each node per gateway.
Kelly Dorset
Ex-Broadband Service Manager
jimbof
Grafter
Posts: 348
Thanks: 2
Registered: ‎02-05-2013

Re: Automating gateway hoping

I figure it must be something like that, that there are routes which are slower / further / more congested.  The only other thought was that somehow something like interleaving was being turned on or off each time I connected up a PPPoE session, but it doesn't seem likely.  I don't have any intention of messing around with the modem at the moment, so I can't see line stats for VDSL.
So is there any way to figure out the details of exactly which endpoint / node I'm coming in from?
I've already hacked up a rough and ready script being run from /etc/hotplug.d/ which on the WAN coming up ping tests it and if it is "slow" drops the connection and grabs another one, but I need to improve it to make it more robust (so it only tries a few times before giving up, etc).
jimbof
Grafter
Posts: 348
Thanks: 2
Registered: ‎02-05-2013

Re: Automating gateway hoping

For anyone interested, here is a script which works for me on OpenWRT Attitude Adjustment.  You need to create a file called /etc/hotplug.d/iface/40-checklatency and put the following shell script code into it.  Fiddle with the variables at the top to your content.  I have it configured here to ping out thinkbroadband's server (HOST), to fail a connection if it comes in at worse than 10ms ping (MAXPING) (the 9 includes up to 9.999ms), to calculate the ping as an average over 3 attempts (PINGS).  At a maximum to take the interface up / down 10 times in search of a good ping (MAXCYCLES).  Intially to loop for 60 attempts with a sleep of 1second between checks waiting for the interface to actually be passing traffic (MAXLOOPS / TIMETOSLEEP).
MAXPING=9 might be very agressive for a lot of connections, but seems to work for me.
Hope it is useful to someone, feel free to improve it! Smiley
NAME=checklatency
HOST=80.249.99.164
MAXPING=9
PINGS=3
MAXLOOPS=60
TIMETOSLEEP=1
TESTFILE=/tmp/checklatencyfails
MAXCYCLES=10
SUCCESS=0
[ "$ACTION" = "ifup" -a "$INTERFACE" = "wan" ] && {
 date >> $TESTFILE
 CYCLE=`wc -l $TESTFILE | cut -d ' ' -f 1`
 echo PING interface cycle attempt $CYCLE | logger -t $NAME
 if [ $CYCLE -gt $MAXCYCLES ] ; then
   echo PING interface max cycles $MAXCYCLES - exiting | logger -t $NAME
   rm $TESTFILE
   exit 1
 fi
 for TRY in `seq $MAXLOOPS`; do
   if ping -c 1 -W 1 $HOST 2>&1 > /dev/null; then
     SUCCESS=1
     break
   fi
   sleep $TIMETOSLEEP
 done
 if [ $SUCCESS -eq 1 ] ; then
   echo PING OK after $TRY attempts | logger -t $NAME
 else
   echo PING never returned despite $TRY attempts | logger -t $NAME
   exit 1
 fi
 PING=`ping -c $PINGS -W 1 $HOST | tail -n 1 | cut -d '/' -f 4 | cut -d '.' -f 1`
 if [ $PING -gt $MAXPING ] ; then
   echo PING to $HOST avg $PING ms, greater than $MAXPING ms, bad connection, dropping | logger -t $NAME
   ifdown wan
   ifup wan
 else
   echo PING to $HOST avg $PING ms, good connection | logger -t $NAME
   rm $TESTFILE
 fi
}
jimbof
Grafter
Posts: 348
Thanks: 2
Registered: ‎02-05-2013

Re: Automating gateway hoping

The automated gateway hopper is working well for me, so a result! Smiley
Kelly, you've piqued my interest now with this discussion of the BT nodes; I was intrigued to see that Faraday and Colindale handle almost twice as much traffic as Stepney... how come?
I also see that some of the connections obtained are a bit "hairier" than others from a latency point of view - I wonder if this is down to gateway or BT nodes?  Like the one I connected to this morning here:
Kelly
Hero
Posts: 5,497
Thanks: 380
Fixes: 9
Registered: ‎04-04-2007

Re: Automating gateway hoping

Stepney is a newer node than the others, so has had less orders into it to build out its connectivity.  I think we've got some more coming from there.
It's actually very difficult atm for us to tell which node you are connecting through.  When someone  says "I was connected to PCL-AG01 last night and it was rubbish", that could mean tey were on 6-8 different end points!  The only way we can tell is by connecting to the gateway itself and looking at the current connection.
If they've disconnected, we can't tell.  This is why it's helpful if you have found a poorly performing gateway to stay connected while we check.  But there is obviously a balance to strike there between your experience and the need for our debug.
Re your graph, is your usage of the connection any different?  As you stress the router and the connection you'll see it reflected on the graph.  I'm absolutely willing to accept there may be differents between node performances though, so willing to look further.
Kelly Dorset
Ex-Broadband Service Manager
jimbof
Grafter
Posts: 348
Thanks: 2
Registered: ‎02-05-2013

Re: Automating gateway hoping

Usage wasn't much different.  I will give you a shout next time I notice it, but wont have time to play for a week or so.  In fairness, these differences in performance for me are observed not experienced - although measureble I don't game.  I just find these lined of differences interesting from the point of understanding them.