Automating gateway hoping

KevinG · ‎05-11-2008

I get the same and my BQM shows it - I suffer from disconnections every few days and if it reconnects to a different gateway, the Minimum Latency normally changes by a small amount, in a range between about 11 and 14 ms. I haven't kept a record of the gateways involved. I'm currently on ptw-ag03, which is a "good" one.

jimbof · ‎02-05-2013

Modified my script to log the BRAS node and store the traceroutes. It is always the first hop (from my router to the gateway) which is slow. I'm always connecting to the same BRAS node (as far as the log is concerned).
I have connected to a "slow" one purposefully now and will leave it for a while. It is also my experience that they never recover by themselves.
My hunch for now is that the issue is down to how BT route the connection to Plusnet, and this changes every time you create a new PPPoE session, but lasts for the duration of the session. Hence why the same gateway can appear good or bad - it depends on the route taken to get to the gateway. I have no idea how to prove this theory though.
This might also explain why the "pcl" gateways appear more prone to it for me - perhaps BT are more likely to route you to those via an unhelpful route. Perhaps related to where in the country I am, too, so maybe users in the rest of the UK see different issues with different gateways.
Question for plusnet / anyone else who knows - are the gateway IP's actually a single machine?

Kelly · ‎04-04-2007

Each gateway has multiple end points terminating at it. Each will be from one of the 3 BT node we have connectivity to, Colindale, Stepney and Faraday. We have between 1 and 3 end points from each node per gateway.

Kelly Dorset
Ex-Broadband Service Manager

jimbof · ‎02-05-2013

I figure it must be something like that, that there are routes which are slower / further / more congested. The only other thought was that somehow something like interleaving was being turned on or off each time I connected up a PPPoE session, but it doesn't seem likely. I don't have any intention of messing around with the modem at the moment, so I can't see line stats for VDSL.
So is there any way to figure out the details of exactly which endpoint / node I'm coming in from?
I've already hacked up a rough and ready script being run from /etc/hotplug.d/ which on the WAN coming up ping tests it and if it is "slow" drops the connection and grabs another one, but I need to improve it to make it more robust (so it only tries a few times before giving up, etc).

jimbof · ‎02-05-2013

For anyone interested, here is a script which works for me on OpenWRT Attitude Adjustment. You need to create a file called /etc/hotplug.d/iface/40-checklatency and put the following shell script code into it. Fiddle with the variables at the top to your content. I have it configured here to ping out thinkbroadband's server (HOST), to fail a connection if it comes in at worse than 10ms ping (MAXPING) (the 9 includes up to 9.999ms), to calculate the ping as an average over 3 attempts (PINGS). At a maximum to take the interface up / down 10 times in search of a good ping (MAXCYCLES). Intially to loop for 60 attempts with a sleep of 1second between checks waiting for the interface to actually be passing traffic (MAXLOOPS / TIMETOSLEEP).
MAXPING=9 might be very agressive for a lot of connections, but seems to work for me.
Hope it is useful to someone, feel free to improve it!

NAME=checklatency
HOST=80.249.99.164
MAXPING=9
PINGS=3
MAXLOOPS=60
TIMETOSLEEP=1
TESTFILE=/tmp/checklatencyfails
MAXCYCLES=10
SUCCESS=0
[ "$ACTION" = "ifup" -a "$INTERFACE" = "wan" ] && {
  date >> $TESTFILE
  CYCLE=`wc -l $TESTFILE | cut -d ' ' -f 1`
  echo PING interface cycle attempt $CYCLE | logger -t $NAME
  if [ $CYCLE -gt $MAXCYCLES ] ; then
    echo PING interface max cycles $MAXCYCLES - exiting | logger -t $NAME
    rm $TESTFILE
    exit 1
  fi
  for TRY in `seq $MAXLOOPS`; do
    if ping -c 1 -W 1 $HOST 2>&1 > /dev/null; then
      SUCCESS=1
      break
    fi
    sleep $TIMETOSLEEP
  done
  if [ $SUCCESS -eq 1 ] ; then
    echo PING OK after $TRY attempts | logger -t $NAME
  else
    echo PING never returned despite $TRY attempts | logger -t $NAME
    exit 1
  fi
  PING=`ping -c $PINGS -W 1 $HOST | tail -n 1 | cut -d '/' -f 4 | cut -d '.' -f 1`
  if [ $PING -gt $MAXPING ] ; then
    echo PING to $HOST avg $PING ms, greater than $MAXPING ms, bad connection, dropping | logger -t $NAME
    ifdown wan
    ifup wan
  else
    echo PING to $HOST avg $PING ms, good connection | logger -t $NAME
    rm $TESTFILE
  fi
}

jimbof · ‎02-05-2013

The automated gateway hopper is working well for me, so a result!

Kelly, you've piqued my interest now with this discussion of the BT nodes; I was intrigued to see that Faraday and Colindale handle almost twice as much traffic as Stepney... how come?
I also see that some of the connections obtained are a bit "hairier" than others from a latency point of view - I wonder if this is down to gateway or BT nodes? Like the one I connected to this morning here:

Kelly · ‎04-04-2007

Stepney is a newer node than the others, so has had less orders into it to build out its connectivity. I think we've got some more coming from there.
It's actually very difficult atm for us to tell which node you are connecting through. When someone says "I was connected to PCL-AG01 last night and it was rubbish", that could mean tey were on 6-8 different end points! The only way we can tell is by connecting to the gateway itself and looking at the current connection.
If they've disconnected, we can't tell. This is why it's helpful if you have found a poorly performing gateway to stay connected while we check. But there is obviously a balance to strike there between your experience and the need for our debug.
Re your graph, is your usage of the connection any different? As you stress the router and the connection you'll see it reflected on the graph. I'm absolutely willing to accept there may be differents between node performances though, so willing to look further.

Kelly Dorset
Ex-Broadband Service Manager

jimbof · ‎02-05-2013

Usage wasn't much different. I will give you a shout next time I notice it, but wont have time to play for a week or so. In fairness, these differences in performance for me are observed not experienced - although measureble I don't game. I just find these lined of differences interesting from the point of understanding them.

Automating gateway hoping

Re: Automating gateway hoping

Re: Automating gateway hoping

Re: Automating gateway hoping

Re: Automating gateway hoping

Re: Automating gateway hoping

Re: Automating gateway hoping

Re: Automating gateway hoping

Re: Automating gateway hoping