cancel
Showing results for 
Search instead for 
Did you mean: 

Smarter Network Balancing

petejackson
Grafter
Posts: 691
Registered: ‎12-04-2007

Smarter Network Balancing

I’d like to share with you a new development on our broadband platform.  I’ll apologise right away for the geek speak, but the changes introduced have proved so effective I’m sure many of you’ll find it interesting. 
We’ve successfully automated some of the session management that keeps our network in balance.  It basically means that the service we provide to you is now very much more consistent and better able to cope with both scheduled maintenance and major service outages (MSOs) than it was just a couple of weeks ago.
Up until now keeping the broadband network in balance has been a manual process, with our Network Operations team tasked with keeping an eye on the number of sessions (each session being a single connected customer) on each of our broadband network ‘endpoints’. Simply put, the total amount of bandwidth we need to provide Internet connections to all our customers is split across 108 endpoints on our network; 92 of these are on WBC (Wholesale Broadband Connect) and 16 on IPSC (IP Stream Connect) at the time of writing – these being the wholesale products we buy from BT Wholesale.
We allocate a percentage of our overall bandwidth to each endpoint and then evenly distribute our customers across them so that any one doesn’t become either over or under-subscribed. Customers connected to an over-subscribed endpoint would be more likely to see slow-downs on certain traffic-managed protocols at peak periods, as demand forces the total volume of data transfer up to a threshold limit we set.  Whereas bandwidth assigned to an under-subscribed one is effectively going to waste.
Keeping the networks in balance isn’t so easy for a number of reasons. WBC is managed ‘in-house’ here at Plusnet, but IPSC is managed by BT (we’re in the process of moving this over so we can automate that too), so for now I’ll just talk about WBC.
Imagine you turn your router off and on again. When you reconnect what endpoint are you likely to connect to? You’d have an equal chance of connecting to any one (either WBC or IPSC depending on availability in your area) unless we ‘steer’ the session to a particular endpoint. We’d want to do that if it was under-subscribed for example. Almost by definition though, customers who turn their router on to do some ‘Internetting’ are more likely to turn it off again afterwards.  This makes rebalancing by hand quite difficult.
The two graphs below how manual ‘rebalancing’ can be quite ‘bumpy’.  Each coloured line represents an endpoint and ideally each of these would hold a similar number of customers; so the tighter together the lines, the better.  What tends to happen though is that after the balancing is done, the network drifts out of balance once more as the more active customers disconnect and reconnect to different endpoints.
This graph shows quite well how lots of attention to manual balancing can keep on top of things but how quickly it can slip out of balance, as it does from Tuesday midnight:

Manual rebalancing after maintenance is even trickier as the graph below shows. The endpoints labelled at 1 & 2 drop all of their sessions in the early hours of the morning (the vertical lines dropping to zero).  You can see an immediate corresponding lift to the other endpoints as everyone’s routers reconnect.  Bringing the dropped endpoints back online requires new connections to be ‘steered’ to them.  The problem with that is you’ll likely end up with a disproportionate number of customers who tend to disconnect/ reconnect on them, meaning it’s going to go out of balance quite quickly afterwards.  You can actually see that pattern in the graph from Tuesday afternoon to midnight (circled) as some sessions drop away again.

What our Networks Operations guys would try to do was to ‘lift’ the sessions from dropped endpoints above the rest to allow them to drop back into balance as customers disconnected. No wonder we had such bumpy graphs!
So, how are things now we’ve automated the balancing? Well, the graphs speak for themselves. All the bumpiness has been removed and balance is consistently tight. So far though we’ve only automated WBC. As I mentioned earlier, IPSC is managed by BT but we’re looking at bring that in-house and automating that too.
This graph below is great. Circled is an endpoint of pcl-ag08 which the previous evening had suffered a partial ‘drop’ (about 3,000 sessions had been disconnected). You can see though how smoothly it’s been brought back into balance with the rest.


What next? I’ve mention IPSC; after this is brought under our direct management we can automate that too.  Further down the line we want to balance our endpoints not on the number of sessions but by the amount of data being transferred. Right now, even with perfect balance, it’s entirely possible that a particular endpoint could have a higher percentage of customers transferring a lot of data and placing a lot of demand on it. Our traffic management will protect customer experience as much as possible in such circumstances, but on the very heaviest of days it would be much better to spread that localised demand across the whole network.
I hope you’ve found this interesting. We’ve got a chap called Richard to thank in our Network Operations team for creating the automation script. It worked first time, which I think is highly commendable. But as ever there’s no time to rest on our laurels and the work on IPSC is already in progress.
12 REPLIES 12
spraxyt
Resting Legend
Posts: 10,063
Thanks: 674
Fixes: 75
Registered: ‎06-04-2007

Re: Smarter Network Balancing

Thanks Pete, an interesting read.
Well done to the team that designed and developed this system and brought it into operation. Smiley
David
David
Anonymous
Not applicable

Re: Smarter Network Balancing

Will this new balancing make it more difficult to intentionally gateway hop, if the script 'steers' new sessions towards under-subscribed endpoints ?, especially if you wanted to hop AWAY from an under-subscribed endpoint.
Chris
Legend
Posts: 17,724
Thanks: 600
Fixes: 169
Registered: ‎05-04-2007

Re: Smarter Network Balancing

I'm not sure why you'd want to move off an under-subscribed end-point though?
It's similar to what the networks guys used to do manually, just a lot easier and more efficient.
Former Plusnet Staff member. Posts after 31st Jan 2020 are not on behalf of Plusnet.
jimbof
Grafter
Posts: 348
Thanks: 2
Registered: ‎02-05-2013

Re: Smarter Network Balancing

Out of interest; why would you want to move away from an undersubscribed endpoint?  I'd imagine that with the amount of connect / disconnects that must happen it would probably be undersubscribed for a very short time?
Anonymous
Not applicable

Re: Smarter Network Balancing

Quote from: Chris
I'm not sure why you'd want to move off an under-subscribed end-point though?

For example, a few weeks ago there was some technical problem which appeared to happen only on "pcl" gateways and not on "ptw" gateways, and therefore the advice was to hop gateways until you hit a "ptw".
So what happens now if there is one under-subscribed "pcl" gateway ?,  rather than hopping randomly until getting to a desired gateway are we going to get stuck on the most under-subscribed gateway until the network comes into balance, at which point the randomness will return due to the natural churn of other sessions connecting and disconnecting ?
Does the new script have any means of landing new sessions on a different gateway, rather than re-landing on the gateway of the previous session ?

Another example, in the thread New traffic management hardware,  you asked us to hop to "PCL-AG01" to see if we could spot any problems with the new configuration.  If "PCL-AG01" was not the most under-subscribed gateway, then would we be able to hop to it as requested ?
Kelly
Hero
Posts: 5,497
Thanks: 380
Fixes: 9
Registered: ‎04-04-2007

Re: Smarter Network Balancing

It hasn't changed the mechanism.  If we had an undersubscribed endpoint/gateway, we'd already be steering people to it.  This just does it without people having to fiddle all the time to keep them in balance.
Kelly Dorset
Ex-Broadband Service Manager
njay
Grafter
Posts: 185
Registered: ‎05-04-2013

Re: Smarter Network Balancing

I took the original post to say.
We used to manually work out which endpoints were under utilised and steer people towards them. we have automated that process and also upped the frequency of checking.
So even before when people gateway hopped you were still being steered towards certain gateways base on what had been manually decided.
Kelly
Hero
Posts: 5,497
Thanks: 380
Fixes: 9
Registered: ‎04-04-2007

Re: Smarter Network Balancing

Yep!
Kelly Dorset
Ex-Broadband Service Manager
jimbof
Grafter
Posts: 348
Thanks: 2
Registered: ‎02-05-2013

Re: Smarter Network Balancing

Must admit to being slightly amazed this had been being done in such a manual fashion for so long - I mean, it has all the characteristics of a task which lends itself well to automation. 
What will you peeps do with yourselves at the NOC now?!
Strat
Community Veteran
Posts: 31,320
Thanks: 1,609
Fixes: 565
Registered: ‎14-04-2007

Re: Smarter Network Balancing

See attached  Wink
Windows 10 Firefox 109.0 (64-bit)
To argue with someone who has renounced the use of reason is like administering medicine to the dead - Thomas Paine
Kelly
Hero
Posts: 5,497
Thanks: 380
Fixes: 9
Registered: ‎04-04-2007

Re: Smarter Network Balancing

Quote from: jimbof
Must admit to being slightly amazed this had been being done in such a manual fashion for so long - I mean, it has all the characteristics of a task which lends itself well to automation. 
What will you peeps do with yourselves at the NOC now?!

It wasn't manually intensive.  It was about adjusting a bunch of weightings basically.  It's just far more efficient automated and means we don't need someone looking at it over night or at weekends.
Kelly Dorset
Ex-Broadband Service Manager
npr
Pro
Posts: 1,898
Thanks: 119
Fixes: 9
Registered: ‎21-01-2013

Re: Smarter Network Balancing

Quote from: Kelly
and means we don't need someone looking at it over night or at weekends.

That's when it will go wrong -- Murphy's law.  Wink