Managing our network under abnormal Load
Managing our network under abnormal Load
Like other ISPs we deliver our broadband service over a network which is shared among the customers we have. That means we can see variations in demand from day-to-day and month-to-month. One-off events can drive demand. It's important that the network can anticipate and cope with these events and this page describes our approach.
Our aimsWe've developed an automated network management system backed up with high quality infrastructure which will continue to evolve as the market evolves. We believe we have developed the most sophisticated system in the UK that offers the best experience for customers. Our aims are:
- To make sure that time-critical applications like VoIP and gaming are always prioritised
- To protect interactive applications like web-browsing and VPN from non-time sensitive download traffic
- To flex the network under demand to cope with normal peaks and troughs from day-to-day and month-to-month
- To flex the network more gracefully than other ISPs in the event of unusual demands in traffic
- To cope with disaster situations more gracefully than other ISPs and be able to continue to offer basic functionality in the event of a major network failure
Three network configurationsWe have three settings for our network management solution according to the conditions the network is under - normal operation, high demand operation and disaster situation. The network will change its behaviour and will therefore alter the customer experience according to the different demand situations. These are outlined below:
Plan A (Normal operation)This is how we expect to run the service for the majority of time the time. Most customers will see very little or no slowdown on interactive applications at peak time. There may be speed reductions on non-interactive traffic at busy times.
Some of our products have rate limits built-in at certain times of the day on non-interactive traffic, to ensure a fair spread of the available bandwidth. The rate limits and applications assigned to each queue are as specified on our speeds page. The rate limits can be adjusted based on the network load on any given day and any given hour.
We have a standard design for each day of the week that bases the rate limits around the traffic we expect to see on a "normal" day. For example, a Saturday night is generally the quietest evening of the week so we would normally expect to be able to increase some of the rate limits on Peer-to-Peer and USENET after about 7pm. On the other hand, a Monday evening is generally the busiest night of the week, so the rate limits will be at their lowest. The rest of the week will be somewhere inbetween, generally getting gradually quieter as the week goes on.
Sometimes we will see slightly higher than normal traffic for short periods, some of these we are able to plan for and some which may be slightly unexpected. As an example Microsoft release updates on the second Tuesday of every month, we can then expect to see a higher than normal amount of HTTP downloads on a Tuesday evening, Wednesday morning and Wednesday evening as people download the files. The larger the files also (for example a large service pack) would also cause this traffic to be higher and last longer. As patch Tuesday is regular every month we can build this in to our traffic management plans.
Plan B (High demand operation)This configuration would be encountered in situations where unusual events drive demand. Examples of this situation could include major concerts, streaming coverage of live sporting events or even severe weather that increases the number of customers online and downloading, or using VPN to work from home.
In these scenarios the overall objective would be to protect the experience of time-critical and interactive applications. To achieve this less bandwidth would be made available to other applications. Each category will vary from those specified on our speeds page, as non-real-time applications are de-prioritised. To maintain a usable experience for interactive applications there may be tactical rate-limits applied to certain traffic types. Restrictions for higher usage customers on management levels might also be tightened as a short-term measure.
For example, if a major snow storm causes significant increases in people working from home we may apply rate limits on Peer-to-Peer and USENET traffic between 9am and Midday, in order to protect the experience of the VPN traffic and browsing.
In the event of provisioned capacity being delayed then at the busiest times of week we may need to set lower rate limits on non-interactive traffic.
The Plan B configuration is flexible to match the exact situation, the severe weather setting would need to be different to an evening concert setting but both may fall under the same category. The severe weather configuration perhaps only being used for one or two days between 9am and 6pm while the concert configuration perhaps only used on the day of the concert between 4pm and Midnight.
We would expect that the Plan B situation would in effect for no more than 5% of the year with the actual lower rate limits being in effect only when they are required to protect the performance of interactive traffic.
Plan C (Disaster situation)There is a third configuration which we would move to in the event of a large scale network failure. It may also need to be used in the event of a major news story. In these instances we might have to block all advanced protocols on all accounts and rate limit all accounts for all other traffic to provide at least a bare minimum real-time service for as many people as possible. For example, we may block all Peer-to-Peer, USENET and FTP traffic and rate limit all other traffic to 512kb/s. All being well we'd expect to be in this situation less than 0.1% of the year.
The Plan C situation is unlikely to be something we can predict and as such the exact details of what we would need to do are also going to be difficult to put an exact prediction on. A loss of central capacity may only last for a couple of hours before being restored when we can switch to Plan B while the network comes back into balance then back to Plan A.
We have added a "Broadband Network Capacity" category to Service Status. This gives our customers a clear view on what level of service to expect. A green light indicates Plan A, amber Plan B, and red Plan C. Customers can also see a history view of when we have changed between each configuration and also what events have caused the changes through the service status pages. Our intention is, when updating this status category, to include not only the reason for the change of configuration and also where possible the expected performance changes that have been made.
In the event of having to switch to a Plan B or Plan C situation we have defined the following hierarchy of products and protocol combinations, to which additional rate limits or deprioritisation is required:
- In a Plan B situation: a product/protocol combination of lowest will see increases in rate limits before low, low will see rate limits before medium. In Plan B we will not add rate limits to product/protocol combinations to those listed as high or highest.
- In a Plan C situation: in addition to the Plan B rate limits we may also rate limit those product/protocol combinations ranked high and highest.
|Product set||Peer-to- Peer||Usenet||FTP||Unidentified||HTTP download servers||HTTP download sites||Progressive download streaming sites||VPN||Web||Streaming|
|Residential products (Standard and Fibre optic)||Lowest / Low||Lowest / Low||Lowest / Low||Low||Low||Low||Medium||Medium||Medium / High||Medium / High|
|Business products (Standard and Fibre optic)||Lowest||Lowest||High||Low||Medium||Medium||Low||High||High||Medium|