Update: the post I quote below seems to have been removed as of 29/10 but it’s still available in Google’s cache
Unless you’ve had your head stuck in the sand with your fingers in your ears for the past month, you will have heard about Radiohead making their latest album available on a pay-as-much-as-you-like basis. It wasn’t until reading this critique of the site’s usability that I decided to check it out. More…
The Community Site has existed for over six months now in which time it has come from nothing to being a thriving hub of activity. In this post I present to you our plans for the next six months. We have so many ideas for improving the site – from live chat to OpenID to social networking – and in the roadmap below we have tried to balance the perceived benefit to the community with our time & resource constraints.
More…
For those who are aware, PC Pro are currently running a series of awards which are due to be announced in December. We are very happy to announce that both PlusNet and Force9 have been nominated in the Broadband ISP category and PlusNet has also been nominated for a web host award.
We’re thrilled to announce that Computer Shopper has awarded us their “Best Broadband Provider” award, which is testament to the improvements that we have been making, especially with our continued recruitment drive within the Customer Support Centre.
Think you could do better than Nigella? A new startup has just launched allowing anyone to put together the cookbook of their dreams. Sounds like a neat idea.
More…
Mobile phone technology, power packs and content management combines to allow instant publication of news.. new armoury for the paps courtesy of Nokia and Reuters. More…
*Please note that the Critical Path trial has now ended, so the infromation below should be read in context*
If you’re reading this blog-post then there’s a good chance that you’ve already seen the Service Status announcement that’s been published about the work we’ll be doing on our email platform next week? For those that haven’t though, you can see a basic overview of the work here.
For those about to continue reading, be warned as this is a fairly lengthy post and not for the faint-of-heart! (although hopefully you’ll find the information it contains useful!)
Since the Webmail Incident we’ve been working hard to improve our spam detection capabilities. There’s been the new Manage My Mail API, the ability to turn off email to virtual domains, improved spam detection rates, and more intuitive handling of spam messages at the server level to name but a few.
Whilst these things have certainly helped, they all still tie up resources across our mail delivery platform. We routinely see problems with email delays and more often than not it’s due to issues that have stemmed from the sheer amount of (often junk) email our mail servers are having to process and deliver.
We’ve attempted ACL blocking, made a multitude of Exim configuration changes and altered/upgraded our spam/virus processing. We’ve been fighting with the mail platform for too long now and we’re only too aware of the negative impact the ensuing problems are having on our customers.
Spam isn’t going to stop. In fact far from it, it’s going to get worse. If the previous years are anything to go by then as we approach Christmas things are going to get particularly nasty. We’re already seeing a significant rise in the volumes of spam reported and we absolutely must take proactive steps to avoid the worst happening.
As has been mentioned in the Planned Maintenance announcement, we’re going to be re-deploying the Critical Path appliances in front of the customer mail platform next week. This will form part of a trial that is expected to last at least three weeks if successful. In addition to re-trialling Critical Path, we’re also continuing to look at alternative/additional solutions. Whilst Critical Path may well become a permanent thing, it does not mean we are bound to exclusively using Critical Path for spam protection and does not deter us from the work we’re doing elsewhere.
Now it’s no secret that we have twice before attempted to introduce the Critical Path anti-abuse appliances in front of the customer mail platform and on both occasions our efforts have resulted in negative repercussions for our customers.
The first time we ended up losing emails and the second time we were chastised for poor advance communication and the subsequent email delays that arose.
It’s very important to note that the problems we encountered back then were mainly caused by the interaction between Critical Path’s equipment and ours, failure to follow procedural guidelines and a poorly defined set of roll-back criteria.
We’ve been working very hard over the last month alongside Critical Path’s most senior technical staff and we’re now confident that we have fully addressed and overcompensated for the things that bit us last time. We’ve very much got the customer at the centre of all of this and we’ll be rolling any changes back at the first hint of any trouble.
So what exactly happened last time?
OK, it makes sense at this point to elaborate on what caused the problems last time. This will help you understand what we’ve done to safeguard against similar things happening again.
The main problems with the previous implementations can be summarised as follows:
Critical Path were on site during the last trial and they saw the pain that was born from the problems that were encountered. They left that day with a conviction to help us resolve what had gone wrong, and as has already been mentioned we’ve been working closely alongside their most senior platform architects ever since.
How are we going to make sure it doesn’t happen again?
We’ve been careful to ensure that all of the above points have been addressed as follows:
The above changes have been tested by both ourselves and Critical Path and both parties are confident that the issues have been resolved.
Last week we also performed a full stress-test on a single sunmxcore mail server in an isolated environment. During this test 750,000 emails were successfully processed during a three hour period. None of the aforementioned issues were encountered.
On average a single sunmxcore server in it’s present state will process approximately 1.2 million emails a day. If you consider what we achieved during the above test then you should have an idea as to why we’re so eager for this to work.
During testing, we also managed to max the CPU on the sunmxcore (there was still plenty of processing potential remaining on the Critical Path appliance). We managed 240 concurrent connections. We only managed 8 the last time we implemented these changes so this is a good indication that there are no longer issues feeding messages from the CP appliances to our platform.
The roll-out
The roll-out is currently scheduled for Tuesday next week (30th October) and will last for several days dependent on whether or not certain success criteria are met.
We will start by replacing one mx.core with a Critical Path device. All traffic from this device will be routed to the removed mx.core server which will then handle the final delivery.
After the first server goes live the platform will be closely monitored. Graphs showing the latency and queues on the Critical Path devices alongside the queues on the sunmxcores will be made available to customers via an isolated portal page that will be visible here following the roll-out.
If all success criteria are met and no problems are encountered then we will introduce a second server on Wednesday, a third server on Thursday and a fourth on Friday.
Once we have reached this point, a decision will be made regarding our deployment to the remaining servers the following week (there are 22 servers in total). No more servers will be added over the weekend and there will be a dedicated resource monitoring the platform throughout this time.
There will be a Critical Path employee on site throughout the trial, and we will also be in contact with a further two senior engineers based in Germany and Ireland.
Roll-back
A decision to roll-back will be arrived at should any of the following criteria be met:
The proposed maintenance work that will be carried out should any of these conditions be met is as follows:
* These steps are to allow for the collation of statistics for post roll-out analysis.
The values above are not arbitrary as it took just one hour for a single Critical Path appliance to accumulate a queue of 100,000 emails the last time we rolled it to the live platform. By taking such a cautious, staged approach we’re hoping to protect customers.
What will the Critical Path boxes do?
There a a number of things the Critical Path boxes will do once they are live in front of the mail delivery servers:
Risks?
There are two risks associated with this work that are worth mentioning. These are what we based our roll-back criteria on and are the reason we’ve allowed for tweaking of the connection limit in the load balancer as part of the test plan.
What next?
As previously mentioned, we’re still exploring the possibility of using other vendors/suppliers. We’ve been working with a number of other third parties and hope to announce details regarding future trials before long.
We’re all hoping for a successful roll-out next week and are confident we’ve done all we can to safeguard our customers from any potential upset. Ultimately we hope this work proves to be a large step towards overcoming the problems spam email causes us and stabilising the platform for our customers once more.
Any questions, feedback or concerns regarding this work are welcomed as always over on our Community Site discussion forums.
Regards,
Bob Pullen.
Disconnections, reconnections, projects, operational duties and cups of tea. Another day in the life of your favourite ISP. More…
End of another day and it’s absolutely flown! I just looked at my watch and hadn’t realised quite how quickly this afternoon had disappeared. More…
End of another day and it’s absolutely flown! I just looked at my watch and realised quite how quickly this afternoon had dissapeared. Winter is definitely starting to rear it’s ugly head and it’s almost dark outside, but here we are with our next set of daily updates.
Here at Plusnet we're always trying to use clever open source things to make our lives easier. Sometimes we write our own and make other people's lives easier too!
We sell broadband, phone, VoIP and more to homes and businesses in the UK. Winner of 9 out of 11 Categories in the 2008 USwitch survey. Winner of "Best Consumer ISP" at 2008 ISPA awards. Voted number 1 in the Broadband Choices 2008 survey.
© Plusnet plc All Rights Reserved. E&OE
Community Site News is powered by WordPress