In light of recent email problems, particularly delays, I thought I'd provide a quick update on the current state of play.
Good news is that we're no longer suffering from email delays and all customers should be able to send and receive email in a timely fashion :)
A number of things have contributed to the delays over the last week or so, and it's been pretty tricky trying to keep abreast of them all. Here's a quick summary:
Critical Path trial - On the 22nd August we encountered an unfortunate problem that led to extended email delays and some customers' email being incorrectly rejected as spam. The last Service Status post can be seen here and there's a detailed incident report regarding the problem that you can find here.
Outbound email delays - Late last week we started seeing reports in the forums of customers whose email was being delayed on our outgoing relay servers. This was narrowed down to file system errors that we found in our mail logs. Moving the database from disk storage to an separate RAM drive soon saw this problem resolved. This was last reported on Service Status here.
Inbound email delays - We encountered two separate issues this week that had the potential to delay some messages for customers. One was an unforeseen result of the work to debug the problems we experienced with the Critical path boxes. These issues were last reported on Service Status here and have since been resolved. We also identified a problem that we suspect to have always existed with one of the spam filtering processes on the delivery servers. This was fixed this morning following the introduction of a new housekeeping script as announced here.
Housekeeping - We will always encounter problems that we have to reactively respond to. That's only half of it though. It's important that we're running regular reporting to pro-actively identify those customers that have the potential to start negatively impacting the service for others. Over recent weeks we've been running daily reports showing the top users of our relay servers by IP address. This is normally populated with customers who have a virus or misconfigured mail server and most if not all appreciate us getting in touch to let them know. Todays top offender had sent in excess of 53,000 emails over our relay server in a 24 hour period - Now that's a lot of email!
Not only have we been beavering away at the above but we've also seized the opportunity to increase the capacity of our relay servers. Yesterday we added an additional 2 high-end servers to the platform bringing the total to 8. We've seen no problems so far and the reduction in load on the platform since their deployment is very promising indeed.
Hopefully we've seen the last of email delays for a while but make sure you let our support team know if you see any problems or give one of us Comms folk a prod over on the forums is you suspect anything awry ;)
Don't forget that you can keep up to date with all the latest Service Status information by subscribing to the Usertool's RSS or Email Feed.