Here at PlusNet we have various procedures in place for resolving system problems, be they internal or external issues.
Anyone at PlusNet can raise a problem using our internal Problem Tool, and the problem is then passed to our QA department to be verified before being assigned to the relevant department for resolution.
Problems when raised are assigned a priority of 1-3. The descriptions for each priority level are below:
A Priority 1 problem is defined by ANY of the following:
- A customer facing service is not working for customers using it (urgency to be judged by Comms/HoD's)
- Any problem affecting sign up or billing
- An issue flagged as critical in Nagios (if affecting customer facing services)
- A problem which causes considerable disruption to the workflow of any department (for the CSC this is defined as a problem which is estimated to cause 100 contacts)
A Priority 2 problem is defined by ANY of the following:
- A service wide issue which does not directly impact the customer or for which there is a readily available workaround
- A problem affecting a departments efficiency, at any level
A Priority 3 problem is defined by ANY of the following:
- A single user issue (internal or external)
- An issue which requires no immediate action or presents no significant risk to the business or customers
Once QA have replicated/validated a problem, it is passed to the relevant department (Networks, Development, Content, Finance, Products) for resolution.
Priority 1 problems are published to the portal, and can be tracked by customers until resolution. You can also add your username to any P1 problem if you are affected, without having to go through the CSC department.
Here in Comms, we took ownership of the problem raising and management process from a CSC perspective a few months ago, and have been involved in many behind the scenes changes to improve the process, and turn problems around quicker (along with our Network and Development colleagues of course).
One of these is something we call the Problem Hopper, which means we can escalate the status of a problem which is categorised as a P2 or P3 but is causing customer pain, or is an issue which CSC receive many contacts about. When in the Hopper, a problem sits somewhere between a P1 and a P2, and is worked on urgently by our problem team until resolved. You may have seen posts in the forums from Comms Team members advising they have prioritised or escalated a problem, which usually means it has entered the Hopper.
When CSC agents raise a problem these days, it is passed to the Comms team for initial validation and prioritisation. This means we can test more thoroughly than CSC agents are often able to, and helps us pick up on more widespread issues much quicker, ensuring that the correct priority is set for the issue. This is valuable, as a problem seen by one CSC agent can easily be missed or raised as an individual ticket to a different department, however if four CSC agents raise the same problem to the Comms team it's clear that there is an issue that needs to be addressed. Agents can also use our internal forums or alerting procedures to advise other agents to watch for certain issues, which is very valuable for other teams coming on shift and such like.
Dealing with all problems here in Comms means we have a much better grasp of which problems are affecting our customers, and means our input into projects or improvements is much more tailored at the wider customer base and resolving current lower priority issues as part of project work where possible.
Each one of our different departments has a dedicated resource to work problems, particular in Development and Networks, who have between 4 and 10 people working on problems at any one time (dependent on how many are assigned to that department at the time). These departments work problems in order of priority, and each individual takes ownership and sees the same problem through to resolution where possible (minimising duplicating the investigating stages, and ensuring the problem is fixed as quickly as possible).
The main area we focus on when dealing with problems is Development, both because there tend to be more problems raised relating to development issues, and because these are the problems which require more testing traditionally.
We've spent a lot of time in the last few weeks grouping problems together, working closely with the development team, in order to create mini-projects to resolve the most prevalent issues. This means we will have dedicated development resource to work on grouped problems, and will be able to feedback throughout the resolution process. It will also save lots of investigation time, as it is likely that at least some of the issues are related, yet present different symptoms to the user.
One area we are focusing on currently is email, as there are a few email problems open at present which are causing problems for customers. We are hoping to get some timescales in place for this later this week, and of course I'll keep you all posted. ;)
If this process proves successful we will take it forward and use it on other problem groups, but time will tell!
As always, we'd like to hear any feedback you have on the information above, and will change the process as required to ensure the best possible allocation of our available resources and priorities.