Incident Management Playbook
Houston, we have a problem ...
Defining a clear incident management process is key to resolving incidents faster and reducing costs. IT support teams are most efficient when you’ve implemented a clear incident management process following the best practices. The benefits of having a clear incident management process include:
- Faster incident resolution and improved MTTR (Mean Time to Resolution)
- Reduced costs and impact on revenue for the business
- Better internal and external communication during incident management
- Continuous improvement and learning
- Improved customer experience
The incident management process is not usually defined or reinvented by organizations, but drawn on industry best practices. These best practices are adopted by organizations to fit their individual needs. Before diving deeper into incident management, the following are some important terms and definitions that we must discuss.
An incident is any unplanned event that disrupts the normal operations of service or impacts the quality of the service. Anything from a service downtime to a slow web server can be categorized as an event.
Incidents are often confused with problems, but incidents are unplanned events whereas problems are the underlying cause behind the incident. Incident management is focused on solving the problem and involves returning the service back to its normal operation. Problem management involves identifying the root cause of the incident to prevent it in the future.
Identifying an Incident
Ideally, monitoring and alerting tools will detect and inform our team about an incident before our customers even notice. Though sometimes we’ll first learn about an incident from customer support tickets.
No matter how the incident is detected, our first step should be to ensure that the incident is recorded for tracking purposes.
It is better to declare an incident early and then find a simple fix and close out the incident than to have to spin up the incident management framework hours into a burgeoning problem.
Outside of tickets, if any of the following is true, the event is an incident:
- Do you need to involve a second team in fixing the problem?
- Is the outage visible to customers?
- Is the issue unsolved even after an hour’s concentrated analysis?
Anyone can identify an incident. Sometimes, an employee reports the issue, and sometimes it’s identified via customers or our partners. Anyone can identify and report an incident via an automatic alert, ticket, text message, email, or phone call. Upon receiving the report, we record and identify whether it’s an incident or a service request as each one is handled differently.
- 0 Comment
Recent Comments