Note: The Best Practice in IT Major Incident Management and it’s components are a framework and your organisation’s other existing processes should be considered when incorporating Major Incident Management into Operations.
Whilst the primary objective of Major Incident Management is to restore normal service to End Users, there are three phases that have sub objectives that contribute to the primary objective.
The three phases of the Best Practice Major Incident management Process:
- The initial 15 minutes (of major incident identification)
- The post 15 minutes (n.b. this can last hours or sometimes days)
- The resolution (and closure of the major incident)
The initial 15 minutes phase
In the initial 15 minutes of major incident identification the key objectives are:
- Validation (that there is a genuine major incident)
- To make all stakeholders aware of the occurrence of the major incident and the impact
- Engage the relevant Technical Resolving Groups
The focus on this first phase of a major incident is identification and communication. Ensuring that the End Users, Management and Operational staff are aware of a Major Incident.
Side note:
Although in this article we are discussing the three phases at a higher level and not the individual actions required within each part of the process it is important to note that guiding your End Users and Service Desk behaviour at this point can make or break the effectiveness of your Incident Management as well as your Major Incident Management.
Ideally you want End Users to log an Incident through self-service portals or contact with Service Desk still. Each Incident is assessed by the Service Desk and if it is linked to the Major Incident a child / sub Incident is created and linked to the Major Incident in your Service Management tool. Typically, the service level agreement (SLA) is suspended on the newly created child incident, whilst the SLA of the major incident is the measurement used.
Some IT Operations may have constraints on the volume on incoming Incidents they can handle. Understanding this and building a mechanism to capture incidents from End Users associated with the major incident within your Major Incident Process, that works for your Operation is important. If the Service Desk becomes overrun with inbound contact they may not be able to deal with other incidents and requests that are unrelated to the major incident or affect End Users in other locations, causing a reduction in the quality of service to other End Users and possible SLA breach.
The post 15 minutes phase
In the post 15 minutes phase of major incident the key objectives are:
- Identify and implement workarounds
- Maintain stakeholder confidence and communication
The focus in the second phase of a major incident is to define clear action plans, workaround identification and implementation of the workaround or fix.
This phase can last minutes, hours or occasionally days in some organisations.
The resolution and closure phase
In the resolution and closure phase of major incident identification the key objectives are:
- Validate the restoration of normal service operation
- Complete administration, reporting and Post Major Incident Review
The focus in this phase is on communication and administration to enable the Operation to learn and evolve, making performance improvements by learning from the
The benefits:
There are many benefits to thinking of a major incident as three phases but the greatest one is the ability to pinpoint where the process is working or breaking down with a greater level of clarity and speed. Knowing where and how to improve the process means a fast evolution and more effective service delivery to your End Users.