When critical IT services fail or are severely degraded, the repercussions and consequences can be very serious (if not catastrophic). Major incidents have been known to cause millions of pounds worth of damage – from loss of sales, productivity and reputation.
Best practice IT Major Incident Management ensures that the right people, processes and resources are primed and aligned to resolving any business critical service outages as quickly as possible – negating any downtime and associated costs and losses.
Major Incident Management’s focus is on restoring the normal service operation of business critical services as quickly as possible.
It is worth noting that whilst Major Incident Management provides a workaround or temporary fix to restore services, it does not necessarily involve identifying the root cause of the problem or permanently fixing the affected infrastructure.
Root cause analysis is the concern of Problem Management – a different function entirely. However, it is not unusual that the quickest solution to a major IT incident is also the appropriate permanent fix. Therefore the two functions can overlap.
Examples of types of impact a business may experience when a major incident occurs:
- Financial loss
- Productivity loss
- Reputational damage
- Legal obligations are not met
- Potential life and death situation
When you consider the wider business and its dependency on technology, the value of Major Incident Management becomes obvious. This is especially the case for large IT managed service providers – where a single major incident may cost a customer millions of pounds in loss of earnings or reputational damage.
Example 1: The value of Major Incident Management for large organisations
Take a major retailer as an example…
Many well-known major retailers rely on self-service point of sale devices and point of sale terminals to take payment for goods. In busy periods they may take around £4,000 of transactions per second. That equates to £240,000 per minute and £14,400,000 per hour. That is a substantial loss of earnings for every hour that the point of sale devices and terminals are down.
Of course, the retailer won’t just lose millions of pounds worth of sales. The impact of such an outage is far greater. When customers become frustrated it can quickly result in negative press and social media coverage, which can seriously damage the brand and company’s reputation. There may even be a longer-term impact as the retailer may lose some valued customers, therefore affecting sales in the long-term.
The damage resulting from that one hour outage includes:
- Productivity loss (staff cannot do their job)
- Financial loss (£14m per hour)
- Reputational damage (both immediate customers and shareholders/ the market)
- Regulatory failings (failing to meet the regulatory standards/ maintain data records)
Example 2: The value of Major Incident Management for smaller organisations
Let us look at the potential impact of a major incident on a smaller organisation. Take for example a professional services organisation that has three office locations in Europe and experiences a complete loss of network access. As a result there are 1000 employees across the three locations that cannot login or access the network.
Each employee costs the business around £145 per day. If the network remains down all day, then the direct outage cost is £145,000 per day. Of course the impact is far wider, as it will affect project deadlines that were missed, as well as the negative long-term impact on customers, partners and suppliers.
These two examples clearly demonstrate the huge impact that a major incident can have on any business.
When Best Practice IT Major Incident Management processes, people and infrastructure are aligned, a major incident is quickly and efficiently dealt with and services are swiftly restored to end users/ customers.
With a best practice Major Incident Management framework in place, key stakeholders, including customers and end users, are immediately made aware of the major incident, and with effective communications, are given confidence that the IT operation is effectively managing a workaround or resolution.
The benefits of a structured, documented and agreed Major Incident process
Without a structured Major Incident Management process, a major IT incident can result in complete chaos. The Service Desk may become overrun with calls from end users, who in turn send a huge number of incidents to different Technical Resolving Groups.
The Technical Resolving Groups then all work on diagnosis in isolation, causing potential duplication of effort from support staff. In short, nobody knows what has happened, when it happened and what needs to happen next. Resolution is slow and damage and losses are protracted.