Major Incident Roles and Responsibilities – Who is involved and what do they do?


Major Incident roles and responsibilities

 

Major Incidents are never resolved by one individual alone.

 

High-performing Major Incident functions rely on coordinated leadership, technical expertise, stakeholder management, and operational control — all working together under pressure

The organisations that manage Major Incidents effectively understand one thing: clarity of roles drives speed, collaboration, and better decision-making.

Below are the key roles commonly involved in Major Incident Management and the part each plays during high-impact service disruption.

Every organisation operates differently, so this should be viewed as a framework for operational excellence rather than a rigid structure.

 

The Service Desk

The Service Desk is often the first team to become aware of a developing Major Incident.

During service disruption, it acts as the operational frontline between the business and IT — managing user communications, logging related incidents, and helping reduce confusion during periods of uncertainty.

In mature Major Incident environments, the Service Desk plays a critical role in:

 

  • identifying impact trends quickly
  • supporting stakeholder communication
  • reducing duplicate demand on technical teams
  • maintaining confidence across the user base

 

Effective communication from the Service Desk can significantly improve stakeholder experience during high-pressure incidents.

 

Technical Resolution Groups

Technical Resolution Groups provide the specialist expertise required to diagnose, contain, and resolve Major Incidents.

Complex outages rarely sit within a single technology domain. Multiple resolver teams often need to collaborate under pressure to identify root cause, implement workarounds, and restore service quickly.

The effectiveness of these groups is heavily influenced by:

  •       clarity of leadership
  •       quality of coordination
  •       speed of decision-making
  •       access to accurate operational data

Without structure, technical effort can quickly become fragmented, increasing downtime and stakeholder impact.

 

Technical Lead Manager

When multiple technical teams are involved, Technical Lead Managers (TLMs) help centralise and coordinate technical activity.

The Major Incident Manager relies on the TLM to provide technical leadership, align investigation efforts, and ensure technical teams remain focused on the agreed recovery strategy.

In complex incidents, Technical Lead Managers also play a key role in translating highly technical information into clear operational updates that support effective stakeholder communication and decision-making.

Strong technical leadership helps reduce confusion, duplication, and delays during critical outages.

 

Service Continuity Manager

The Service Continuity Manager becomes critical when service recovery through normal Major Incident processes is no longer possible.

Their focus is business continuity and disaster recovery — ensuring the organisation can continue operating during severe disruption.

Examples may include:

  • data centre failures
  • environmental incidents
  • physical site access issues
  • cyber or infrastructure events requiring recovery site activation

In high-impact scenarios, the Service Continuity Manager helps coordinate recovery strategies that protect operational resilience and minimise long-term business disruption.

 

Service Manager / Service Director

Within Managed Service Providers (MSPs), Service Managers and Service Directors often own the strategic relationship with the client organisation.

In internal IT functions, they are typically accountable for service performance, governance, and supplier management.

During Major Incidents, their role may include:

  • supporting stakeholder engagement
  • providing operational or client-specific insight
  • coordinating access to key contacts or locations
  • supporting escalation management
  • ensuring communication remains aligned with business expectations

Their involvement helps maintain confidence and continuity throughout high-pressure incidents.

 

Third Parties

Third parties include any external suppliers, vendors, or partners that contribute to the organisation’s technology services.

 

During Major Incidents, these organisations may provide:

  • specialist technical expertise
  • infrastructure support
  • vendor escalation
  • access to critical systems or services

· 

In many environments, resolution success depends on how effectively third parties are coordinated alongside internal technical teams.

Clear governance, accountability, and communication are essential to avoiding delays during critical incidents.

 

Director / Head of IT / Head of Service

The Director or Head of IT retains overall accountability for the Major Incident capability, including its people, processes, governance, and operational effectiveness.

In mature organisations, their role during live incidents is typically strategic rather than operational — focused on:

  • stakeholder assurance
  • executive visibility
  • business impact management
  • escalation support
  • organisational risk oversight

 

Following major outages, senior IT leadership may also be responsible for communicating outcomes, recovery actions, and improvement strategies to executive stakeholders or clients.

In less mature environments, senior IT leaders may become heavily involved in live incident coordination, particularly where dedicated Major Incident capability does not yet exist.

 

Change Manager

The Change Manager helps ensure changes to the IT estate are implemented in a controlled and governed way.

 

During Major Incidents, emergency fixes and workarounds are often required at speed. This creates a balance between rapid restoration and operational risk.

 

Effective Change Management during Major Incidents supports:

  • rapid approval processes
  • controlled implementation of fixes
  • governance 
  • reduced risk of additional disruption

In mature organisations, emergency and retrospective change processes are designed to support recovery efforts without creating unnecessary operational delay.

 

Problem Manager

Problem Managers focus on identifying underlying causes and preventing repeat incidents.

Although Major Incident Management is focused on rapid restoration, long-term operational improvement depends heavily on effective Problem Management.

Following high-impact incidents, Problem Managers may:

  • lead root cause investigations
  • identify recurring failure patterns
  • coordinate permanent fixes
  • support post-incident reviews
  • drive continual service improvement

·   

Without this activity, organisations risk repeatedly experiencing the same operational failures.

 

Major Incident Manager

The Major Incident Manager is responsible for the end-to-end coordination and leadership of Major Incidents.

Their role extends far beyond technical coordination. Effective Major Incident Managers bring structure, pace, communication, and control to highly pressured situations.

Key responsibilities commonly include:

  • coordinating technical and operational teams
  • managing stakeholder communication
  • driving decision-making and accountability
  • forming clear action plans and recovery strategies
  • coordinating third parties and resolver groups
  • managing business impact and escalation
  • maintaining operational governance and reporting
  • supporting continual improvement activities

 

Strong Major Incident Managers combine leadership, communication, operational awareness, and decision-making under pressure.

They are often the difference between controlled recovery and operational chaos.

Major Incident Management Is a Team Capability Technology alone does not resolve Major Incidents.

Clear leadership, defined responsibilities, operational maturity, and coordinated decision-making are what separate reactive organisations from operationally resilient ones.

The most effective organisations understand that Major Incident excellence is not built around individual effort, it is built through structured collaboration, capability development and leadership under pressure.

 

MIM® helps organisations develop the people, processes, and operational capability required to deliver Major Incident excellence at scale.

 

MIM® is the Global Best Practice IT Major Incident Management Training and Certification provider. Learn more about Training, Consultancy and our other services here