THOUGHT LEADERSHIP

Major Incident Roles and Responsibilities – Who is involved and what do they do?

Explore the key roles and responsibilities involved in Major Incident Management, from Service Desk and Technical Resolution Groups through to Change, Problem, and Major Incident Managers.

The organisations that manage Major Incidents effectively understand one thing: clarity of roles drives speed, collaboration, and better decision-making.

High-performing Major Incident functions are never built around one individual. They rely on coordinated leadership, technical expertise, stakeholder management and operational control, all working together under pressure.

This article outlines the key roles commonly involved in Major Incident Management and the part each plays during high-impact service disruption. Every organisation operates differently, so this should be used as a framework for operational excellence rather than a rigid structure.

The Service Desk

The Service Desk is often the first team to become aware of a developing Major Incident.

During service disruption, it acts as the operational frontline between the business and IT, managing user communications, logging related incidents, and helping reduce confusion during periods of uncertainty.

In mature Major Incident environments, the Service Desk plays a critical role in identifying impact trends quickly, supporting stakeholder communication, reducing duplicate demand on technical teams, and maintaining confidence across the user base.

Effective communication from the Service Desk can significantly improve stakeholder experience during high-pressure incidents.

Technical Resolution Groups

Technical Resolution Groups provide the specialist expertise required to diagnose, contain, and resolve Major Incidents.

Complex outages rarely sit within a single technology domain. Multiple resolver teams often need to collaborate under pressure to identify root cause, implement workarounds, and restore service quickly.

The effectiveness of these groups is heavily influenced by clarity of leadership, quality of coordination, speed of decision-making, and access to accurate operational data.

Without structure, technical effort can quickly become fragmented, increasing downtime and stakeholder impact.

Technical Lead Manager

When multiple technical teams are involved, Technical Lead Managers, or TLMs, help centralise and coordinate technical activity.

The Major Incident Manager relies on the Technical Lead Manager to provide technical leadership, align investigation efforts, and ensure technical teams remain focused on the agreed recovery strategy.

In complex incidents, Technical Lead Managers also play a key role in translating highly technical information into clear operational updates that support effective stakeholder communication and decision-making.

Strong technical leadership helps reduce confusion, duplication, and delays during critical outages.

Service Continuity Manager

The Service Continuity Manager becomes critical when service recovery through normal Major Incident processes is no longer possible.

Their focus is business continuity and disaster recovery, ensuring the organisation can continue operating during severe disruption.

Examples may include data centre failures, environmental incidents, physical site access issues, or cyber and infrastructure events requiring recovery site activation.

In high-impact scenarios, the Service Continuity Manager helps coordinate recovery strategies that protect operational resilience and minimise long-term business disruption.

Service Manager / Service Director

Within Managed Service Providers, Service Managers and Service Directors often own the strategic relationship with the client organisation.

In internal IT functions, they are typically accountable for service performance, governance, and supplier management.

During Major Incidents, their role may include supporting stakeholder engagement, providing operational or client-specific insight, coordinating access to key contacts or locations, supporting escalation management, and ensuring communication remains aligned with business expectations.

Their involvement helps maintain confidence and continuity throughout high-pressure incidents.

Third Parties

Third parties include any external suppliers, vendors, or partners that contribute to the organisation’s technology services.

During Major Incidents, these organisations may provide specialist technical expertise, infrastructure support, vendor escalation, and access to critical systems or services.

In many environments, resolution success depends on how effectively third parties are coordinated alongside internal technical teams.

Clear governance, accountability, and communication are essential to avoiding delays during critical incidents.

Director / Head of IT / Head of Service

The Director or Head of IT retains overall accountability for the Major Incident capability, including its people, processes, governance, and operational effectiveness.

In mature organisations, their role during live incidents is typically strategic rather than operational, focused on stakeholder assurance, executive visibility, business impact management, escalation support, and organisational risk oversight.

Following major outages, senior IT leadership may also be responsible for communicating outcomes, recovery actions, and improvement strategies to executive stakeholders or clients.

In less mature environments, senior IT leaders may become heavily involved in live incident coordination, particularly where dedicated Major Incident capability does not yet exist.

Change Manager

The Change Manager helps ensure changes to the IT estate are implemented in a controlled and governed way.

During Major Incidents, emergency fixes and workarounds are often required at speed. This creates a balance between rapid restoration and operational risk.

Effective Change Management during Major Incidents supports rapid approval processes, controlled implementation of fixes, governance and auditability, and reduced risk of additional disruption.

In mature organisations, emergency and retrospective change processes are designed to support recovery efforts without creating unnecessary operational delay.

Problem Manager

Problem Managers focus on identifying underlying causes and preventing repeat incidents.

Although Major Incident Management is focused on rapid restoration, long-term operational improvement depends heavily on effective Problem Management.

Following high-impact incidents, Problem Managers may lead root cause investigations, identify recurring failure patterns, coordinate permanent fixes, support post-incident reviews, and drive continual service improvement.

Without this activity, organisations risk repeatedly experiencing the same operational failures.

Major Incident Manager

The Major Incident Manager is responsible for the end-to-end coordination and leadership of Major Incidents.

Their role extends far beyond technical coordination. Effective Major Incident Managers bring structure, pace, communication, and control to highly pressured situations.

Key responsibilities commonly include leading Major Incident calls and investigations, coordinating technical and operational teams, managing stakeholder communication, driving decision-making and accountability, forming clear action plans and recovery strategies, coordinating third parties and resolver groups, managing business impact and escalation, maintaining operational governance and reporting, and supporting continual improvement activities.

Strong Major Incident Managers combine leadership, communication, operational awareness, and decision-making under pressure.

They are often the difference between controlled recovery and operational chaos.

Major Incident Management Is a Team Capability

Technology alone does not resolve Major Incidents.

Clear leadership, defined responsibilities, operational maturity, and coordinated decision-making are what separate reactive organisations from operationally resilient ones.

The most effective organisations understand that Major Incident excellence is not built around individual effort, it is built through structured collaboration, capability development and leadership under pressure.