Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
LABS
Guides

Setting Up an Emergency Response Team Structure

A technical guide for DAOs and protocol teams on structuring a dedicated incident response team with clear roles, communication protocols, and on-chain escalation paths.
Chainscore © 2026
introduction
SECURITY

Setting Up an Emergency Response Team Structure

A structured Emergency Response Team (ERT) is critical for managing security incidents in Web3 projects. This guide outlines the core roles, responsibilities, and operational framework for an effective team.

An Emergency Response Team (ERT) is a predefined group of individuals responsible for coordinating the detection, analysis, containment, and resolution of security incidents. In the context of blockchain protocols, smart contracts, and decentralized applications, these incidents can range from a critical bug discovery and active exploit to a governance attack or a severe protocol failure. Unlike traditional incident response, Web3 ERTs must operate with extreme transparency, often under public scrutiny, while managing immutable code and decentralized treasury assets. The primary goal is to minimize damage, protect user funds, and restore normal operations as swiftly as possible.

The foundation of an effective ERT is a clear role-based structure. A typical structure includes: a Lead Coordinator who oversees the entire response and serves as the primary decision-maker; Technical Analysts who investigate the incident's root cause, often through blockchain forensics and smart contract audit; Communications Lead responsible for internal alerts and public disclosures via official channels like Twitter, Discord, or governance forums; and a Legal/Compliance Officer to navigate regulatory implications. For smaller projects, individuals may wear multiple hats, but the responsibilities must be explicitly assigned before an incident occurs.

Establishing Standard Operating Procedures (SOPs) is the next critical step. These are documented playbooks that the team follows when an incident is declared. A basic SOP includes: 1) Incident Triage & Declaration: Defining clear severity levels (e.g., Critical, High, Medium) and the threshold for activating the full ERT. 2) Communication Protocols: Defining internal communication tools (e.g., a private Signal or Telegram group) and templates for public announcements. 3) Containment Actions: Outlining technical steps like pausing vulnerable contracts using upgradeable proxies or emergency multisig functions. 4) Post-Mortem Process: Mandating a transparent report, like those published by Compound or Polygon, detailing the cause, response, and remediation.

For technical implementation, many protocols codify emergency powers into their smart contract architecture. A common pattern is a time-locked multisig wallet or a governance module with specialized emergency functions. For example, an upgradeable contract might have a pause() function that is only callable by a 4-of-7 multisig held by the ERT. The OpenZeppelin library provides Pausable and AccessControl contracts that form the basis for such systems. It is crucial that these emergency mechanisms are thoroughly tested on a testnet and their activation process is rehearsed by the team.

Finally, the ERT must be prepared for the post-incident phase. This involves conducting a blameless post-mortem analysis to identify systemic failures, updating the SOPs based on lessons learned, and, if necessary, executing a remediation plan such as a treasury-funded reimbursement for affected users. Transparency in this phase is non-negotiable for maintaining community trust. The entire structure—from defined roles to tested smart contract safeguards—ensures that when a crisis hits, the team can move from panic to a coordinated, effective response, ultimately safeguarding the protocol and its users.

prerequisites
PREREQUISITES AND TEAM FOUNDATION

Setting Up an Emergency Response Team Structure

A formalized on-call structure is a prerequisite for effective incident management. This guide outlines how to establish roles, responsibilities, and communication protocols for a Web3 security team.

An emergency response team (ERT) is a dedicated group responsible for managing security incidents, from smart contract exploits to governance attacks. Unlike traditional DevOps teams, Web3 ERTs must operate in a public, adversarial environment where transactions are irreversible and time is measured in block confirmations. The core team structure typically includes a Primary On-Call Engineer, a Secondary/Backup Engineer, a Communications Lead, and an Escalation Manager. Each role has defined responsibilities and clear handoff procedures documented in a runbook.

The Primary On-Call Engineer is the first responder. Their immediate duties include incident triage, initial investigation using tools like Tenderly or Etherscan, and executing time-critical mitigations such as pausing a vulnerable contract. They must have deep protocol knowledge and direct access to secure private keys for emergency multisig transactions. The Secondary Engineer provides support, reviews proposed actions, and is prepared to take over if the primary is unavailable. This redundancy is critical for 24/7 coverage.

A Communications Lead manages external and internal messaging. For public protocols, this involves drafting alerts for Twitter, Discord, or governance forums to maintain transparency without causing panic. Internally, they coordinate updates using a dedicated war room channel (e.g., in Slack or Telegram). The Escalation Manager, often a senior engineer or project lead, is notified for severe incidents (Severity 1/2) to make high-stakes decisions, like authorizing a treasury drain or coordinating with legal counsel.

Establishing clear communication protocols is non-negotiable. Designate a primary, encrypted channel for internal coordination (e.g., Keybase, Signal) and a separate, auditable channel for logging all actions and decisions. All communication should follow the Situation-Background-Assessment-Recommendation (SBAR) framework to ensure clarity. For example: "Situation: Withdrawals are failing on VaultContract. Background: The _processWithdrawal function reverts due to an overflow. Assessment: This is a Severity 2 exploit in progress. Recommendation: Pause the contract via multisig and investigate the root cause."

Finally, document this structure, contact information, and escalation paths in a readily accessible Incident Response Runbook. This living document should be stored in a secure, version-controlled repository like GitHub or Notion. Conduct regular tabletop exercises to simulate incidents (e.g., "a flash loan attack on the lending pool") to test response times, decision-making, and tooling. These drills validate your team's readiness and expose gaps in your procedures before a real crisis occurs.

key-roles
EMERGENCY RESPONSE

Core Team Roles and Responsibilities

A structured on-call team is critical for handling security incidents, protocol failures, and governance crises in a live Web3 environment.

01

Incident Commander

The Incident Commander is the central decision-maker during a crisis. This role is responsible for:

  • Declaring the start and end of an incident response.
  • Coordinating all team leads (technical, communications, legal).
  • Making final decisions on mitigation steps, such as pausing contracts or initiating upgrades.
  • Maintaining the official incident timeline and log.

This role requires deep protocol knowledge and the authority to execute under pressure.

02

Technical Lead / On-Call Engineer

This role owns the technical investigation and remediation. Key responsibilities include:

  • Triage: Analyzing on-chain data, logs, and monitoring alerts to confirm and scope the incident.
  • Mitigation: Executing pre-authorized emergency functions (e.g., pausing a pool) or deploying hotfixes.
  • Forensics: Working with blockchain analysts to trace fund flows and identify root cause.
  • Tooling: Managing access to incident response dashboards, private RPC nodes, and multi-sig wallets.

Example: During a flash loan attack, this lead would identify the exploiter's contract and recommend a pause.

03

Communications Lead

Manages all internal and external messaging to maintain trust and transparency. Duties cover:

  • Internal Comms: Alerting core team, investors, and key partners via secure channels (e.g., Telegram, War Room).
  • Public Comms: Drafting and publishing updates on Twitter, Discord, and the project blog. Must balance transparency with operational security.
  • Stakeholder Updates: Providing regular, factual briefings to token holders, DAO members, and liquidity providers.
  • Post-Mortem: Leading the public disclosure process after resolution, adhering to a responsible disclosure policy.
04

Legal & Compliance Lead

Provides guidance on regulatory and contractual obligations during a crisis. This role focuses on:

  • Assessing regulatory reporting requirements (e.g., in jurisdictions where the protocol operates).
  • Managing communications with law enforcement and blockchain intelligence firms like Chainalysis or TRM Labs.
  • Reviewing all public statements for legal risk.
  • Advising on liability and potential recovery actions, including working with white-hat hackers or negotiating with exploiters.
  • Ensuring the response adheres to the protocol's own terms of service and governance framework.
05

Establishing an On-Call Rotation

A sustainable emergency response requires a formal on-call rotation. Key elements include:

  • Primary & Secondary Responders: Designate a primary on-call engineer and a backup for each shift (e.g., weekly rotations).
  • Escalation Paths: Define clear triggers for escalating from automated alerts to the Incident Commander (e.g., funds at risk exceeding $X).
  • Tooling Access: Ensure on-call personnel have immediate, secure access to necessary private keys (in hardware wallets or MPC), admin panels, and communication tools.
  • Compensation: Compensate team members for on-call duty, as it requires constant readiness.
06

Runbooks & Preparedness

Runbooks are pre-written, step-by-step guides for known failure modes. Preparedness is proactive. This involves:

  • Creating Scenario Playbooks: For events like oracle failure, governance attack, or critical smart contract bug. Example: "Runbook: Responding to a Stablecoin Depeg on our AMM."
  • Regular Drills: Conducting tabletop exercises or simulations quarterly to test communication and decision-making.
  • Tooling Checklist: Maintaining a verified list of emergency contacts, multi-sig signer availability, and critical contract addresses.
  • Post-Incident Reviews: Mandating a blameless post-mortem process to update runbooks and improve response for future incidents.
RESPONSE FRAMEWORK

Incident Severity and Escalation Matrix

Defines severity levels, response timelines, and escalation paths for on-chain security incidents.

Severity LevelDescription & ExamplesInitial Response TimeEscalation PathPost-Mortem Required

SEV-1: Critical

Protocol insolvency, governance attack, >$10M exploit, total network halt.

< 15 minutes

All team leads → Executive team → Legal counsel

SEV-2: High

Major contract bug, partial downtime, $1M-$10M exploit, critical front-end compromise.

< 1 hour

Team leads → CTO/Head of Security → Executive team

SEV-3: Medium

Minor contract bug, UI/UX failure, <$1M exploit, significant performance degradation.

< 4 hours

On-call engineer → Team lead → Head of Security

SEV-4: Low

Minor UI bug, non-critical API failure, informational security alert, community confusion.

< 24 hours

On-call engineer → Team lead

False Positive / Info

Non-critical alert, expected behavior, external protocol incident with no direct impact.

As needed

Logged for review by security team

on-call-implementation
OPERATIONS

Implementing On-Call Rotations and PagerDuty

A structured on-call rotation is critical for Web3 protocol reliability. This guide covers how to design a rotation schedule and configure PagerDuty for effective incident response.

On-call rotations ensure a protocol's core services—like node operation, smart contract monitoring, and API availability—have dedicated responders 24/7. A poorly managed rotation leads to alert fatigue, burnout, and slower incident resolution. For Web3 teams, the stakes are high: a missed critical alert can result in halted bridges, failed transactions, or exploited vulnerabilities. The primary goal is to create a predictable, fair schedule that distributes responsibility while maintaining clear escalation paths for different severity levels, from Sev-1 (service down) to Sev-4 (minor bug).

Designing the rotation starts with defining roles and responsibilities. A common structure includes a Primary responder who handles all initial alerts, a Secondary or backup engineer, and an Escalation Manager for critical incidents. Rotations can follow different patterns: a weekly rotation is simple but can be taxing, while a follow-the-sun model leverages team members in different time zones for continuous coverage. Use tools like Google Calendar or dedicated scheduling software to create the schedule, ensuring it accounts for vacations and time-off requests well in advance.

PagerDuty is a leading platform for orchestrating this process. After creating a team, you define Services representing your critical components (e.g., Mainnet RPC Endpoint, Bridge Monitoring). Each service has Integrations that connect to your alert sources. For Web3, common integrations include Prometheus for node metrics, Sentry for application errors, and custom webhooks from blockchain monitoring tools like Tenderly or Chainscore. Configuring these integrations correctly ensures alerts flow from your infrastructure into PagerDuty's incident management system.

The core of PagerDuty configuration is the Escalation Policy. This policy dictates the alert routing: it first notifies the Primary on-call engineer via phone, SMS, and push notification. If they don't acknowledge the alert within a set timeframe (e.g., 5 minutes for Sev-1), it escalates to the Secondary, and eventually to the manager. You can create different policies for different severity levels. It's crucial to integrate PagerDuty with your communication stack, such as Slack or Microsoft Teams, to keep the entire team informed of ongoing incidents through dedicated channels.

Effective on-call requires more than just tools; it requires clear processes. Document runbooks for common alerts (e.g., "RPC node is down" or "High gas price spike") in a wiki like Notion or Confluence. These runbooks should provide immediate diagnostic steps and remediation actions. After each significant incident, conduct a blameless post-mortem to identify root causes and improve systems. Regularly review and tune your alert thresholds to reduce noise—aim for a high signal-to-noise ratio where every page requires immediate human intervention.

communication-protocols
EMERGENCY RESPONSE

Secure Communication and Coordination Protocols

Establishing a clear, secure, and resilient communication structure is critical for managing on-chain incidents. This guide covers the essential protocols and tools for Web3 teams.

01

Establish a Secure War Room

A dedicated, private communication channel is the first step in any incident response. Key requirements include:

  • End-to-end encryption for all sensitive discussions.
  • Access control with strict, role-based permissions.
  • Audit logging for post-mortem analysis.

Use platforms like Keybase or Element (Matrix) with verified teams. For time-critical coordination, establish a clear on-call rotation and escalation matrix.

03

Create a Pre-Approved Incident Playbook

Reduce decision latency with documented procedures for common threats. A playbook should include:

  • Trigger conditions (e.g., oracle failure, governance attack).
  • Step-by-step response flows for technical and communication teams.
  • Pre-written templates for public announcements.

Store this playbook in a private, version-controlled repository (e.g., GitHub Gist with limited access) and conduct quarterly tabletop exercises to test it.

05

Define Clear Internal and External Communication Protocols

Maintain trust by controlling the narrative. Establish two parallel communication tracks:

Internal (Secure): Immediate, technical details for the response team only. External (Public): Timely, accurate, and calming updates for users and stakeholders.

Designate official spokespeople and use pre-established channels like the project's official Twitter account and Discord announcement channel. Never discuss mitigation strategies in public forums during an active incident.

06

Conduct Post-Mortems and Update Procedures

Every incident, whether mitigated or not, is a learning opportunity. A formal post-mortem process should:

  • Reconstruct the timeline from logs and chat history.
  • Identify root causes and response bottlenecks.
  • Produce actionable items to improve tools, playbooks, and training.

Publish a sanitized version of the post-mortem to the community to demonstrate accountability and reinforce security practices. Update all related documentation immediately.

external-coordination
SECURITY OPERATIONS

Setting Up an Emergency Response Team Structure

A structured emergency response team (ERT) is critical for coordinating with external security researchers during a crisis. This guide outlines how to build a cross-functional team, define clear roles, and establish communication protocols to manage vulnerabilities effectively.

An effective Emergency Response Team (ERT) is a predefined, cross-functional group activated during a security incident, such as the discovery of a critical vulnerability by an external auditor or whitehat hacker. The core team should include representatives from technical, legal, communications, and executive leadership. Key technical roles include a Lead Developer to assess and implement fixes, a Security Lead to coordinate with the finder and analyze the threat, and a DevOps Engineer to manage deployments. Having these roles assigned in advance prevents chaotic scrambling and ensures a swift, organized response when minutes count.

Clear internal and external communication protocols are the backbone of crisis management. Internally, establish a dedicated, private channel (e.g., a Slack channel or War Room) for the ERT. Externally, provide a secure, standard method for researchers to report issues, such as a security@ email, a dedicated form, or a platform like Immunefi or HackerOne. The process must be documented in your project's SECURITY.md file. Upon receiving a report, the Security Lead should acknowledge it within 24-48 hours, providing the researcher with a single point of contact to prevent confusion and information leaks.

The ERT must operate with a pre-defined decision-making framework and escalation matrix. Create a severity classification system (e.g., Critical, High, Medium) based on potential impact, often outlined in your bug bounty program. This classification triggers specific response playbooks. For a Critical bug that could lead to fund loss or protocol shutdown, the playbook should include immediate steps: validating the bug, preparing a patch, coordinating a mainnet pause if necessary, and notifying key stakeholders. The legal and communications leads work in parallel to draft public statements and manage the bounty payout process in accordance with your published policy.

governance-integration
ON-CHAIN GOVERNANCE

Setting Up an Emergency Response Team Structure

A structured Emergency Response Team (ERT) is critical for managing protocol crises, from critical bugs to governance attacks. This guide outlines how to formally integrate an ERT into your on-chain governance framework.

An Emergency Response Team (ERT) is a pre-authorized group of experts empowered to execute time-sensitive actions to protect a protocol. Unlike standard governance, which operates on proposal timelines of days or weeks, an ERT can act within hours. This structure is essential for mitigating risks like a vulnerability in a core Pool contract, a malfunctioning oracle, or a malicious governance proposal attempting to drain the treasury. The ERT's authority and limitations must be explicitly codified in the protocol's smart contracts and governance charter.

The core mechanism is a multisig wallet or a specialized module with elevated permissions. For example, a 5-of-9 Gnosis Safe multisig could hold the power to pause specific contract functions, upgrade proxy implementations, or execute a pre-defined emergency shutdown. These powers are not arbitrary; they are scoped to a whitelisted set of functions defined in the protocol's EmergencyExecutor contract. This setup ensures the ERT can act decisively while being constrained from accessing user funds or altering core economic parameters without broader approval.

Integrating the ERT requires clear on-chain and social consensus. Technically, the governance DAO must pass a proposal to grant the EMERGENCY_EXECUTOR_ROLE to the ERT's multisig address. Socially, the community must ratify a publicly available ERT charter detailing its members, selection process, activation triggers, and post-action review requirements. Transparency about the team's composition, often including core developers, security researchers, and legal advisors, builds trust that this 'break-glass' mechanism won't be abused.

Post-emergency accountability is non-negotiable. Every action taken by the ERT must trigger an automatic, on-chain event that creates a mandatory review proposal. For instance, after pausing a lending market, the ERT must submit a governance proposal within 7 days explaining their actions. The broader DAO then votes to either ratify the action or, in a fail-safe, revert the changes. This creates a check-and-balance, ensuring the ERT is a tool for protection, not a centralized override of decentralized governance.

To implement, start with a time-locked, cancelable design. Use OpenZeppelin's TimelockController as the ERT executor, where actions have a short delay (e.g., 2-6 hours) that allows public scrutiny and gives the DAO a final window to cancel the action via a standard vote. This model, used by protocols like Compound and Uniswap, provides a critical safety buffer. The contract code below shows a simplified structure for an emergency pauser module that only the ERT timelock can activate.

solidity
contract EmergencyPauser {
    address public emergencyTimelock;
    IPausable public targetContract;

    function executePause() external {
        require(msg.sender == emergencyTimelock, "Unauthorized");
        targetContract.pause();
    }
}

Ultimately, a well-designed ERT strengthens decentralization by providing a predictable, community-approved path for crisis management. It prevents panic-driven, ad-hoc decisions and replaces them with a transparent process. Document your structure clearly in the protocol's documentation, conduct regular tabletop exercises with the team, and ensure the smart contract permissions are audited. This turns a potential single point of failure into a robust, accountable component of your on-chain governance system.

CRITICAL INFRASTRUCTURE

Emergency Response Tool Stack Comparison

Comparison of communication, monitoring, and coordination tools for Web3 security incident response.

Tool Category & FeatureDiscord + Custom BotsTelegram + TenderlyPagerDuty + OpenZeppelin Defender

Real-time Alert Ingestion

On-chain Monitoring Integration

Custom webhooks

Tenderly Alerts

Defender Sentinels

Role-Based Access Control (RBAC)

Manual via roles

Incident War Room Creation

Manual channel

Manual group

Auto-generated

Audit Logging & Compliance

Limited

SOC2 compliant

Response Time SLA Tracking

Multi-chain Alert Aggregation

Custom logic required

EVM chains only

EVM & Solana via API

Estimated Monthly Cost (Team of 10)

$0-50

$100-400

$500+

runbook-creation
INCIDENT RESPONSE

Setting Up an Emergency Response Team Structure

A well-defined emergency response team (ERT) is the critical first line of defense for any Web3 protocol. This guide outlines the roles, responsibilities, and operational structure needed to manage security incidents effectively.

The core of an effective ERT is a clear RACI matrix (Responsible, Accountable, Consulted, Informed). For a blockchain protocol, key roles include the Incident Commander, who has ultimate authority and makes final decisions; Technical Leads for smart contracts, node infrastructure, and frontends; a Communications Lead to manage internal and public messaging; and a Legal/Compliance Officer. Each role must have at least one primary and one backup person, with 24/7 availability via defined on-call schedules using tools like PagerDuty or Opsgenie.

Establishing clear escalation paths and communication channels is non-negotiable. The primary command center should be a private, auditable channel like a dedicated Slack workspace or Discord server with strict access controls. Define severity levels (e.g., SEV-1: Protocol funds at risk, SEV-2: Critical functionality degraded) with corresponding response time SLAs. For example, a SEV-1 incident might require the full team to be paged and assembled in a war room within 15 minutes, while a SEV-3 bug may only require notification during business hours.

Preparation involves creating and maintaining pre-approved action playbooks for common scenarios. These are step-by-step guides that the team can execute under pressure. Examples include a governance pause for a vulnerable smart contract, a frontend takedown in case of a DNS hijack, or a coordinated disclosure process with security researchers. Each playbook should list required permissions (e.g., multi-sig signers), tools, and verification steps. Regularly test these playbooks through tabletop exercises that simulate real incidents like an oracle failure or a governance attack.

The ERT must integrate with the protocol's technical stack and governance. This means ensuring team members have the necessary access to multi-sig wallets (via Gnosis Safe), node RPC endpoints, monitoring dashboards (like Tenderly or Blocknative), and blockchain explorers. For decentralized protocols, define the process for triggering emergency measures encoded in the smart contracts, specifying which multi-sig signers or DAO votes are required. Document all access credentials securely in a tool like 1Password or HashiCorp Vault, with regular key rotation.

Continuous improvement is driven by post-incident reviews. After resolving any incident, the ERT should conduct a blameless retrospective to document the timeline, root cause, and effectiveness of the response. Key questions to answer: Was the detection time acceptable? Did communication channels work? Were the playbooks followed and were they adequate? Use these insights to update runbooks, refine team composition, and identify areas for technical mitigation, such as adding new monitoring alerts or implementing circuit breakers in the protocol code.

EMERGENCY RESPONSE

Frequently Asked Questions

Common questions and troubleshooting steps for establishing and operating a Web3 security emergency response team (ERT).

A Web3 Emergency Response Team (ERT) is a dedicated, cross-functional group responsible for managing security incidents for a protocol or DAO. Unlike traditional IT security, Web3 ERTs handle unique threats like smart contract exploits, governance attacks, oracle manipulation, and bridge hacks. The need stems from the immutable and financially critical nature of blockchain systems where a single bug can lead to irreversible losses exceeding hundreds of millions of dollars. An ERT provides a structured framework for rapid detection, analysis, containment, and communication during a crisis, which is essential for mitigating damage and maintaining user trust in a decentralized environment.

How to Build a Blockchain Emergency Response Team | ChainScore Guides