A red team vs. blue team exercise is a controlled, adversarial simulation where one group (Red Team) attacks a system, and another (Blue Team) defends it. In Web3, this methodology is critical for stress-testing smart contracts, oracles, governance mechanisms, and economic models beyond automated audits. The primary goal is to uncover complex, multi-step attack vectors and systemic weaknesses that automated tools or individual code reviews might miss, providing a realistic assessment of a protocol's resilience under pressure.
How to Structure a Red Team vs. Blue Team Security Exercise
How to Structure a Red Team vs. Blue Team Security Exercise
A structured guide to planning and executing effective red team vs. blue team exercises for Web3 protocols, focusing on smart contract and protocol-level vulnerabilities.
The first step is scoping and objective setting. Define the attack surface: Will the Red Team target the core protocol logic, the token economics, the front-end dApp, or the underlying infrastructure like RPC nodes? Establish clear Rules of Engagement (RoE) specifying allowed techniques (e.g., social engineering, front-running, governance manipulation) and off-limits actions (e.g., attacking third-party dependencies). Objectives should be specific, such as "drain 30% of the main liquidity pool" or "achieve a malicious governance proposal pass."
Next, assemble and brief the teams. The Red Team should consist of experienced security researchers or ethical hackers familiar with common Web3 exploit patterns like reentrancy, price oracle manipulation, and flash loan attacks. The Blue Team typically includes the protocol's core developers, DevOps engineers, and monitoring specialists. A neutral White Cell acts as referees, ensuring RoE compliance, injecting scenario events, and adjudicating disputes. Provide both teams with the same documentation, codebase access, and initial system state.
Execution follows a defined timeline, often 24-72 hours. The Red Team actively probes and exploits, while the Blue Team monitors on-chain activity, internal logs, and alerting systems (like OpenZeppelin Defender or Forta) to detect, analyze, and respond to incidents. The Blue Team's response might involve pausing contracts, executing emergency governance, or deploying patches. All actions, findings, and communications should be logged in a dedicated channel or platform for post-exercise analysis.
The most critical phase is the post-mortem and remediation. After the exercise, all parties convene for a debrief. The White Cell presents a timeline of events. Each team discusses their strategies, detection gaps, and response effectiveness. Every successful exploit is documented as a vulnerability report, complete with a proof-of-concept and a severity assessment. The protocol team then prioritizes these findings and creates a remediation plan, turning the exercise's lessons into concrete security improvements, such as upgraded contract logic or enhanced monitoring rules.
How to Structure a Red Team vs. Blue Team Security Exercise
A structured red team vs. blue team exercise is a critical method for stress-testing blockchain protocols and smart contracts. This guide outlines the prerequisites and team composition needed to run an effective security simulation.
Before initiating an exercise, define a clear scope and objectives. The scope should specify the target system (e.g., a specific smart contract, a bridge protocol, or a validator client), the assets in play (testnet tokens, specific keys), and the rules of engagement. Objectives might include testing incident response procedures, validating monitoring alerts, or discovering unknown vulnerabilities. A well-defined Rules of Engagement (RoE) document is mandatory, outlining permitted and prohibited attack vectors, such as forbidding attacks on underlying infrastructure or third-party services.
Assembling the right teams is foundational. The Red Team acts as the adversarial force, simulating sophisticated attackers. This team requires deep expertise in blockchain internals, smart contract vulnerabilities (like reentrancy or logic errors), and common exploit techniques. They should operate with a threat intelligence mindset, crafting plausible attack narratives. The Blue Team is the defensive unit, typically composed of protocol developers, DevOps engineers, and security analysts. Their role is to detect, analyze, and respond to the Red Team's activities using monitoring tools, logs, and on-chain analytics.
A critical, often overlooked role is the White Team or exercise control. This neutral party acts as the referee and facilitator. They manage the exercise timeline, ensure adherence to the RoE, inject scenario events to increase realism, and serve as the communication hub between Red and Blue teams. The White Team is also responsible for the post-exercise debrief, where findings are discussed without attribution to foster a blameless culture focused on systemic improvement.
Technical prerequisites must be established in a controlled environment. This almost always means deploying the target system on a private testnet or staging environment that mirrors mainnet conditions. Both teams need access to necessary tooling: the Red Team requires frameworks like Foundry for exploit development and fuzzing, while the Blue Team needs monitoring stacks (e.g., Tenderly, Blocknative, custom indexers) and incident management platforms. All participants should have documented, role-specific access to these systems before the exercise clock starts.
Finally, establish clear success metrics and documentation procedures. Success isn't just about whether the Red Team "wins" by exploiting a flaw; it's measured by the Blue Team's Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR), the quality of forensic reports, and the number of validated security improvements identified. All actions, from attack payloads to alert logs, should be meticulously recorded to create an actionable report that drives concrete security enhancements to the protocol.
Step 1: Define Scope and Objectives
The success of a blockchain security exercise hinges on a meticulously defined scope and clear objectives. This initial phase establishes the rules of engagement, preventing scope creep and ensuring measurable outcomes.
Begin by explicitly stating the exercise's primary goal. Is the focus on testing the resilience of a new smart contract deployment, evaluating the incident response of a DeFi protocol's security team, or assessing the security of cross-chain bridge operations? A goal like "assess the oracle manipulation resistance of our lending protocol's liquidation logic" is far more actionable than a vague aim to "test security." This clarity directs all subsequent planning and resource allocation.
Next, define the in-scope assets and systems. This is a critical boundary that protects both the red team (attackers) and the blue team (defenders). The scope must specify the exact contract addresses, blockchain networks (e.g., Ethereum Mainnet, Arbitrum Sepolia testnet), front-end applications, and internal monitoring tools that are fair game. Crucially, it must also list out-of-scope elements, such as production databases, third-party infrastructure not under your control, or specific user funds. Documenting these limits in a Rules of Engagement (RoE) document is a security best practice.
Finally, establish success criteria and metrics. How will you measure the exercise's effectiveness? Objectives should be SMART: Specific, Measurable, Achievable, Relevant, and Time-bound. Examples include: "Identify at least three critical vulnerabilities (CVSS >= 7.0) in the new vault contract," "Achieve a mean time to detection (MTTD) of under 15 minutes for simulated attacks," or "Validate that all high-severity findings from the previous audit have been remediated." These metrics provide a concrete framework for the post-exercise analysis and report.
Step 2: Develop Attack Scenarios
Effective security exercises require realistic threat models. This step focuses on crafting specific attack vectors that challenge your protocol's defenses.
Define the Attack Surface
Map all potential entry points for an attacker. This includes:
- Smart contract functions (e.g., mint, swap, stake)
- Admin privileges and upgradeable contracts
- Oracles and external data dependencies
- User interfaces and front-end integrations
- Cross-chain bridges and third-party dependencies Start by auditing your system's architecture to identify every component an adversary could target.
Craft Economic Attack Vectors
Simulate attacks that exploit financial incentives. Common scenarios include:
- Flash loan attacks: Borrowing large sums to manipulate oracle prices or liquidity pool ratios.
- Governance attacks: Acquiring voting power to pass malicious proposals.
- MEV extraction: Front-running or sandwiching user transactions for profit.
- Liquidation cascades: Triggering mass liquidations by manipulating collateral prices. Model the capital requirements and potential profit for each attack to assess feasibility.
Design Technical Exploits
Create scenarios targeting code vulnerabilities. Focus on:
- Reentrancy: Exploiting state changes during external calls.
- Logic errors: Flaws in business logic, like incorrect fee calculations or access control.
- Integer overflows/underflows: Manipulating arithmetic operations.
- Signature replay attacks: Reusing signed messages across different chains or contexts. Use tools like Slither or Foundry's fuzzing to help discover these weaknesses.
Simulate Operational & Social Attacks
Test non-technical weaknesses in your organization.
- Private key compromise: Simulate a team member's key being leaked.
- Malicious insider: An employee with privileged access acts adversarially.
- Supply chain attack: A compromised dependency (like an NPM library) is introduced.
- Phishing campaign: Attempt to trick team members into revealing credentials. These exercises test incident response plans and operational security (OpSec) protocols.
Prioritize with Impact/Likelihood Matrix
Rank your scenarios to focus testing efforts. Use a simple matrix:
- High Impact, High Likelihood: Critical vulnerabilities (e.g., a flaw in a core vault). Prioritize these first.
- High Impact, Low Likelihood: "Black swan" events (e.g., a major oracle failure). Plan contingency measures.
- Low Impact, High Likelihood: Nuisance attacks (e.g., spam). Implement automated mitigations.
- Low Impact, Low Likelihood: Monitor but deprioritize. This ensures efficient allocation of your red team's resources.
Document Scenario Playbooks
Create a detailed runbook for each major attack scenario. Each playbook should include:
- Attack narrative: A story describing the attacker's goal and steps.
- Technical prerequisites: Required tools, capital, and access.
- Expected system behavior: How the protocol should react if defenses work.
- Success criteria: Clear metrics for whether the attack succeeded.
- Blue team clues: What logs, events, or anomalies the defense team should look for. This documentation is essential for consistent, repeatable exercises.
Step 3: Assign Red and Blue Team Roles
Core responsibilities, required skills, and team composition for Red and Blue Teams in a security exercise.
| Role / Attribute | Red Team (Attackers) | Blue Team (Defenders) | Purple Team (Facilitators) |
|---|---|---|---|
Primary Objective | Simulate real-world attacks to find vulnerabilities | Detect, analyze, and respond to simulated attacks | Coordinate exercise and ensure knowledge transfer |
Core Responsibilities | Reconnaissance, exploitation, lateral movement, persistence | Threat monitoring, alert triage, incident response, forensics | Rule-setting, scenario design, post-exercise debrief, gap analysis |
Key Skills Required | Penetration testing, social engineering, exploit development | SIEM/SOAR operation, log analysis, threat hunting, containment | Project management, communication, threat modeling, risk assessment |
Typical Team Size | 2-5 specialists | 5-10+ analysts and engineers | 1-3 coordinators |
Tools & Environment | C2 frameworks (e.g., Cobalt Strike), vulnerability scanners, custom payloads | EDR/XDR, SIEM, IDS/IPS, firewall logs, ticketing systems | Exercise platform, communication channels, reporting templates |
Success Metrics | Critical vulnerabilities discovered, mean time to compromise (MTTC) | Mean time to detect (MTTD), mean time to respond (MTTR), false positive rate | Exercise completion, actionable findings documented, lessons learned implemented |
Post-Exercise Output | Detailed attack narrative, proof-of-concept exploits, access artifacts | Incident report, updated detection rules, improved playbooks | Formal after-action report, risk register updates, training recommendations |
Step 4: Execute the Simulation
This phase transforms your planning into action, where the Red Team actively attacks and the Blue Team defends the target system.
The execution phase begins with a formal kickoff, where the facilitator confirms all teams are ready, reiterates the rules of engagement (ROE), and starts the official timer. The Red Team initiates their attack sequence based on their pre-defined playbook, which may include steps like initial reconnaissance, social engineering attempts, or exploiting a known vulnerability in a smart contract. Concurrently, the Blue Team activates their monitoring tools—such as blockchain explorers, security event managers (SEMs), and custom alerting scripts—to detect anomalous activity. All actions, from attack payloads to defensive countermeasures, must be meticulously logged with timestamps for later analysis.
During the live simulation, the facilitator's primary role is to orchestrate the exercise and ensure it remains within the established scope. They monitor the ROE for compliance, answer procedural questions, and manage any injects—pre-planned scenario twists like a simulated exchange hack or a sudden drop in token price—to test the teams' adaptability. Communication is often restricted to simulated channels (e.g., a dedicated Slack channel for 'internal alerts') to mimic real-world constraints. The goal is to create a high-fidelity, pressurized environment that tests both technical response and team coordination under stress.
A critical execution concept is the contained test environment. Attacks should target forked mainnet networks (using tools like Hardhat or Anvil), dedicated testnets, or isolated sandboxes—never production systems. For example, a Red Team might deploy a malicious, copycat token contract on a forked Ethereum network to attempt a phishing scam, while the Blue Team practices identifying the fraudulent contract address and blacklisting it in their front-end. This safety-first approach allows for aggressive testing without financial loss or network disruption.
Documentation is non-negotiable. The Red Team should maintain a detailed attack log noting each step, tool used, and outcome. The Blue Team's defense log should record every alert investigated, decision made, and action taken, such as pausing a mint function or upgrading a contract. These logs are the raw data for the post-mortem. The simulation typically runs for a fixed duration (e.g., 4-8 hours) or until a specific success condition is met, such as the exfiltration of a specific amount of test funds or the Blue Team successfully isolating the threat.
Post-Exercise Analysis and Reporting
The final, critical phase of a security exercise where findings are synthesized into actionable intelligence for protocol improvement.
The post-exercise phase transforms raw attack and defense data into a structured After-Action Report (AAR). This document is the primary deliverable, detailing the exercise timeline, attack vectors exploited, defensive measures tested, and the overall resilience of the system. It should objectively catalog both true positives (successful attacks) and false positives (defensive actions against non-threats), providing a complete picture of the security posture. The goal is not to assign blame but to create a shared, factual baseline for all stakeholders.
A robust analysis follows a standard methodology. Start with a timeline reconstruction, mapping the sequence of Red Team actions and Blue Team responses. For each critical event, perform root cause analysis to determine why a vulnerability existed or a detection failed. Was it a logic error in a smart contract, a misconfigured monitoring alert, or a gap in incident response procedures? Quantify the impact using metrics like Time to Detection (TTD) and Time to Resolution (TTR) to measure the Blue Team's operational efficiency.
The final report must prioritize findings. Use a risk-rating framework like DREAD (Damage, Reproducibility, Exploitability, Affected Users, Discoverability) or CVSS (Common Vulnerability Scoring System) to score each vulnerability. This creates a clear roadmap for remediation, distinguishing critical consensus-layer flaws from lower-severity UI issues. Include specific, verifiable evidence for each finding, such as transaction hashes, contract addresses, log excerpts, or screenshots from tools like Tenderly or Etherscan.
Beyond the written report, conduct a hotwash or debriefing session with all participants. This facilitated discussion allows the Red Team to explain their tactics, the Blue Team to share their internal decision-making, and developers to clarify system intent. It surfaces insights that raw data misses and fosters a blameless culture focused on systemic improvement. Record key takeaways and action items from this session directly in the report.
The ultimate output is a set of actionable recommendations tied to each finding. Recommendations should be specific and assigned: e.g., "Audit and refactor the withdraw() function in Contract X to prevent reentrancy (Owner: Dev Team, Due: Q3)." Track these recommendations in a project management system like Jira or Linear. The exercise's value is realized only when findings are fixed, monitoring is improved, and playbooks are updated, completing the security feedback loop.
Vulnerability Findings Template
A structured template for documenting security vulnerabilities discovered during a red team exercise.
| Field | Red Team (Offensive) | Blue Team (Defensive) | Auditor (Neutral) |
|---|---|---|---|
Vulnerability ID | RTE-2024-001 | BTR-2024-001 | AUD-2024-001 |
Severity | Critical | High | Critical |
Attack Vector | Remote | Network | Remote |
Proof of Concept | |||
Impact Description | Full contract drain | Temporary DOS | Full contract drain |
Fix Deadline | 24 hours | 7 days | 48 hours |
CWE Reference | CWE-862 | CWE-400 | CWE-862 |
Status | Exploited | Mitigated | Confirmed |
Tools and Resources
Practical tools, frameworks, and artifacts for running a red team vs. blue team security exercise that produces measurable improvements rather than ad hoc findings.
Threat Modeling and Scope Definition
Start by defining what the red team is allowed to break and why. A poorly scoped exercise produces noise or real damage.
Key steps:
- Identify crown jewels: private keys, admin multisigs, upgrade proxies, CI/CD secrets
- Define in-scope assets: smart contracts, RPC endpoints, off-chain services, monitoring stack
- Specify out-of-scope actions: mainnet funds movement, social engineering of executives, irreversible state changes
- Document rules of engagement: attack window, disclosure process, emergency stop
For Web3 teams, include chain-specific constraints like fork testing vs. mainnet simulation, and whether MEV-style attacks are permitted. Treat this document as a signed contract between red and blue teams to avoid ambiguity during exploitation.
Post-Exercise Reporting and Remediation
End the exercise with a joint red-blue debrief, not just a vulnerability list.
A useful report includes:
- Attack timeline correlated with logs and alerts
- Missed detections and root causes
- Defensive controls that worked as intended
- Clear remediation tasks with owners and deadlines
Translate findings into backlog items such as new alerts, contract invariants, runbooks, or access controls. Schedule a follow-up purple team validation to confirm fixes are effective.
Without structured remediation and retesting, red team vs. blue team exercises degrade into one-time demos rather than continuous security improvement.
Frequently Asked Questions
Common questions about structuring effective Red Team vs. Blue Team exercises for blockchain and smart contract security.
In Web3 security, Red Teams are offensive security experts who simulate real-world attackers. Their goal is to find and exploit vulnerabilities in a system, such as smart contracts, governance mechanisms, or node infrastructure, using the same tools and techniques as malicious actors.
Blue Teams are defensive security experts responsible for protecting the system. They monitor, detect, and respond to the Red Team's attacks, hardening defenses and improving incident response procedures. This adversarial simulation, often called a purple team exercise, creates a continuous feedback loop to measurably improve an organization's security posture.
Conclusion and Next Steps
A structured Red Team vs. Blue Team exercise is a powerful tool for hardening your protocol's security posture. This guide has outlined the core components: defining objectives, assembling teams, planning the attack lifecycle, and establishing a feedback loop. The final step is to operationalize these learnings and build a continuous security culture.
The immediate next step is to conduct a formal post-mortem analysis with all participants. This session should be blameless and focus on systemic issues, not individual performance. Document every finding: successful exploits, detection gaps, and procedural failures. Convert these findings into actionable tickets in your project management system, categorized by severity and assigned to specific owners. For example, a finding like "Flash loan oracle manipulation was not detected" should lead to tasks for implementing price sanity checks or circuit breakers.
To institutionalize security, integrate these exercises into your development lifecycle. Consider adopting a bug bounty program on platforms like Immunefi or Hats Finance to scale your Red Team efforts with external researchers. For the Blue Team, implement continuous monitoring tools such as Forta Network for on-chain anomaly detection or Tenderly for real-time transaction simulation and alerting. The goal is to shift security left, making it a continuous process rather than a periodic event.
Finally, measure your progress. Establish key security metrics (KPIs) to track over time, such as mean time to detect (MTTD) an attack, mean time to respond (MTTR), or the percentage of critical vulnerabilities discovered internally versus externally. Regularly scheduled exercises—quarterly for major protocol updates or biannually for established systems—ensure that security keeps pace with development. By treating security as an iterative, measurable discipline, you build more resilient and trustworthy decentralized systems.