How to Structure a Red Team vs. Blue Team Security Exercise

introduction

ADVERSARIAL SECURITY

How to Structure a Red Team vs. Blue Team Security Exercise

A structured guide to planning and executing effective red team vs. blue team exercises for Web3 protocols, focusing on smart contract and protocol-level vulnerabilities.

A red team vs. blue team exercise is a controlled, adversarial simulation where one group (Red Team) attacks a system, and another (Blue Team) defends it. In Web3, this methodology is critical for stress-testing smart contracts, oracles, governance mechanisms, and economic models beyond automated audits. The primary goal is to uncover complex, multi-step attack vectors and systemic weaknesses that automated tools or individual code reviews might miss, providing a realistic assessment of a protocol's resilience under pressure.

The first step is scoping and objective setting. Define the attack surface: Will the Red Team target the core protocol logic, the token economics, the front-end dApp, or the underlying infrastructure like RPC nodes? Establish clear Rules of Engagement (RoE) specifying allowed techniques (e.g., social engineering, front-running, governance manipulation) and off-limits actions (e.g., attacking third-party dependencies). Objectives should be specific, such as "drain 30% of the main liquidity pool" or "achieve a malicious governance proposal pass."

Next, assemble and brief the teams. The Red Team should consist of experienced security researchers or ethical hackers familiar with common Web3 exploit patterns like reentrancy, price oracle manipulation, and flash loan attacks. The Blue Team typically includes the protocol's core developers, DevOps engineers, and monitoring specialists. A neutral White Cell acts as referees, ensuring RoE compliance, injecting scenario events, and adjudicating disputes. Provide both teams with the same documentation, codebase access, and initial system state.

Execution follows a defined timeline, often 24-72 hours. The Red Team actively probes and exploits, while the Blue Team monitors on-chain activity, internal logs, and alerting systems (like OpenZeppelin Defender or Forta) to detect, analyze, and respond to incidents. The Blue Team's response might involve pausing contracts, executing emergency governance, or deploying patches. All actions, findings, and communications should be logged in a dedicated channel or platform for post-exercise analysis.

The most critical phase is the post-mortem and remediation. After the exercise, all parties convene for a debrief. The White Cell presents a timeline of events. Each team discusses their strategies, detection gaps, and response effectiveness. Every successful exploit is documented as a vulnerability report, complete with a proof-of-concept and a severity assessment. The protocol team then prioritizes these findings and creates a remediation plan, turning the exercise's lessons into concrete security improvements, such as upgraded contract logic or enhanced monitoring rules.

prerequisites

TEAM SETUP

How to Structure a Red Team vs. Blue Team Security Exercise

A structured red team vs. blue team exercise is a critical method for stress-testing blockchain protocols and smart contracts. This guide outlines the prerequisites and team composition needed to run an effective security simulation.

Before initiating an exercise, define a clear scope and objectives. The scope should specify the target system (e.g., a specific smart contract, a bridge protocol, or a validator client), the assets in play (testnet tokens, specific keys), and the rules of engagement. Objectives might include testing incident response procedures, validating monitoring alerts, or discovering unknown vulnerabilities. A well-defined Rules of Engagement (RoE) document is mandatory, outlining permitted and prohibited attack vectors, such as forbidding attacks on underlying infrastructure or third-party services.

Assembling the right teams is foundational. The Red Team acts as the adversarial force, simulating sophisticated attackers. This team requires deep expertise in blockchain internals, smart contract vulnerabilities (like reentrancy or logic errors), and common exploit techniques. They should operate with a threat intelligence mindset, crafting plausible attack narratives. The Blue Team is the defensive unit, typically composed of protocol developers, DevOps engineers, and security analysts. Their role is to detect, analyze, and respond to the Red Team's activities using monitoring tools, logs, and on-chain analytics.

A critical, often overlooked role is the White Team or exercise control. This neutral party acts as the referee and facilitator. They manage the exercise timeline, ensure adherence to the RoE, inject scenario events to increase realism, and serve as the communication hub between Red and Blue teams. The White Team is also responsible for the post-exercise debrief, where findings are discussed without attribution to foster a blameless culture focused on systemic improvement.

Technical prerequisites must be established in a controlled environment. This almost always means deploying the target system on a private testnet or staging environment that mirrors mainnet conditions. Both teams need access to necessary tooling: the Red Team requires frameworks like Foundry for exploit development and fuzzing, while the Blue Team needs monitoring stacks (e.g., Tenderly, Blocknative, custom indexers) and incident management platforms. All participants should have documented, role-specific access to these systems before the exercise clock starts.

Finally, establish clear success metrics and documentation procedures. Success isn't just about whether the Red Team "wins" by exploiting a flaw; it's measured by the Blue Team's Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR), the quality of forensic reports, and the number of validated security improvements identified. All actions, from attack payloads to alert logs, should be meticulously recorded to create an actionable report that drives concrete security enhancements to the protocol.

defining-scope-objectives

FOUNDATION

Step 1: Define Scope and Objectives

The success of a blockchain security exercise hinges on a meticulously defined scope and clear objectives. This initial phase establishes the rules of engagement, preventing scope creep and ensuring measurable outcomes.

Begin by explicitly stating the exercise's primary goal. Is the focus on testing the resilience of a new smart contract deployment, evaluating the incident response of a DeFi protocol's security team, or assessing the security of cross-chain bridge operations? A goal like "assess the oracle manipulation resistance of our lending protocol's liquidation logic" is far more actionable than a vague aim to "test security." This clarity directs all subsequent planning and resource allocation.

Next, define the in-scope assets and systems. This is a critical boundary that protects both the red team (attackers) and the blue team (defenders). The scope must specify the exact contract addresses, blockchain networks (e.g., Ethereum Mainnet, Arbitrum Sepolia testnet), front-end applications, and internal monitoring tools that are fair game. Crucially, it must also list out-of-scope elements, such as production databases, third-party infrastructure not under your control, or specific user funds. Documenting these limits in a Rules of Engagement (RoE) document is a security best practice.

Finally, establish success criteria and metrics. How will you measure the exercise's effectiveness? Objectives should be SMART: Specific, Measurable, Achievable, Relevant, and Time-bound. Examples include: "Identify at least three critical vulnerabilities (CVSS >= 7.0) in the new vault contract," "Achieve a mean time to detection (MTTD) of under 15 minutes for simulated attacks," or "Validate that all high-severity findings from the previous audit have been remediated." These metrics provide a concrete framework for the post-exercise analysis and report.

attack-scenarios

RED TEAM VS. BLUE TEAM

Step 2: Develop Attack Scenarios

Effective security exercises require realistic threat models. This step focuses on crafting specific attack vectors that challenge your protocol's defenses.

Define the Attack Surface

Map all potential entry points for an attacker. This includes:

Smart contract functions (e.g., mint, swap, stake)
Admin privileges and upgradeable contracts
Oracles and external data dependencies
User interfaces and front-end integrations
Cross-chain bridges and third-party dependencies Start by auditing your system's architecture to identify every component an adversary could target.

Craft Economic Attack Vectors

Simulate attacks that exploit financial incentives. Common scenarios include:

Flash loan attacks: Borrowing large sums to manipulate oracle prices or liquidity pool ratios.
Governance attacks: Acquiring voting power to pass malicious proposals.
MEV extraction: Front-running or sandwiching user transactions for profit.
Liquidation cascades: Triggering mass liquidations by manipulating collateral prices. Model the capital requirements and potential profit for each attack to assess feasibility.

Design Technical Exploits

Create scenarios targeting code vulnerabilities. Focus on:

Reentrancy: Exploiting state changes during external calls.
Logic errors: Flaws in business logic, like incorrect fee calculations or access control.
Integer overflows/underflows: Manipulating arithmetic operations.
Signature replay attacks: Reusing signed messages across different chains or contexts. Use tools like Slither or Foundry's fuzzing to help discover these weaknesses.

Simulate Operational & Social Attacks

Test non-technical weaknesses in your organization.

Private key compromise: Simulate a team member's key being leaked.
Malicious insider: An employee with privileged access acts adversarially.
Supply chain attack: A compromised dependency (like an NPM library) is introduced.
Phishing campaign: Attempt to trick team members into revealing credentials. These exercises test incident response plans and operational security (OpSec) protocols.

Prioritize with Impact/Likelihood Matrix

Rank your scenarios to focus testing efforts. Use a simple matrix:

High Impact, High Likelihood: Critical vulnerabilities (e.g., a flaw in a core vault). Prioritize these first.
High Impact, Low Likelihood: "Black swan" events (e.g., a major oracle failure). Plan contingency measures.
Low Impact, High Likelihood: Nuisance attacks (e.g., spam). Implement automated mitigations.
Low Impact, Low Likelihood: Monitor but deprioritize. This ensures efficient allocation of your red team's resources.

Document Scenario Playbooks

Create a detailed runbook for each major attack scenario. Each playbook should include:

Attack narrative: A story describing the attacker's goal and steps.
Technical prerequisites: Required tools, capital, and access.
Expected system behavior: How the protocol should react if defenses work.
Success criteria: Clear metrics for whether the attack succeeded.
Blue team clues: What logs, events, or anomalies the defense team should look for. This documentation is essential for consistent, repeatable exercises.

ROLE DEFINITION

Step 3: Assign Red and Blue Team Roles

Core responsibilities, required skills, and team composition for Red and Blue Teams in a security exercise.

Role / Attribute	Red Team (Attackers)	Blue Team (Defenders)	Purple Team (Facilitators)
Primary Objective	Simulate real-world attacks to find vulnerabilities	Detect, analyze, and respond to simulated attacks	Coordinate exercise and ensure knowledge transfer
Core Responsibilities	Reconnaissance, exploitation, lateral movement, persistence	Threat monitoring, alert triage, incident response, forensics	Rule-setting, scenario design, post-exercise debrief, gap analysis
Key Skills Required	Penetration testing, social engineering, exploit development	SIEM/SOAR operation, log analysis, threat hunting, containment	Project management, communication, threat modeling, risk assessment
Typical Team Size	2-5 specialists	5-10+ analysts and engineers	1-3 coordinators
Tools & Environment	C2 frameworks (e.g., Cobalt Strike), vulnerability scanners, custom payloads	EDR/XDR, SIEM, IDS/IPS, firewall logs, ticketing systems	Exercise platform, communication channels, reporting templates
Success Metrics	Critical vulnerabilities discovered, mean time to compromise (MTTC)	Mean time to detect (MTTD), mean time to respond (MTTR), false positive rate	Exercise completion, actionable findings documented, lessons learned implemented
Post-Exercise Output	Detailed attack narrative, proof-of-concept exploits, access artifacts	Incident report, updated detection rules, improved playbooks	Formal after-action report, risk register updates, training recommendations

execution-phase

OPERATIONAL PHASE

Step 4: Execute the Simulation

This phase transforms your planning into action, where the Red Team actively attacks and the Blue Team defends the target system.

The execution phase begins with a formal kickoff, where the facilitator confirms all teams are ready, reiterates the rules of engagement (ROE), and starts the official timer. The Red Team initiates their attack sequence based on their pre-defined playbook, which may include steps like initial reconnaissance, social engineering attempts, or exploiting a known vulnerability in a smart contract. Concurrently, the Blue Team activates their monitoring tools—such as blockchain explorers, security event managers (SEMs), and custom alerting scripts—to detect anomalous activity. All actions, from attack payloads to defensive countermeasures, must be meticulously logged with timestamps for later analysis.

During the live simulation, the facilitator's primary role is to orchestrate the exercise and ensure it remains within the established scope. They monitor the ROE for compliance, answer procedural questions, and manage any injects—pre-planned scenario twists like a simulated exchange hack or a sudden drop in token price—to test the teams' adaptability. Communication is often restricted to simulated channels (e.g., a dedicated Slack channel for 'internal alerts') to mimic real-world constraints. The goal is to create a high-fidelity, pressurized environment that tests both technical response and team coordination under stress.

A critical execution concept is the contained test environment. Attacks should target forked mainnet networks (using tools like Hardhat or Anvil), dedicated testnets, or isolated sandboxes—never production systems. For example, a Red Team might deploy a malicious, copycat token contract on a forked Ethereum network to attempt a phishing scam, while the Blue Team practices identifying the fraudulent contract address and blacklisting it in their front-end. This safety-first approach allows for aggressive testing without financial loss or network disruption.

Documentation is non-negotiable. The Red Team should maintain a detailed attack log noting each step, tool used, and outcome. The Blue Team's defense log should record every alert investigated, decision made, and action taken, such as pausing a mint function or upgrading a contract. These logs are the raw data for the post-mortem. The simulation typically runs for a fixed duration (e.g., 4-8 hours) or until a specific success condition is met, such as the exfiltration of a specific amount of test funds or the Blue Team successfully isolating the threat.

post-exercise-analysis

STEP 5

Post-Exercise Analysis and Reporting

The final, critical phase of a security exercise where findings are synthesized into actionable intelligence for protocol improvement.

The post-exercise phase transforms raw attack and defense data into a structured After-Action Report (AAR). This document is the primary deliverable, detailing the exercise timeline, attack vectors exploited, defensive measures tested, and the overall resilience of the system. It should objectively catalog both true positives (successful attacks) and false positives (defensive actions against non-threats), providing a complete picture of the security posture. The goal is not to assign blame but to create a shared, factual baseline for all stakeholders.

A robust analysis follows a standard methodology. Start with a timeline reconstruction, mapping the sequence of Red Team actions and Blue Team responses. For each critical event, perform root cause analysis to determine why a vulnerability existed or a detection failed. Was it a logic error in a smart contract, a misconfigured monitoring alert, or a gap in incident response procedures? Quantify the impact using metrics like Time to Detection (TTD) and Time to Resolution (TTR) to measure the Blue Team's operational efficiency.

The final report must prioritize findings. Use a risk-rating framework like DREAD (Damage, Reproducibility, Exploitability, Affected Users, Discoverability) or CVSS (Common Vulnerability Scoring System) to score each vulnerability. This creates a clear roadmap for remediation, distinguishing critical consensus-layer flaws from lower-severity UI issues. Include specific, verifiable evidence for each finding, such as transaction hashes, contract addresses, log excerpts, or screenshots from tools like Tenderly or Etherscan.

Beyond the written report, conduct a hotwash or debriefing session with all participants. This facilitated discussion allows the Red Team to explain their tactics, the Blue Team to share their internal decision-making, and developers to clarify system intent. It surfaces insights that raw data misses and fosters a blameless culture focused on systemic improvement. Record key takeaways and action items from this session directly in the report.

The ultimate output is a set of actionable recommendations tied to each finding. Recommendations should be specific and assigned: e.g., "Audit and refactor the withdraw() function in Contract X to prevent reentrancy (Owner: Dev Team, Due: Q3)." Track these recommendations in a project management system like Jira or Linear. The exercise's value is realized only when findings are fixed, monitoring is improved, and playbooks are updated, completing the security feedback loop.

REPORTING STANDARD

Vulnerability Findings Template

A structured template for documenting security vulnerabilities discovered during a red team exercise.

Field	Red Team (Offensive)	Blue Team (Defensive)	Auditor (Neutral)
Vulnerability ID	RTE-2024-001	BTR-2024-001	AUD-2024-001
Severity	Critical	High	Critical
Attack Vector	Remote	Network	Remote
Proof of Concept
Impact Description	Full contract drain	Temporary DOS	Full contract drain
Fix Deadline	24 hours	7 days	48 hours
CWE Reference	CWE-862	CWE-400	CWE-862
Status	Exploited	Mitigated	Confirmed

resource-links

Tools and Resources

Practical tools, frameworks, and artifacts for running a red team vs. blue team security exercise that produces measurable improvements rather than ad hoc findings.

Threat Modeling and Scope Definition

Start by defining what the red team is allowed to break and why. A poorly scoped exercise produces noise or real damage.

Key steps:

Identify crown jewels: private keys, admin multisigs, upgrade proxies, CI/CD secrets
Define in-scope assets: smart contracts, RPC endpoints, off-chain services, monitoring stack
Specify out-of-scope actions: mainnet funds movement, social engineering of executives, irreversible state changes
Document rules of engagement: attack window, disclosure process, emergency stop

For Web3 teams, include chain-specific constraints like fork testing vs. mainnet simulation, and whether MEV-style attacks are permitted. Treat this document as a signed contract between red and blue teams to avoid ambiguity during exploitation.

Attack Frameworks for Red Team Planning

Use established adversary frameworks to avoid random testing and ensure coverage across attack classes.

Recommended resources:

MITRE ATT&CK for tactics like initial access, persistence, and privilege escalation
Smart contract-specific patterns: reentrancy, oracle manipulation, signature replay, storage collision
Off-chain vectors: compromised CI runners, poisoned dependencies, leaked API keys

Map each planned attack to a framework category so the blue team can later measure detection gaps. This also enables purple team analysis, where both sides review which techniques were missed or misclassified.

Framework-driven planning makes results comparable across exercises and prevents over-focusing on one exploit class.

EXPLORE

Execution Tooling and Simulated Attacks

During execution, red teams should favor repeatable, observable attack techniques rather than bespoke one-offs.

Commonly used tools:

Atomic Red Team for scripted adversary techniques that can be replayed
Custom Foundry or Hardhat scripts to exploit contract logic under controlled conditions
Forked mainnet environments to simulate liquidity, oracle feeds, and MEV competition

Blue teams should run full detection stacks during the exercise:

On-chain monitoring and alerting
Log aggregation from RPC, indexers, and backend services
Pager and incident channels as if this were a real breach

The goal is to test detection and response latency, not just exploit feasibility.

EXPLORE

Incident Response and Metrics

A red team vs. blue team exercise is only valuable if you can measure response quality.

Track concrete metrics:

Time to detection (first alert after exploit attempt)
Time to triage (alert to human acknowledgment)
Time to containment (attack blocked or mitigated)
Signal quality: false positives vs. true positives

Align response procedures with NIST SP 800-61, including preparation, detection, containment, eradication, and recovery. For Web3 incidents, add chain-specific steps like pausing contracts, rotating keys, or coordinating with validators and RPC providers.

These metrics form the baseline for future exercises and security budget decisions.

EXPLORE

Post-Exercise Reporting and Remediation

End the exercise with a joint red-blue debrief, not just a vulnerability list.

A useful report includes:

Attack timeline correlated with logs and alerts
Missed detections and root causes
Defensive controls that worked as intended
Clear remediation tasks with owners and deadlines

Translate findings into backlog items such as new alerts, contract invariants, runbooks, or access controls. Schedule a follow-up purple team validation to confirm fixes are effective.

Without structured remediation and retesting, red team vs. blue team exercises degrade into one-time demos rather than continuous security improvement.

SECURITY EXERCISES

Frequently Asked Questions

Common questions about structuring effective Red Team vs. Blue Team exercises for blockchain and smart contract security.

In Web3 security, Red Teams are offensive security experts who simulate real-world attackers. Their goal is to find and exploit vulnerabilities in a system, such as smart contracts, governance mechanisms, or node infrastructure, using the same tools and techniques as malicious actors.

Blue Teams are defensive security experts responsible for protecting the system. They monitor, detect, and respond to the Red Team's attacks, hardening defenses and improving incident response procedures. This adversarial simulation, often called a purple team exercise, creates a continuous feedback loop to measurably improve an organization's security posture.

conclusion-next-steps

SECURITY EXERCISE

Conclusion and Next Steps

A structured Red Team vs. Blue Team exercise is a powerful tool for hardening your protocol's security posture. This guide has outlined the core components: defining objectives, assembling teams, planning the attack lifecycle, and establishing a feedback loop. The final step is to operationalize these learnings and build a continuous security culture.

The immediate next step is to conduct a formal post-mortem analysis with all participants. This session should be blameless and focus on systemic issues, not individual performance. Document every finding: successful exploits, detection gaps, and procedural failures. Convert these findings into actionable tickets in your project management system, categorized by severity and assigned to specific owners. For example, a finding like "Flash loan oracle manipulation was not detected" should lead to tasks for implementing price sanity checks or circuit breakers.

To institutionalize security, integrate these exercises into your development lifecycle. Consider adopting a bug bounty program on platforms like Immunefi or Hats Finance to scale your Red Team efforts with external researchers. For the Blue Team, implement continuous monitoring tools such as Forta Network for on-chain anomaly detection or Tenderly for real-time transaction simulation and alerting. The goal is to shift security left, making it a continuous process rather than a periodic event.

Finally, measure your progress. Establish key security metrics (KPIs) to track over time, such as mean time to detect (MTTD) an attack, mean time to respond (MTTR), or the percentage of critical vulnerabilities discovered internally versus externally. Regularly scheduled exercises—quarterly for major protocol updates or biannually for established systems—ensure that security keeps pace with development. By treating security as an iterative, measurable discipline, you build more resilient and trustworthy decentralized systems.