Maximal Extractable Value (MEV) represents profit that can be extracted by reordering, including, or censoring transactions within blocks. An effective MEV detection system is a critical tool for researchers, searchers, and protocol developers to understand and quantify this phenomenon. At its core, such a system continuously monitors the public mempool and on-chain state to identify arbitrage, liquidations, and other profitable transaction bundles before they are mined. Architecting this system requires a modular approach, typically involving data ingestion, simulation, and strategy identification components.
How to Architect a MEV (Maximal Extractable Value) Detection System
How to Architect a MEV Detection System
A technical guide to building a system that identifies Maximal Extractable Value opportunities by analyzing the mempool and blockchain state.
The foundational layer is mempool data acquisition. You need a reliable connection to Ethereum (or other EVM chain) nodes via the JSON-RPC eth_subscribe method to stream pending transactions. For comprehensive coverage, connect to multiple nodes across different geographic regions, as mempool views can vary. Each transaction should be parsed to extract key fields: from, to, gasPrice, maxPriorityFeePerGas, input data, and value. Tools like Ethers.js or Viem can handle this decoding. It's crucial to timestamp each transaction upon receipt to analyze latency and propagation.
With raw transaction data, the next step is state simulation. This involves locally executing transactions against a recent state to predict their outcome. Use a forking provider from frameworks like Hardhat or Anvil to create a local copy of the chain. Before simulating a potential MEV bundle, you must first simulate the existing pending transactions in your proposed block order to establish a valid context. The goal is to calculate the state diff and profit (often in ETH) a searcher would gain from a specific transaction ordering. This requires managing a local EVM instance and tracking changes to token balances and pool reserves.
Detection logic identifies specific MEV opportunities. Common categories include: DEX arbitrage (price differences between Uniswap, Sushiswap, etc.), liquidations (on lending platforms like Aave or Compound), and sandwich trades. For arbitrage, your system must monitor price quotes from multiple DEX pools for the same token pair and flag discrepancies exceeding a threshold after factoring in gas costs. Liquidations require tracking users' health factors on lending protocols and watching for transactions that trigger a liquidation threshold. Each strategy module outputs a potential bundle of transactions and its estimated profit.
Finally, the system needs orchestration and output. A robust architecture uses a message queue (e.g., RabbitMQ, Redis) to decouple the data ingestion, simulation, and strategy workers. Opportunities should be logged to a database with fields for strategy_type, profit_wei, transaction_hashes_involved, and timestamp. For real-time alerts, integrate with a notification service. Remember, operating this system at scale requires significant infrastructure, as simulating every pending transaction is computationally expensive. Many teams use optimized, low-level EVM implementations like revm for faster simulation cycles.
Building your own MEV detection system provides unparalleled insight into blockchain dynamics. Start with a single chain and one strategy (like simple two-pool arbitrage) before scaling. Open-source projects like Flashbots' mev-inspect and Blocknative's Mempool Explorer offer references for data models and techniques. As you expand, consider the ethical and legal implications of acting on detected opportunities. This architecture forms the basis for advanced MEV research, searcher tools, and protective measures for decentralized applications.
Prerequisites and System Requirements
Building a system to detect Maximal Extractable Value (MEV) requires a robust technical foundation. This guide outlines the essential components, software, and infrastructure needed before you write your first line of detection logic.
A functional MEV detection system is built on a multi-layered stack. At its core, you need reliable access to blockchain data. This is typically achieved by running your own archive node (e.g., Geth, Erigon, or Nethermind) or subscribing to a node provider's WebSocket streams (like Alchemy, Infura, or QuickNode). An archive node is non-negotiable for historical analysis, as it stores the full state history, allowing you to reconstruct past transactions and their effects. For real-time detection, a node with pending transaction subscription (eth_subscribe) is critical to see transactions before they are mined.
The next layer is the data processing and indexing engine. You will need a backend service, often written in Go, Rust, or TypeScript/Node.js, to ingest the raw blockchain data. This service must decode transaction calldata using ABI definitions, parse event logs, and calculate state diffs. For efficiency, you will likely store processed data in a time-series database like TimescaleDB or a high-performance OLAP database. This allows for complex queries, such as identifying arbitrage opportunities across multiple blocks or tracking specific smart contract interactions over time.
Your detection logic requires a deep understanding of DeFi mechanics. You must model common MEV strategies: arbitrage (DEX price differences), liquidations (undercollateralized loans), and sandwich attacks (frontrunning user transactions). This involves maintaining real-time price feeds from oracles and DEX pools, tracking loan health in protocols like Aave or Compound, and simulating transaction execution using tools like Tenderly or a local EVM fork to estimate profitability before an opportunity is mined.
Finally, consider the operational requirements. MEV detection is latency-sensitive; your system must process and react to pending transactions in milliseconds. This demands a low-latency infrastructure, potentially in a geographically optimized cloud region. You will also need monitoring and alerting (e.g., Prometheus/Grafana) to track system health, missed opportunities, and false positives. Setting up this foundation correctly is more critical than the detection algorithms themselves, as a slow or unreliable data pipeline will render even the smartest strategy ineffective.
How to Architect a MEV Detection System
A practical guide to building a system that identifies and analyzes Maximal Extractable Value opportunities on Ethereum and other EVM chains.
A MEV detection system is a specialized data pipeline that monitors the mempool and blockchain state to identify profitable transaction ordering opportunities. The core architecture consists of three layers: a data ingestion layer that streams raw transactions and blocks, a processing and simulation layer that models transaction outcomes, and an analysis and alerting layer that surfaces profitable bundles or arbitrage paths. The primary challenge is processing high-volume, low-latency data to identify fleeting opportunities before they are included in a block by other searchers or builders.
The data ingestion layer is the system's foundation. It requires a reliable connection to a node's JSON-RPC endpoint for real-time access to the eth_getBlockByNumber, eth_getLogs, and eth_newPendingTransaction subscriptions. For production systems, using a service like Alchemy or QuickNode with WebSocket support is essential to avoid rate limits and ensure low-latency updates. This layer must parse and normalize incoming data, handling chain reorganizations and managing the backlog of pending transactions in a local mempool representation.
The processing layer is where opportunity detection happens. It involves simulating the outcome of potential transactions. For a simple arbitrage bot, this means fetching real-time price feeds from DEXes like Uniswap, calculating potential profit after gas costs, and constructing a candidate transaction. More advanced systems use a sandboxed EVM (like EthereumJS VM or Foundry's forge in script mode) to simulate complex multi-contract interactions and bundle proposals. This simulation must account for gas usage, slippage, and the state changes from other pending transactions in the mempool.
Here is a simplified code snippet demonstrating the core loop for monitoring pending transactions and checking for a simple DEX arbitrage opportunity between two pools:
javascriptconst Web3 = require('web3'); const web3 = new Web3('wss://your-node-provider.com/ws'); const subscription = web3.eth.subscribe('pendingTransactions', (error, txHash) => { if (!error) { web3.eth.getTransaction(txHash).then(tx => { if (tx.to === UNISWAP_ROUTER_ADDRESS) { analyzeTransactionForArbitrage(tx); } }); } }); async function analyzeTransactionForArbitrage(pendingTx) { // 1. Decode transaction input to understand the swap. // 2. Fetch current reserves from the involved liquidity pools. // 3. Simulate the swap's price impact. // 4. Check for a profitable back-run arbitrage on a different DEX. // 5. If profitable, construct and submit a bundle via a relay. }
The final analysis and alerting layer filters simulated opportunities based on profitability thresholds and risk parameters. It must decide whether to act by submitting a transaction bundle to a block builder via a service like Flashbots Protect RPC or a private relay. This layer should include logging, metrics collection (e.g., opportunities seen vs. acted upon), and alerting for system health. For research purposes, the output can be stored in a time-series database for later analysis of MEV trends and strategy backtesting.
Key considerations for a robust architecture include resilience to chain reorgs, efficient state management to avoid redundant RPC calls, and modular design to plug in different detection strategies (e.g., liquidations, NFT arbitrage). Open-source frameworks like ethers.js and viem provide essential utilities, while running a local archive node or using a specialized MEV data provider like Blocknative can significantly improve data access and system performance.
Key MEV Attack Vectors to Detect
Building a robust MEV detection system starts with understanding the adversarial patterns to monitor. This guide covers the core attack vectors that your architecture must identify.
Liquidation Arbitrage
Attackers compete to be the first to liquidate undercollateralized positions on lending protocols like Aave or Compound for a profit.
- Detection Signal: Rapid, profitable transactions triggered immediately after an oracle price update that pushes a loan below its health factor.
- Key Data: Track oracle price feeds and monitor for liquidation function calls within 1-2 blocks of a price change. High gas bidding is a strong indicator.
- Architecture Note: Your system needs real-time access to protocol-specific health factor data and mempool monitoring for liquidation bots.
NFT Marketplace Sniping
Bots monitor for mispriced NFT listings (e.g., below floor price) on marketplaces like Blur or OpenSea and instantly purchase them before the listing can be corrected.
- Detection Signal: A transaction buying an NFT within the same block it was listed, especially at a price significantly below the collection's current floor.
- Key Data: Index NFT listing events from marketplace contracts and cross-reference with real-time floor price data from aggregators. The time delta between
ListingandPurchaseevents is critical. - Example: A Bored Ape listed for 70 ETH (floor: 75 ETH) is bought by a bot in under 3 seconds.
How to Architect a MEV Detection System
A technical guide to designing and implementing a system that identifies Maximal Extractable Value opportunities on-chain.
A MEV detection system is a specialized data pipeline that monitors the public mempool and blockchain state for profitable transaction ordering opportunities. The core architecture consists of three layers: a data ingestion layer that streams raw transactions and block data, a heuristic processing layer that applies detection logic, and an execution layer that formulates and submits profitable bundles. The system must operate with sub-second latency, as opportunities like arbitrage or liquidations can be extracted in the 1-2 second window before a block is proposed. Key infrastructure choices include using a high-performance JSON-RPC provider like Alchemy or QuickNode, a stream-processing framework like Apache Flink or a simple event loop in Node.js, and a connection to a block builder or relay like Flashbots Protect to submit bundles privately.
The heuristic layer is the intelligence core. It applies specific rules to identify MEV. Common heuristics include: liquidation detection by monitoring loan health factors on Aave or Compound against oracle prices; arbitrage detection by simulating swaps across DEX pools (Uniswap V3, Curve) to find price discrepancies exceeding gas costs; and sandwich detection by identifying large pending DEX swaps that can be front-run and back-run. Each heuristic requires simulating transaction outcomes. For Ethereum, you use the eth_call RPC method with a modified state. For example, to check an arbitrage, you simulate a swap on DEX A, then the resulting tokens on DEX B, calculating profit in ETH. Libraries like ethers.js and viem provide simulation utilities, but complex multi-step logic may require a custom EVM tracer.
Implementing a detection bot requires careful state and event management. You must track the mempool for new transactions and the new block head event to reset state. A basic Node.js detector for DEX arbitrage might: 1. Subscribe to pendingTransactions via a WebSocket provider. 2. Filter for transactions to known DEX routers. 3. For each candidate, decode the calldata to get the swap path and amount. 4. Simulate the swap and a counter-trade on another DEX using eth_call. 5. If the simulated profit exceeds a threshold (e.g., 0.1 ETH after gas), construct a bundle transaction. False positives are common, so simulations must account for slippage, gas costs (using eth_estimateGas), and block base fee predictions. The code must also handle chain reorgs and failed simulations gracefully.
Beyond simple heuristics, advanced systems use machine learning and composability. ML models can predict optimal gas bids or identify novel MEV patterns from historical data. Composability involves chaining heuristics; a large loan liquidation might create a downstream arbitrage opportunity on the collateral asset. Detection systems often run in a closed feedback loop: submitted bundles are tracked, and their success or failure is used to refine heuristic parameters. Security is critical; your system's private key must be protected, and bundle construction must avoid triggering reverts that waste gas. Finally, consider the ethical and legal landscape; while arbitrage is generally neutral, sandwich attacking user transactions has negative externalities and may be restricted by some block builders.
MEV Type Detection Signatures
Key on-chain and mempool patterns used to identify and classify common MEV extraction strategies.
| Detection Signal | Arbitrage | Liquidations | Sandwich Attacks | NFT MEV |
|---|---|---|---|---|
Mempool Priority Fee Spike | ||||
Identical Token Pair Swaps Across DEXs | ||||
Flash Loan Utilization | ||||
Liquidation Function Call Sequence | ||||
Backrun of Large User Swap | ||||
Bundled Mint-List Transactions | ||||
Average Profit per Event | $500 - $5k+ | $1k - $50k+ | $100 - $2k | $10 - $10k+ |
Time Window for Execution | < 1 block | 1-3 blocks | Same block | 1-10 blocks |
How to Architect a MEV Detection System
This guide details the core components and code patterns for building a system to detect Maximal Extractable Value opportunities on Ethereum and other EVM chains.
A robust MEV detection system operates as a real-time data pipeline. It ingests raw blockchain data—primarily from the mempool and newly mined blocks—and processes it to identify profitable transaction orderings. The foundational architecture consists of three layers: a data ingestion layer using WebSocket connections to node providers like Alchemy or QuickNode, a simulation and analysis layer that models transaction outcomes, and an opportunity classification layer that flags specific MEV strategies such as arbitrage or liquidations. This modular design allows for scaling individual components, such as adding new blockchain networks or detection algorithms.
The data ingestion layer is critical. You must subscribe to pending transactions via eth_subscribe and new blocks via eth_subscribe('newHeads'). For high-frequency detection, consider running your own archive node or using specialized services like Flashbots Protect RPC to access the private transaction pool. The following Node.js snippet demonstrates a basic WebSocket listener using the ethers.js library to capture pending transactions, which are the primary source for frontrunning and sandwich attack opportunities.
javascriptconst { ethers } = require('ethers'); const provider = new ethers.providers.WebSocketProvider('YOUR_WS_ENDPOINT'); provider.on('pending', async (txHash) => { const tx = await provider.getTransaction(txHash); if (tx) { // Analyze transaction for MEV potential analyzeTransaction(tx); } });
Once you have transaction data, the core detection logic involves simulating state changes. For an arbitrage opportunity between Uniswap and Sushiswap, your system must simulate the outcome of executing a triangular arbitrage path. Use a local EVM via ganache-core or a forked network with hardhat or anvil. The simulation checks if the expected output asset amount exceeds the input after accounting for gas. A simplified check involves calculating the expected return using on-chain pool reserves fetched via the getReserves() function on the pair contracts, but accurate detection requires full execution simulation to account for fees, slippage, and intermediate contract logic.
Classifying and prioritizing opportunities is the final step. Not all detected MEV is actionable; your system must filter by profit potential, gas costs, and success probability. Implement a scoring engine that estimates net profit in ETH (Revenue - Gas Cost). For example, a profitable arbitrage must yield more than the current base fee plus priority fee for the bundle. You can integrate with MEV-relay APIs like the Flashbots Relay to estimate bundle inclusion costs. Store classified opportunities in a time-series database like TimescaleDB for historical analysis and to identify recurring patterns, which can improve your detection algorithms over time through heuristic tuning.
Essential Tools and Libraries
Building a robust MEV detection system requires specialized tools for data ingestion, simulation, and analysis. This guide covers the core components.
How to Architect a MEV Detection System
A technical guide to building a system that identifies and measures Maximal Extractable Value (MEV) on Ethereum and other blockchains.
Architecting a MEV detection system requires a modular approach to process blockchain data and identify profit extraction patterns. The core architecture typically consists of three layers: a data ingestion layer that streams raw blockchain data from nodes or services like Erigon or Alchemy, a processing layer that applies detection heuristics to transaction and block data, and a storage/analysis layer that aggregates and serves the results. The system must be designed for low-latency processing to keep pace with block production, which is approximately every 12 seconds on Ethereum. Using a message queue like Apache Kafka or RabbitMQ can decouple data ingestion from processing, ensuring reliability during network congestion.
The detection logic is the system's intelligence, relying on heuristics to spot common MEV strategies. Key patterns to detect include sandwich attacks, where a bot front-runs and back-runs a victim's DEX trade; arbitrage opportunities across decentralized exchanges like Uniswap and SushiSwap; and liquidations in lending protocols such as Aave. For sandwich attacks, the system scans for transaction pairs where a bot's trade appears immediately before and after a large user swap in the same block. Arbitrage detection involves monitoring price discrepancies for the same asset pair across multiple liquidity pools, flagging transactions that profit from the difference.
To quantify the extracted value accurately, the system must calculate profit in USD terms. This involves tracing the flow of assets within a detected MEV bundle, converting token amounts to their USD value at the exact block timestamp using a price oracle or historical price feed from Dune Analytics or a subgraph. For example, if a sandwich attack nets 10 ETH, the system would fetch the ETH/USD price from that block to determine the dollar value. It's crucial to account for gas costs, which can significantly reduce net profit, especially during periods of high network activity. Subtracting the gas paid (also converted to USD) from the gross profit yields the net extracted value.
Effective MEV metrics go beyond simple profit sums. A robust system should track temporal patterns (e.g., MEV volume per hour), actor concentration (identifying the most active searchers or builders), and victim impact (the total value lost by regular users). Storing this data in a time-series database like TimescaleDB or InfluxDB enables powerful trend analysis. Furthermore, correlating MEV activity with on-chain events—such as a major NFT mint or a new protocol launch—can reveal how external catalysts influence extraction opportunities. These metrics are essential for researchers analyzing market efficiency and for developers building protective tools like MEV-aware wallets.
Finally, deploying and scaling the system requires careful infrastructure choices. For production, run the ingestion and processing modules as containerized services using Docker and Kubernetes for easy scaling. Implement comprehensive logging and monitoring with tools like Prometheus and Grafana to track system health and detection accuracy. The codebase should be modular, allowing new detection strategies (e.g., for NFT arbitrage or JIT liquidity) to be added as plugins. Open-source projects like EigenPhi's analysis tools and the Flashbots MEV-Explore dataset provide valuable references and validation points for your own detection logic and metrics.
Frequently Asked Questions
Common technical questions about building systems to detect and analyze Maximal Extractable Value (MEV) opportunities on Ethereum and other EVM chains.
A basic MEV detection system typically follows a three-tier architecture:
- Data Ingestion Layer: Connects to blockchain nodes (e.g., Geth, Erigon) via JSON-RPC to stream new blocks and pending transactions from the mempool. This requires a low-latency connection.
- Simulation & Analysis Engine: The core component. It simulates transaction bundles locally using an EVM execution client (like a forked
ganacheinstance) to calculate potential profit from arbitrage, liquidations, or other strategies. Tools likeTenderly's API oreth_callsimulations are used here. - Alerting/Execution Layer: Formats profitable opportunities into a standardized data structure (e.g., a
Bundleobject with transactions, target block, etc.) and sends them to a searcher's bot or a dashboard.
Key libraries include ethers.js/web3.py for interaction, and viem for type-safe operations.
Further Reading and Resources
These resources cover the core primitives, datasets, and reference implementations needed to design and validate a production-grade MEV detection system. Each card focuses on a concrete component you can integrate or study directly.
Ethereum Execution Tracing and Debug APIs
Most accurate MEV detection systems rely on execution traces, not just transaction metadata. Ethereum clients expose debug and trace APIs that allow reconstruction of value flows inside a block.
Core components to study:
debug_traceTransactionanddebug_traceBlockByNumber- Call-level state diffs, internal transfers, and opcode execution
- Identification of profit extraction points within a transaction
- Correlating multiple transactions inside the same block
These APIs are available in clients like Geth, Erigon, and Nethermind, often requiring archive nodes. Mastering trace-based analysis is critical for detecting MEV that is invisible at the mempool layer.
Conclusion and Next Steps
This guide has outlined the core components for building a production-ready MEV detection system. Here's a summary of key takeaways and resources for further development.
Building a robust MEV detection system requires integrating several specialized components: a high-performance mempool listener (using tools like Flashbots Protect RPC or Geth's txpool API), a simulation engine (via Tenderly, Foundry's cast, or a local EVM), and a profitability calculator that accounts for gas, slippage, and fees. The system's effectiveness hinges on low-latency data ingestion and the precision of its transaction simulation to identify arbitrage, liquidations, and sandwich attack opportunities before they are included in a block.
For next steps, focus on system optimization and scaling. Implement parallel simulation to handle multiple candidate transactions simultaneously. Integrate with a searcher framework like the Flashbots SDK (flashbots/sdk) or mev-rs to bundle and submit profitable opportunities to builders or relays. Crucially, you must develop robust risk management logic to avoid failed transactions from incorrect simulations, which can waste significant gas. Continuously backtest your strategy against historical mempool data to refine your detection heuristics.
To deepen your understanding, explore these key resources: the Flashbots Docs for relay and bundle specifications, EigenPhi for real-time MEV dashboard and attack analysis, and academic papers like "Flash Boys 2.0" for foundational concepts. Engaging with the community on forums like the Flashbots Discord and reviewing open-source searcher repositories on GitHub will provide practical insights and keep you updated on evolving MEV tactics and defensive measures like SUAVE.