A MEV detection framework is a system that analyzes blockchain data to identify transactions where value is extracted by searchers, validators, or bots beyond standard block rewards and gas fees. The core challenge is distinguishing between legitimate arbitrage, liquidations, and DEX trades from exploitative front-running, sandwich attacks, and time-bandit attacks. Effective detection requires monitoring the mempool for pending transactions and comparing them against the final state changes in executed blocks. The primary data sources are an Ethereum node (or node provider API) for real-time mempool streaming and a blockchain indexer for historical state analysis.
How to Design a MEV Detection Framework
How to Design a MEV Detection Framework
A practical guide to architecting a system for identifying and analyzing Maximal Extractable Value (MEV) activity on Ethereum and other EVM chains.
The architectural design typically involves several key components. First, a mempool listener subscribes to pending transactions via the eth_subscribe JSON-RPC method. Second, a block subscriber captures each new block and its full transaction list. The core logic is a correlation engine that matches pending transactions from the mempool with their final execution outcome in the block. This allows you to detect if a profitable opportunity identified in the mempool was acted upon by another transaction that paid a higher gas fee to get included first—a hallmark of a sandwich attack. You can implement this in a language like Python or TypeScript using libraries such as web3.py or ethers.js.
To identify specific MEV patterns, you must define detection heuristics. For a sandwich attack, look for a victim's DEX trade that is preceded by a buy transaction and followed by a sell transaction from the same attacker address, all within the same block. Arbitrage is detected by finding a sequence of trades across multiple DEXs (e.g., Uniswap, SushiSwap) executed in one transaction where the ending token balance is greater than the starting balance, factoring in gas costs. Liquidation bots on protocols like Aave or Compound can be spotted by transactions that call the liquidationCall function and immediately sell the seized collateral. Storing these patterns as modular rules allows your framework to be extended.
Implementing the framework requires careful data handling. You should decode transaction inputs using ABI definitions to understand the exact function calls. Tracking internal transactions and event logs is crucial, as much of the profit extraction happens via token transfers within smart contract executions. For performance, consider using a dedicated database to store raw transactions, decoded events, and your analysis results. Open-source tools like the EigenPhi API or Flashbots MEV-Share schemas can provide reference models for transaction classification and profit calculation.
Finally, no detection framework is complete without a method for calculating the extracted value. This involves simulating the transaction's effect on the attacker's token balances, using pre- and post-block state queries for precise ERC-20 holdings. Convert all profits into a common denomination like ETH or USD using historical price oracles. By quantifying the impact, you can rank incidents by severity and generate actionable alerts. This systematic approach transforms raw blockchain data into intelligible insights on network activity and economic security.
Prerequisites and System Requirements
Before building a MEV detection framework, you need the right tools, data sources, and understanding of the blockchain execution layer.
A robust MEV detection framework requires a deep understanding of the Ethereum Virtual Machine (EVM) and the mempool. You should be comfortable with concepts like transaction lifecycle, gas mechanics, and block structure. Familiarity with MEV concepts such as arbitrage, liquidations, and sandwich attacks is essential. This guide assumes you have intermediate knowledge of blockchain fundamentals and basic proficiency in a programming language like Python, Go, or TypeScript for implementing detection logic.
Your primary technical requirement is reliable, low-latency access to blockchain data. This includes a full node or archive node (e.g., Geth, Erigon) for raw data, or a specialized provider like Chainscore for enriched, real-time mempool and block streams. You will also need access to a mempool data feed to observe pending transactions before they are mined. For historical analysis, services like Google's BigQuery public datasets or The Graph can be useful, but real-time detection demands a direct WebSocket connection to a node's transaction pool.
The core of detection is analyzing transaction sequences. You'll need to process and decode transactions using libraries like ethers.js, web3.py, or viem. Setting up a local database (e.g., PostgreSQL, TimescaleDB) is crucial for storing and querying detected events and patterns. For scalable event processing, consider a stream-processing framework like Apache Kafka or Flink. Your development environment should support these tools, and you'll need sufficient system resources—a machine with at least 16GB RAM and a multi-core CPU is recommended for handling high-throughput data.
Security and testing are non-negotiable. You must run your framework on a testnet (like Sepolia or Goerli) first to validate logic without financial risk. Use forked mainnet environments with tools like Hardhat or Foundry to simulate complex MEV scenarios. Implement comprehensive logging and monitoring (e.g., Prometheus, Grafana) to track system performance and detection accuracy. Finally, ensure you understand the legal and ethical considerations of MEV research, as interacting with live mempools on mainnet carries inherent risks.
How to Design a MEV Detection Framework
A modular framework for identifying and analyzing Maximal Extractable Value opportunities across blockchain networks.
A robust MEV detection framework is a multi-layered system designed to monitor, parse, and analyze blockchain data in real-time to identify profitable transaction orderings. The core architecture typically consists of three primary layers: a data ingestion layer that streams raw blocks and mempool data, a processing and analysis layer that applies detection heuristics, and a strategy and execution layer that formulates actionable opportunities. This separation of concerns allows for scalability, as each component can be optimized independently—for instance, using Go or Rust for high-throughput data ingestion and Python for complex analysis logic.
The data ingestion layer is the foundation. It must connect to reliable node providers (e.g., using WebSocket subscriptions to newHeads and pending transactions) or services like Flashbots Protect RPC to access the private mempool. This layer is responsible for normalizing data from different sources (Ethereum, Arbitrum, Base) into a common internal format. Critical tasks include parsing transaction calldata to decode function calls, tracking nonces and gas prices, and maintaining a low-latency connection to avoid missing fleeting arbitrage or liquidation opportunities that may exist for only a few blocks.
At the heart of the framework is the analysis engine. This is where predefined MEV bots and custom heuristics scan the normalized data stream. Common detection modules include: a sandwich detector looking for user DEX swaps surrounded by larger orders, a liquidator monitoring lending protocols like Aave for undercollateralized positions, and an arbitrageur searching for price discrepancies across DEXs like Uniswap and Curve. Each module emits standardized MEVOpportunity events containing details like target transactions, expected profit, and required capital. The complexity here lies in simulating transaction outcomes accurately using tools like Tenderly or a local ganache fork before flagging an opportunity.
Designing for resilience is non-negotiable. The framework must handle chain reorgs, RPC node failures, and false positives gracefully. Implement a state management system to track the lifecycle of each detected opportunity from PENDING to EXECUTED or EXPIRED. Use circuit breakers and rate limiting to prevent spam during network congestion. Furthermore, consider implementing a priority queue system for opportunities, ranking them by metrics like profit-per-gas (PPG) or success probability, to ensure the execution layer focuses on the most viable targets first when resources are constrained.
Finally, the strategy and execution layer receives validated opportunities. This component must manage private key security, gas estimation, and transaction bundling. For Ethereum, integration with a mev-geth node or a service like Flashbots is essential to submit bundles directly to validators, avoiding public mempool exposure. The architecture should allow for backtesting strategies against historical data and include comprehensive logging and metrics (e.g., opportunities detected per hour, win rate, average profit) to iteratively refine detection algorithms and improve the framework's profitability over time.
Essential Data Sources and Tools
Building a robust MEV detection system requires access to specialized data and analytical tools. This guide covers the core components for identifying and analyzing transaction-level arbitrage, liquidations, and sandwich attacks.
Step 1: Ingesting and Parsing Mempool Data
The first step in building a MEV detection system is establishing a reliable data pipeline. This involves connecting to blockchain nodes to access the mempool, the staging area for pending transactions, and structuring this raw data for analysis.
The mempool is a node's collection of unconfirmed transactions broadcast to the network. For MEV detection, you need a real-time, low-latency feed of this data. The most direct method is to run your own Ethereum execution client (like Geth or Nethermind) with WebSocket RPC enabled. This allows you to subscribe to the newPendingTransactions stream, receiving transaction hashes as they arrive. For higher throughput and reliability, specialized services like Chainscore's Mempool Stream API or Blocknative provide normalized, enriched feeds that aggregate data from multiple global nodes, reducing the risk of missing opportunities due to network latency or node-specific filtering.
Once you have a stream of transaction hashes, you must fetch the full transaction data. This is done by calling the eth_getTransactionByHash RPC method. The raw transaction object contains critical fields for MEV analysis: from, to, gasPrice or maxPriorityFeePerGas, maxFeePerGas, value, input (the calldata), and nonce. Parsing the input data is essential, as it reveals the target smart contract function and its arguments. For ERC-20 token transfers, this is where you decode the recipient address and transfer amount. For complex DeFi interactions, you need the contract's Application Binary Interface (ABI) to decode the function call.
Effective parsing requires structuring this data. A common approach is to create an internal data model, often a PendingTransaction object. This model should include the raw fields, a decoded function signature (e.g., swapExactTokensForETH), parsed arguments, and derived metrics like potential profit. For swaps, calculate the input/output token amounts and use an on-chain or off-chain price feed to estimate the USD value. This normalized data structure is what your downstream detection logic will analyze. It's crucial to handle decoding errors gracefully, as malformed calldata or unknown ABIs are common.
Latency is a critical bottleneck. The time between a transaction appearing in the mempool and being included in a block is often just seconds. Your ingestion pipeline must minimize delay at every stage: WebSocket connection stability, RPC call speed, and decoding efficiency. Implementing concurrent processing, using efficient JSON-RPC libraries, and caching frequently accessed ABIs can significantly improve performance. Monitoring metrics like ingestion-to-decode latency and missed transactions is necessary to tune the system.
Finally, consider data enrichment. Raw transaction data often lacks context. Enrichment involves cross-referencing addresses with labels (e.g., known arbitrage bots, DeFi protocols), calculating token prices at the time of the transaction, and simulating the transaction's state change using a tool like Tenderly or a local eth_call. This enriched context transforms a simple transaction into a candidate for MEV opportunity classification, setting the stage for the next step: identifying patterns and building detection heuristics.
Step 2: Detecting Common MEV Patterns
Learn how to build a systematic framework to identify and analyze prevalent MEV strategies on-chain.
A robust MEV detection framework operates by analyzing pending transactions in the mempool and comparing them to the resulting state changes on-chain. The core principle is to identify transaction sequences where the economic outcome for the searcher is only possible due to their privileged position in the transaction ordering. Your framework needs to monitor for specific on-chain footprints, such as sudden large swaps followed by immediate profit extraction, or the creation and liquidation of positions within a single block. Tools like Etherscan's transaction decoder and Tenderly's simulation API are essential for reconstructing these sequences.
To detect sandwich attacks, your system should scan for user swap transactions with high slippage tolerance. Look for a pair of surrounding transactions from the same entity: a front-running swap that moves the pool price, followed by a back-running swap that returns it. The profit is captured from the user's inflated price impact. For liquidations, monitor lending protocols like Aave or Compound for positions nearing their health factor threshold. Detect when a searcher's transaction supplies collateral, triggers the liquidation of a vulnerable position to claim the bonus, and then withdraws the initial collateral, all atomically.
Arbitrage detection focuses on price discrepancies across decentralized exchanges (DEXs). Your framework should track identical asset pairs (e.g., WETH/USDC) on multiple venues like Uniswap, Balancer, and Curve. An arbitrage opportunity is confirmed when a transaction executes a profitable cycle: buying low on one DEX and selling high on another, with the profit held in a net gain of the input token. This often requires analyzing complex multi-hop swap routes. Jito-style bundles on Solana or Flashbots bundles on Ethereum are common delivery mechanisms for these strategies, where searchers submit a package of transactions for validators to include as a unit.
Implementing detection requires accessing raw data. Use a node provider (Alchemy, QuickNode) for real-time mempool streaming and block data. For historical analysis, the Google BigQuery public datasets for Ethereum and Solana are invaluable. A simple detection script for a sandwich attack might: 1) Listen for pending swap transactions, 2) Simulate their price impact, 3) Search for surrounding swaps from the same msg.sender, and 4) Calculate if the middle transaction suffered worse execution and the surrounding trades profited. Open-source bots like the EigenPhi Inspector provide a reference for these logic flows.
Beyond single patterns, advanced detection involves clustering addresses and analyzing long-term behavior. Searchers often use contract factories or proxy contracts for each operation. By grouping transactions by funding source (e.g., a central EOA that provides gas) or withdrawal address, you can map the activity of a single entity. This helps in measuring the total extracted value and the sophistication of the searcher. Remember, your framework's goal is not to prevent MEV but to illuminate the landscape, providing data for users, developers, and researchers to build fairer systems.
MEV Strategy Classification Matrix
A framework for classifying MEV extraction strategies based on key operational characteristics.
| Classification Dimension | Arbitrage | Liquidation | Sandwich Trading | Long-tail (NFT/DeFi) |
|---|---|---|---|---|
Primary Profit Source | Price discrepancies across venues | Under-collateralized loan positions | Latency advantage over pending trades | Protocol-specific logic exploits |
Time Horizon | < 1 second | Seconds to minutes | < 500 milliseconds | Minutes to hours |
Required Capital | High | Very High | Medium | Low to Variable |
Automation Complexity | High (cross-DEX routing) | Medium (oracle monitoring) | Extreme (mempool sniping) | High (protocol-specific) |
Predictability | High (mathematical) | High (oracle-based) | Medium (behavioral) | Low (opportunistic) |
On-Chain Footprint | Large (multiple swaps) | Large (repay + seize) | Targeted (front/back-run) | Variable (often complex) |
Main Risk | Slippage & gas auction | Gas auction & bad debt | Detection & retaliation | Smart contract risk & failure |
Typical Profit Range | 0.1% - 0.5% of volume | 5% - 10% of position | 0.5% - 2.0% of victim trade | Variable, often >100% ROI |
Step 3: Quantifying MEV Extraction and Impact
This section details the methodology for measuring MEV, from identifying opportunities to calculating their financial impact and systemic effects on the blockchain.
A robust MEV detection framework requires moving from qualitative identification to quantitative measurement. The core metrics are extracted value and impact. Extracted value is the direct profit captured by searchers or validators, typically measured in ETH or USD. This is calculated by analyzing the state change in an actor's balance before and after a transaction sequence, accounting for gas costs. For example, a profitable arbitrage is quantified as (Output Asset Value) - (Input Asset Value + Gas Fees). Tools like the Flashbots MEV-Explore dashboard and EigenPhi provide aggregated data on these extracted profits across various MEV categories like arbitrage and liquidations.
Beyond simple profit, you must quantify the negative externalities and systemic impact of MEV. Key impact metrics include gas price spikes caused by bidding wars, latency sensitivity that disadvantages regular users, and chain reorganization (reorg) risk from validator manipulation. For instance, a series of profitable sandwich attacks might extract $50,000 in value but also increase the average gas price for the block by over 100 gwei, imposing significant costs on all other network participants. Measuring this requires comparing block-level statistics like average gas price and inclusion times against a baseline during periods of low MEV activity.
Implementing detection requires processing raw blockchain data. Start by streaming pending transactions from a node's mempool and finalized blocks from an archive node. For each block, reconstruct the transaction order and simulate state changes to identify profitable opportunities that were executed. A simple Python snippet using web3.py might calculate arbitrage profit: profit = (post_trade_balance - pre_trade_balance) - tx["gasUsed"] * tx["gasPrice"]. For sandwich attacks, you need to detect user transactions that were front-run and back-run by the same entity, checking for matching token pairs and direction.
To analyze the data, segment MEV by type (e.g., arbitrage, liquidation, sandwiching) and actor (e.g., searcher, validator, bot). Temporal analysis is also crucial: plot extracted value over time to correlate with market volatility or specific protocol launches. Furthermore, assess capture rate: what percentage of the theoretically available MEV in a block was actually extracted? This reveals the efficiency of the searcher network. Long-term, tracking these metrics helps research the MEV supply chain and informs the design of mitigations like Fair Sequencing Services or encrypted mempools.
Finally, contextualize your findings within broader ecosystem health. High levels of extractable MEV can indicate market inefficiencies or protocol design flaws. Persistent sandwich attacks on a specific DEX, for example, may signal the need for better slippage protection mechanisms. By quantifying both extraction and impact, your framework provides the empirical foundation needed to evaluate proposed solutions, from protocol-level changes (like CowSwap's batch auctions) to infrastructure upgrades (like MEV-Boost's relay architecture), ultimately aiming to reduce MEV's negative externalities.
Advanced Detection and ML Techniques
Building a robust MEV detection framework requires understanding transaction lifecycle analysis, data sourcing, and classification models. This section covers the core components and practical tools.
Step 4: Visualizing and Alerting on MEV Activity
After detecting MEV transactions, the next step is to build a system for real-time visualization and alerting to monitor network activity and respond to threats.
A MEV detection framework is only as useful as its output. The goal of this step is to transform raw transaction data into actionable intelligence through dashboards and notifications. Effective visualization helps researchers identify patterns—like the rise of a new sandwich attack bot—while alerting enables protocol teams to react to malicious transactions in real-time, such as pausing a vulnerable contract. This layer turns passive observation into active monitoring and defense.
For visualization, tools like Grafana connected to your indexed database (e.g., PostgreSQL with TimescaleDB) are industry standard. You can create panels tracking key metrics: total extracted value per block, the most active searcher addresses, dominant MEV strategies (arbitrage vs. liquidation), and affected protocols. Visualizing fee spikes or unusual success rates for certain transaction bundles can reveal the onset of generalized frontrunning. Public explorers like EigenPhi offer examples of such aggregated views.
Alerting requires defining specific, high-signal triggers. Common rules include: detecting a transaction bundle that extracts value above a threshold (e.g., >5 ETH), identifying a surge in failed arbitrage attempts from a single address (potential griefing), or spotting a complex interaction path that touches a newly deployed, unaudited contract. These alerts can be routed via PagerDuty, Slack webhooks, or Telegram bots. The logic can be implemented as a separate service that queries your database or listens to your enriched event stream.
Here is a simplified code snippet for a Python-based alert service that checks for large value extraction:
pythonimport psycopg2 import requests # Query for high-value MEV in recent blocks def check_mev_alerts(): conn = psycopg2.connect(DATABASE_URL) cur = conn.cursor() cur.execute(""" SELECT block_number, tx_hash, extractor, profit_eth FROM mev_transactions WHERE profit_eth > 5 AND inserted_at > NOW() - INTERVAL '5 minutes' ORDER BY profit_eth DESC """) large_mev = cur.fetchall() for alert in large_mev: block, tx, extractor, profit = alert message = f"Large MEV Alert: {profit} ETH extracted in block {block} by {extractor}" # Send to Slack requests.post(SLACK_WEBHOOK, json={'text': message}) cur.close() conn.close()
This service could be run periodically via a cron job or triggered by new database entries.
Integrating with on-chain defense systems is the advanced frontier of MEV alerting. Upon detecting a malicious pattern, your system could automatically trigger a Flashbots Protect RPC to shield user transactions, or submit a transaction to update a protocol's risk parameters. The final architecture—comprising detection, a data pipeline, a visualization layer, and an alerting system—creates a closed-loop for monitoring and mitigating MEV-related risks across the ecosystem.
MEV Detection Framework FAQ
Common questions and technical clarifications for developers building or integrating MEV detection systems.
MEV detection is the process of identifying and analyzing potential MEV opportunities by monitoring the mempool and blockchain state. It is a passive, observational process. MEV extraction is the active process of capitalizing on those opportunities by submitting transactions, often via bots, to capture the value.
A detection framework like Chainscore's API scans pending transactions to find patterns like arbitrage, liquidations, or sandwich attacks. It outputs data and alerts. Extraction requires a separate execution layer that submits transactions, manages gas, and handles private transaction relays like Flashbots Protect to avoid frontrunning.
Key Distinction: Detection is about seeing the opportunity; extraction is about seizing it. Most frameworks focus on detection to provide data for researchers, dashboards, or risk analysis tools.
Further Resources and Code Repositories
These tools, datasets, and research repositories help you design, validate, and extend a MEV detection framework in production or research environments. Each resource focuses on a concrete layer of the MEV detection stack, from raw mempool access to labeling and simulation.