Scalability Benchmarks Under Adversarial Load

introduction

THE FLAWED METRICS

Introduction

Current blockchain benchmarks fail to measure performance under the adversarial conditions that define real-world usage.

Benchmarks are marketing tools. Published TPS and gas cost figures measure optimal, synthetic workloads, not the congested, spam-filled state of mainnet.

Real load is adversarial. Users compete with MEV bots, arbitrageurs, and spam transactions, creating a non-linear performance collapse that synthetic benchmarks ignore.

The industry lacks a standard. Unlike Solana's Turbine or Avalanche's Subnets, there is no common framework for measuring how consensus and execution layers degrade under attack.

Evidence: A network claiming 10,000 TPS in a lab often processes under 100 TPS during an NFT mint or a Uniswap token launch, where demand is real and malicious.

key-insights

THE LOAD TEST IMPERATIVE

Executive Summary

Current blockchain benchmarks measure performance in a sterile lab; the future demands testing under the adversarial load of a live, multi-billion dollar ecosystem.

The Problem: Synthetic Benchmarks Are a Lie

TPS and gas price metrics under ideal conditions are meaningless. Real-world performance collapses under network congestion, MEV bot spam, and oracle update storms.\n- Real Gap: A chain claiming 10k TPS often delivers <500 TPS during a mempool flood.\n- Hidden Cost: Latency spikes from 200ms to 5+ seconds under load, breaking DeFi arbitrage.

<5%

Real-World TPS

25x

Latency Spike

The Solution: Adversarial Load Generators

Simulate the worst-case traffic of Uniswap v4 hook auctions, NFT mints, and liquidations hitting the network simultaneously. This requires stateful, intelligent bots, not simple transaction blasters.\n- Key Metric: State Growth Rate under sustained spam (MB/sec).\n- True Benchmark: Finality time consistency during a simulated exploit hunt.

100k+

Simulated Users

$1B+

Simulated TVL

The New Standard: Economic Throughput

Measure value secured per second, not transactions. A chain processing $50M in stablecoin transfers is more scalable than one processing 1M NFT approvals. This aligns with L2 beat's TVL and EigenLayer's restaking metrics.\n- Core Metric: Adjusted TVL/sec = (Value Secured) / (Time to Finality).\n- Real Test: Can the chain's economic capacity support the next $10B+ DeFi protocol?

$TVL/sec

New Metric

10B+

Protocol Scale

The Arbiter: MEV-Resilient Latency

Scalability is useless if latency is unpredictable. Under adversarial load, proposer-builder separation (PBS) and encrypted mempools must be stress-tested. Compare Flashbots SUAVE vs. native chain ordering.\n- Key Insight: Jitter (latency variance) is more critical than average latency.\n- Failure Point: Can a $100M arbitrage opportunity be captured, or will it be front-run?

±200ms

Max Jitter

99.9%

Latency SLA

thesis-statement

THE DATA

Thesis: The Lab is a Lie

Current blockchain benchmarks measure performance in sterile, non-adversarial conditions, creating a dangerous gap between marketing claims and production reality.

Benchmarks are synthetic environments that fail to model real-world adversarial load. They test isolated chains with simple token transfers, ignoring the congestion from complex MEV bots, flash loan arbitrage, and NFT minting wars that define live networks.

Adversarial load exposes different bottlenecks. A chain optimized for sequential throughput like Solana fails under concurrent write contention, while parallel EVMs like Monad or Sei must prove their state access patterns under real arbitrage pressure.

The industry lacks a standard adversarial benchmark. We need a public mempool stress test that simulates the coordinated attack patterns seen during major airdrops or high-frequency DEX events on Uniswap and Aave.

Evidence: The Solana network congestion crisis of April 2024, where real TPS collapsed under bot spam despite a theoretical 65k TPS, proves the lab is a lie.

market-context

THE DATA

The Current State of Benchmarks

Current benchmarks fail to measure scalability under the adversarial conditions that break real-world systems.

Benchmarks measure ideal conditions. They test isolated, sequential transactions, ignoring the network effects and state contention that dominate mainnet performance. This creates a gap between lab results and user experience.

Adversarial load is the missing metric. Real users submit spam, arbitrage bots flood mempools, and MEV searchers create congestion. Benchmarks from Solana devnet or Arbitrum Nitro ignore this chaotic, parallel execution environment.

The industry lacks a standard. Projects self-report theoretical TPS using optimal transactions, while tools like Blockscout or Dune Analytics track real, degraded throughput. This discrepancy misinforms architectural decisions and capital allocation.

Evidence: A network claiming 100k TPS for simple transfers will process under 5k TPS when handling complex, conflicting operations like those on Uniswap during a market crash or an NFT mint.

key-trends

FUTURE OF BENCHMARKS

The Two Adversarial Loads That Matter

Current TPS benchmarks are marketing fluff. Real scalability is defined by performance under two specific, hostile conditions.

The Problem: Synthetic Spam

Networks fail when flooded with worthless transactions from a single, well-funded actor. This tests state bloat and mempool management, not just raw throughput.\n- Key Metric: Sustained TPS under a $1M+ spam attack\n- Real Consequence: Congestion for real users, fee market failure

>90%

TPS Drop

1000x

Fee Spike

The Solution: Economic Finality

Measure the time and cost to achieve un-reorgable settlement under load, not just probabilistic inclusion. This is the only metric for DeFi and bridges.\n- Key Metric: Time to $1B Cost-to-Censor threshold\n- Real Consequence: Security for protocols like Uniswap, Aave, and layerzero

<2s

Target

$1B+

Adversarial Cost

The Problem: MEV-Driven Congestion

Arbitrage and liquidation bots create bursty, high-value traffic that exploits block space allocation. This tests transaction ordering fairness and fee predictability.\n- Key Metric: Latency variance for a $10k priority fee transaction\n- Real Consequence: Unstable costs, frontrunning, failed liquidations

~500ms

Jitter

10-100x

Fee Volatility

The Solution: Contention-Weighted Throughput

Benchmark TPS when a significant percentage of transactions are competing for the same state slot (e.g., an NFT mint, a popular token pool).\n- Key Metric: TPS with >30% state contention\n- Real Consequence: Measures real-world bottlenecks, not ideal conditions

-70%

Effective TPS

High

Design Signal

Entity: Solana's Adversarial Load Test

The Solana network stress test is the industry's only real-world benchmark, exposing true limits under synthetic spam. It revealed critical QUIC and fee market flaws.\n- Key Lesson: Throughput is meaningless without client diversity and local fee markets\n- Result: Drove development of Agave, Jito, and Firedancer

~4000

Peak Grief TPS

Multiple

Outages Caused

The New Benchmark Stack

Future frameworks like Chainscore must simulate coordinated adversaries. This requires custom clients and economic modeling, not just load generators.\n- Key Component: MEV bot simulation with $10M+ capital\n- Output: A breakdown curve showing performance vs. adversarial spend

Synthetic

Adversaries

Breakdown Curve

Output

THE FUTURE OF BENCHMARKS

Adversarial Load Test: A Comparative Snapshot

Comparing how leading blockchain scaling solutions perform under coordinated, malicious traffic designed to degrade performance.

Adversarial Metric	Monolithic L1 (e.g., Solana)	Optimistic Rollup (e.g., Arbitrum)	ZK Rollup (e.g., zkSync Era)	Modular DA (e.g., Celestia + Rollup)
Peak TPS Under Spam (Sustained)	~3,000	~300	~600	10,000
State Growth Attack Mitigation
Sequencer Censorship Resistance	Low	Medium (7d delay)	High (ZK validity)	High (Multiple sequencers)
Cost of 1-Hr 50% Fill Attack	$50k	$200k	$500k	$2M
Time to Finality Under Load	< 1 sec	~1 min + 7 days	~10 min	~20 min
Data Availability Guarantee	On-chain	On L1 (expensive)	On L1 (compressed)	External (e.g., Celestia, EigenDA)
MEV Extraction Surface	High	Medium (Sequencer-dependent)	Low (ZK-provable batches)	Variable (Depends on rollup impl.)

deep-dive

THE FUTURE OF BENCHMARKS

Deep Dive: The Architecture of Resilience

Scalability metrics must evolve to measure performance under adversarial load, not just ideal conditions.

Adversarial load testing is the new benchmark standard. Current TPS figures from Solana or Arbitrum measure optimal throughput, ignoring the reality of mempool spam and MEV bots. True resilience is defined by a system's performance when its most expensive resource is saturated.

The resource exhaustion attack is the universal stress test. For an L1, this is compute; for a rollup, it's data availability via Celestia or EigenDA; for a bridge like Across, it's message capacity. Each layer has a unique breaking point that benchmarks must target.

Intent-based architectures inherently resist congestion. Protocols like UniswapX and CowSwap shift computation off-chain, making their throughput independent of chain load. This decouples user experience from base layer failures, a metric traditional benchmarks miss entirely.

Evidence: The 2022 Solana outages demonstrated that 65k TPS theoretical capacity collapsed under a few thousand spam transactions. A resilient benchmark would measure the sustained TPS after triggering the state growth or compute limit.

protocol-spotlight

ADVERSARIAL LOAD TESTING

Protocol Spotlight: Who's Built for Battle?

Real scalability is defined under maximum stress, not in a lab. These protocols are pioneering the metrics and mechanisms for the next generation of benchmarks.

Solana: The Throughput Baseline

The problem: Legacy benchmarks like TPS are meaningless under spam. The solution: Solana's real-world adversarial load from memecoons and arbitrage bots provides the industry's most brutal, public stress test.\n- Real Metric: Sustained 3k-5k TPS with ~400ms finality under live network spam.\n- Key Benefit: Firedancer client aims to push this to 1M TPS, proving horizontal scaling on a single chain.

3k-5k

Live TPS

400ms

Finality

EigenLayer & Restaking: The Economic Security Benchmark

The problem: How do you measure and scale cryptoeconomic security? The solution: EigenLayer creates a marketplace for pooled security (restaking), allowing AVSs to lease Ethereum's $50B+ staked ETH.\n- Real Metric: Restaked TVL becomes the key KPI for shared security capacity.\n- Key Benefit: Enables scalable security for hundreds of rollups and oracles (e.g., EigenDA, Hyperlane) without issuing new inflationary tokens.

$50B+

Securing Power

100+

AVS Capacity

Celestia & Modular Data Availability

The problem: Monolithic chains collapse under their own data bloat. The solution: Celestia decouples execution from data availability, allowing rollups to scale independently while inheriting security.\n- Real Metric: Data bandwidth per second and cost per byte are the new scalability constraints.\n- Key Benefit: Enables ~$0.001 settlement costs for high-throughput rollups, making adversarial spam economically non-viable.

$0.001

Settle Cost

100x

Blob Throughput

Arbitrum Nitro & Fraud Proofs Under Load

The problem: Optimistic rollups have a vulnerability window; can they defend it at scale? The solution: Arbitrum's Nitro architecture is battle-tested with $15B+ TVL, featuring multi-round fraud proofs and a WASM-based prover.\n- Real Metric: Challenge resolution time and cost of false assertion under maximum congestion.\n- Key Benefit: 7-day window secured by massive economic stake, making attacks financially irrational even at scale.

7 Days

Challenge Window

$15B+

Staked TVL

Sui & Move: Parallel Execution Frontier

The problem: Sequential execution is the primary bottleneck. The solution: Sui's object-centric model and Move language enable parallel execution of independent transactions, a fundamental shift.\n- Real Metric: CPU core utilization and conflict-free transaction rate define theoretical max throughput.\n- Key Benefit: Achieves >100k TPS in controlled, adversarial benchmarks where most transactions are independent transfers.

>100k

Theoretical TPS

100%

Core Usage

The Benchmark Itself: Redefining the Stack

The problem: Old metrics (TPS, Finality) are insufficient. The solution: The new benchmark stack measures adversarial throughput, time-to-censorship-resistance, and cost-of-attack.\n- Real Metric: Load-test nets like Anvil by Flashbots and Kurtosis packages simulate real-world spam and MEV attacks.\n- Key Benefit: Forces protocols to optimize for the worst-case scenario, not the happy path, separating production-ready chains from testnet heroes.

Worst-Case

Design Target

MEV & Spam

Test Vectors

counter-argument

THE REAL-WORLD METRICS

Counter-Argument: The Case for Optimism

Adversarial load testing, while a useful stress test, is not the sole determinant of a network's practical scalability or economic viability.

Adversarial load is artificial. The synthetic spam transactions used in benchmarks like Superscalar Papyrus or Solana's 100k TPS tests rarely reflect real economic activity. Real-world demand is bursty and heterogeneous, not a sustained, uniform flood of identical operations.

Economic security is the real throttle. A network's sustainable throughput is gated by validator/staker economics, not raw hardware. A chain that pays $1M daily in rewards cannot justify $10M in hardware costs, making extreme adversarial TPS figures economically irrelevant.

Optimistic Rollups already scale. Arbitrum and Optimism process orders of magnitude more complex, valuable transactions than their L1s under normal load. Their scalability is proven by real user adoption and TVL, not lab-based spam tests.

Evidence: The Ethereum L1 handles ~15 TPS but secures over $50B in L2 assets. This demonstrates that economic security and decentralization, not peak TPS under attack, are the foundational metrics for a scalable ecosystem.

takeaways

THE FUTURE OF BENCHMARKS

Takeaways: The New Benchmarking Framework

Adversarial load testing moves beyond synthetic benchmarks to measure how systems truly fail.

The Problem: Sybil-Resistance is a Latency Tax

Current benchmarks ignore the overhead of real-world consensus and fraud proofs. Measuring TPS in a vacuum is useless if the system chokes under a coordinated spam attack.\n- Real Cost: Proof-of-Stake finality adds ~2-12s latency vs. theoretical speeds.\n- Adversarial Metric: Must measure time-to-finality under >30% malicious validator load.

2-12s

Finality Lag

30%+

Attack Threshold

The Solution: Chaos Engineering for L2s

Inject failures like sequencer downtime, data availability (DA) layer outages, and multi-block reorgs. Systems like Arbitrum, Optimism, and zkSync must be stress-tested beyond happy paths.\n- Key Test: Worst-case time-to-escape hatch during a 7-day DA challenge.\n- Benchmark: Cost of forced inclusion during congestion vs. normal operation (100x+ fee spikes).

7-day

DA Challenge

100x

Fee Spike

The New KPI: Economic Throughput

Measure value secured per second, not just transactions. A system processing $10B in DeFi settlements at 200 TPS is more robust than one processing memecoins at 10,000 TPS.\n- Adversarial Load: Simulate flash loan attacks and MEV extraction waves on Uniswap and Aave.\n- Real Metric: TVL retained during a 30% market crash event with max extractable value (MEV) bots active.

$10B

Value Secured

30%

Crash Test

The Problem: Cross-Chain Benchmarks Are Fiction

Benchmarking LayerZero, Axelar, or Wormhole in isolation misses the systemic risk of chain reorganization (reorg) attacks. A fast bridge is a fragile bridge if it doesn't account for source chain finality.\n- Critical Gap: No standard for measuring bridge latency under a deep reorg (50+ blocks).\n- Real Cost: $2B+ in bridge hacks from ignoring adversarial network states.

50+

Block Reorg

$2B+

Hack Losses

The Solution: Adversarial Interop Suites

Test cross-domain messaging under simulated chain halts and governance attacks. How does Celestia's data availability affect Ethereum L2 safety? How does Polygon's checkpoint system fail?\n- Key Test: Time-to-fault-proof activation across a multi-L2 stack.\n- Benchmark: Message passing success rate during a correlated validator failure across >3 chains.

>3 chains

Correlated Failure

99.9%

Success Rate Target

The New Standard: Nakamoto Coefficient Under Load

The classic Nakamoto Coefficient is static. The new benchmark measures how it degrades under economic attack (e.g., stake slashing, validator churn). A chain with a coefficient of 50 at rest might drop to 5 under a $500M bribe attack.\n- Adversarial Metric: Cost-to-corrupt the consensus (in USD) during market volatility.\n- Real Data: Ethereum's coefficient shifts from ~4 (client diversity) to ~2 under extreme MEV conditions.

$500M

Bribe Attack Cost

4→2

Coefficient Drop

The Future of Benchmarks: Measuring Scalability Under Adversarial Load

Introduction

Executive Summary

The Problem: Synthetic Benchmarks Are a Lie

The Solution: Adversarial Load Generators

The New Standard: Economic Throughput

The Arbiter: MEV-Resilient Latency

Thesis: The Lab is a Lie

The Current State of Benchmarks

The Two Adversarial Loads That Matter

The Problem: Synthetic Spam

The Solution: Economic Finality

The Problem: MEV-Driven Congestion

The Solution: Contention-Weighted Throughput

Entity: Solana's Adversarial Load Test

The New Benchmark Stack

Adversarial Load Test: A Comparative Snapshot

Deep Dive: The Architecture of Resilience

Protocol Spotlight: Who's Built for Battle?

Solana: The Throughput Baseline

EigenLayer & Restaking: The Economic Security Benchmark

Celestia & Modular Data Availability

Arbitrum Nitro & Fraud Proofs Under Load

Sui & Move: Parallel Execution Frontier

The Benchmark Itself: Redefining the Stack

Counter-Argument: The Case for Optimism

Takeaways: The New Benchmarking Framework

The Problem: Sybil-Resistance is a Latency Tax

The Solution: Chaos Engineering for L2s

The New KPI: Economic Throughput

The Problem: Cross-Chain Benchmarks Are Fiction

The Solution: Adversarial Interop Suites

The New Standard: Nakamoto Coefficient Under Load

Get a free quote.

Get In Touch
today.

The Future of Benchmarks: Measuring Scalability Under Adversarial Load

Introduction

Executive Summary

The Problem: Synthetic Benchmarks Are a Lie

The Solution: Adversarial Load Generators

The New Standard: Economic Throughput

The Arbiter: MEV-Resilient Latency

Thesis: The Lab is a Lie

The Current State of Benchmarks

The Two Adversarial Loads That Matter

The Problem: Synthetic Spam

The Solution: Economic Finality

The Problem: MEV-Driven Congestion

The Solution: Contention-Weighted Throughput

Entity: Solana's Adversarial Load Test

The New Benchmark Stack

Adversarial Load Test: A Comparative Snapshot

Deep Dive: The Architecture of Resilience

Protocol Spotlight: Who's Built for Battle?

Solana: The Throughput Baseline

EigenLayer & Restaking: The Economic Security Benchmark

Celestia & Modular Data Availability

Arbitrum Nitro & Fraud Proofs Under Load

Sui & Move: Parallel Execution Frontier

The Benchmark Itself: Redefining the Stack

Counter-Argument: The Case for Optimism

Takeaways: The New Benchmarking Framework

The Problem: Sybil-Resistance is a Latency Tax

The Solution: Chaos Engineering for L2s

The New KPI: Economic Throughput

The Problem: Cross-Chain Benchmarks Are Fiction

The Solution: Adversarial Interop Suites

The New Standard: Nakamoto Coefficient Under Load

Get In Touch today.

Get In Touch
today.