How to Organize Execution for High Throughput

introduction

ARCHITECTURE

Introduction to High Throughput Execution

High throughput execution is the capability of a blockchain to process and confirm a large number of transactions per second (TPS). This guide explains the core architectural patterns for organizing execution to achieve this goal.

Traditional monolithic blockchains, where a single node sequentially executes all transactions, inherently limit throughput. High throughput execution is achieved by parallelizing this work. The key is to identify which transactions are independent and can be processed simultaneously without causing state conflicts. This requires a shift from sequential execution to parallel execution, where multiple CPU cores or even separate machines work on different parts of the transaction load concurrently. Blockchains like Solana and Sui have pioneered this approach, demonstrating that parallel execution is a fundamental requirement for scaling beyond a few hundred TPS.

Organizing execution for high throughput involves two main components: a scheduling mechanism and a state access model. The scheduler's job is to quickly determine dependencies between transactions. It does this by analyzing which parts of the blockchain's state—such as specific accounts, smart contract storage slots, or objects—each transaction plans to read and write. Transactions that touch disjoint sets of state can be scheduled in parallel. This is often implemented using software transactional memory (STM) concepts or directed acyclic graph (DAG)-based schedulers, which map dependencies before execution begins.

A critical design choice is the state model, which dictates how dependencies are identified. The account-based model (used by Solana) treats each account as an independent unit; transactions modifying different accounts can run in parallel. The object-based model (used by Sui and Aptos) allows for finer granularity, where each distinct smart contract object is a parallelizable unit. In contrast, the Ethereum Virtual Machine (EVM) uses a shared global state, which is harder to parallelize without modifications like those proposed in Ethereum's Pectra upgrade through EIP-7702 and other parallel EVM implementations.

To implement this, developers must structure their applications with parallelism in mind. For example, in an AMM, swaps involving different liquidity pool pairs are independent. A well-designed scheduler would recognize that a transaction swapping ETH/USDC and another swapping SOL/USDT do not conflict and dispatch them to different execution threads. State rent or storage fees can also incentivize developers to create smaller, more granular data structures (objects) rather than monolithic contracts, naturally increasing the potential for parallel execution.

The final step is deterministic execution and commitment. All parallel execution paths must produce a result that is identical to a hypothetical sequential execution. After parallel processing, the results are validated, and a deterministic, totally-ordered list of state changes is committed to the blockchain. This ensures network consensus is maintained. Frameworks like the Block-STM parallel execution engine, used by Aptos, demonstrate this by using a multi-version data structure and re-executing transactions only when conflicts are detected, optimizing for the common case where most transactions are independent.

prerequisites

PREREQUISITES AND CORE CONCEPTS

How to Organize Execution for High Throughput

Understanding the architectural patterns that enable high transaction throughput is essential for building scalable blockchain applications.

High throughput in blockchain execution is achieved by parallelizing transaction processing. Unlike sequential execution, which processes transactions one by one, parallel execution identifies independent transactions that can be processed simultaneously without causing state conflicts. This requires a deterministic scheduler that can analyze a transaction's read and write sets—the specific storage keys it accesses and modifies—before execution. Modern execution environments like the Aptos Block-STM and Sui's object-centric model implement this principle to scale performance linearly with available CPU cores.

The core challenge is managing state contention. Two transactions that write to the same storage location (e.g., the same ERC-20 token balance) cannot be executed in parallel. Effective organization involves designing your application's state to minimize these hot spots. Strategies include sharding state by user (e.g., user-specific vault contracts), using unrelated storage slots, and batching operations. For example, an NFT minting contract that increments a single global counter will bottleneck throughput, whereas one that uses a per-user nonce can process many mints in parallel.

To leverage parallel execution, developers must structure their smart contracts accordingly. In Move-based chains like Aptos, you use the #[resource] and #[key] annotations to define independent state objects. In Solana, you explicitly pass all accounts a transaction will touch via the Accounts struct, allowing the runtime to schedule non-overlapping transactions concurrently. The key is to maximize the number of transactions whose read-write sets do not intersect. Profiling tools like Aptos' Transaction Emitter can help identify contention points in your dApp's workflow.

Beyond smart contract design, off-chain sequencing and rollup architectures are critical for organizing execution at the protocol level. A sequencer (like in Arbitrum or Optimism) orders transactions off-chain before submitting a compressed batch to L1, drastically increasing throughput. Rollups execute transactions in a dedicated, high-performance environment and only use the base layer for consensus and data availability. Choosing the right execution layer—be it an EVM L2, a Solana program, or an Aptos module—fundamentally dictates how you organize your application for scale.

Implementing high-throughput execution requires a shift in mindset from serial to concurrent programming. Start by auditing your contract for shared global state, refactor it into isolated objects or accounts, and utilize your chosen chain's parallel execution APIs. Testing under load with realistic transaction patterns is non-negotiable to validate that your organization strategy effectively minimizes contention and unlocks the promised performance gains.

key-concepts-text

KEY ARCHITECTURAL CONCEPTS

How to Organize Execution for High Throughput

Achieving high transaction throughput in blockchain systems requires deliberate architectural choices. This guide explores the core concepts of parallel execution, state management, and consensus separation that enable scalable performance.

The primary bottleneck in traditional blockchains like Ethereum is sequential execution. Transactions are processed one after another, even when they don't conflict, wasting computational resources. High-throughput architectures adopt parallel execution engines that process non-conflicting transactions simultaneously. This is analogous to multi-core processors versus single-core. Key implementations include Solana's Sealevel, Aptos' Block-STM, and Sui's object-centric model. These engines require a method to detect which transactions can be run in parallel, typically by analyzing their access patterns to shared state.

Effective parallel execution depends on intelligent state management. Transactions that read or write to the same memory location (e.g., the same Account or Storage slot) create a conflict and must be sequenced. Systems use a dependency graph or conflict detection to identify these relationships. For example, Aptos' Block-STM uses software transactional memory to optimistically execute all transactions in parallel, then re-execute only those with conflicts. The granularity of state access—whether at the account, contract, or key-value level—directly impacts the degree of achievable parallelism.

To maximize throughput, the consensus layer must be decoupled from execution. In a modular architecture, consensus nodes (validators) only agree on the ordering of transactions, not their outcome. Dedicated execution nodes then process the ordered batch in parallel. This separation, seen in systems like Celestia's rollups and Polygon Avail, allows execution to scale horizontally. The consensus layer provides a canonical transaction log, while execution becomes a stateless computation layer that can be replicated and sharded.

Sharding is the logical extension of parallel execution, dividing the network state into multiple partitions or shards. Each shard processes its transactions independently and in parallel with others. Ethereum's roadmap with Danksharding and Near's Nightshade are prominent examples. The challenge lies in secure cross-shard communication and maintaining composability. Effective sharding requires a robust cross-shard messaging protocol and a beacon chain or relay mechanism to coordinate finality across all shards.

Implementing these concepts requires careful trade-offs. Parallel execution increases hardware requirements (CPU cores, RAM). Decoupling consensus adds latency for cross-domain communication. Sharding can fragment liquidity and complicate developer experience. The optimal architecture depends on the application: a high-frequency DEX prioritizes low-latency parallel execution within a single shard, while a global payment network might prioritize sharding for maximum total throughput. Tools like parallel EVMs (Monad, Sei) aim to bring these benefits to existing ecosystems by modifying client software.

ARCHITECTURAL APPROACHES

Execution Model Comparison

Comparison of core execution models for high-throughput blockchain applications.

Feature	Monolithic	Modular Execution	Parallel Execution
Throughput (TPS)	1,000-5,000	10,000-50,000+	50,000-100,000+
Latency (Finality)	5-15 sec	2-5 sec	< 2 sec
State Access	Sequential	Sharded/Partitioned	Concurrent
Developer Complexity	Low	Medium	High
Cross-Shard/Core TX	N/A	Required	Required
Gas Fee Volatility	High	Low	Very Low
Example Protocols	Ethereum L1, Solana	Ethereum L2s (Arbitrum, Optimism), Celestia	Aptos, Sui, Monad

parallelization-strategies

ARCHITECTURE

Parallel Execution Strategies

Techniques for organizing transaction processing to maximize throughput and minimize latency in blockchain systems.

Block-STM (Solana, Aptos, Sui)

A software transactional memory (STM) approach for optimistic parallel execution.

Transactions are speculatively executed in parallel against a shared state.
A post-execution validation phase detects conflicts using a read/write set.
Conflicting transactions are re-executed sequentially.
This model achieves high throughput when conflicts are low, typical for payments and NFT transfers.

EXPLORE

Sharded Execution (Ethereum, Near, Zilliqa)

Partitions the network state into distinct shards that process transactions independently.

Each shard maintains its own subset of accounts and smart contract state.
Cross-shard communication requires asynchronous messaging, adding complexity.
Throughput scales nearly linearly with the number of shards.
Security is maintained by randomly assigning validators to shards.

EXPLORE

Actor Model (Sui Move)

Structures state into independent objects owned by specific addresses, enabling implicit parallelism.

Transactions that touch disjoint sets of owned objects can execute in parallel with no conflicts.
The type system in Sui Move enforces ownership rules at the bytecode level.
This design is optimal for high-throughput applications like gaming and social feeds where user assets are independent.

EXPLORE

DAG-Based Ordering (Avalanche, Narwhal)

Uses a Directed Acyclic Graph (DAG) for transaction dissemination and ordering, decoupling it from execution.

Narwhal is a mempool protocol that provides high-throughput data availability for transactions.
Execution engines like Bullshark or Tusk can then order and process this DAG in parallel.
This separation allows different consensus and execution mechanisms to be plugged in.

EXPLORE

Deterministic Parallelism (FuelVM)

Requires developers to explicitly declare which state a transaction will access using script predicates.

The Virtual Machine schedules transactions with non-overlapping access sets to run in parallel.
This shifts the burden of identifying parallelism to the developer but guarantees predictable performance.
It eliminates re-execution overhead, making it suitable for high-compute DeFi applications.

EXPLORE

Optimistic Concurrency Control

A database-inspired technique used in blockchains like Monad.

Similar to Block-STM but often implemented at the EVM bytecode level with enhanced state access tracking.
It employs pipelining across multiple stages: fetching state, executing, validating, and committing.
Advanced implementations use a deterministic just-in-time (JIT) compiler to optimize execution paths and reordering.

state-management-deep-dive

ARCHITECTURE

Optimizing State Management

High-throughput blockchain applications require efficient state organization to avoid bottlenecks and reduce gas costs. This guide covers strategies for structuring your smart contract's execution and data flow.

High-throughput applications, such as decentralized exchanges (DEXs) or gaming protocols, process thousands of transactions. Inefficient state management is the primary bottleneck, leading to high gas fees and slow execution. The core challenge is minimizing on-chain storage operations—SSTORE and SLOAD—which are the most expensive EVM opcodes. Effective organization focuses on data locality, batching updates, and gas-efficient data structures to reduce the frequency and cost of these operations.

Adopt a state machine pattern to organize execution flow. Define clear states (e.g., PENDING, EXECUTING, FINALIZED) and restrict state-modifying functions to specific transitions. This prevents invalid state changes and reduces redundant checks. For example, an order-matching engine should only update a trade's status from OPEN to FILLED within a dedicated executeTrade function, ensuring atomic and predictable state updates. Use require statements or custom modifiers to enforce these guards.

Batch state updates to amortize fixed transaction costs. Instead of writing individual user balances in a loop, aggregate changes off-chain and submit a single transaction with a merkle root or a diff. Layer 2 solutions like Optimistic Rollups and zk-Rollups exemplify this by executing batches off-chain and posting compressed proofs on-chain. For on-chain batching, consider using mappings with incremental counters or bitmaps to track changes within a single block or transaction.

Optimize data structures for frequent access patterns. Use uint256 for packed data, mapping for O(1) lookups, and consider EIP-2929 gas cost implications for accessed storage slots. For iterable data, combine a mapping with an array (e.g., mapping(address => UserData) users; address[] userList). Store frequently accessed, immutable data in immutable or constant variables, which are stored in contract bytecode, not storage. Use events for data that doesn't need on-chain retrieval.

Implement access control and validation at the state layer. Use function modifiers like onlyOwner or whenNotPaused to prevent unauthorized state mutations. For complex validation, compute proofs off-chain (e.g., with Merkle trees) and verify them on-chain with a single function call. Libraries like OpenZeppelin's ReentrancyGuard and Pausable provide standardized patterns for secure state transitions. Always audit state changes for reentrancy and front-running vulnerabilities.

Profile and test your gas usage with tools like Hardhat's gasReporter, eth-gas-reporter, or Foundry's forge snapshot. Compare different state organization strategies (e.g., struct packing vs. separate variables) using real transaction simulations. Monitor key metrics: gas per transaction, storage slot usage, and calldata size. Reference established patterns from high-throughput protocols like Uniswap V3 (concentrated liquidity ticks) or Aave (interest rate indices) for proven state management architectures.

implementation-tools

HIGH THROUGHPUT EXECUTION

Implementation Tools and Frameworks

Tools and architectural patterns to structure your application for maximum transaction throughput and efficient state management.

Parallel Execution Engines

Modern blockchains like Aptos and Sui use parallel execution to process non-conflicting transactions simultaneously, dramatically increasing throughput. Key concepts include:

Move-based data models that define explicit resource ownership.
Software Transactional Memory (STM) for optimistic concurrency control.
Dependency graphs to identify independent transactions. Implementing similar patterns in your dApp's architecture can reduce bottlenecks.

EXPLORE

State Channels & Rollups

Move computation and state updates off-chain to achieve near-instant finality and high throughput for user interactions.

State channels (e.g., Connext, Raiden) are ideal for repeated, bidirectional exchanges.
Optimistic Rollups (Arbitrum, Optimism) batch thousands of transactions into a single L1 proof.
ZK-Rollups (zkSync, StarkNet) use validity proofs for secure, fast withdrawals. These are foundational for building scalable payment systems or high-frequency trading dApps.

EXPLORE

Sharding & Horizontal Scaling

Distribute network load across multiple chains or sub-chains to scale capacity linearly. Implementation approaches include:

Ethereum Danksharding: Data availability sampling for rollups.
Cosmos SDK & IBC: Deploying application-specific blockchains that interoperate.
Polygon Supernets: Dedicated, Ethereum-aligned blockchains. Sharding requires careful design of cross-shard communication and consensus mechanisms.

EXPLORE

Asynchronous Programming & Non-Blocking I/O

Node.js and Rust's async/await patterns are critical for backend services interacting with blockchains. This prevents thread blocking during RPC calls to nodes.

Use web3.js or ethers.js Promises for non-blocking contract calls.
Implement connection pooling for database and RPC endpoints.
Structure event listeners to process logs without halting other operations. This is essential for indexers, bots, and any service handling concurrent user requests.

EXPLORE

Caching & Mempool Strategies

Reduce latency and load by caching frequent data and intelligently managing transaction queues.

Redis or Memcached for caching token prices, user balances, and contract states.
Private mempools (Flashbots MEV-Share) for transaction ordering and privacy.
Gas estimation services to avoid underpriced transactions that clog queues. Effective caching can reduce RPC calls by over 90% for read-heavy applications.

EXPLORE

Modular Data Pipelines

Decouple data ingestion, processing, and serving to handle high-volume on-chain data. A typical pipeline includes:

Ingestion: Use The Graph subgraphs or run an indexer with TrueBlocks.
Processing: Transform data with Apache Kafka streams or Apache Flink.
Serving: Expose APIs via GraphQL for efficient client queries. This architecture is necessary for analytics dashboards, portfolio trackers, and complex DeFi strategies.

EXPLORE

ARCHITECTURE

Platform-Specific Implementations

Optimistic & ZK-Rollup Strategies

High-throughput execution on Ethereum is primarily achieved via Layer 2 rollups. Optimistic Rollups (Arbitrum, Optimism) batch transactions off-chain, posting only state roots and compressed calldata to L1. They rely on a 7-day fraud proof window for security, prioritizing compatibility with the EVM.

ZK-Rollups (zkSync Era, Starknet, Polygon zkEVM) generate cryptographic validity proofs (ZK-SNARKs/STARKs) for each batch, enabling near-instant finality on L1. This requires specialized virtual machines (e.g., zkEVM) and proving hardware, trading some developer convenience for superior throughput and security.

Key Implementation Focus:

Use calldata compression (e.g., Brotli) to minimize L1 data costs.
Architect for sequencer/prover decentralization.
Design contracts with batchability in mind to maximize L2 efficiency.

HIGH THROUGHPUT EXECUTION

Common Implementation Mistakes

Optimizing for high throughput requires more than just a fast execution client. Common architectural and configuration mistakes can create bottlenecks that severely limit performance.

This often indicates a synchronization bottleneck where components are waiting for each other. The execution client (e.g., Geth, Erigon) may be processing blocks faster than the consensus client (e.g., Lighthouse, Prysm) can validate them, or vice-versa.

Common causes:

Mismatched hardware specs: An overpowered execution client paired with an underpowered consensus client (or vice-versa).
I/O bottlenecks: Using a slow disk (HDD) for the chain database, causing the execution client to stall on state reads/writes.
Resource contention: Running multiple resource-intensive services (RPC, validator, MEV-boost) on the same machine without adequate CPU/memory isolation.

How to diagnose:

Monitor individual process CPU usage (htop, docker stats).
Check logs for repeated warnings about "behind chain head" or "syncing".
Use client-specific metrics (Prometheus/Grafana) to track processing queue depths.

HIGH-THROUGHPUT EXECUTION

Frequently Asked Questions

Common questions and solutions for developers optimizing blockchain transaction execution for speed and scale.

The primary bottleneck is block gas limit, not block time. Each block can only process a finite amount of computational work, measured in gas. For example, the Ethereum mainnet gas limit is ~30 million gas per block. A standard ERC-20 transfer uses ~65,000 gas, theoretically limiting a block to ~460 such transfers. High-throughput execution focuses on gas efficiency—doing more work per unit of gas—and designing systems that minimize on-chain operations through techniques like state channels, validity proofs, or optimized contract logic that reduces storage writes and expensive opcode usage.

resource-links

DEEPER TECHNICAL GUIDES

Further Resources

These resources focus on execution-layer design patterns, tooling, and research used by high-throughput blockchains. They help developers understand how to organize transaction execution, reduce contention, and scale compute without breaking determinism.

Parallel Transaction Execution Models

Parallel execution is the primary lever for increasing throughput on stateful blockchains. Instead of executing transactions sequentially, systems analyze read and write dependencies to run non-conflicting transactions concurrently.

Key approaches used in production chains:

Optimistic parallelism: Execute transactions in parallel, then re-run conflicts. Used by Solana and Firedancer.
Deterministic parallelism: Pre-compute access lists to guarantee conflict-free scheduling. Used by Aptos Block-STM.
Static vs dynamic dependency graphs: Static graphs rely on declared storage access. Dynamic graphs detect conflicts at runtime.

Developers building high-throughput execution layers must understand:

State access granularity (account-level vs storage-slot-level)
Cost of conflict resolution and rollbacks
How block builders batch transactions to maximize parallelism

This model determines CPU utilization, validator hardware requirements, and worst-case execution latency.

Aptos Block-STM Architecture

Block-STM is a production-grade parallel execution engine designed for deterministic replay across validators. It combines speculative execution with software transactional memory.

Key design elements:

Optimistic concurrency control with versioned storage
Deterministic re-execution to guarantee consensus safety
Automatic transaction reordering when write conflicts occur
Parallel VM execution across CPU cores

Why it matters for throughput:

Achieves linear scaling with available cores under low contention
Avoids non-determinism common in naive parallel models
Enables developers to write standard smart contracts without manually managing access lists

Studying Block-STM is useful even outside Aptos. The design patterns translate directly to custom execution engines and rollup VMs aiming to break single-threaded limits.

EXPLORE

Solana Runtime and Sealevel

Sealevel is Solana’s parallel smart contract runtime. It enforces explicit account access declarations, allowing the runtime to schedule transactions safely across threads.

Key execution constraints:

Transactions declare read/write account sets upfront
No shared global storage; all state is account-scoped
Runtime locks accounts to avoid conflicting writes

Throughput implications:

Parallelism scales with account isolation
Poor account modeling becomes a performance bottleneck
Programs must be architected around composable, fine-grained state

Solana regularly processes thousands of transactions per second under load. Understanding Sealevel is critical if you are designing:

High-frequency DeFi protocols
Order books or auctions
Execution models that trade developer ergonomics for raw throughput

EXPLORE

Execution vs Data Availability Separation

Modern high-throughput systems increasingly decouple execution from data availability (DA). Instead of every node re-executing all transactions, execution happens in specialized environments, while DA layers ensure data is published and verifiable.

Common architectures:

Rollups executing transactions off-chain using custom VMs
DA layers like Celestia storing raw transaction data
Fraud or validity proofs enforcing correctness

Benefits for execution scaling:

Execution environments can optimize for CPU and memory
DA layers optimize for bandwidth and blob throughput
Independent scaling of execution and consensus

This separation enables:

Higher TPS without increasing validator hardware requirements
Faster execution engines with tailored concurrency models
Clearer cost accounting between compute and data

Understanding this split is essential when designing high-throughput L2s or app-specific rollups.

EXPLORE

Batching, Mempools, and Execution Pipelines

High throughput depends as much on transaction ingestion as on execution. Efficient systems treat block production as a multi-stage pipeline.

Critical pipeline stages:

Mempool sharding to avoid global contention
Batch formation optimized for shared state locality
Priority scheduling based on fees or application-specific rules
Pre-execution simulation to detect conflicts early

Well-designed pipelines:

Increase cache locality during execution
Reduce failed or reverted transactions
Maximize parallelizable work per block

Advanced builders implement multiple mempools or execution queues for different transaction types. This approach is especially relevant for high-frequency protocols, sequencers, and custom block builders targeting sustained throughput rather than peak benchmarks.

conclusion

ARCHITECTURE REVIEW

Conclusion and Next Steps

This guide has outlined the core architectural patterns for achieving high-throughput execution in blockchain applications. The next step is to implement these strategies systematically.

Organizing for high throughput requires a systematic approach that integrates the concepts discussed: - Parallel execution via optimistic concurrency or sharding - State management through off-chain computation and caching - Asynchronous processing with message queues and event-driven architectures. Start by profiling your application's bottlenecks using tools like profiling RPC calls or analyzing transaction traces on a local testnet. Identify whether your constraint is compute, I/O, or consensus latency.

For implementation, begin with the data layer. Structure your smart contracts to minimize on-chain storage and computation. Use patterns like storing only cryptographic commitments (e.g., Merkle roots) on-chain, with proofs submitted for verification. Offload complex logic to dedicated off-chain executors or Layer 2 rollups. For Ethereum-based apps, consider frameworks like Starknet's Cairo or zkSync's zkEVM for scaling. On Solana, leverage its native parallel execution by designing independent program-derived accounts (PDAs).

Next, architect your backend services. Implement a dispatcher service that receives user transactions, validates them against current state (using a cached view from an indexer like The Graph), and routes them to appropriate worker pools. Use a message broker like RabbitMQ or Apache Kafka to decouple ingestion from processing. Ensure idempotency in your workers to handle retries safely. A reference architecture might involve: 1. A load-balanced API gateway, 2. A transaction sequencer, 3. A cluster of stateless executors, and 4. A finality watcher to submit batches on-chain.

Finally, establish a continuous performance testing regimen. Simulate load using tools like Hardhat Network or Foundry's forge to benchmark transactions per second (TPS) and identify new bottlenecks under stress. Monitor key metrics: - End-to-end latency from user sign to on-chain confirmation - Queue depth in your message broker - Gas efficiency of your contract calls. Iteratively refine your architecture based on this data. The goal is a system that scales horizontally, maintaining low latency even as user demand grows exponentially.

To dive deeper, explore the documentation for parallel execution runtimes (Aptos Move, Sui Move), advanced rollup stacks (Arbitrum Nitro, Optimism Bedrock), and high-performance clients (Erigon, Reth). The field evolves rapidly; engaging with research from teams like EigenLayer (restaking for decentralized sequencers) and Espresso Systems (shared sequencers) will keep your approach at the cutting edge.