How to Plan Full and Archive State for Blockchain Nodes

introduction

NODE OPERATION

Introduction to Node State Planning

A guide to understanding and planning for the different types of blockchain state your node will manage, from recent blocks to the complete historical archive.

Running a blockchain node requires managing different types of data, primarily categorized as full state and archive state. The full state (or recent state) contains the information necessary to validate new blocks and interact with the current network. This includes the most recent 128 blocks on Ethereum, the unspent transaction outputs (UTXOs) for Bitcoin, or the working state tree for other chains. In contrast, an archive state contains the complete historical record: every block, transaction, and the state of every account at every point in the chain's history. Planning which state to maintain is a critical decision impacting storage requirements, synchronization time, and the node's utility.

The choice between a full node and an archive node depends on your use case. A full node is sufficient for most applications: validating transactions, broadcasting new transactions, and participating in network consensus. It requires significantly less storage, often ranging from hundreds of gigabytes to a few terabytes. An archive node is essential for specialized tasks that require historical data queries, such as block explorers, advanced analytics platforms, historical balance checks for audits, or indexing services like The Graph. Archive nodes can require 10TB or more of fast storage, depending on the chain's age and activity.

To plan your node's state, start by defining your requirements. Ask: Do you need to query any historical state? If not, a pruned full node is optimal. Next, research the specific storage demands for your target chain and client software. For example, an Ethereum Geth full node with default settings requires about 650GB, while an archive node exceeds 12TB. Tools like geth snapshot or Erigon's staged sync can accelerate initial synchronization. Always allocate a storage buffer of 20-30% for future chain growth and consider using an SSD for the chaindata directory to ensure fast read/write operations during block processing.

Implementation involves configuring your client software during the initial sync. For a Geth full node, you would use flags like --syncmode snap. To run an archive node, you must use --syncmode full --gcmode archive. For Bitcoin Core, the -prune= flag controls state retention. A crucial planning step is estimating sync time, which can take days or weeks for an archive node on a high-activity chain. Using trusted checkpoints or snapshots from the client community can reduce this time from weeks to days by providing a recent verified state to bootstrap from.

Long-term maintenance is part of the plan. Even full nodes require periodic maintenance as the chain grows. Monitor your disk usage and understand your client's garbage collection process. For archive nodes, implement a robust backup strategy, as re-syncing is prohibitively time-consuming. Consider whether a hybrid approach fits your needs: running a primary full node for daily operations and a separate, slower-syncing archive node for occasional historical queries. This decouples performance from deep historical access. Proper state planning ensures your node remains reliable, performant, and fit for its intended purpose over the long term.

prerequisites

PREREQUISITES AND PLANNING CONSIDERATIONS

How to Plan Full and Archive State

A strategic approach to managing blockchain state is essential for node operators and developers. This guide covers the key considerations for planning full and archive node deployments.

A full node stores the current state of the blockchain, enabling it to validate new blocks and transactions independently. An archive node retains the entire historical state, including all intermediate states for every block. The core difference is storage: a full node might require 1-2 TB for Ethereum, while an archive node can demand 10+ TB. This decision impacts your node's capabilities, operational costs, and the types of queries it can support, such as historical balance checks or complex analytics.

Before deployment, assess your hardware requirements. For a mainnet Ethereum archive node, plan for a fast NVMe SSD (12+ TB), 32+ GB of RAM, and a multi-core CPU. Network bandwidth must accommodate initial sync traffic, which can exceed 10 TB. Use tools like the Ethereum Foundation's documentation for current specifications. Consider using a pruned node configuration if you need full validation but not full history, significantly reducing storage needs.

Your choice of client software (Geth, Erigon, Nethermind, Besu) dictates sync strategies and resource efficiency. For example, Erigon uses a staged sync optimized for archive nodes, while Geth's snap sync is faster for full nodes. You must also plan for maintenance: regular client updates, monitoring disk health, and managing log rotation. Automated backup solutions for the node's data directory and validator keys (if applicable) are non-negotiable for production systems.

The initial synchronization strategy is critical. A full sync from genesis is the most resource-intensive but most reliable method. For faster results, most clients support snapshot sync modes that download a recent state and verify the chain backward. For archive nodes, anticipate the sync taking weeks, not days. During this period, ensure stable power and internet; interruptions can cause costly re-syncing. Allocate sufficient time and resources for this one-time, intensive process.

Finally, define your operational goals. Are you running the node for an RPC service, a block explorer, a validator, or personal development? This determines if you need an archive node's deep history. For most dApp backends, a full node suffices. Factor in ongoing costs: cloud storage for an archive node can exceed $300/month, while a local setup requires upfront hardware investment. A clear plan balancing performance, cost, and data requirements is the foundation of a reliable node operation.

key-concepts-text

NODE OPERATION

Key Concepts: Full vs. Archive State

Understanding the distinction between full and archive nodes is fundamental for developers building on-chain applications and for node operators managing infrastructure.

In blockchain networks like Ethereum, a full node stores and validates the current state of the blockchain. This includes all block headers, transactions, and the most recent world state—the set of all account balances, smart contract code, and storage slots as of the latest block. Running a full node allows you to independently verify transactions and interact with the network without trusting a third-party provider. However, it only provides access to data from the point of synchronization forward; you cannot query historical states, such as an account's balance at block #12,000,000.

An archive node is a superset of a full node. It retains the entire history of the chain's state. After processing each block, it takes a snapshot of the world state and persists it. This creates an indexed archive of every single state change since genesis. This capability is essential for services that require deep historical data: block explorers need it to show past transactions, analytics platforms use it for on-chain metrics over time, and smart contract auditors may need to reconstruct state for forensic analysis. The trade-off is significant: an Ethereum archive node requires over 12 TB of storage, compared to roughly 1-2 TB for a full node.

Choosing which type to run depends on your application's needs. For a wallet service or a DEX frontend that only needs to broadcast transactions and check current balances, a full node is sufficient and far more resource-efficient. For building a The Graph subgraph that indexes event history, a decentralized application (dApp) with complex historical queries, or any off-chain reporting tool, you will need access to an archive node. Many developers use managed RPC services like Alchemy or Infura that provide archive data via API endpoints, avoiding the operational overhead of maintaining their own archive node.

From a technical perspective, the difference is in the state trie pruning. Full nodes use a pruning mechanism to discard old state data that is no longer needed to validate new blocks, keeping only the current state and recent history for efficiency. Archive nodes disable this pruning entirely. In Geth, this is controlled by the --gcmode flag (archive vs. full). In execution clients like Erigon, the architecture is designed to provide archive queries more efficiently by default, though it still requires substantial storage.

When planning your infrastructure, consider the data access patterns of your application. Ask: Do you need to call a view function on a contract as it existed six months ago? Do you need to calculate total value locked (TVL) in a protocol at a specific historical block? If the answer is yes, you require archive state. For most real-time interactions—submitting transactions, listening for new events, or fetching current token prices—a full node connection is perfectly adequate and recommended for cost and performance.

DATA STORAGE

Full Node vs. Archive Node: Technical Comparison

A detailed comparison of the hardware requirements, functionality, and operational trade-offs between full and archive nodes.

Feature / Metric	Full Node	Archive Node
Historical State Data
Initial Sync Time	~5-15 hours	~2-7 days
Minimum Storage Required	650 GB - 1 TB	12 TB+
Typical RAM	8-16 GB	32 GB+
Block Validation Speed	< 1 sec	< 1 sec
State Query Capability	Latest state only	Any historical state
Infrastructure Cost (Monthly)	$50 - $150	$300 - $800+
Primary Use Case	Validating & relaying transactions	Block explorers, analytics, debugging

storage-strategies

BLOCKCHAIN INFRASTRUCTURE

State Storage and Pruning Strategies

A guide to managing the trade-offs between data availability and node resource requirements through full and archive state configurations.

A blockchain's state is the complete set of data—account balances, smart contract code, and storage variables—needed to validate new transactions. For networks like Ethereum, this state is stored in a Merkle Patricia Trie, where each block header contains a root hash that commits to the entire state. Running a full node means downloading and verifying every block and transaction, then reconstructing this state locally. However, storing the entire historical state from genesis is resource-intensive, often requiring multiple terabytes of storage. This is where strategic pruning becomes essential for node operators.

State pruning is the process of deleting historical state data that is no longer necessary for validating new blocks. A standard pruned full node, such as one run with Geth's default settings, only retains the most recent 128 blocks of state. Older state trie nodes are discarded once they are no longer referenced. This reduces storage needs to roughly 500GB-1TB for Ethereum, enabling broader participation. In contrast, an archive node retains all historical state, allowing it to answer arbitrary queries about the chain's past, such as an account's balance at block #10,000,000. Archive nodes are crucial for block explorers, analytics platforms, and certain indexers.

Planning your node's state strategy depends on your use case. For transaction validation and most DApp RPC needs, a pruned full node is sufficient. To run an archive node with Geth, you would use the flag --gcmode=archive. For Erigon, another Ethereum execution client, you would use --prune=hrtc to prune everything but historical receipts and traces, or --prune=ht for a full archive. The key trade-off is between storage cost and query capability. An archive node on Ethereum Mainnet can require over 12TB of fast SSD storage, while a pruned node requires significantly less.

Beyond simple pruning, advanced strategies involve offloading state. Services like Erigon's "stage sync" can sync a node by downloading pre-processed historical data snapshots, drastically reducing sync time. Furthermore, the concept of stateless clients and Verkle tries (planned for Ethereum) aims to revolutionize state management. In a stateless paradigm, validators would only need a small cryptographic witness to validate blocks, pushing the burden of storing full state onto specialized archive services. Planning your infrastructure requires understanding these evolving protocols and the specific data your application requires from the chain.

resource-links

NODE OPERATIONS

Essential Tools and Documentation

Planning full and archive state requires precise choices around execution clients, pruning modes, disk growth, and snapshot strategies. These tools and guides help operators estimate hardware requirements, understand state growth mechanics, and avoid data loss during long-term operation.

Ethereum Execution Clients and State Modes

Execution clients determine how full and archive state is stored, pruned, and queried. Each client implements different defaults for state retention and database layout, which directly affects disk usage and reorg safety.

Key differences to review before deployment:

Geth: Supports full and archive nodes via --gcmode=full and --gcmode=archive. Archive mode disables state pruning and increases disk growth by hundreds of GB per year.
Nethermind: Uses configurable pruning layers with options like --Pruning.FullPruningTrigger and can selectively retain ancient state.
Erigon: Optimized for historical access using flat files and snapshot-first sync. Archive-like access is available without classic archive bloat.

Planning steps:

Identify whether you need historical state at every block or only historical blocks and receipts.
Map client state mode to concrete use cases like MEV research, fork simulation, or compliance queries.
Test disk growth over 7 to 14 days before committing hardware.

Client choice can reduce storage costs by multiple TB over a year.

EXPLORE

State Pruning and Garbage Collection Mechanics

State pruning defines how old contract storage and account data is removed once it is no longer needed for consensus. Misconfigured pruning is the most common cause of unexpected disk exhaustion.

Important concepts:

Pruning horizon: Number of blocks for which full state is retained. Shorter horizons reduce disk but limit historical queries.
Garbage collection cycles: Background processes that reclaim disk space. These can temporarily increase IO and CPU usage.
Ancient data separation: Block bodies and receipts are often moved to append-only files that are not pruned.

Operational guidance:

Monitor disk usage continuously during the first full GC cycle, which can take days on large nodes.
Avoid disabling pruning unless archive access is explicitly required.
Schedule node restarts after GC completion to stabilize memory usage.

Understanding pruning internals allows operators to plan realistic disk limits instead of reacting to sudden failures.

EXPLORE

Snapshot Sync and Historical Access Tradeoffs

Snapshot synchronization accelerates initial node setup by downloading recent state snapshots instead of replaying the entire chain. While this reduces sync time, it changes how historical state can be reconstructed.

Key tradeoffs:

Faster time to readiness: Snapshot sync can reduce initial sync from days to hours on mainnet.
Limited historical state: Older state must be reconstructed on demand, which may be slow or unsupported depending on client.
Archive incompatibility: True archive nodes often cannot use snapshot sync without additional backfills.

Planning considerations:

For analytics workloads, test whether eth_call at older block heights meets latency requirements.
Verify how your client rebuilds missing trie nodes when querying deep history.
Avoid snapshot sync for nodes used in forensic analysis or long-range replay.

Snapshot sync is ideal for infrastructure nodes but rarely sufficient for deep historical research.

EXPLORE

Disk Sizing and Growth Forecasting

Disk planning for full and archive state must account for compaction overhead, state growth, and temporary spikes during reorgs or GC.

Core components to size:

Active state database: Grows with contract deployment and storage usage.
Ancient data: Block headers, bodies, and receipts grow linearly with chain height.
GC overhead: Pruning can temporarily require 10 to 30 percent extra disk.

Best practices:

Provision at least 30 percent free headroom beyond projected one-year growth.
Prefer NVMe SSDs with sustained write performance; state DBs are write-heavy.
Track weekly growth rates rather than relying on static estimates.

Operators running archive state without growth forecasting frequently hit hard disk limits, resulting in forced resyncs that can take days.

hardware-requirements

NODE OPERATION

Hardware and Infrastructure Requirements

Choosing the right hardware is critical for running a reliable blockchain node. This guide details the specifications needed for different node types, from full nodes to archive nodes.

Running a blockchain node requires hardware capable of processing and storing the network's state. A full node stores the current state, allowing you to verify transactions and blocks independently. An archive node (or archival node) stores the complete historical state for every block, enabling deep historical queries. The key difference is storage: a full node can prune old state data, while an archive node must retain everything. For example, an Ethereum full node requires about 1-2 TB of SSD storage, whereas an archive node needs over 12 TB.

The primary hardware components to consider are CPU, RAM, Storage, and Network bandwidth. A modern multi-core CPU (e.g., 4+ cores) is essential for block validation and state computation. RAM is crucial for holding the in-memory state trie; 16 GB is a minimum for many chains, with 32 GB or more recommended for performance. Storage must be a fast SSD; HDDs are too slow for the random I/O patterns of blockchain data. Synchronization also demands significant network bandwidth, with initial sync often downloading hundreds of gigabytes.

For a concrete example, here are recommended specs for running an Ethereum execution client (e.g., Geth, Nethermind) as of 2024:

Full Node: 4+ core CPU, 16 GB RAM, 2 TB NVMe SSD, 25+ Mbps internet.
Archive Node: 8+ core CPU, 32 GB RAM, 12+ TB NVMe SSD, 100+ Mbps internet. These requirements scale with chain activity. Networks like Polygon PoS or Arbitrum have lower initial storage needs but grow over time. Always check your specific client's documentation for the latest recommendations.

Beyond raw specs, system configuration is vital. Ensure your OS and filesystem are optimized for high I/O. Using the --prune flag in Geth reduces a full node's footprint, while --syncmode full or --syncmode snap are standard. For an archive node, you would use --syncmode full --gcmode archive. Monitoring tools like Prometheus and Grafana help track resource usage (CPU, memory, disk I/O) to prevent bottlenecks. A stable power supply and reliable internet connection are non-negotiable for maintaining node uptime.

Planning for growth is essential. Blockchain state grows continuously. Implement a monitoring alert for disk usage (e.g., alert at 80% capacity). For archive nodes, consider a scalable storage solution. Cloud providers offer high-performance instances with scalable block storage, but be mindful of egress costs. Self-hosting requires planning for hardware upgrades. The initial sync is the most resource-intensive phase; using a snapshot or trusted checkpoint from the community can reduce sync time from weeks to days, saving significant bandwidth and wear on your SSD.

RECOMMENDED HARDWARE

Sample Infrastructure Specifications by Network

Minimum hardware specifications for running a full or archive node on major EVM networks.

Resource	Ethereum Mainnet	Polygon PoS	Arbitrum One	Optimism
CPU Cores	4+ cores	4+ cores	8+ cores	8+ cores
RAM	16 GB	16 GB	32 GB	32 GB
SSD Storage (Full Node)	2 TB NVMe	1.5 TB NVMe	1 TB NVMe	1 TB NVMe
SSD Storage (Archive Node)	12+ TB NVMe	6+ TB NVMe	3+ TB NVMe	3+ TB NVMe
Network Bandwidth	25+ Mbps	25+ Mbps	25+ Mbps	25+ Mbps
Sync Time (Full, Fast Sync)	~15 hours	~8 hours	~6 hours	~5 hours
Initial Sync RAM Usage	16-32 GB	8-16 GB	16-32 GB	16-32 GB

implementation-steps

ARCHITECTURE

How to Plan Full and Archive State Synchronization

Synchronizing a node's state is a critical operation that requires careful planning. This guide outlines the steps for implementing both full and archive state sync, focusing on the architectural decisions and practical considerations for developers.

Planning begins with a clear definition of your synchronization goals. A full state sync downloads the current state of the blockchain, allowing a node to validate new blocks. An archive state sync downloads the entire historical state, enabling queries of any account balance or contract storage at any past block. The choice depends on your node's purpose: - Validators/RPC nodes typically require a full sync. - Indexers, explorers, and analytics platforms require an archive sync. The resource requirements differ drastically; an archive node for Ethereum Mainnet requires over 12 TB of SSD storage, while a full node requires around 1 TB.

The next step is selecting a synchronization strategy. For a full sync, you generally choose between a snap sync (fast sync) or a warp sync. Snap sync, used by Geth and Nethermind, downloads the state trie in chunks for recent blocks, dramatically reducing sync time. Archive syncs are almost always performed as an initial block download (IBD) from genesis, sequentially processing every transaction to build the complete historical state. You must configure your client accordingly; for example, in Geth, you would use the --syncmode snap flag for a full sync or --syncmode full --gcmode archive for an archive sync.

Implementation requires configuring hardware and client software. Key hardware considerations include: - Storage: NVMe SSDs are mandatory for archive nodes due to high IOPS requirements. - Memory: At least 16-32 GB RAM is recommended to handle state caching. - CPU: A multi-core processor aids in parallel transaction execution during sync. On the software side, you must set the correct data directory, ensure the client is on the latest stable release, and configure pruning settings. For an archive node, pruning must be disabled entirely.

The actual synchronization process involves monitoring and optimization. Start the client with your chosen sync mode and monitor logs for progress and errors. Use tools like geth attach or client-specific admin APIs to check sync status (e.g., eth.syncing). For archive syncs, the process can take weeks. Optimization techniques include: - Increasing the number of peer connections (--maxpeers). - Using a trusted checkpoint or a snapshot from the community to bootstrap the initial state. - Ensuring your node has an open, non-restrictive firewall on the P2P port (typically 30303 for Ethereum).

Post-sync, you must verify data integrity and establish maintenance procedures. After the sync reaches the chain tip, run a state trie verification if your client supports it (e.g., Geth's checkState). For an archive node, test historical queries using the eth_getBalance or eth_getStorageAt RPC methods with an old block number. Maintenance involves planning for continuous growth; archive storage will expand with each new block. Implement monitoring for disk space and client health, and have a documented disaster recovery plan, including how to re-sync from a backup or snapshot if corruption occurs.

STATE MANAGEMENT

Frequently Asked Questions

Common questions and troubleshooting for planning full and archive node states in blockchain infrastructure.

A full node stores the current state of the blockchain, allowing it to verify new transactions and blocks. It typically prunes historical state data to save disk space.

An archive node retains the complete historical state for every block since genesis. This includes all intermediate states, which are required for complex queries like an account's balance at a specific historical block height or for deep chain analysis.

Key Distinction:

Full Node: ~1-2 TB for Ethereum, verifies live chain.
Archive Node: ~10+ TB for Ethereum, enables historical state queries.

Archive nodes are essential for services like block explorers, analytics platforms, and certain developer tools that need to replay or inspect past events.

maintenance-monitoring

NODE OPERATIONS

How to Plan Full and Archive State

A strategic approach to managing your node's data footprint is critical for long-term performance and cost efficiency. This guide covers the differences between full and archive states and how to plan for them.

A full state node stores only the current state of the blockchain—the latest account balances, contract storage, and nonces. It's sufficient for most applications like transaction validation, RPC services, and block production. An archive state (or archival) node retains the complete historical state for every single block since genesis. This is required for services like block explorers, complex analytics, or querying historical data (e.g., an account's balance at block #1,000,000). The storage difference is immense: an Ethereum full node requires ~1-2 TB, while an archive node can exceed 12 TB.

Your choice dictates hardware and operational costs. For a full node, plan for a fast NVMe SSD (2-4 TB is a safe starting point) and monitor growth. Chains with high throughput, like Polygon or Arbitrum, have faster state growth. Use your client's built-in pruning (e.g., Geth's --gcmode=archive vs. --gcmode=full) to manage this. For an archive node, you need significantly larger and often more expensive storage, potentially requiring a multi-drive setup or cloud object storage with a caching layer. Operational costs for cloud-based archive nodes can be 5-10x higher than for full nodes.

Implement a monitoring strategy to track state growth. Use metrics like chaindata directory size, and client-specific gauges such as Geth's chaindata/ancients for historical data. Set alerts for when storage reaches 70-80% capacity. For archive nodes, consider a tiered architecture: a hot archive on fast storage for recent blocks (e.g., last 100k blocks) and a cold archive on cheaper object storage for older data, using a service like Erigon's "staged sync" which natively supports this separation. Plan your snapshot and backup strategy around these state types, as restoring an archive node is a multi-day process.

conclusion

ARCHIVAL STRATEGY

Conclusion and Next Steps

A summary of key concepts for planning full and archive node states, with actionable steps for implementation.

Planning for full and archive node states is a foundational task for developers and node operators building resilient Web3 infrastructure. The full state provides immediate access to recent blockchain data for transaction execution and validation, while the archive state serves as a complete historical ledger for deep analysis, auditing, and advanced queries. Understanding this distinction is critical for designing systems that balance performance, cost, and data availability. For example, a DeFi protocol's frontend might query a full node for real-time gas prices, while a blockchain explorer relies on an archive node to display historical token transfers.

To implement this architecture, start by defining your data access requirements. Ask: Do you need to query balances at a past block? Replay old transactions? Audit smart contract state changes? Tools like Erigon for Ethereum or Substrate's chain storage offer efficient archive solutions. For a full node, consider client software like Geth (execution) and Lighthouse (consensus). The operational cost difference is significant; an Ethereum archive node can require over 4 TB of SSD storage, whereas a pruned full node may need less than 1 TB. Plan your hardware and cloud budget accordingly.

Your next step is to choose a deployment strategy. You can run both node types yourself, use managed services like Alchemy or QuickNode for archive data, or employ a hybrid approach. For development, use testnets (Goerli, Sepolia) to experiment with state queries without mainnet costs. Implement robust monitoring for sync status, disk space, and RPC endpoint health. Finally, document your node's capabilities and access patterns for your team, ensuring everyone understands which service to use for historical analysis versus real-time operations. This planning turns a complex infrastructure decision into a scalable, maintainable system.