How to Deploy an Archival Node: A Step-by-Step Guide

introduction

BLOCKCHAIN INFRASTRUCTURE

What is an Archival Node?

An archival node is a type of blockchain node that stores the complete historical state of a network, including every transaction and block since genesis. This guide explains its purpose, technical requirements, and how to deploy one.

A blockchain archival node is the most resource-intensive type of node, maintaining a full copy of the entire ledger's history. Unlike a full node, which only stores recent blocks to validate new transactions, or a light node, which relies on others for data, an archival node preserves the complete historical state. This includes every transaction, smart contract interaction, and account balance change from the genesis block to the present. Services like The Graph, block explorers like Etherscan, and on-chain analytics platforms depend on archival nodes to query historical data that is otherwise pruned from standard nodes.

Running an archival node requires significant hardware resources. For networks like Ethereum, this typically means a machine with at least 16-32 GB of RAM, a multi-core CPU, and several terabytes of fast SSD storage. The storage requirement is critical; an Ethereum archival node currently requires over 12 TB to store the full history. Synchronization can take days or weeks, as the node must download and verify every block. Software clients like Geth (Go-Ethereum), Erigon, or Besu for Ethereum, or their equivalents for other chains like Solana or Polygon, are used for this process, each with different performance and storage optimizations.

Deploying an archival node involves several key steps. First, choose your client software and target blockchain. For Ethereum using Geth, you would initiate synchronization with the --syncmode full and --gcmode archive flags, which instruct the client to retain all historical state. The command geth --syncmode full --gcmode archive starts this process. It's crucial to ensure your system meets the storage and memory requirements before beginning. The initial sync is the most demanding phase, requiring stable internet and uninterrupted runtime. Many operators use process managers like systemd or Docker to keep the node running persistently and to restart it automatically after reboots.

Once synchronized, the archival node provides an invaluable resource. Developers use it to query historical events for decentralized applications (dApps), researchers analyze on-chain trends, and auditors verify transaction histories. Access is typically provided via the node's JSON-RPC endpoint, allowing programmatic queries for data like an account's balance at a specific block height or all transactions involving a particular smart contract. Maintaining the node requires ongoing diligence: applying client updates for security patches, monitoring disk space, and ensuring the system remains online to avoid falling behind the chain tip, which would require a lengthy re-sync.

prerequisites

ARCHIVAL NODE SETUP

Prerequisites for Deployment

Before deploying a full archival node, ensure your system meets the hardware, software, and network requirements for reliable, long-term blockchain data storage.

Deploying an archival node requires a significant hardware commitment. Unlike a standard full node that only stores recent state, an archival node retains the complete history of the blockchain, including every state for every block. For networks like Ethereum, this means provisioning a machine with at least 32 GB of RAM, a multi-core CPU (8+ cores recommended), and, most critically, high-speed SSD storage with 12+ TB of capacity. The initial sync can take weeks, and insufficient I/O performance is the most common cause of failure.

The software stack is equally important. You will need a compatible execution client (e.g., Geth, Erigon, Nethermind) and a consensus client (e.g., Lighthouse, Prysm, Teku) configured for archive mode. Ensure your operating system is up-to-date (Ubuntu 22.04 LTS is a common choice), and that you have the latest stable versions of Go, Rust, or other required build tools installed, depending on your client selection. Docker can simplify deployment but requires understanding container networking and volume management.

Network and infrastructure setup is crucial for stability. A stable, unmetered internet connection with high upload/download bandwidth is non-negotiable. You must configure your router to forward the client's P2P ports (e.g., TCP/30303 for Geth, TCP/9000 for consensus clients) and consider using a static IP or dynamic DNS service. For production deployments, implement monitoring with tools like Grafana and Prometheus to track sync status, disk usage, and memory consumption. Finally, ensure you have a secure, climate-controlled environment for your hardware to ensure 24/7 uptime.

hardware-setup-steps

FOUNDATION

Step 1: Hardware and OS Setup

Deploying a reliable archival node requires a robust hardware foundation and a stable operating system. This step details the minimum and recommended specifications to ensure your node can handle the full blockchain history and serve data efficiently.

An archival node stores the complete history of a blockchain, including every transaction and state change from the genesis block. This is far more resource-intensive than a standard full node, which only needs recent data to validate new blocks. For Ethereum, this means storing over 20 TB of data (as of early 2025) and requiring significant processing power to sync and serve historical queries. The primary hardware constraints are storage I/O speed, CPU performance, and RAM capacity.

We recommend using a dedicated server or high-performance cloud instance. For a production-grade Ethereum archival node, aim for:

CPU: 8+ physical cores (e.g., Intel Xeon or AMD EPYC)
RAM: 32 GB minimum, 64 GB recommended for better caching
Storage: 2+ TB NVMe SSD for the active chain data, plus a large HDD (e.g., 16+ TB) for ancient data and backups. Fast NVMe storage is critical for sync performance.
Network: 1 Gbps dedicated connection with high monthly data transfer limits (expect 15-20 TB for initial sync).

For the operating system, a Linux distribution is strongly advised due to its stability, performance, and extensive tooling support. Ubuntu 22.04 LTS or 24.04 LTS are popular choices with long-term support and large communities. You will need to be comfortable with the command line. Before installing any node software, ensure your system is updated (sudo apt update && sudo apt upgrade -y) and that you have essential build tools installed, like git, curl, and build-essential.

client-installation-steps

SOFTWARE SETUP

Step 2: Install and Configure Client

This step covers downloading, installing, and performing the initial configuration of the Ethereum execution client software, which is the core engine of your archival node.

The first decision is selecting your execution client. For an archival node, the most common and robust choices are Geth (Go-Ethereum) or Erigon. Geth is the original and most widely used client, known for its stability. Erigon is a newer, more resource-efficient alternative that uses a different database structure, which can significantly reduce storage requirements for archival data. Both are excellent choices; your selection may depend on your familiarity with Go versus Rust or specific performance goals. Download the latest stable release from the official GitHub repository for your chosen client.

Installation varies by operating system. For Linux, you typically download a pre-compiled binary, make it executable (chmod +x), and move it to a directory in your PATH. On macOS, you can use Homebrew (brew install ethereum for Geth). For Windows, download the .exe installer from the releases page. After installation, verify it works by opening a terminal and running geth version or erigon --version. The output should show the client version, confirming the binary is correctly installed and accessible from your command line.

Before the first run, you must create a basic configuration. This is done via command-line flags or a configuration file (.toml for Erigon). The most critical flags for an archival node are --syncmode full (for Geth) or --prune htc (for Erigon) to disable pruning and retain all historical state. You must also specify your data directory with --datadir /path/to/your/chaindata. This folder will eventually contain hundreds of gigabytes of data, so ensure it's on a drive with sufficient capacity and good I/O performance, like an NVMe SSD.

A minimal startup command for an archival Geth node would be: geth --syncmode full --datadir /mnt/ssd/ethereum. For Erigon, you would use: erigon --datadir /mnt/ssd/erigon --chain mainnet. At this stage, the node will attempt to connect to the Ethereum peer-to-peer network and begin syncing. The initial sync for an archival node is a lengthy process—taking days or weeks—as it downloads and verifies every block and state change since the Genesis block. Ensure your machine has a stable internet connection and is left running continuously.

syncing-strategies

HOW TO DEPLOY AN ARCHIVAL NODE

Step 3: Synchronization Strategies

Choosing the right synchronization mode is critical for node performance and data availability. This step explains the trade-offs between different strategies.

When initializing your archival node, you must select a synchronization mode. The primary modes are full sync, fast sync, and snap sync. A full sync processes every block from genesis, verifying all transactions and executing all state transitions. This is the most secure and complete method but is extremely slow, often taking weeks for mature chains like Ethereum. Fast sync downloads block headers and the recent state, then switches to full sync for the last 1024 blocks, significantly reducing initial sync time.

For most archival deployments, snap sync is the recommended strategy. Instead of downloading the entire state trie, snap sync downloads contiguous chunks of the state data, known as snapshots, from trusted peers. Clients like Geth and Erigon implement this. For example, syncing Ethereum mainnet with Geth's snap sync (--syncmode snap) can complete in days instead of weeks. You enable it with the flag --syncmode snap during initial startup. The node will first sync the block headers, then fetch the state snapshot, and finally switch to full block processing.

The choice impacts your node's resource profile. A node syncing with snap sync requires substantial disk I/O and bandwidth during the initial phase but uses less CPU than a full sync. Ensure your hardware, particularly your SSD's write endurance and speed, can handle the sustained data ingestion—often hundreds of gigabytes written per day. After the initial sync, the node operates as a standard full node, continuing to archive all new blocks and state data indefinitely.

You must also configure pruning settings, which are distinct from sync mode. An archival node retains all historical state, so you typically disable pruning entirely. In Geth, this means not using the --gcmode flag or explicitly setting --gcmode=archive. In contrast, a full node with fast sync might use --gcmode=full to prune older state trie nodes. Misconfiguring this is a common error that results in a node that cannot serve historical data, defeating the purpose of an archival deployment.

For chains using Consensus Clients (like Ethereum's Beacon Chain), synchronization is separate. Your execution client (e.g., Geth) handles historical chain data via snap sync, while your consensus client (e.g., Lighthouse) syncs the beacon chain. The consensus client usually performs a checkpoint sync, downloading a recent finalized state from a trusted endpoint, which is fast and secure. You then configure the two clients to communicate via the Engine API on localhost. This parallel synchronization is essential for post-Merge Ethereum networks.

Finally, monitor your sync progress using client logs and RPC methods. Call eth_syncing via RPC to check status. For Geth, logs will show block import speed and state download percentages. Initial sync is complete when eth_syncing returns false. Remember, an archival node's sync is never truly 'finished'—it must continuously import new blocks. Proper strategy selection balances initial deployment time with the long-term requirement to serve the entire history of the chain.

MINIMUM VS. RECOMMENDED

Archival Node Hardware Requirements

Hardware specifications for running a full archival node on networks like Ethereum, Polygon, and Solana.

Component	Minimum Spec	Recommended Spec	Enterprise / High-Performance
CPU Cores	4 Cores	8 Cores	16+ Cores
RAM	16 GB	32 GB	64 GB+
Storage Type	SSD (SATA)	NVMe SSD	NVMe SSD RAID 0
Storage Capacity	2 TB	4 TB	8 TB+
Network Bandwidth	100 Mbps	1 Gbps	10 Gbps
Uptime SLA
Estimated Sync Time	2-4 weeks	1-2 weeks	< 1 week
Annual Storage Growth	~1 TB	~1 TB	~1 TB

maintenance-monitoring

ONGOING MAINTENANCE AND MONITORING

How to Deploy an Archival Node

Deploying an archival node is a significant commitment requiring continuous oversight. This guide covers the essential maintenance tasks and monitoring strategies to ensure long-term stability and data integrity.

An archival node stores the complete history of a blockchain, including every transaction and state change. Unlike full nodes that prune old data, archival nodes are essential for services like block explorers, analytics platforms, and historical data queries. Deploying one requires robust hardware—typically a server with 2+ TB of fast SSD storage, 32+ GB of RAM, and a multi-core CPU. The initial sync can take days or weeks, consuming significant bandwidth and compute resources. Proper configuration of your client software (e.g., Geth's --syncmode full --gcmode archive for Ethereum) is the critical first step.

Ongoing maintenance is non-negotiable. This includes applying security patches to your operating system and node client, monitoring disk usage to prevent a full drive from crashing the node, and managing the growing chain data. For Ethereum, the archive size exceeds 12 TB and grows by roughly 150 GB per month. Implement log rotation to manage client output and set up automated alerts for common failure states like stalled synchronization, high memory usage, or process crashes. A basic monitoring stack using Prometheus and Grafana can track metrics like peer count, block height, and resource utilization.

Data integrity checks are vital. Periodically verify your node's chain data against public checkpoints or other trusted nodes. For Geth, you can use the debug.chaindbCompact() method to optimize storage. Ensure your backups are functional; while the chain data itself is reproducible, your node's configuration and validator keys (if applicable) must be securely backed up. Consider the operational costs: archival nodes have higher cloud hosting fees or electricity costs for on-premise hardware, and they require dedicated technical oversight. The Ethereum Foundation's documentation provides client-specific guidance for these tasks.

For production reliability, implement a structured response plan. Define procedures for client upgrades, which often require a re-sync. Use process managers like systemd or supervisor to automatically restart your node client after a crash or server reboot. Monitor your node's JSON-RPC endpoint for responsiveness if it serves API traffic. Finally, join community channels for your specific client to stay informed about urgent updates or known issues. A well-maintained archival node provides a reliable, sovereign source of blockchain truth, forming the backbone for higher-level applications and research.

resource-links

GUIDE RESOURCES

Essential Resources and Documentation

These resources cover the concrete steps, software choices, and operational requirements for deploying an archival node. Each card focuses on a specific decision or setup task developers face when storing complete blockchain state and history.

Client Software for Archival Nodes

Running an archival node requires using a full client configured to retain all historical state, not just recent data. Most Ethereum-compatible clients support this mode but require explicit configuration.

Execution clients like Geth, Nethermind, Erigon, and Besu support archival modes with different tradeoffs in disk usage and sync speed
Consensus clients like Lighthouse, Prysm, Teku, or Nimbus must be paired correctly for post-Merge networks
Example: Geth requires --gcmode=archive at startup, while Erigon runs in archival mode by default
Client choice affects storage footprint, API performance, and resync complexity This decision should be made before syncing, as switching to archival later often requires a full resync from genesis.

EXPLORE

Hardware and Storage Requirements

Archival nodes are storage-intensive and I/O-bound, especially on Ethereum and other high-activity chains.

Ethereum archival nodes typically require 15–20 TB of NVMe SSD storage depending on client and pruning strategy
Minimum practical specs include 64–128 GB RAM and high sustained disk write throughput
Cloud instances often become cost-prohibitive unless storage-optimized bare metal is used
RAID configurations or redundant volumes reduce downtime during disk failures Under-provisioned hardware leads to slow RPC responses, frequent crashes, and corrupted databases. Planning for long-term growth is critical since historical state grows continuously.

Genesis Sync and Snapshot Options

Archival nodes can sync from genesis or bootstrap from verified snapshots, each with tradeoffs.

Genesis sync verifies every block and state transition but can take weeks on large networks
Snapshot-based sync dramatically reduces initial setup time but introduces trust assumptions
Some clients provide official snapshot tooling, while others rely on community mirrors
Always verify snapshot checksums and match client versions exactly For production or research-critical environments, teams often combine snapshot bootstrapping with state verification to balance speed and security.

EXPLORE

RPC Configuration and API Exposure

Archival nodes are commonly used to serve historical RPC queries that pruned nodes cannot answer.

Enable JSON-RPC modules like eth, debug, and trace based on use case
Restrict exposed APIs using firewall rules or authenticated gateways
High-cost methods like eth_getBlockByNumber on early blocks can saturate resources
Load balancers and rate limits are recommended for shared environments Misconfigured RPC exposure is one of the most common causes of node instability and unplanned downtime.

Monitoring, Backups, and Maintenance

Long-lived archival nodes require ongoing operational maintenance to remain reliable.

Monitor disk usage, database growth rate, and peer count continuously
Schedule regular offline backups or filesystem-level snapshots
Track client release notes for consensus changes and breaking database upgrades
Test recovery procedures on backup data periodically Without proactive monitoring and backups, a single disk failure or corruption event can result in weeks of resync time and data loss.

ARCHIVAL NODE DEPLOYMENT

Frequently Asked Questions

Common technical questions and solutions for deploying and maintaining a blockchain archival node.

A full node validates new blocks and transactions, storing only the current state (e.g., account balances) and a recent history of blocks (typically 128 blocks on Ethereum). An archival node retains the complete historical data, including all past states for every block. This enables queries about the chain's state at any historical point, which is essential for block explorers, analytics platforms, and certain developer tools.

Key differences:

Storage: Full nodes require ~1-2 TB; archival nodes can require 10+ TB and grow continuously.
Function: Full nodes support network consensus; archival nodes support historical data retrieval.
Use Case: Use a full node for validating transactions. Use an archival node for debugging old contracts, complex analytics, or running services like The Graph.

conclusion

POST-DEPLOYMENT

Next Steps and Verification

After your archival node is running, you must verify its health, configure it for production, and set up monitoring to ensure long-term reliability and data integrity.

First, confirm your node is fully synchronized and serving data correctly. Use the client's built-in JSON-RPC endpoint to query the latest block number and compare it against a public block explorer like Etherscan for Ethereum or PolygonScan for Polygon. A significant lag indicates a sync issue. You should also test core RPC methods such as eth_getBlockByNumber, eth_getBalance, and trace_* endpoints (if enabled) to ensure historical data is accessible. Consistent failures or timeouts may point to insufficient disk I/O, memory constraints, or a corrupted database.

For production readiness, implement essential configurations. This includes setting up a firewall to restrict RPC access to trusted IPs, configuring your client (e.g., Geth, Erigon, Nethermind) to prune logs or manage cache size for optimal performance, and enabling metrics export for Prometheus. If your node will serve public traffic, consider placing it behind a load balancer and using a reverse proxy like Nginx for SSL termination and rate limiting. Ensure your systemd service file is configured to restart on failure and log to a dedicated location using journalctl.

Proactive monitoring is non-negotiable for maintaining an archival node. Set up alerts for key metrics: disk space (archival chains grow by hundreds of GBs per year), memory usage, CPU load, and sync status. Tools like Grafana with Prometheus or the Tenderduty alerting daemon for Tendermint chains can automate this. Also monitor peer count and network latency. Regularly verify data integrity by running a light client or a second "verifier" node in a different availability zone to cross-check block hashes, though this is resource-intensive.

Finally, establish a maintenance routine. Client software receives frequent updates for security patches and performance improvements. Plan for scheduled downtime to apply these upgrades. Have a documented disaster recovery process, including tested backups of your node's data directory or, at minimum, the seed phrase for a trusted snapshot provider. A healthy archival node is characterized by consistent uptime, accurate data, and the ability to serve complex historical queries without degrading performance for the wider network.