An archive client (or archive node) is a full node that retains the entire historical state of a blockchain, as opposed to a pruned node which discards old state data to save space. This means it stores every single block, transaction, and crucially, the world state (account balances, smart contract storage, etc.) at every point in the chain's history. This comprehensive data persistence is essential for services requiring deep historical access, such as block explorers, advanced analytics platforms, and certain developer tools that need to query the state of the network at any arbitrary block height.
Archive Client
What is an Archive Client?
An archive client is a specialized type of blockchain node that maintains a complete, unpruned historical record of the network, including all past states and transactions, enabling deep historical queries and analysis.
The primary function of an archive client is to serve historical data queries that are impossible for standard full nodes. For example, answering questions like "What was the balance of this address at block 15,000,000?" or "What was the internal state of this smart contract three months ago?" requires access to the historical state trie. Running an archive node demands significantly more storage and computational resources; for major networks like Ethereum, this can require multiple terabytes of SSD storage, making it an infrastructure choice typically reserved for data providers, exchanges, and institutional analysts rather than individual users.
Key technical components of an archive client include the state trie and its historical roots. While a full node only needs the current state root to validate new blocks, an archive node maintains all intermediate state roots, allowing it to reconstruct any past state. This is often implemented using archive modes in clients like Geth (--gcmode=archive) or Erigon. The data is typically stored in a query-optimized database, enabling efficient retrieval of historical information via JSON-RPC methods such as eth_getBalance or eth_getStorageAt with a specific block parameter.
The utility of archive nodes extends to several critical use cases: - Blockchain explorers (like Etherscan) rely on them to display historical transactions and states. - Analytics and indexing services (like Dune Analytics, The Graph) use them to build comprehensive datasets. - Auditors and forensic analysts need them to investigate events or verify historical claims. - Some DeFi protocols may require historical state access for specific functions or dispute resolution. Without archive nodes, the blockchain's utility would be limited to its current state, losing the ability to audit or analyze its complete history.
It's important to distinguish an archive client from other node types. A light client syncs only block headers for basic verification. A full node validates all transactions and maintains recent state for validation. An archive node is a superset of a full node, adding the persistent historical state. As blockchain data grows, some networks and services are exploring alternative historical data solutions, such as Ethereum's Portal Network or dedicated archive services (e.g., Infura's Archive API), which provide centralized access to archived data without requiring users to run their own resource-intensive node.
How Does an Archive Client Work?
An archive client is a specialized blockchain node that stores the complete historical state of a network, enabling deep historical queries that standard full nodes cannot perform.
An archive client (or archive node) is a specialized type of full node that retains the complete historical state of a blockchain at every single block. Unlike a standard full node, which only stores recent state data to validate new transactions, an archive node preserves the entire history, including the state (account balances, contract storage, etc.) for every block height since genesis. This is achieved by persistently storing all intermediate state roots and tries (like Ethereum's Merkle Patricia Trie) rather than pruning them. The primary function is to serve complex historical queries, such as "What was the balance of this address at block 5,000,000?" which a pruned node cannot answer.
The operational mechanism involves two key components: the execution client (e.g., Geth, Erigon) and the consensus client. The execution client processes transactions and manages the state trie. In archive mode, it is configured to disable state pruning entirely, writing every state change to its database. This results in exponentially larger storage requirements—often tens of terabytes compared to a few hundred gigabytes for a pruned node. Services like block explorers (Etherscan), analytics platforms (Dune Analytics), and certain indexers rely on archive nodes to fetch historical data for their APIs and dashboards, as they require access to state information from any point in the chain's history.
Deploying and maintaining an archive node presents significant infrastructure challenges. The massive storage footprint requires high-performance SSDs or specialized hardware to manage input/output operations. Synchronization from genesis in archive mode is an extremely slow process that can take weeks, leading many operators to use snapshots from trusted providers to bootstrap. Furthermore, the resource intensity makes running a personal archive node impractical for most users, creating a reliance on centralized infrastructure providers. This centralization concern is partially addressed by decentralized RPC networks and services that pool access to archive data, though the underlying node operation remains resource-intensive.
The distinction between archive nodes and other node types is critical for developers. A full node validates the latest chain state and recent history but prunes older state data. A light client only downloads block headers for verification, relying on full nodes for data. An archive node is the only type that provides a complete historical ledger. For applications like auditing, complex DeFi analytics, or recalculating historical token distributions, direct access to an archive node's RPC endpoint is often essential. Without it, developers must rely on third-party APIs, which can introduce latency, cost, and points of failure.
In the Ethereum ecosystem post-Merge, archive functionality is typically provided by execution clients like Erigon (which uses a flat storage model optimized for historical queries) and Nethermind. Users interact with them via standard JSON-RPC methods such as eth_getBalance or eth_getStorageAt with a specific block number parameter. The emergence of Ethereum's Portal Network aims to create a more decentralized way to access historical data, potentially reducing the infrastructural burden of traditional archive nodes. However, for the foreseeable future, dedicated archive clients remain the backbone for any service requiring guaranteed, low-level access to the blockchain's entire historical record.
Key Features of an Archive Client
An archive client is a specialized blockchain node that stores the complete historical state of a network, enabling deep historical queries and analysis that are impossible with standard full nodes.
Complete Historical State
Unlike a full node, which only stores recent state to validate new blocks, an archive client retains the entire state history (account balances, contract storage, etc.) for every single block since genesis. This enables querying the state of the blockchain at any past block height.
- Example: Finding an account's ETH balance on January 1, 2021.
- Requirement: Massive storage, often multiple terabytes.
State Trie Pruning Disabled
To save space, standard nodes use state trie pruning, deleting old state data that is no longer needed for validating new blocks. An archive client disables this pruning mechanism. It maintains all intermediate Merkle Patricia Trie nodes, allowing it to cryptographically prove any historical state.
Enabler for Advanced Indexing
Archive nodes are the foundational data source for block explorers, analytics platforms, and indexing services like The Graph. They allow these services to efficiently answer complex historical questions without needing to replay the entire chain from scratch.
- Use Case: Calculating total DEX volume for a specific token over a 6-month period.
High Resource Requirements
Running an archive node demands significant and growing resources.
- Storage: Can exceed 10+ TB for mature chains like Ethereum.
- Memory: Requires ample RAM for efficient state access.
- Sync Time: Initial synchronization can take weeks, as it processes every transaction in history.
JSON-RPC Endpoints for History
Archive clients expose the same JSON-RPC API as other nodes but support additional historical queries. The key differentiator is the eth_getBalance, eth_getStorageAt, and eth_call methods can be executed with a block number parameter from the distant past, returning the state as it was at that time.
Comparison: Full vs. Archive
Full Node:
- Validates new blocks and transactions.
- Stores only recent state (pruned).
- ~500 GB - 1 TB storage.
Archive Node:
- Validates new blocks and transactions.
- Stores all historical state (unpruned).
- 2 TB - 15+ TB storage.
- Enables deep historical queries.
Archive Client vs. Full Node vs. Light Client
A comparison of the three primary node types in Ethereum, defined by their data storage and validation capabilities.
| Feature / Metric | Archive Client | Full Node | Light Client |
|---|---|---|---|
Data Storage | Entire history (all states) | Recent 128 blocks (pruned state) | Block headers only |
Initial Sync Time | Weeks (5+ TB) | Days (~650 GB) | Minutes (< 1 GB) |
Hardware Requirements | High (16+ GB RAM, Fast SSD) | Moderate (8+ GB RAM, Fast SSD) | Low (Mobile device capable) |
Network Validation | Full historical validation | Full recent validation | Probabilistic validation |
Serves Historical Data | |||
Trust Assumption | Trustless (self-validating) | Trustless (self-validating) | Trusts a full node for data |
Primary Use Case | Block explorers, analytics, indexers | dApp infrastructure, staking | Mobile wallets, quick queries |
Primary Use Cases
An archive client is a specialized blockchain node that stores the complete historical state of a network, enabling deep historical data queries that are impossible for standard full nodes.
Historical Data Analysis & Auditing
Enables forensic analysis of on-chain activity by providing access to the complete historical state. This is essential for:
- Auditing smart contracts and tracking fund flows over time.
- Compliance reporting for regulatory requirements.
- Investigating security incidents by reconstructing the exact state of the blockchain at any past block.
Advanced Blockchain Indexing
Powers data infrastructure for applications requiring complex historical queries. Indexers and APIs (like The Graph) rely on archive nodes to:
- Build and serve historical data feeds for dApps.
- Enable queries for user balances or contract interactions at any point in history.
- Support analytics platforms and blockchain explorers with deep historical data.
Developer Tooling & Testing
Critical for developers building and debugging decentralized applications. Provides the ability to:
- Fork the mainnet at a specific historical block for testing in a local environment (e.g., using Hardhat or Ganache).
- Accurately simulate complex transactions that depend on past state.
- Verify the behavior of smart contracts against historical events.
Research & Protocol Development
Supports academic and protocol-level research by offering a verifiable, complete dataset. Researchers use archive clients to:
- Analyze long-term network metrics, fee markets, and usage patterns.
- Model and test proposed protocol upgrades (EIPs) against real historical data.
- Conduct economic studies of DeFi protocols and token distributions from genesis.
Data Archival & Preservation
Serves as the canonical, immutable record of the blockchain's entire history. This function is vital for:
- Network resilience and decentralization, ensuring historical data isn't lost.
- Creating permanent backups of chain state for disaster recovery.
- Enabling future state pruning experiments on full nodes, knowing a complete archive exists elsewhere.
Comparison to Full & Light Nodes
Highlights the specialized role of an archive client versus other node types.
- Full Node: Stores recent state to validate new blocks; prunes old state to save space.
- Light Node: Stores only block headers; relies on full nodes for current state data.
- Archive Node (Client): Stores all historical state generated since genesis, requiring significantly more storage (e.g., 10+ TB for Ethereum).
Ecosystem Usage & Providers
An archive client is a specialized blockchain node that stores the complete historical state of a network, enabling deep historical queries and data analysis that are impossible with standard full nodes.
Core Function: Full Historical State
Unlike a standard full node, which only stores recent blocks and the current state, an archive client maintains the complete historical state for every single block since genesis. This includes the balance, code, and storage of every account at any point in history, enabling complex queries like "What was the balance of this address at block 15,000,000?"
Primary Use Cases
Archive nodes are essential infrastructure for services requiring deep historical data:
- Block Explorers: To display transaction history and state changes for any block.
- Analytics Platforms: For calculating historical metrics, token flows, and protocol growth.
- Developer Tools: To test smart contracts against past states or debug historical transactions.
- Indexers: As the data source for building off-chain indexes (e.g., The Graph).
Technical Trade-offs: Storage & Sync
Running an archive client requires massive storage (often multiple terabytes) and a lengthy initial synchronization period that can take weeks. For Ethereum, an archive Geth node requires over 12 TB of SSD storage. This is the primary reason most developers and projects use managed RPC providers instead of self-hosting.
Ethereum Client Examples
The major Ethereum execution clients can be run in archive mode:
- Geth: Use the
--gcmode archiveflag. - Nethermind: Configured via
Sync.SnapSyncandPruningsettings. - Erigon: Designed for efficient archive storage, using a "flat" database model to reduce the footprint.
- Besu: Configured with
pruning-enabled=falseanddata-storage-format=BONSAIfor archive data.
Comparison: Full vs. Archive vs. Light
Full Node: Stores recent blocks (~128 for Ethereum) and current state. Can verify new transactions. Archive Node: A full node + the entire historical state. Can answer any historical query. Light Client: Stores only block headers. Relies on full nodes for data. Minimal resource use. Archive nodes are the most resource-intensive but offer the highest data completeness.
Archive Client
A specialized node software designed for historical data retrieval and long-term storage of the entire blockchain state.
An archive client is a full node that retains the complete historical state of a blockchain, including the state (account balances, contract storage) for every block since genesis, rather than pruning this data to save disk space. This makes it an essential infrastructure component for services requiring deep historical queries, such as block explorers, analytics platforms, and certain developer tools that need to verify or analyze past states without replaying the entire chain. Unlike a standard full node, which may only keep recent state data, an archive node's storage requirements grow linearly with the chain's age and activity.
The implementation of an archive client involves maintaining a persistent state trie (e.g., a Merkle Patricia Trie in Ethereum) and storing all intermediate state roots. When a block is processed, the client does not discard the previous state but preserves it, indexed by its block hash or number. This is computationally and storage-intensive, often requiring terabytes of space and significant I/O resources. Clients like Geth (in --syncmode full --gcmode archive), Erigon, and Nethermind offer archive modes, each with different optimizations for data retrieval and storage efficiency.
A primary use case for an archive client is enabling direct queries for an account's balance or a smart contract's storage slot at any arbitrary block height in the past, which is impossible on a pruned node. They are critical for indexing services, historical analytics, and dispute resolution in layer-2 systems that require cryptographic proofs of past states. Running an archive node is often a prerequisite for operating services like The Graph's indexing nodes or for developers needing to test complex interactions against historical mainnet data in a local environment.
From a network health perspective, archive nodes serve as a decentralized backbone for historical data availability, ensuring the blockchain's full history remains accessible and verifiable. While not required for consensus or basic transaction propagation, they provide a public good for the ecosystem. Users typically interact with archive nodes indirectly through RPC endpoints provided by infrastructure services like Infura, Alchemy, or QuickNode, which abstract away the complexity and cost of maintaining such nodes.
Frequently Asked Questions (FAQ)
Common questions about archive clients, their critical role in blockchain infrastructure, and how they differ from other node types.
An archive client is a type of blockchain node that maintains a complete historical record of the network's state for every single block since genesis. Unlike a full node, which only stores recent state to validate new blocks, an archive node retains the entire history, including all intermediate states, account balances, and contract storage at every point in time. This makes it essential for services requiring deep historical data analysis, such as block explorers, advanced analytics platforms, and certain developer tools that need to query past states. Running an archive node requires significantly more storage and resources than a standard full node.
Get In Touch
today.
Our experts will offer a free quote and a 30min call to discuss your project.