How to Set Up Pruning Strategies for Blockchain Nodes

introduction

TUTORIAL

How to Set Up Blockchain Pruning Strategies

A practical guide to implementing pruning strategies for Ethereum, Bitcoin, and other nodes to manage disk space and improve sync times.

Blockchain pruning is the process of selectively removing historical data from a node's storage while preserving the information necessary to validate new transactions and blocks. A full archival node stores the entire history of the chain, which for Ethereum exceeds 1 TB and for Bitcoin is over 500 GB. Pruning allows a node to operate as a pruned full node, maintaining the current state and recent blocks, often reducing storage requirements by 80-90%. This is crucial for validators, RPC providers, and developers who need to run a verifying node without the overhead of indefinite historical data storage.

To implement pruning, you must configure your node client with specific flags. For Geth, the dominant Ethereum execution client, you enable pruning by starting the node with --gcmode=archive for full history or --gcmode=full for the default pruned mode. To perform a one-off deep prune to reclaim space on an existing archive node, use geth snapshot prune-state. For Bitcoin Core, pruning is activated by setting the prune= parameter in bitcoin.conf or via the -prune CLI flag, specifying the target size in megabytes (e.g., prune=550 for 550 MB). It's important to note that once pruning is enabled on Bitcoin Core, you cannot revert to a full archival node without a resync.

Different pruning strategies exist based on the data type. State pruning removes old state trie nodes that are no longer referenced by the latest 128 blocks (in Geth's case). Block pruning discards old block bodies and receipts, keeping only headers and the most recent N blocks for serving light clients. TxIndex pruning disables or limits the transaction index. When setting up, ensure your node has completed an initial sync in archival mode if you plan to prune later, as pruning during sync is often unsupported. Always maintain backups of your chaindata directory before performing major pruning operations.

For advanced use cases, consider manual pruning scripts or client-specific tools. Nethermind (another Ethereum client) offers a prune plugin with configurable retention periods. On the consensus layer, Lighthouse and Prysm beacon chain clients can prune old beacon state data using flags like --historic-state-cache-count. Monitoring is essential; track metrics like chaindata folder size, geth_db_chaindata_disk_size, and sync status. Pruned nodes cannot serve historical data queries (like eth_getLogs for old blocks), so ensure your application's requirements align with the chosen strategy. For developers, services like Infura or Alchemy can fill gaps for archived data needs.

prerequisites

PREREQUISITES

How to Set Up Pruning Strategies

Pruning is a critical node operation that deletes historical blockchain data to conserve disk space while preserving network functionality. This guide explains the core concepts and provides actionable steps for configuring pruning on popular clients.

Blockchain nodes store a complete history of all transactions and state changes, which can grow to several terabytes. Pruning allows a node to delete this historical data after it is no longer needed for validating new blocks, significantly reducing storage requirements. A pruned node maintains only the most recent state and a limited history, remaining a fully functional participant in consensus. This is distinct from archival nodes, which retain the entire chain history for services like block explorers. Common clients like Geth, Erigon, and Besu offer configurable pruning modes.

Before implementing pruning, understand the key data types involved. The blockchain itself is the sequential record of blocks. The state is a snapshot of all account balances and smart contract storage at a given block. The state trie is the data structure (a Merkle Patricia Trie) that cryptographically commits to this state. Pruning typically removes old block bodies and receipts but must carefully manage state trie nodes to avoid corrupting the current state. Incorrect pruning can render a node unable to process new blocks.

To set up pruning, you must first choose a client and mode. For Geth, the --prune flag is deprecated; instead, use geth snapshot prune-state to offline prune an existing database, or run with --snapshot=false to enable online pruning (experimental). Erigon uses an integrated pruning stage during sync; you can trigger manual pruning with erigon stage snapshots. Besu uses --pruning-enabled=true and allows configuration of blocks to retain (--pruning-blocks-retained). Always ensure your node is fully synced before attempting offline pruning operations.

A critical prerequisite is ensuring you have a complete, recent backup of your chain data. Pruning operations are irreversible. For mainnet operations, consider testing your pruning strategy on a testnet or a mirrored dataset first. The command and required free disk space vary: Geth's offline prune needs roughly 2x the current database size temporarily. Monitor logs for Pruning state data or similar messages. After pruning, verify node health by checking it can serve recent block data and that the chaindata directory size has decreased as expected.

Advanced strategies involve tuning for specific use cases. A validator node may prioritize fast access to recent states and prune aggressively. A RPC endpoint provider might need a longer history for query flexibility. Tools like turbo-geth's integration prune algorithm or Nethermind's Pruning.Mode settings (Full, Memory, Hybrid) offer fine-grained control. Remember that once pruned, historical data can only be reacquired via a resync from genesis or a trusted peer. Proper pruning configuration is essential for maintaining a sustainable, performant node infrastructure.

key-concepts-text

CONFIGURATION GUIDE

How to Set Up Pruning Strategies

Pruning is a critical node operation that deletes historical state data to reduce storage requirements. This guide covers the practical steps for configuring different pruning strategies on EVM-compatible nodes.

Pruning strategies determine which historical data your node retains. The core trade-off is between storage efficiency and data availability. A full archive node keeps all historical state, allowing queries of any account balance or contract storage at any past block. A pruned node deletes this historical state, retaining only recent data and the current state trie, which can reduce storage by over 90%. Common strategies include pruning by block age (e.g., keep last 128 blocks), by state trie depth, or using a snapshot-based approach. Your choice depends on whether you need historical data for indexing, RPC services, or block production.

For Geth clients, pruning is configured via command-line flags. The default geth sync mode (--syncmode snap) performs automatic state pruning. To explicitly control retention, use --gcmode with archive, full, or light. For a pruned node keeping 128 epochs of state, you might run: geth --syncmode snap --gcmode full --gcmode.override.berlin=128. Nethermind uses a Pruning config block in its configs/mainnet.cfg file, with settings like Mode (None, Memory, Full, Hybrid), FullPruningTrigger (disk threshold), and PruningDelayBlockCount. Erigon implements full pruning as a separate, resource-intensive operation triggered manually or by schedule, which rewrites the database to remove old state.

When implementing a strategy, monitor key metrics. Track disk usage growth rate to schedule pruning before you run out of space. For Geth, the debug.setHead RPC method can be used for manual state rollback and prune in emergencies. Always ensure you have verified backups before major pruning operations. For networks serving many historical queries, consider a hybrid setup: run a pruned node for syncing and a separate dedicated archive node. Remember that once pruned, historical state data is irrecoverable from that node, so align your strategy with your application's requirements for data provenance and auditability.

COMPARISON

Pruning Support by Node Client

Pruning configuration methods and capabilities across major Ethereum execution and consensus clients.

Feature / Client	Geth	Nethermind	Besu	Lighthouse
Pruning Mode	Default (fast, archive)	Default (fast, full, archive)	Default (fast, full, archive)	Prunes finalized states
Config Flag	--gcmode	--Pruning.Mode	--data-storage-format	--slots-per-restore-point
Default Block History	~90k blocks	~90k blocks	~90k blocks	~2048 epochs
Disk Space (approx.)	650 GB	550 GB	700 GB	1.2 TB (with execution)
In-Place Pruning
Online Pruning
Custom Pruning Window
Memory Cache for Pruning	4 GB default	Configurable via --Pruning.CacheMb	Configurable via --Xbonsai-cache-size	Uses --freezer.epoch-state-cache-size

geth-pruning-guide

GUIDE

How to Prune a Geth Node

Pruning a Geth node removes historical state data to reclaim significant disk space while maintaining full node functionality for the current and recent blockchain state.

Geth's state is the complete set of account balances, contract code, and storage slots at a given block. Over time, this data grows substantially, often exceeding 1 TB for a full archive node. Pruning is the process of deleting this historical state data that is no longer necessary for a node to validate new blocks and serve the most recent chain data. The primary command to initiate this is geth snapshot prune-state. This operation is I/O intensive and can take several hours, but it is non-destructive and can be run on a live node, though performance will be degraded.

Before pruning, ensure you are running Geth v1.10 or later with the --snapshot flag enabled, which is required for the prune operation. The most common and effective strategy is to perform a full state prune. You execute this by attaching to your running Geth instance with geth attach and running debug.setHead(blockNumber) to roll back a few thousand blocks, then stopping Geth and running the prune command. A typical command is: geth --datadir /path/to/chaindata snapshot prune-state. This traverses the entire state trie and deletes all state data except that referenced by the 128 most recent blocks.

For maintenance, you can implement an automated pruning strategy. Since pruning is heavy, schedule it during off-peak hours. A script can check disk usage and, if it exceeds a threshold (e.g., 90%), safely stop Geth, run the prune-state command, and restart the service. Always maintain a recent backup of your chaindata directory before major operations. Post-prune, a node transitions from an 'archive' node to a 'pruned full' node. It can no longer serve historical state queries beyond the retained depth but will sync and validate new blocks normally, often with reduced storage requirements by 60-80%.

erigon-pruning-guide

NODE OPTIMIZATION

How to Configure Pruning in Erigon

Pruning is a critical configuration for managing Erigon's storage footprint. This guide explains the available strategies and how to implement them.

Erigon's prune flags control which historical data is retained after the node syncs. Unlike other clients, Erigon uses a staged sync that naturally organizes data, allowing for granular pruning. The primary flags are --prune and --prune.* (e.g., --prune.h.older). The --prune flag accepts a comma-separated list of components to prune: h for history (state change sets), r for receipts, t for transaction indexes, and c for call traces. For example, --prune=hrt is a common configuration for an archive node that still removes some data.

For more precise control, use the component-specific flags. These let you define retention periods. For instance, --prune.h.older 90 keeps state history for the last 90 days, deleting older data. You can combine this with --prune.h.before 15_000_000 to also prune history before block 15 million. Other flags like --prune.r.older and --prune.t.older work similarly for receipts and transaction indexes. It's crucial to set these flags before the initial sync; changing them on a live database requires a resync.

Choosing a strategy depends on your node's purpose. A full archive node requires --prune=. An archive node without old history might use --prune=hrtc. For a light node focused on recent state, --prune=hrtc --prune.h.older 30 is effective. Always verify disk space: an unpruned Erigon archive node can exceed 3TB for Ethereum Mainnet, while aggressive pruning can reduce this to under 500GB. Monitor sync progress with erigon status to ensure pruning is executing as configured.

Here is a sample command to start an Erigon node that prunes state history older than 30 days and receipts older than 90 days, while keeping all transaction indexes:

bash
erigon --chain mainnet --prune=hrc --prune.h.older 30 --prune.r.older 90 --datadir /path/to/data

Remember that --prune.t.older is omitted, so transaction indexes are kept indefinitely. The --datadir path must have sufficient IOPS for pruning operations, which can be intensive during the initial sync phase.

If you need to change your pruning strategy, you must wipe and resync the relevant data. Use erigon snapshots uncompress to prepare and then remove the database directories for the components you wish to re-prune (e.g., tg for transactions, history for state history). After deleting these directories, restart Erigon with the new prune flags. For production nodes, consider using the --torrent.upload.rate and --torrent.download.rate flags to manage bandwidth during the resync process from snapshots.

besu-nethermind-guide

NODE OPTIMIZATION

Pruning in Besu and Nethermind

Pruning is a critical storage optimization technique for Ethereum execution clients like Besu and Nethermind. This guide explains how to configure and manage pruning strategies to reduce node disk usage while maintaining full functionality.

Pruning removes historical state data that is no longer necessary for processing new blocks, significantly reducing the storage footprint of an Ethereum node. For example, a fully synced Geth archive node can require over 12 TB, while a pruned node may need only 500-700 GB. Both Besu (Hyperledger Besu) and Nethermind implement pruning to delete old trie nodes and contract bytecode that are unreachable from the current chain state. This process is automatic and runs in the background, but its behavior and aggressiveness are configurable via command-line flags and configuration files.

In Besu, pruning is enabled by default in fast sync mode. You can control it with the --pruning-enabled and --pruning-blocks-retained flags. The --pruning-blocks-retained parameter defines how many recent blocks' worth of state to keep; the default is 1024. A higher value increases storage used but can improve historical query performance. To run Besu with a custom pruning window of 5000 blocks, you would start the client with: besu --sync-mode=FAST --data-storage-format=BONSAI --pruning-enabled=true --pruning-blocks-retained=5000. The BONSAI storage format is required for pruning and offers substantial storage savings over the legacy FOREST format.

Nethermind also supports configurable pruning. It is enabled by default in fast sync (FastSync) and snap sync (SnapSync) modes. Key configuration options are set in the Pruning section of the config.cfg file or via command line. The Pruning.Mode can be set to Full (prune everything possible), Hybrid (default, keeps some recent state), or None. The Pruning.CacheMb and Pruning.PersistenceInterval settings control memory usage and how often pruned data is written to disk. For instance, to enable aggressive pruning from the command line, use: --Pruning.Mode Full --Pruning.CacheMb 2048.

When planning your pruning strategy, consider your node's purpose. For an RPC endpoint serving recent block data, aggressive pruning is ideal. For developers needing some historical state for debugging, a larger retention window (e.g., 10000+ blocks) is better. Monitor disk usage with tools like du or client-specific JSON-RPC methods (debug_getBadBlocks is not for pruning; instead, check logs for Pruning events). Be aware that pruning requires significant CPU and disk I/O; running it on a schedule during off-peak hours can minimize performance impact on block processing.

Both clients perform pruning automatically, but you can manually trigger a pruning cycle if needed. In Nethermind, use the admin_prune JSON-RPC method. Besu does not expose a direct RPC call for manual pruning; it occurs during block import. Always ensure you have a recent backup before experimenting with pruning settings, as a misconfiguration could force a resync. For the latest parameters, consult the official documentation: Besu Pruning Docs and Nethermind Configuration.

PRUNING STRATEGIES

Troubleshooting Common Pruning Issues

Pruning is essential for managing blockchain node storage, but misconfiguration can lead to sync failures, data loss, or performance issues. This guide addresses frequent problems and their solutions.

A node failing to sync after enabling pruning often indicates a configuration mismatch or a corrupted database. The most common causes are:

Incorrect Pruning Settings: Using pruning=everything on a validator or RPC node that requires full history. For archival services, use pruning=nothing.
Database Corruption: An unclean shutdown during the pruning process can corrupt the state. You may need to resync from genesis or a trusted snapshot.
Insufficient Disk Space: Pruning requires free space to rewrite the database. Ensure you have at least 20-30% free space on the drive.
Chain-Specific Flags: Some chains, like Cosmos SDK-based networks, require specific flags like pruning-interval and pruning-keep-recent. Check the chain's documentation.

First Step: Check your node logs for errors related to IAVL, state, or panic during commit. Verify your app.toml (Cosmos) or command-line flags match your node's purpose.

COMPARISON

Pruning Strategy Trade-offs

A comparison of common pruning strategies for blockchain nodes, focusing on storage, performance, and operational requirements.

Metric	Full Archive	Pruned (Default)	Light Client
Storage Required	1 TB	~ 550 GB	< 10 GB
Block History	All blocks & states	Last ~128 blocks	Block headers only
Historical Data Query
Sync Time (Initial)	5-7 days	1-2 days	< 6 hours
CPU/Memory Usage	High	Medium	Low
RPC Endpoint Support	Full	Full	Limited
Suitable For	Indexers, Analysts	Validators, DApps	Wallets, Explorers
Ethereum Execution Client Example	Geth (full), Nethermind (archive)	Geth (default), Besu	Erigon (light), Nimbus

resource-links

GUIDES

Resources and Documentation

Documentation and tooling references for configuring pruning strategies on blockchain nodes. These resources focus on reducing disk usage, improving state access performance, and aligning pruning settings with operational requirements.

Ethereum Node Pruning with Geth

Geth supports state pruning through its built-in garbage collection and sync modes. Most production Ethereum nodes rely on pruning to keep disk usage under control while serving RPC traffic.

Key concepts and actions:

Use snap sync (default) to avoid storing historical intermediate state
Configure with --syncmode=snap and --gcmode=full for aggressive pruning
Disable archive behavior unless you explicitly need historical state via --gcmode=archive
Expect disk usage of ~800GB to 1TB for a fully synced pruned mainnet node as of late 2024

The official documentation explains how pruning interacts with ancient data, state tries, and fast resyncs. It also covers tradeoffs between pruned full nodes and archive nodes when serving APIs or indexing pipelines.

EXPLORE

Erigon Pruning and Storage Optimizations

Erigon is designed around aggressive pruning and low storage overhead, making it popular for infrastructure providers. Its architecture separates state, history, and receipts into independently prunable components.

Practical steps:

Enable pruned sync by default; Erigon does not support archive mode in the same way as Geth
Control history retention using flags like --prune=htr (history, txs, receipts)
Retain only recent blocks for RPC workloads while preserving state consistency
Typical disk usage is 300–600GB for Ethereum mainnet, significantly lower than legacy clients

The docs explain how pruning affects RPC methods such as eth_getLogs and why indexers often pair Erigon with external archival databases. This is essential reading before deploying Erigon in production.

EXPLORE

Bitcoin Core Pruning Configuration

Bitcoin Core supports block pruning, allowing nodes to discard old block data while remaining fully validating. This is suitable for wallets, explorers, and internal services that do not need full historical blocks.

Implementation details:

Enable pruning with prune=<MiB> in bitcoin.conf
Minimum pruning target is 550 MiB; common setups use 5–10 GB
Pruned nodes still validate all blocks and enforce consensus rules
Pruning removes old block files but keeps the full UTXO set

The official guide details limitations, including the inability to serve historical blocks to peers and constraints on reindexing. It also explains when pruning is unsafe, such as for miners or public archival services.

EXPLORE

Cosmos SDK State Pruning Options

Cosmos SDK chains expose explicit state pruning strategies at the application layer. Validators and RPC providers must choose between disk usage and query flexibility.

Supported strategies:

default: keeps recent state and periodic snapshots
nothing: archive mode, stores all historical states
everything: keeps only the latest state
custom: define pruning-keep-recent and pruning-interval

Configuration is handled in app.toml and directly impacts ABCI query capabilities, IAVL tree size, and snapshot generation. The documentation explains how pruning affects state sync, upgrades, and forensic analysis. This resource is critical when operating validators or public RPC endpoints in Cosmos-based networks.

EXPLORE

PRUNING STRATEGIES

Frequently Asked Questions

Common questions and troubleshooting for implementing node pruning to manage blockchain storage.

Blockchain pruning is the process of selectively removing old, non-essential data from a node's storage while preserving the ability to validate new transactions and blocks. It's necessary because a full archival node's storage grows indefinitely—Bitcoin's chainstate is over 500 GB, and Ethereum's is over 1 TB. Pruning reduces this to a manageable size (often under 50 GB for Bitcoin) by deleting spent transaction outputs (UTXOs for Bitcoin) and old historical state data (for Ethereum). This conserves disk space, reduces sync times for new nodes, and lowers operational costs without compromising the node's core function of verifying the current chain state.

conclusion

IMPLEMENTATION SUMMARY

Conclusion and Next Steps

You have configured a node pruning strategy to manage disk usage. Here are the final considerations and how to proceed.

Your chosen pruning strategy is now active. For archive nodes, you maintain the full history, which is essential for services like block explorers or indexers. For pruned nodes, you have significantly reduced storage requirements, making node operation more accessible. Remember that pruning is a trade-off: you gain efficiency but lose the ability to serve historical data queries. Always verify your node's sync status using client-specific RPC calls, such as eth_syncing for Geth, to ensure it is fully operational with the new configuration.

The next step is monitoring. Use tools like du and df on Linux to track disk usage over time. For automated alerts, integrate with monitoring stacks like Prometheus and Grafana, using client-specific metrics exporters. For example, track the chaindata directory size and the head_block_number metric. If you are running a pruned node for an RPC endpoint, implement rate limiting and monitor request patterns to ensure performance remains stable as the chain grows.

To deepen your understanding, explore advanced configurations. For Ethereum clients, investigate geth's --cache options to optimize memory usage alongside pruning. For Cosmos SDK chains, examine the pruning-keep-recent and pruning-interval parameters in app.toml. Consider testing different strategies on a testnet before applying them to mainnet. The official documentation for your specific client, such as Geth's Docs or Cosmos's Docs, is the definitive resource for these advanced settings.

Finally, stay informed about protocol upgrades. Hard forks and network upgrades can sometimes change state storage formats or pruning compatibility. Subscribe to your client's development channels on GitHub or Discord. Pruning is a foundational skill for node operators; mastering it allows you to run efficient, reliable infrastructure that supports the broader blockchain network.