Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
LABS
Guides

How to Design State Retention Policies

A technical guide for developers on implementing state retention strategies for blockchain nodes. Covers policy design, trade-offs, and code examples for EVM and SVM chains.
Chainscore © 2026
introduction
BLOCKCHAIN SCALABILITY

How to Design State Retention Policies

State retention policies determine what data a blockchain node must store, directly impacting scalability and decentralization. This guide explains the trade-offs and design patterns.

Blockchain state is the complete set of data—account balances, smart contract code, and storage variables—that defines the network's current condition. A state retention policy dictates how much of this data a node is required to keep to participate in consensus and validate new blocks. The core challenge is balancing data availability for security against the storage burden on node operators. Policies range from full archival (storing all historical state) to pruned (storing only recent state) and stateless (storing no persistent state). The choice influences who can run a node, sync times, and the feasibility of historical data queries.

The most common policy is running a full archival node. This node stores the entire history of the chain, including every state trie for every block. It's essential for services like block explorers, indexers, and some RPC providers that need arbitrary historical data. However, storage requirements grow linearly with chain usage; Ethereum's archive node, for instance, requires over 12 TB. A pruned node reduces this load by deleting old state data after it's no longer needed for validating new blocks, typically keeping only the last 128 blocks' state. Tools like Geth's --syncmode snap or Erigon implement sophisticated pruning, cutting storage needs by over 90% while maintaining full validation capabilities.

For extreme scalability, stateless and stateful clients represent an advanced frontier. In a stateless model, validators don't store the state locally. Instead, they verify blocks using cryptographic proofs (like witnesses) provided with each transaction, as theorized for Ethereum's Verkle tree transition. A lighter variant is the stateful client, which stores a minimal, recent subset of state. Designing these policies requires careful protocol changes: defining the witness format, ensuring proof sizes are manageable, and creating incentives for proof providers. The goal is to enable ultra-light clients that can validate with near-zero storage.

When designing a policy, you must analyze your chain's specific needs. Ask: Is historical data a public good (e.g., for audits and analytics), or can it be outsourced? What is the target hardware profile for a node operator? Solutions often involve layered storage. Core consensus nodes may use aggressive pruning, while specialized archive services store full history. Data can be moved to decentralized storage networks like Filecoin or Arweave for long-term retention, with the blockchain storing only content-addressed references. This hybrid approach, used by networks like Celestia for data availability, separates consensus from data storage duties.

Implementation requires configuring your client software. For example, using Besu, you set --pruning-enabled=true and --pruning-blocks-retained=1000. With Geth, you choose a sync mode: full (archive), snap (pruned), or light. For a custom chain, you might modify the client's state database logic to prune based on a custom epoch or implement a module that offloads old state to an external service via an RPC hook. Always test pruning logic thoroughly to ensure it doesn't accidentally delete state needed for reorg handling or certain transaction types.

Ultimately, your state retention policy is a foundational scaling decision. A well-designed policy reduces node operation costs, improves network decentralization by lowering barriers to entry, and defines the ecosystem's data landscape. The trend is toward modularity: letting the execution layer handle state minimally while dedicated layers manage availability and long-term storage. By explicitly planning for state growth from the outset, you can build a chain that remains scalable and accessible for years to come.

prerequisites
PREREQUISITES AND CORE CONCEPTS

How to Design State Retention Policies

A state retention policy defines what data a blockchain node stores, for how long, and why. This guide explains the trade-offs and technical considerations for designing these policies.

A state retention policy is a set of rules governing how a node manages its local copy of the blockchain's state. The state includes account balances, smart contract storage, and other data derived from transaction execution. Unlike the immutable transaction history, the state is mutable and can grow infinitely, creating a significant storage burden. The core decision is whether to run an archive node (storing all historical states) or a pruned node (discarding old state data). This choice impacts a node's resource requirements, its ability to serve historical queries, and its role in the network.

Designing a policy requires balancing three key constraints: storage cost, data availability, and sync time. An archive node on Ethereum Mainnet currently requires over 12 TB of SSD storage, while a pruned node can operate with less than 1 TB. However, a pruned node cannot answer queries about an account's balance at an arbitrary past block. Policies are often implemented via pruning algorithms. For example, Geth's snapshot-based pruning or Erigon's state and history tables allow for different retention granularities. The policy must align with the node's purpose—whether for block production, RPC service, or data analysis.

The technical implementation involves configuring your client's database and understanding its garbage collection mechanisms. In Geth, you use flags like --gcmode (archive, full, or light) and --snapshot to control state retention. For a custom policy, you might need to modify the client's state trie management logic, which handles the Merkle Patricia Trie structure storing all state data. Developers should also consider state growth trends of their target chain; networks with high contract deployment or NFT minting activity experience faster state bloat, necessitating more aggressive pruning schedules.

Beyond full nodes, light clients and stateless clients represent alternative models with minimal state retention. Light clients store only block headers, relying on full nodes for state proofs via protocols like Les (Light Ethereum Subprotocol). The emerging verkle tree design in Ethereum's future upgrades aims to enable truly stateless validation, where nodes require only a small proof to verify transactions without storing any state. Designing for these paradigms shifts the policy focus from local storage management to efficient proof generation and verification.

When deploying infrastructure, document your retention policy clearly. Specify the retention window (e.g., "last 128,000 blocks"), the pruning trigger (disk usage threshold or block interval), and any archival backup strategy for critical historical data. Use monitoring tools to track state size growth and prune job success. A well-designed policy ensures node stability, controls operational costs, and meets the data requirements of your application—whether it's a block explorer needing full history or a validator prioritizing sync speed and disk longevity.

policy-objectives
STATE MANAGEMENT

Key Policy Objectives and Trade-offs

Designing state retention policies involves balancing storage costs, data availability, and application logic. These cards outline the core objectives and inherent compromises.

01

Minimizing On-Chain Storage Costs

The primary cost driver is storing data on-chain. Key strategies include:

  • State pruning: Periodically deleting old, non-essential data (e.g., expired bids in an auction).
  • State rent: Charging accounts for persistent storage, like Solana's mechanism for inactive accounts.
  • Statelessness: Moving state off-chain and using cryptographic proofs (like zk-SNARKs) for verification.

Trade-off: Increased complexity and potential for data loss if off-chain storage fails.

02

Ensuring Data Availability & Liveness

Applications must guarantee that critical state is accessible when needed. This involves:

  • Data availability layers: Using networks like Celestia or EigenDA to store transaction data cheaply and verifiably.
  • State expiry with proofs: Archiving state after a period but allowing users to submit proofs to resurrect it, as proposed in Ethereum's EIP-4444.
  • Interoperability standards: Ensuring archived state can be accessed by other chains via protocols like Inter-Blockchain Communication (IBC).

Trade-off: Higher latency for retrieving archived data and reliance on external systems.

03

Optimizing for Execution Speed

Frequent state access slows down transaction processing. Optimizations include:

  • Warm/Cold state separation: Keeping frequently accessed "warm" state in memory (like MPT nodes in Ethereum clients) while "cold" state resides on disk.
  • Parallel execution: Designing state dependencies to allow for concurrent transaction processing, as seen in Solana and Sui.
  • State channels: Moving transactional state off-chain for instant finality, settling only the net result on-chain.

Trade-off: Requires careful contract design and can increase hardware requirements for nodes.

04

Maintaining Protocol-Level Security

State policies must not introduce new attack vectors. Critical considerations are:

  • Replay protection: Ensuring state transitions from archived data cannot be replayed maliciously.
  • Denial-of-service resistance: Preventing spam attacks that fill state with junk data to increase costs for others.
  • Consensus finality: Guaranteeing that once state is pruned or expired, it does not affect the canonical chain's security.

Trade-off: More conservative policies can lead to higher baseline storage costs for all users.

05

Enabling Developer & User Experience

Policies should not overburden application logic or end-users. This requires:

  • Predictable cost models: Clear gas schedules for state operations so developers can budget (e.g., Ethereum's SSTORE opcode cost).
  • Automated state management: Tools like The Graph for indexing or Pocket Network for RPC access abstract state complexity.
  • Graceful degradation: Applications should handle missing state gracefully, perhaps by prompting users to submit a recovery proof.

Trade-off: Abstraction layers can introduce centralization points or additional fees.

ARCHIVAL VS. FULL VS. LIGHT

Node Type Comparison: Storage and Capabilities

Comparison of storage requirements, data availability, and operational capabilities for different Ethereum node types, informing state retention policy decisions.

Feature / MetricArchive NodeFull NodeLight Node

State History Storage

~12 TB

~650 GB

< 100 MB

Block History Storage

Complete (from genesis)

Recent ~128 blocks

Header-only

State Query Capability

Any historical state

Latest state only

Limited, via trust

Serves RPC Requests

Initial Sync Time

5-10 days

1-3 days

< 1 hour

Hardware Requirements

High (32GB+ RAM, Fast SSD)

Medium (16GB RAM, SSD)

Low (Mobile capable)

Network Bandwidth

High (>50 Mbps)

Medium (~25 Mbps)

Low (<5 Mbps)

Suitable for Validating

evm-implementation
ARCHIVE NODE CONFIGURATION

Implementing State Retention Policies for EVM Chains (Geth, Erigon)

State retention policies determine how much historical blockchain data your node stores, balancing disk usage with data availability for applications.

An Ethereum node's state is the complete set of accounts, balances, and smart contract storage. As the chain grows, storing the entire history becomes infeasible for most operators. A state retention policy defines pruning rules to delete old data while preserving the node's core functionality. For Geth and Erigon, this is configured via command-line flags that control the pruning window—the number of recent blocks for which full state data is kept. Data outside this window is either archived or discarded.

In Geth, the primary flag is --gcmode. Setting --gcmode=archive disables pruning entirely, creating a full archive node. For a pruned node, use --gcmode=full (the default) along with --gcmode.override.berlin=120000 to set a specific retention window in blocks. For example, --gcmode=full --gcmode.override.berlin=100000 keeps state for the last 100,000 blocks. Geth's garbage collection runs automatically, removing trie nodes not needed for the recent state.

Erigon uses a different architecture, separating historical data into chaindata and snapshots. Its pruning is more granular. Use --prune flags to specify what to remove: --prune=hrtc prunes history, receipts, and call traces. The retention period is set with --prune.h.older (e.g., --prune=hrtc --prune.h.older 100000). Erigon can also run in --datadir.ancient mode, moving very old block data to a separate, cheaper storage location, which is a form of tiered retention.

Choosing a policy depends on your node's purpose. A JSON-RPC endpoint for recent queries may only need 128k blocks of state. An indexer for all transaction history requires an archive node or a pruned node with a very large window. A validator/client for consensus can often use light pruning. Consider that some DeFi protocols or block explorers may need access to state several months old for arbitrage analysis or user reporting.

To implement, first estimate your storage. A full Geth archive node requires ~12TB+, while a node pruned to 100k blocks may need under 1TB. Update your service file (e.g., systemd). For Geth: ExecStart=/usr/bin/geth --syncmode snap --gcmode=full --gcmode.override.berlin=100000. For Erigon: ExecStart=/usr/bin/erigon --prune=hrtc --prune.h.older 100000. Always test policy changes on a testnet or with a --datadir.old backup to avoid irreversible data loss.

Monitor disk usage and sync performance after applying a new policy. Tools like du and client-specific logs (geth.log, erigon.log) show pruning operations. Remember that pruning occurs during sync and block processing; an existing archive node must be resynced to apply pruning. For the most current flags and best practices, always refer to the official documentation for Geth and Erigon.

svm-implementation
GUIDE

Implementing Policies for Solana (SVM)

This guide explains how to design and implement state retention policies for Solana programs, a critical consideration for managing on-chain storage costs and program lifecycle.

On Solana, account state is persistent storage that programs rent via lamports. The network's rent-exemption model requires accounts to hold a minimum balance to avoid being purged. A state retention policy defines the rules for how your program manages this storage lifecycle—determining when data is created, archived, or deleted. Unlike EVM chains where storage is tied to a contract's address, Solana's account model separates data (accounts) from logic (programs), making explicit policy design essential for cost efficiency and data hygiene.

Designing a policy starts with categorizing your data's lifespan. Use PDA-derived accounts for user-specific state that should persist indefinitely, ensuring rent exemption is calculated and funded on creation. For temporary or cache-like data, consider using ephemeral accounts created by the client, which can be closed to reclaim rent. The close instruction is a fundamental tool, allowing a program to transfer an account's lamports to a beneficiary and mark its data as invalid, effectively deleting it from the chain while refunding the rent.

Implementing a close function is a security-critical component. A robust implementation must:

  • Validate the correct signer is authorizing the closure.
  • Ensure the correct destination account receives the reclaimed lamports.
  • Properly zero-out or invalidate the account data to prevent stale state resurrection. Here is a simplified Anchor framework example:
rust
pub fn close_account(ctx: Context<CloseAccount>) -> Result<()> {
    let dest_account_info = &ctx.accounts.destination;
    let account_to_close = &ctx.accounts.account_to_close;
    let signer = &ctx.accounts.signer;
    // ... authorization logic ...
    // Transfer remaining lamports to the destination
    **dest_account_info.try_borrow_mut_lamports()? += account_to_close.lamports();
    **account_to_close.try_borrow_mut_lamports()? = 0;
    Ok(())
}

For data that must be retained but accessed infrequently, consider state compression or off-chain indexing. Store minimal identifiers or hashes on-chain and keep the full dataset in a decentralized storage solution like Arweave or IPFS, or in an off-chain database indexed by a service like Helius. This hybrid storage model significantly reduces on-chain costs. Alternatively, use account versioning; design your account structs with discriminators to allow for graceful upgrades and migrations, leaving obsolete data accounts eligible for closure under the new program logic.

Your policy should be documented in the program's specification and tested thoroughly. Write integration tests that simulate long-running scenarios: account creation, rent epoch passage, and closure operations. Tools like the Solana Test Validator and Anchor's test framework are indispensable. A well-defined state retention policy prevents resource leaks, reduces operational costs for users, and ensures your Solana program remains sustainable and efficient as usage scales.

tooling-resources
IMPLEMENTATION GUIDE

Tools and Monitoring for State Retention Policies

Designing effective state retention policies requires specialized tools for data pruning, archival, and real-time monitoring. This guide covers the essential software and frameworks used by leading protocols.

ARCHITECTURE COMPARISON

Retention Strategy Decision Matrix

Comparison of core approaches for managing state retention in blockchain applications.

StrategyStateless ClientsState ChannelsPlasma / RollupsFull On-Chain

Data Availability

Client-managed

Bilateral off-chain

On-chain proofs

Fully on-chain

Withdrawal Finality

Instant

Challenge period (1-7 days)

~1 week

Immediate

User Custody

User holds data

Smart contract escrow

Smart contract escrow

None required

Gas Cost for Exit

None

High (dispute)

Moderate (proof)

N/A

Trust Assumptions

None

Counterparty honesty

Operator liveness

Chain security

Best For

Light clients, wallets

High-frequency micropayments

Scaling general apps

Maximum security

Implementation Complexity

Low

High

Very High

Low

Example Protocols

Ethereum statelessness, Celestia

Lightning Network, Raiden

Optimism, Arbitrum, zkSync

Base Ethereum L1

advanced-optimizations
ADVANCED OPTIMIZATIONS AND CUSTOM PRUNING

How to Design State Retention Policies

Learn to implement custom state retention rules to optimize node storage and performance while preserving critical historical data.

A state retention policy defines which historical blockchain data a node stores and for how long. Full nodes typically retain all data, but this requires significant storage. Pruning is the process of selectively deleting old data, such as spent transaction outputs or historical state trie nodes. Designing a custom policy involves balancing storage costs against the need for data accessibility for services like block explorers, indexers, or specific smart contract queries. The core challenge is identifying which data is archive-critical versus prunable.

The first step is to audit your node's data usage. Tools like geth's debug namespace or erigon's stage reports can show storage distribution. Key metrics include the size of the chaindata, ancient data, and state trie. For Ethereum, you might find that blocks older than 128 epochs (approx. 27 hours) are rarely accessed for consensus but are needed for some historical RPC calls. A policy could be: prune transaction receipts and intermediate state roots after 100,000 blocks, but keep all block headers and final state roots indefinitely. This preserves the ability to verify proofs while drastically reducing storage.

Implementation varies by client. For Geth, you configure pruning via the --gcmode flag (archive, full, light). For custom logic, you would modify the core/state/pruner.go logic, perhaps adding a filter based on block number or contract address. In Erigon, which uses a staged sync, you can adjust the prune stages in the stages sequence. A practical example is creating a whitelist of important smart contract addresses (e.g., major DeFi protocols, ENS) whose full state history must be archived, while pruning all others after a set interval.

When designing policies, consider the proof requirements of your applications. Light clients and bridges often need Merkle proofs from recent blocks, requiring corresponding state. If your node serves a zk-rollup sequencer, you must archive all state related to its bridge contract. Test your policy on a testnet or a snapshot of mainnet data. Use the client's RPC methods (like eth_getProof) to verify that required historical data is still accessible after pruning. Incorrect pruning can silently break applications that assume full data availability.

Advanced strategies involve tiered storage. Keep hot data (last 10k blocks) on SSDs, warm data (up to 100k blocks) on high-capacity HDDs, and archive the rest to decentralized storage like Arweave or Filecoin, indexed by block hash. Projects like Erigon's "external tx lookup" exemplify this by separating transaction data. The final policy should be documented and versioned. As network upgrades (like Ethereum's Verkle trees) change state structure, your retention rules must be re-evaluated to maintain node functionality and service reliability.

DESIGN AND IMPLEMENTATION

Frequently Asked Questions on State Retention

Common developer questions on designing and implementing state retention policies for blockchain applications, covering gas costs, security, and best practices.

State retention is the practice of strategically preserving data on-chain across multiple transactions or contract calls, rather than repeatedly writing and deleting it. It's a core gas optimization technique because writing to storage (SSTORE) is one of the most expensive operations on the Ethereum Virtual Machine (EVM).

For example, instead of storing a user's temporary calculation result and then clearing it (costing ~20,000 gas for a new value and ~5,000 gas for a refund on deletion), you retain the value in a state variable for future use within the same transaction or subsequent calls. This avoids the initial write cost on every interaction. Effective state retention can reduce a contract's gas consumption by 15-30% for state-heavy operations, directly lowering user transaction fees.

conclusion
IMPLEMENTATION

Conclusion and Next Steps

Designing effective state retention policies requires balancing data availability, cost, and application logic. This guide has outlined the core principles and strategies.

A well-designed state retention policy is not a one-size-fits-all solution. It is a deliberate architectural choice based on your application's specific needs. Key factors include the frequency of state access (hot vs. cold data), the cost of on-chain storage (e.g., Ethereum mainnet vs. Layer 2), and the trust assumptions for off-chain data availability. For example, a high-frequency DeFi trading pair's liquidity pool state must be fully on-chain and immediately verifiable, while an NFT project's historical metadata for past collections can be efficiently archived on decentralized storage like Arweave or IPFS, referenced by a permanent content identifier (CID) stored on-chain.

To implement these strategies, developers should leverage established patterns and tools. For on-chain pruning, consider using Ethereum's SSTORE2 for immutable data or custom mappings with deletion logic in upgradeable contracts. For Layer 2 solutions, understand the specific data availability guarantees of your chosen chain—Optimism's fault proofs rely on different data publishing than Arbitrum's BOLD protocol. When using decentralized storage, integrate libraries like web3.storage or lighthouse.storage to pin data and always store the proof (like an IPFS CID) in an immutable on-chain event. Testing your policy's resilience to state expiry or pruning is crucial.

Your next steps should involve prototyping and auditing. Start by mapping your smart contract's data into categories: critical consensus state, frequently accessed application state, and historical logs. Implement a minimal viable policy for a testnet deployment and simulate long-term operation. Tools like Hardhat and Foundry can help you fork mainnet state and test pruning functions. Finally, consider engaging a smart contract auditing firm to review your state management logic, as improper data handling can lead to permanent data loss or broken contract functionality. Continue your research with resources like the Ethereum Foundation's documentation on state management and the technical papers for data availability layers like Celestia or EigenDA.

How to Design State Retention Policies for Blockchain Nodes | ChainScore Guides