Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
LABS
Guides

How to Understand Data Availability Sampling

A technical guide explaining the mechanics of Data Availability Sampling (DAS), its role in scaling blockchains, and how to interact with DAS protocols using code examples.
Chainscore © 2026
introduction
BLOCKCHAIN SCALING

What is Data Availability Sampling?

Data Availability Sampling (DAS) is a cryptographic technique that allows light nodes to verify that all transaction data for a new block is available without downloading the entire block.

In blockchain scaling solutions like rollups and sharding, a core challenge is ensuring that the massive amounts of data posted to a base layer (like Ethereum) are actually accessible to anyone who wants to verify the chain's state. If this data is withheld, it becomes impossible to detect invalid transactions or reconstruct the chain's history. Data Availability Sampling solves this by turning a monolithic download into a probabilistic check. Instead of downloading a 2 MB block, a node can download a few dozen random 1 KB samples. If all samples are successfully retrieved, the node can be statistically confident the entire data is available.

The protocol relies on erasure coding, a method that expands the original data with redundant pieces. For example, a block's data is encoded so that any 50% of the new, larger dataset can reconstruct the original 100%. Light clients then randomly select and request small chunks of this encoded data. If even a single chunk is missing, it indicates that more than the tolerable 50% might be unavailable, causing the sampling to fail. This creates a fisherman's dilemma for malicious block producers: to successfully hide data, they must withhold over 50% of the encoded blob, which becomes exponentially easier to detect as more nodes perform random sampling.

Ethereum's Dankrad Feist first formalized the concept, and it is a cornerstone of Ethereum's Proto-Danksharding (EIP-4844) and full Danksharding roadmap. In this design, validators are not required to store all blob data permanently. Instead, they perform DAS for a short window (e.g., 30 days), after which the data can be pruned, relying on other parties like Data Availability Committees (DACs) or decentralized storage for long-term archival. This separation of availability from permanent storage is key to achieving scalable data layers without imposing unsustainable storage burdens on all network participants.

Implementing DAS requires specific cryptographic primitives. A common approach uses KZG polynomial commitments (as in EIP-4844) or Reed-Solomon codes with Merkle proofs. With KZG commitments, the data is treated as a polynomial, and the commitment allows for efficient generation of proofs for any sampled point. A node requests a random coordinate and receives a proof that the sample is consistent with the block's commitment. This is far more efficient than transmitting Merkle branches for each sample. Libraries like gnark-crypto and kzg provide the necessary tools for developers to work with these commitments.

For developers building on scalable L2s, understanding DAS is crucial for trust assumptions. When you use an Optimistic Rollup, you rely on the underlying L1's data availability for the fraud proof window. A ZK-Rollup similarly posts validity proofs and data. If that data is unavailable, the rollup's state cannot be challenged or updated. Tools like the Ethereum Portal Network aim to provide robust peer-to-peer data availability layers. By grasping DAS, you can better evaluate the security models of different scaling solutions and understand the trade-offs between data availability, decentralization, and scalability in the modular blockchain stack.

prerequisites
FOUNDATIONAL CONCEPTS

Prerequisites

Before diving into Data Availability Sampling (DAS), you need a solid understanding of the underlying blockchain scaling and data concepts.

Data Availability Sampling is a core component of modern blockchain scaling solutions like Ethereum's danksharding and Celestia. Its primary function is to allow light nodes to securely verify that all transaction data for a new block is published and available, without downloading the entire dataset. This is the foundation for data availability, a critical security property that prevents malicious block producers from hiding data and creating invalid blocks. Understanding DAS requires familiarity with several key areas of blockchain architecture and cryptography.

First, you should understand the modular blockchain thesis. This paradigm separates the core functions of a blockchain—execution, settlement, consensus, and data availability—into distinct layers. DAS operates at the data availability layer, which is responsible for guaranteeing that transaction data is published. Projects like Celestia and EigenDA are built as specialized data availability layers. Contrast this with monolithic chains like early Ethereum, where all functions are bundled together, creating a bottleneck for scalability.

You must also grasp the basics of erasure coding, specifically Reed-Solomon codes. This is the cryptographic trick that makes DAS possible. A block's data is expanded into coded chunks, such that the original data can be recovered even if up to 50% of the chunks are missing. In DAS, light nodes randomly sample a small number of these chunks. If all samples are available, they can be statistically confident the entire data is available. This is far more efficient than downloading the full block.

Familiarity with light client protocols is essential. These are nodes that do not store the full blockchain state or history. Their security traditionally relied on full nodes, but DAS empowers them to independently verify data availability. Concepts like Merkle proofs and Merkle roots are crucial here, as they allow for efficient verification that a small piece of data belongs to a larger set. In DAS, each data chunk is accompanied by a Merkle proof against the block's data root.

Finally, a working knowledge of consensus mechanisms (especially Proof-of-Stake) and the block production process is important. You need to understand the roles of validators/proposers, how blocks are constructed and propagated, and what it means for a block to be "withheld." The security guarantee of DAS is probabilistic: by increasing the number of samples, light nodes can reduce the chance of a malicious block producer successfully hiding data to an infinitesimally small level, making the chain secure for light clients.

core-problem
BLOCKCHAIN SCALING

The Data Availability Problem

Data availability is the guarantee that all data for a block is published to the network, enabling nodes to verify transaction validity. This guide explains why it's a core bottleneck for scaling and how Data Availability Sampling provides a solution.

In a blockchain, full nodes download and validate every transaction in every block. This ensures security but limits throughput, as block size is constrained by the processing power of the weakest node. Data availability refers to the requirement that all data in a new block must be made available for download. If a block producer withholds even a single transaction, the network cannot verify if hidden transactions are invalid, opening the door to fraud. This creates a fundamental scalability trilemma: increasing block size to process more transactions per second (TPS) makes it harder for regular nodes to keep up, potentially centralizing the network.

Data Availability Sampling (DAS) is a cryptographic technique that allows light clients to verify data availability without downloading an entire block. It relies on erasure coding, where block data is expanded into coded chunks with redundancy. Using a protocol like KZG polynomial commitments or Reed-Solomon codes, the block producer commits to the full data. Light clients then randomly sample a small number of these chunks. If all sampled chunks are available, they can be statistically confident the entire block is available. This enables secure block validation with sub-linear data downloads.

The practical implementation of DAS is central to Ethereum's danksharding roadmap and Celestia's modular blockchain architecture. In Ethereum's Proto-Danksharding (EIP-4844), blobs of data are posted to the Beacon Chain with a short persistence window, and validators perform sampling. Celestia uses a two-dimensional Reed-Solomon encoding scheme where light nodes sample rows and columns of a data matrix. For developers, this means layer-2 rollups can post their transaction data to a dedicated data availability layer, significantly reducing costs compared to Ethereum mainnet calldata while maintaining robust security guarantees.

Understanding DAS requires familiarity with key components. The Data Availability Committee (DAC) is a trusted group that signs off on data availability, a simpler but more centralized model used by some early rollups. The Data Availability Layer is a specialized blockchain, like Celestia or Avail, whose primary function is to order and guarantee data. Fraud proofs or validity proofs (ZK-proofs) for rollups depend entirely on this data being available for verification. If the data is withheld, proofs cannot be generated, and the system's security fails.

For builders, the choice of data availability solution impacts security, cost, and decentralization. Using Ethereum for data (via blobs) offers the highest security but at a premium. A modular DA layer like Celestia can reduce data posting costs by over 99%. The trade-off involves assessing the trust assumptions of external DA layers versus the economic security of Ethereum. Code-wise, posting data involves sending transactions to a target chain; for example, an Optimism rollup contract might store a data root on Ethereum, while a rollup on Celestia would submit data to Celestia's namespace.

key-concepts
DATA AVAILABILITY

Key Concepts for DAS

Data Availability Sampling (DAS) is a cryptographic technique that allows light nodes to verify that transaction data is published without downloading it entirely. This is a core scaling innovation for Ethereum and other modular blockchains.

01

What is Data Availability?

Data availability is the guarantee that all data for a block (transactions, state diffs) is published and accessible to the network. Without it, validators could hide malicious transactions, making fraud proofs impossible. In a rollup-centric ecosystem, ensuring the data for L2 batches is available on L1 is the primary challenge DAS solves.

02

How Sampling Works

Instead of downloading a full 2 MB block, a light client performs Data Availability Sampling (DAS) by:

  • Requesting multiple small, random chunks (data blobs) of the block.
  • Using erasure coding (like Reed-Solomon) to redundantly encode the data.
  • Statistically verifying that enough chunks are available to reconstruct the whole. A client sampling 30 random chunks can achieve 99.9% certainty that all data is available.
03

Erasure Coding (Data Reed-Solomon)

Erasure coding is the pre-processing step that makes sampling possible. The original data is expanded into, for example, 2x the number of chunks. The key property is that any 50% of the extended chunks can reconstruct the original 100%. This creates redundancy, allowing samplers to detect if even a single chunk is withheld with high probability.

04

KZG Commitments & Proofs

To trustlessly verify that sampled data is correct, DAS uses KZG polynomial commitments. A prover commits to the data polynomial. For each sampled chunk, they provide a tiny KZG proof that the chunk is consistent with the overall commitment. This allows verifiers to check data correctness without knowing the full dataset, a requirement for stateless validation.

06

The Data Availability Problem

The core problem DAS addresses: in a modular blockchain, who guarantees the sequencer's batch data is available for verification? Solutions exist on a spectrum:

  • Ethereum Consensus (Rollups): Uses L1 for DA.
  • EigenDA & Celestia: Provide external, scalable DA layers.
  • Validiums: Use off-chain DA committees for higher throughput but different security assumptions. Choosing a DA layer is a fundamental security trade-off.
how-das-works
BLOCKCHAIN SCALING

How Data Availability Sampling Works Step-by-Step

Data Availability Sampling (DAS) is a cryptographic technique that allows light nodes to verify that block data is available without downloading it entirely. This guide explains the step-by-step process.

Data Availability Sampling (DAS) solves a core scaling problem: how can nodes with limited resources trust that all data for a new block is published and accessible? In traditional blockchains, full nodes download everything, creating a bottleneck. DAS, a key component of data availability layers like Celestia and Ethereum's danksharding roadmap, enables light clients to perform probabilistic checks. Instead of 100% verification, they sample small, random pieces of the data. If the data is available, samples will succeed; if not, they will fail, signaling a problem.

The process relies on erasure coding and commitments. First, a block's data is expanded using an erasure code like Reed-Solomon. This creates redundant chunks, so the original data can be recovered even if up to 50% is missing. A commitment to this extended data, typically a Merkle root or a KZG polynomial commitment, is then published to the blockchain. This commitment acts as a cryptographic promise of what the full data should be. Light nodes only have this commitment, not the data itself.

A light node begins sampling by randomly selecting multiple indices (e.g., 30 random numbers). For each index, it asks the network for the corresponding data chunk and a proof that the chunk is correct relative to the published commitment. This is a Merkle proof or a KZG evaluation proof. The node downloads these small samples—perhaps a few kilobytes each—instead of megabytes of full block data. It then cryptographically verifies each proof against the known commitment.

If all sampled chunks are returned and their proofs are valid, the node gains high statistical confidence that the entire data is available. The probability of missing data evading detection drops exponentially with more samples. For instance, if 25% of data is withheld, 30 random samples have a (0.75)^30 chance of missing it—a probability less than 0.02%. If a node cannot retrieve a valid sample, it concludes data is unavailable and rejects the block, preventing acceptance of a fraudulent state transition.

In practice, networks coordinate sampling to increase efficiency. Protocols like Celestia use a 2D Reed-Solomon encoding scheme arranged in a matrix, allowing samples to be taken from both rows and columns. Projects like EigenDA implement DAS within a restaking security model. The end result is that light nodes can securely follow the chain with minimal resources, enabling truly scalable, secure blockchain architectures where data availability is guaranteed without full data download.

COMPARISON

DAS Protocol Implementations

A technical comparison of major data availability sampling implementations, their architectures, and key performance characteristics.

Feature / MetricCelestiaEigenDAAvail

Core Architecture

Modular DA Layer

Restaking-based AVS

Modular DA Layer

Data Availability Sampling (DAS)

Data Availability Proofs (KZG)

Blob Transaction Support

Throughput (MB/s)

~150

~10

~70

Cost per MB (approx.)

$0.003

$0.001

$0.002

Finality Time

~12 sec

~10 min

~20 sec

Native Light Client Support

Primary Consensus

Tendermint

Ethereum

Nominated Proof-of-Stake

code-examples
DATA AVAILABILITY SAMPLING

Code Examples and Interactions

Practical resources for developers to implement, test, and understand Data Availability Sampling (DAS) through code, simulations, and interactive tools.

security-model
DATA AVAILABILITY SAMPLING

Security Assumptions and Guarantees

Data Availability Sampling (DAS) is a cryptographic technique that allows light nodes to probabilistically verify that all data for a block is published, without downloading it entirely. This is the security foundation for scaling solutions like danksharding and modular blockchains.

At its core, Data Availability Sampling solves the data availability problem: how can a node be sure that all the data for a newly proposed block is actually published to the network? Malicious block producers could withhold transaction data, making it impossible for honest validators to reconstruct the block and detect invalid transactions. DAS enables light clients to perform multiple random checks on small portions of the data. If all sampled chunks are available, the client can be statistically confident the entire data is available. This shifts the security model from "trust a majority" to "verify probabilistically."

The process relies on erasure coding and commitments. First, the block data is expanded using an erasure code (like Reed-Solomon), creating redundant chunks. A commitment to this extended data, typically a KZG polynomial commitment or a Merkle root, is published in the block header. Light clients then randomly select a few indices and request the data chunks at those positions from the network. By checking these chunks against the commitment, they can detect if data is missing. The probability of missing withheld data decreases exponentially with the number of samples.

The primary security guarantee of DAS is liveness. It ensures that honest validators can always reconstruct the full block data, as long as a sufficient number of samples pass. A key assumption is that there is at least one honest full node in the sampling network that stores and serves the complete data. The system is designed so that if even 1% of the data is withheld, light clients will detect it with near-certainty after 30-40 random samples. This creates a strong disincentive for data withholding attacks.

In practice, implementations like the one in Ethereum's danksharding (EIP-4844 and beyond) use a 2D KZG commitment scheme. Data is arranged in a matrix, and commitments are made to each row and column. This allows for more efficient sampling and recovery. Light clients might sample a handful of points, while full nodes reconstruct the data if needed. The security parameters—such as the coding rate and number of required samples—are tuned to achieve a target security level, like 1-in-a-trillion chance of failure.

Understanding DAS is crucial for evaluating modular blockchain stacks. A rollup's security ultimately depends on the data availability layer it uses. If that layer uses DAS correctly, the rollup inherits strong liveness guarantees. If the DA layer is compromised, the rollup can become stuck. Therefore, when using a rollup or a validity-proof system, you must audit its dependency on the underlying DA layer's sampling assumptions and the honesty of its data availability committees or full nodes.

DATA AVAILABILITY SAMPLING

Frequently Asked Questions

Common questions from developers and researchers about Data Availability Sampling (DAS), its implementation, and its role in scaling blockchains.

Data Availability Sampling (DAS) is a cryptographic technique that allows light nodes to verify that all data for a block is published and available without downloading the entire block. It solves the data availability problem, a key challenge for scaling blockchains with large blocks (e.g., in rollups or sharding).

Without DAS, a malicious block producer could withhold transaction data, making it impossible for others to verify or reconstruct the block's state, leading to consensus failures. DAS enables nodes to perform multiple random checks on small pieces (data blobs) of the block. If enough samples pass, they can be statistically confident the entire data is available, allowing for secure scaling beyond the storage and bandwidth limits of individual nodes.

conclusion
KEY TAKEAWAYS

Conclusion and Next Steps

Data Availability Sampling (DAS) is a foundational scaling technology that enables secure, high-throughput blockchains by allowing light nodes to probabilistically verify data availability without downloading entire blocks.

The core principle of DAS is that a network of light clients can collectively guarantee that all block data is published and available by each sampling a small, random subset. This is made possible by the use of erasure coding and 2D Reed-Solomon encoding, which spreads data redundancy across a matrix. If a block producer withholds even a small fraction of the data, the probability of a single random sample hitting a missing piece is high. As hundreds of nodes perform independent sampling, the chance of a data withholding attack going undetected becomes astronomically low. This probabilistic security model is the breakthrough that separates DAS from previous data availability solutions.

For developers and researchers, the next step is to understand the practical implementations. Ethereum's Proto-Danksharding (EIP-4844) introduced blob-carrying transactions as a precursor to full DAS, increasing data capacity for L2 rollups. The full Danksharding roadmap will implement a peer-to-peer sampling network where validators and light clients request random chunks of data from the Distributed Validator Technology (DVT) network. To experiment, you can interact with the beacon-APIs for blob data or run a light client implementation like Helios or Nimbus in light client mode to observe sampling mechanics in a testnet environment.

To deepen your expertise, engage with the following resources and communities. Study the foundational paper, Fraud and Data Availability Proofs: Detecting Invalid Blocks in Light Clients. Follow the implementation progress in the Ethereum Consensus Layer (CL) specifications on GitHub. For hands-on learning, set up a local testnet using Kurtosis or the Ethereum Foundation's Holesky testnet to test blob transactions. Contributing to client diversity by running a minority client or participating in Ethereum R&D Discord channels are concrete ways to support the ecosystem's move towards a scalable, secure, and decentralized future powered by Data Availability Sampling.