How to Use Data Availability Sampling: A Developer Tutorial

introduction

SCALABILITY PRIMER

Introduction to Data Availability Sampling

Data Availability Sampling (DAS) is a cryptographic technique that allows light nodes to securely verify that all data for a block is available without downloading it entirely, a core innovation for scaling blockchains.

In blockchain scaling, a fundamental problem is data availability: how can network participants be sure that all the data for a new block has been published? Full nodes download everything, but this becomes impractical at scale. Light clients, which only download block headers, cannot perform this check, making them vulnerable to data withholding attacks. In such an attack, a malicious block producer could publish a block header but withhold some transaction data, potentially hiding invalid transactions that would be rejected if seen. DAS solves this by allowing light nodes to probabilistically verify data availability with minimal resource requirements.

The core mechanism relies on erasure coding and random sampling. First, the block data is expanded using an erasure code (like Reed-Solomon), creating redundant data chunks. This encoding ensures the original data can be reconstructed even if a significant portion (e.g., 50%) of the chunks are missing. Light nodes then perform DAS by randomly selecting a small, fixed number of these chunks (e.g., 30 samples) and requesting them from the network. If the data is fully available, all samples will be retrieved successfully. If a malicious actor is withholding data, there's a high statistical probability that a randomly selected sample will hit a missing chunk, revealing the fraud.

The security guarantee is probabilistic but can be made arbitrarily strong. The probability of a light node failing to detect missing data decreases exponentially with the number of samples it requests. For example, if 50% of data is withheld and a node makes 30 random queries, the chance it misses all withheld data is less than one in a billion. This allows a large network of light nodes to collectively provide a robust security guarantee equivalent to a full node, enabling secure sharding and high-throughput blockchains like Ethereum's danksharding roadmap. The process is trust-minimized and does not require additional consensus assumptions.

Implementing DAS requires specific peer-to-peer network protocols. Light nodes use a Distributed Hash Table (DHT) or a similar gossip network to discover and connect to peers storing the data. When sampling, they request specific chunks by their Merkle root or KZG polynomial commitment, which acts as a concise cryptographic fingerprint for the entire data. Projects like Celestia pioneered DAS for modular blockchains, while Ethereum's Proto-Danksharding (EIP-4844) introduced blob-carrying transactions with a similar data availability sampling goal for Layer 2 rollups. The efficiency of DAS is what makes storing data off-chain but verifying its availability on-chain a viable scaling strategy.

For developers, interacting with DAS often involves working with these new data structures. Instead of querying for full transactions, you request data by blob index or chunk coordinate. A simple conceptual check in pseudocode might look like:

python
# Pseudocode for a sampling check
for i in range(NUM_SAMPLES):
    chunk_id = random_sample(block_header.data_root)
    chunk = network.fetch_data_chunk(chunk_id)
    if chunk is None:
        # Trigger fraud proof or reject block
        raise DataUnavailableError
    if not verify_merkle_proof(chunk, block_header.data_root):
        raise InvalidDataError

This shift requires new client software and standards, but it unlocks orders-of-magnitude greater throughput while preserving decentralized security.

prerequisites

DATA AVAILABILITY SAMPLING

Prerequisites

Before implementing Data Availability Sampling (DAS), ensure you have a solid foundation in blockchain architecture and distributed systems.

Data Availability Sampling is a cryptographic technique that allows light nodes to verify the availability of large data blocks without downloading them entirely. It's a core component of Ethereum's scaling roadmap (Danksharding) and other modular blockchain designs like Celestia. To understand DAS, you need familiarity with erasure coding (specifically Reed-Solomon codes), Merkle proofs, and the concept of data availability versus data validity. A node's goal is to answer one question with high probability: "Is this block's data published and retrievable by the network?"

You should be comfortable with the client-server model in a P2P context. In DAS, a node acts as a sampling client, requesting small, random chunks of data from the network. Each successful sample acts as a statistical proof that the entire dataset is available. The security model assumes an honest majority of the network; if more than 50% of the data is available, random sampling will eventually detect missing data. Key parameters to understand are the sampling rate and the confidence level, which determine how many queries are needed to achieve a target security guarantee (e.g., 99.9% confidence).

For practical implementation, you'll need a development environment capable of handling cryptographic operations. Proficiency in a systems language like Go (used in Ethereum's Prysm and Lodestar clients) or Rust (used in Sigma Prime's Lighthouse) is essential. You should understand how to work with KZG polynomial commitments or Merkle trees for constructing proofs. Familiarity with libp2p or a similar networking stack is crucial for the peer discovery and data request/response protocol that underpins the sampling process.

Finally, set up a test environment. Use a local testnet from a client implementation (like a Danksharding devnet) or the Celestia network's testnet. This allows you to experiment with the sampling workflow: connecting to peers, requesting data chunks by their coordinates (row, column), and verifying the responses against a known commitment. Tools like Ethereum's Portal Network client or Celestia's celestia-node provide concrete codebases to study. Start by running a light node and instrumenting it to log its sampling attempts and results.

key-concepts-text

SCALING BLOCKCHAINS

How Data Availability Sampling Works

Data Availability Sampling (DAS) is a cryptographic technique that allows light nodes to securely verify that block data is available without downloading it entirely. This is a core innovation for scaling blockchains with data sharding, like Ethereum's danksharding roadmap.

In a traditional blockchain, a full node must download and store the entire block data to verify its validity and availability. This creates a scalability bottleneck, as block size is limited by the storage and bandwidth of the weakest node. Data Availability Sampling solves this by enabling light clients to perform probabilistic checks. Instead of downloading a 2 MB block, a light client might only need to download a few kilobytes of randomly sampled data to achieve high confidence that the full data exists and is retrievable.

The system relies on erasure coding the block data. The original data is expanded into a larger set of coded chunks, such that any sufficiently large subset of these chunks can reconstruct the original. A block producer commits to this extended data using a polynomial commitment scheme, like a KZG commitment. Light clients then randomly select a small number of these chunk indices and request proofs from the network that those specific pieces are part of the committed data. If a malicious block producer withholds even a small portion of the data, the probability of a sampling client detecting this absence increases exponentially with each sample.

The Sampling Process

A typical DAS workflow involves several steps. First, the block builder erasure codes the block data, creating 2n chunks from n original chunks. They then generate a KZG commitment to the entire extended data. Light clients query the network for, say, 30 randomly chosen chunks. For each query, a network node (often a dedicated sampling node) provides the chunk data and a Merkle proof linking it to the KZG commitment. If all samples are returned successfully, the client can be statistically certain (e.g., >99.9%) the full data is available. If any sample fails, the client rejects the block header.

Implementing DAS requires a robust peer-to-peer network for serving data chunks. Projects like Celestia have pioneered this architecture, where a network of full storage nodes holds all the data and light nodes perform sampling. The Ethereum ecosystem is developing this through the Ethereum PeerDAS testnet, where consensus nodes (validators) are also responsible for storing and serving data blobs to sampling clients. This design shifts the trust assumption from "all data is stored by everyone" to "enough data is stored somewhere on the network to satisfy all sampling requests."

For developers, interacting with DAS often means working with new RPC methods and client libraries. For instance, an Ethereum light client using DAS might call eth_getBlobSidecar with a specific index to sample a blob chunk. The response includes the data and a proof. Libraries like EIP-4844 clients handle the underlying cryptography for verifying KZG commitments and Merkle proofs. The core security guarantee is that it's computationally infeasible for an adversary to generate valid proofs for sampled chunks if the corresponding data is not actually stored and available on the network.

resource-links

PROTOCOL DESIGN

Core Protocols and Tools

Data Availability Sampling (DAS) is a core primitive for scaling modular blockchains and rollups. These cards explain how DAS works in practice, which protocols implement it, and how developers can verify data availability without downloading full blocks.

Data Availability Sampling Fundamentals

Data Availability Sampling (DAS) allows light clients to verify that block data is available without downloading the entire block. Instead, clients randomly sample small pieces of encoded data and use probabilistic guarantees.

Key mechanics:

Erasure coding expands the original block data (typically 2x) so that any sufficient subset can reconstruct the full block
Random sampling of encoded shares detects missing data with high probability
Data availability proofs are replaced by statistical guarantees rather than full downloads

If even a small fraction of encoded shares is unavailable, light clients will detect it with overwhelming probability after a few dozen samples. This is what makes DAS viable for high-throughput chains and rollups where full nodes cannot scale linearly with data throughput.

DAS shifts the trust model:

Full nodes ensure correctness and execution
Light nodes ensure availability through sampling

This separation is foundational for modular execution and settlement layers.

Celestia: DAS in Production

Celestia is the first production blockchain built around Data Availability Sampling. Developers post transaction data to Celestia, while execution happens in rollups.

How Celestia implements DAS:

Blocks are split into fixed-size data shares
Shares are extended using 2D Reed-Solomon erasure coding
Light clients sample shares across rows and columns

Practical takeaways for developers:

You can run a light node that verifies availability without storing full blocks
Rollups only need to verify that data was made available on Celestia, not re-execute transactions
Sampling parameters scale logarithmically with block size, enabling high throughput

Celestia’s approach demonstrates that DAS can operate on a live network with real economic security assumptions.

EXPLORE

Ethereum Danksharding and DAS

Ethereum’s scaling roadmap relies on Danksharding, which uses Data Availability Sampling to increase data throughput for rollups.

Key design points:

Proposer-Builder Separation (PBS) enables large data blobs without overloading proposers
KZG commitments allow light clients to verify sampled blob data efficiently
Blobs are data-only objects used by rollups without execution on L1

Under Danksharding:

Light clients sample blob data instead of downloading all blob contents
Validators are not required to store blobs long-term
Rollups get orders-of-magnitude more DA capacity without increasing execution costs

For rollup developers, this means:

Posting calldata is replaced by posting blob data
Data availability guarantees come from sampling, not full verification
DA bandwidth scales independently from execution complexity

EXPLORE

EigenDA: Restaked Data Availability

EigenDA provides Data Availability as a service using restaked Ethereum validators via EigenLayer.

Architecture overview:

Operators store and serve encoded data chunks
Data is erasure-coded before distribution
Security is backed by restaked ETH and slashing conditions

How DAS applies:

Clients sample data chunks across operators
Unavailable data can be statistically detected
Misbehaving operators face economic penalties

When to consider EigenDA:

You want Ethereum-aligned security without building a new DA chain
You are deploying an Ethereum rollup that needs higher data throughput
You prefer a service-based DA model over sovereign DA layers

EigenDA shows how DAS can be modularized as a reusable infrastructure component.

EXPLORE

Implementing DAS in Your Own Protocol

If you are designing a custom DA layer or research prototype, implementing Data Availability Sampling requires careful parameter selection.

Core implementation steps:

Choose an erasure coding scheme (Reed-Solomon is standard)
Define share size and block matrix dimensions
Set sampling thresholds for light clients

Design trade-offs:

Larger blocks require more samples but improve throughput
More aggressive sampling increases detection probability but raises client costs
Network assumptions matter, especially under adversarial conditions

DAS implementations must be paired with:

Peer discovery mechanisms for share retrieval
Gossip protocols optimized for data chunks
Clear economic or cryptographic incentives to serve data

This approach is best suited for research chains or modular stacks experimenting beyond Ethereum-aligned designs.

ARCHITECTURE OVERVIEW

DAS Protocol Comparison

Key technical and economic differences between leading Data Availability Sampling implementations.

Feature	Celestia	EigenDA	Avail
Core Architecture	Modular DA Layer	Restaking-based AVS	Modular DA & Consensus
Data Encoding	2D Reed-Solomon	KZG Commitments	2D Reed-Solomon
Sampling Security	Light Client Sampling	Proof of Custody	KZG + Validity Proofs
Throughput (MB/s)	~100	~10	~70
Finality Time	< 1 min	~10 min	< 20 sec
Cost per MB	$0.10-0.50	$0.01-0.10	$0.05-0.30
Native Token Required
EVM Compatibility

implement-celestia-das

TUTORIAL

Implementing DAS with Celestia-Node

A practical guide to using Celestia's Data Availability Sampling (DAS) client to verify data availability on a modular blockchain network.

Data Availability Sampling (DAS) is the core innovation that allows Celestia to scale data availability securely. Instead of downloading an entire block to verify its data is available, a light client running celestia-node downloads only a small, random subset of encoded data chunks. By performing this sampling repeatedly across many blocks, the client gains cryptographic confidence that the entire block data is available, without the resource cost of full node validation. This enables trust-minimized bridging and execution layer verification.

To begin, you need to run a Celestia light node. After installing celestia-node, initialize it for your desired network (e.g., Mocha testnet) and start it with the light flag: celestia light start. Once synchronized, your node connects to the Celestia P2P network and begins its sampling duties automatically. The node uses the Quadratic Reed-Solomon encoding scheme applied to block data, creating a 2D matrix of data and parity shares that light nodes sample from.

The sampling process is continuous. For each new block header it receives, your light node randomly selects a set of coordinates within the data square and requests those specific shares from full nodes on the network. If a requested share is unavailable, the node detects a potential data withholding attack. After successfully sampling a configured number of shares (e.g., 20), the node considers the block data available. This result is logged locally and can be queried via the node's RPC endpoints.

You can interact with the DAS process programmatically. Use the node's gRPC or REST interface (typically on port 26658) to query the sampling status. A key method is das.SamplingStats, which returns metrics like total samples taken and successful samples. For developers building a sovereign rollup or bridge, you can subscribe to the node's header subscription and trigger your application logic only after DAS confirms data availability for a given block height, ensuring your system acts on provably available data.

For advanced implementation, consider the sampling parameters. You can adjust the das.sample-time and concurrency settings to balance between verification speed and network load. The security model relies on a sufficient number of light nodes performing independent sampling. In practice, running your own light node provides the strongest guarantee, as you are not relying on a third party's sampling results. This self-verification is crucial for trust assumptions in modular architectures.

The primary output of a DAS light node is local confidence. Unlike a full node that stores all data, the light node's role is to produce a verifiable claim: "with high probability, the data for block X is available." This claim can be used by rollup nodes to proceed with state execution or by bridges to release locked funds. By implementing DAS, you directly contribute to and leverage the security of the Celestia network, enabling scalable blockchain infrastructure without compromising on decentralized verification.

implement-eigenda

TUTORIAL

Using EigenDA for Data Availability

A practical guide to implementing Data Availability Sampling (DAS) with EigenDA, the modular data availability layer for Ethereum L2s.

Data Availability Sampling (DAS) is the core mechanism that allows light nodes to securely verify that data is available without downloading an entire block. In EigenDA, this is achieved by having nodes randomly sample small pieces of encoded data, known as KZG commitments and blobs. If enough samples are collected successfully, the node can be statistically confident the full data is available. This is critical for fraud proofs in optimistic rollups and validity proofs in ZK-rollups, as the proof cannot be constructed without the underlying transaction data.

To use EigenDA, developers integrate its EigenDA smart contracts and Node API. The primary contract is the BLSApkRegistry for operator management and the EigenDAServiceManager for posting data. The workflow begins by encoding your data (e.g., batch of L2 transactions) into a blob. You then call postDataAvailability(...) on the service manager, which returns a Data Availability Certificate (DACert). This certificate, containing the KZG commitment and proof, is your proof that the data was accepted and will be made available by the EigenDA network.

Here is a simplified TypeScript example using the EigenDA SDK to post data:

typescript
import { EigenDaClient } from '@eigenlayer/da-client';
const client = new EigenDaClient('https://api.eigenda.xyz');
const data = Buffer.from('your raw batch data');
const { commitment, proof, dataPointer } = await client.postData(data);
// Store `commitment` & `proof` on-chain as your DACert

After posting, Dispersers (EigenDA nodes) erasure-code the data and distribute the chunks to Attesters. Your rollup contract should store the KZG commitment, which light clients will use to perform sampling queries against the network.

For verification, a light client or a rollup verifier contract needs to validate data availability. It does this by querying multiple EigenDA nodes for random samples of the data referenced by the KZG commitment. Using the EigenDA Light Client Library, you can generate random sampling indices and request the corresponding data chunks and proofs. The library verifies each sample against the on-chain commitment. If a configurable threshold of samples (e.g., 30 out of 50) is valid, availability is confirmed. This process ensures security without the overhead of full data download.

Key operational considerations include cost, which is paid in ETH for the EigenDA service, and latency for data finality. You must also monitor the churn and stake of the EigenDA operator set, as security relies on a decentralized, honest majority. For production rollups, implement a fallback mechanism, such as the ability to fall back to posting calldata on Ethereum L1, in case the EigenDA network experiences downtime. Always refer to the latest EigenDA documentation for contract addresses and API updates.

DATA AVAILABILITY SAMPLING

Common Implementation Mistakes

Data Availability Sampling (DAS) is a core scaling technology for modular blockchains, but its implementation is nuanced. Developers often encounter specific pitfalls that can compromise security, performance, or correctness. This guide addresses the most frequent mistakes and their solutions.

A failure to reconstruct a block from sampled data typically indicates a data availability (DA) failure, meaning the full data is not actually available. This is the security guarantee of DAS in action. However, implementation bugs can cause false positives.

Common causes:

Insufficient samples: Not sampling enough unique 2D Reed-Solomon (RS) encoded shares. For a k-of-n erasure coding scheme, you need at least k unique, valid shares. Failing to track sampled indices can lead to redundant queries.
Faulty sampling logic: Sampling from an incorrect KZG commitment root or a mismatched extension field. Ensure your client uses the commitment from the block header and the correct field parameters (e.g., BLS12-381 for Ethereum).
Network layer issues: Not properly handling timeouts or peer disconnections during sampling. Implement retry logic with different peers.

Debugging steps:

Log the count and indices of successfully fetched shares.
Verify the Merkle proofs for each share against the published DA root.
Confirm the reconstruction algorithm (e.g., Lagrange interpolation) is correctly implemented for the 2D polynomial.

DATA AVAILABILITY SAMPLING

Frequently Asked Questions

Common technical questions and troubleshooting for developers implementing or interacting with Data Availability Sampling (DAS) protocols.

Data Availability Sampling (DAS) is a cryptographic technique that allows light nodes to probabilistically verify that all data for a block is published and available, without downloading the entire dataset. It's a core component of scaling solutions like Ethereum's danksharding and Celestia.

Here's the core workflow:

The block producer erasure codes the block data, expanding it (e.g., from 1 MB to 2 MB).
The data is split into fixed-size data blobs and arranged in a matrix.
Each blob is committed to via a KZG polynomial commitment or a Merkle root.
A light node randomly selects a small number of these blobs (e.g., 30 out of thousands) and requests them from the network.
If the node can successfully retrieve all sampled blobs, it can be statistically confident (e.g., >99.9%) that the entire data is available. If a sample is missing, the node raises an alarm, signaling a potential data withholding attack.

conclusion

IMPLEMENTATION GUIDE

Conclusion and Next Steps

Data Availability Sampling (DAS) is a foundational scaling technology for modular blockchains. This guide has covered its core concepts, mechanics, and practical integration steps.

Successfully implementing DAS requires a methodical approach. Start by selecting a compatible data availability layer like Celestia, EigenDA, or Avail. Each has distinct trade-offs in throughput, cost, and security assumptions. Your choice will dictate the client library you integrate, such as celestia-node or the EigenDA SDK. The core workflow involves your rollup or application submitting transaction data blobs to the DA layer, receiving a commitment (like a KZG polynomial commitment or Merkle root), and then having light clients or validators perform sampling to verify data availability.

For developers, the next step is to experiment in a testnet environment. Deploy a simple rollup smart contract on a testnet (e.g., Sepolia) and configure it to post its transaction data to a DA layer testnet. Use the layer's API to post data and retrieve proofs. A critical implementation detail is setting the correct blob size and sampling parameters; for example, ensuring you generate enough 2D Reed-Solomon erasure codes so that light clients can achieve high security with a manageable number of samples (e.g., 30 samples for 99.99% confidence).

To deepen your understanding, explore the following resources: the original Celestia research paper on DAS, the Ethereum Portal Network specs for light client communication, and the KZG ceremony documentation. Participating in these ecosystems' developer forums and Discord channels is invaluable for troubleshooting and staying updated on protocol changes.

The landscape of data availability is rapidly evolving. Keep an eye on emerging techniques like Proof of Sampling and Volition models, which allow applications to choose between on-chain and off-chain DA. As a builder, your mastery of DAS enables you to design scalable, secure, and cost-effective decentralized applications, moving beyond the limitations of monolithic blockchain architectures.