What is an IPFS CID? Content Identifier Explained

definition

BLOCKCHAIN GLOSSARY

What is InterPlanetary File System (IPFS) CID?

A technical definition of the unique identifier for content on the decentralized IPFS network.

An InterPlanetary File System (IPFS) Content Identifier (CID) is a self-describing cryptographic hash that uniquely and permanently identifies content stored on the IPFS distributed network. Unlike traditional location-based addresses (like URLs), which point to where data is stored, a CID is a content-addressed identifier derived from the data itself. This means that any piece of content—whether a document, image, or dataset—will always generate the same CID, allowing for secure, verifiable, and deduplicated storage across the peer-to-peer network.

The structure of a CID is highly versatile, encoding multiple pieces of information within its string representation. It specifies the cryptographic hash function used (e.g., SHA-256), the codec describing the data format (e.g., dag-pb for IPFS files, dag-cbor for structured data), and a multibase prefix indicating the encoding (like b for base32). This self-contained design, often seen as a string starting with bafy..., ensures that the CID can be interpreted correctly by any system without external context, a principle known as content addressing.

In blockchain and Web3 applications, CIDs are fundamental for linking to off-chain data in a trust-minimized way. A smart contract or an NFT's metadata often stores only a CID, which points to the actual JSON file or media asset stored on IPFS. This creates a permanent, immutable link; if the underlying data changes, its CID changes entirely, providing a clear audit trail. This mechanism is crucial for data integrity, persistence, and enabling decentralized applications (dApps) to reference data without relying on centralized servers.

From a technical perspective, generating a CID involves constructing a Merkle DAG (Directed Acyclic Graph) representation of the data and then hashing its root node. Large files are typically split into smaller blocks, each with its own hash, which are then linked together. This architecture enables efficient deduplication—if two users add the same file, IPFS stores it only once—and supports partial sharing, where peers can fetch different pieces of the same file from multiple sources simultaneously.

The evolution of the CID specification, notably from CIDv0 to the more flexible CIDv1, has increased its interoperability across different decentralized protocols. CIDs are not exclusive to IPFS; they are a core component of the broader IPLD (InterPlanetary Linked Data) stack and are used by systems like Filecoin for storage deals and libp2p for peer and content routing. This universality makes the CID a foundational primitive for the decentralized web.

how-it-works

CONTENT HASHING

How an IPFS CID Works

An IPFS Content Identifier (CID) is a unique cryptographic fingerprint for data, enabling content-addressed storage and retrieval on the decentralized web.

An InterPlanetary File System Content Identifier (IPFS CID) is a self-describing content-addressed identifier that uniquely and immutably represents any piece of data on the IPFS network. Unlike location-based addresses (URLs), which point to where data is stored, a CID is derived from the content itself using cryptographic hashing. This means that identical data will always produce the same CID, enabling deduplication and verifiable integrity. The CID contains a multihash, which specifies the cryptographic hash function used (e.g., SHA-256) and the hash digest, along with other metadata about the content's encoding format.

The structure of a CID is versioned and extensible. CIDv0 is the original format, recognizable as a Base58-encoded SHA-256 hash starting with 'Qm'. The more advanced CIDv1 supports multiple hash functions, codecs (like dag-pb for files or dag-cbor for structured data), and a flexible binary format that can be represented in various bases (Base32, Base58, etc.). This versioning allows the system to evolve while maintaining backward compatibility. The core innovation is that the CID itself tells the network everything needed to locate and verify the content, independent of any specific server or location.

When you add a file to IPFS, the protocol segments it into blocks, hashes each block, and links them in a Merkle Directed Acyclic Graph (DAG) structure. The root hash of this DAG becomes the file's CID. To retrieve the data, a node requests the content by its CID from the network. Other nodes that have pinned the data can provide the blocks. The requesting node verifies the integrity of each received block by recalculating its hash and ensuring it matches the hash referenced in the CID, guaranteeing the content is authentic and unaltered.

key-features

CONTENT IDENTIFIER

Key Features of an IPFS CID

An IPFS Content Identifier (CID) is a self-describing cryptographic hash that uniquely and permanently addresses content in a distributed system. Its structure encodes the hash, the hash function used, and the codec for interpreting the data.

01

Cryptographic Hash-Based

A CID is fundamentally a cryptographic hash (digest) of the content it represents. This creates a content-addressed system where the identifier is derived from the data itself, not its location. Key properties include:

Verifiability: Anyone can recompute the hash to verify the data's integrity.
Immutable Reference: If the data changes, the CID changes completely.
Deduplication: Identical data blocks generate the same CID, enabling efficient storage.

02

Self-Describing (Multiformat)

A CID is a self-describing format that embodes metadata about how to interpret the hash. It uses multiformat prefixes to declare:

Hash function: (e.g., sha2-256, sha3-512, blake3).
Codec: The format of the underlying data (e.g., raw, dag-pb for Protobuf, dag-cbor).
Version: CIDv0 or CIDv1. This allows systems to decode and verify the content without external context.

03

Versioning (CIDv0 vs. CIDv1)

CIDs have evolved to support greater flexibility.

CIDv0: The original version, starting with Qm.... It is a Base58-encoded multihash of a dag-pb node, with implicit assumptions about the hash function and codec.
CIDv1: The current standard, featuring an explicit version byte and multiformat prefixes for the codec and multihash. It supports multiple bases (e.g., Base32, bafy...) and is more future-proof. CIDv1 is the recommended format for new systems.

04

Human-Readable Encoding

The binary CID is encoded into a string for practical use. The encoding is specified by a multibase prefix.

Common Encodings:
- base58btc (CIDv0: Qm...)
- base32 (CIDv1: bafy...) – the default for IPFS.
The multibase prefix (like b for base32) allows the string itself to declare its encoding, making it portable. The bafy... format is designed to be case-insensitive and URL-safe.

05

Persistence & Decentralization

A CID provides a persistent link to content that is independent of any single server or location. This enables decentralized storage and retrieval.

Location-Independent: The CID remains valid regardless of where the data is stored (IPFS node, Filecoin, Arweave, a local disk).
Pinning: To ensure data persists on the IPFS network, nodes must pin the CID, preventing garbage collection.
The CID is the key for locating data via Distributed Hash Tables (DHTs).

06

Related: Content Addressing vs. Location Addressing

CIDs exemplify content addressing, a paradigm shift from the web's dominant location addressing (URLs).

Location Addressing (URL): https://server.com/file.pdf – points to a location. The content at that location can change (link rot) or be moved (404 error).
Content Addressing (CID): bafybeig... – points to specific content. Anyone holding the data can serve it, guaranteeing the requested content is received. This is foundational for verifiable web and data integrity.

cid-version-comparison

IPFS CONTENT IDENTIFIERS

CIDv0 vs. CIDv1

A technical comparison of the two primary versions of Content Identifiers (CIDs) used in the InterPlanetary File System (IPFS) and related decentralized protocols.

CIDv0 and CIDv1 are distinct versions of Content Identifiers (CIDs), the core addressing system for content in IPFS. A CID is a self-describing content-addressed identifier, but the two versions differ in their encoding format, flexibility, and future-proofing. CIDv0 is the original, legacy format, recognizable as a Base58-encoded hash starting with Qm. CIDv1 is the modern, extensible standard that supports multiple bases, hash functions, and codecs through a structured binary format defined in the multiformats project.

The primary technical distinction lies in their structure. A CIDv0 is essentially a multihash (a self-describing hash) encoded in Base58. It is inflexible, implicitly using the SHA-256 hash function and the DagProtobuf codec. In contrast, a CIDv1 is a fully self-describing binary structure with explicit version prefix, codec identifier, multihash, and optional metadata. This structure allows CIDv1 to support a wide range of content types (e.g., raw, dag-cbor, dag-json) and cryptographic hash functions (e.g., sha2-256, sha3-512, blake3) beyond the limitations of v0.

For compatibility, CIDv1 can be converted to a CIDv0-equivalent string only under specific conditions: the multihash must be SHA-256, the codec must be DagProtobuf, and it must be encoded in Base58. However, the reverse is not true; a CIDv1 representation of any content will always be different from its CIDv0. Modern IPFS tooling and APIs primarily generate CIDv1, though they maintain backward compatibility by accepting and resolving CIDv0. The shift to v1 is critical for the ecosystem's evolution, enabling support for new data structures and ensuring interoperability across different decentralized networks like IPFS, Filecoin, and IPLD.

ecosystem-usage

CONTENT ADDRESSING

Ecosystem Usage: Where CIDs Are Used

The Content Identifier (CID) is the fundamental unit of data addressing in decentralized systems, enabling verifiable, location-independent data retrieval across a wide range of protocols and applications.

01

Decentralized Storage (IPFS & Filecoin)

CIDs are the native addressing system for the InterPlanetary File System (IPFS) and Filecoin. They serve as immutable pointers to content stored across a peer-to-peer network, enabling data to be retrieved from any node that has a copy, not just a single server. This provides content-addressed storage, where the data's integrity is cryptographically guaranteed by its CID.

IPFS: Uses CIDs for retrieving files and website assets.
Filecoin: Uses CIDs as the key identifier for data stored in long-term, incentivized storage deals.

EXPLORE

02

Blockchain Data & NFTs

Blockchains use CIDs to reference off-chain data in a secure, verifiable manner. Storing large files like images or metadata directly on-chain is prohibitively expensive. Instead, applications store the CID on-chain, which acts as a cryptographic proof linking to the data stored elsewhere (e.g., on IPFS).

NFT Metadata: The tokenURI for an NFT (ERC-721/ERC-1155) typically points to a JSON metadata file via its CID.
Smart Contract Data: CIDs can reference datasets, documentation, or legal agreements associated with a contract.

EXPLORE

03

Decentralized Web (dWeb) & Hosting

CIDs are the backbone of the decentralized web, enabling websites and applications that are resistant to censorship and single points of failure. Projects like the IPFS Public Gateway and Fleek use CIDs to host static sites.

dWeb Hosting: A website's entire bundle of HTML, CSS, and JS files is given a CID and pinned to the IPFS network.
Access: Users can access the site via an IPFS gateway (using the CID) or directly through an IPFS-enabled browser.

EXPLORE

04

Data Provenance & Scientific Datasets

In data-intensive fields, CIDs provide immutable data provenance. Any modification to a dataset generates a new CID, creating a verifiable audit trail. This is critical for reproducibility in scientific research, machine learning model versioning, and regulatory compliance.

Research Data: Datasets published with a CID ensure the exact version used in a paper can be retrieved.
Supply Chain: CIDs can timestamp and verify the integrity of documents like certificates of origin or inspection reports.

05

Decentralized Applications (dApps)

dApps leverage CIDs to manage user-generated content, application state, and media in a decentralized manner. This moves data ownership away from centralized platform servers.

Social Media: Posts, profiles, and media files can be stored with CIDs, allowing users to port their data.
Gaming: In-game assets, maps, or player histories can be referenced via CIDs on-chain or on IPFS.
Tools: Services like Pinata and web3.storage provide infrastructure for dApp developers to pin and retrieve data using CIDs.

EXPLORE

06

Interoperable Protocol Bridges

CIDs act as a universal content addressing format that bridges different decentralized protocols. The same CID can be used to locate data across multiple networks and layers.

IPFS ↔ Filecoin: A CID can point to data retrievable from the IPFS network or from a Filecoin storage provider.
Layer 2 & Sidechains: Data commitments posted from Layer 2 solutions (e.g., rollups) to Ethereum Layer 1 often use CIDs to reference transaction batches.
Other Networks: Protocols like Arweave (using its own ar:// scheme) and Sia can be integrated using CID-based gateways.

nft-metadata-application

TECHNICAL FOUNDATION

CIDs in NFT Metadata and Provenance

An explanation of how Content Identifiers (CIDs) from the InterPlanetary File System (IPFS) provide the immutable, verifiable backbone for NFT data and ownership history.

An InterPlanetary File System (IPFS) Content Identifier (CID) is a self-describing cryptographic hash that uniquely and permanently identifies a piece of content stored on the decentralized IPFS network, serving as the foundational data pointer for Non-Fungible Token (NFT) metadata and digital assets. Unlike a traditional URL that points to a location, a CID points to content itself, ensuring that the data referenced by an NFT—be it a JPEG, animation, or a JSON metadata file—is immutable and verifiable. This shift from location-based to content-based addressing is what underpins the concept of true digital ownership and provenance in Web3.

In a typical NFT's architecture, the on-chain token contract contains a tokenURI that resolves to an IPFS CID, not a centralized web server address. This CID points to a JSON metadata file that itself contains descriptive attributes and, crucially, another IPFS CID linking to the actual image or media file. This creates a verifiable chain of hashes: the on-chain token points to a metadata CID, which in turn points to an asset CID. Any alteration to the underlying image or metadata would generate a completely different CID, breaking the link and proving the data has been tampered with, thus guaranteeing the NFT's provenance and authenticity.

The robustness of this system relies on content addressing and decentralized storage. When you "pin" a file to IPFS, it is split into chunks, hashed, and given a CID. The network's participants store and serve these chunks. As long as at least one node on the IPFS network hosts the data associated with a CID, it remains accessible. This makes NFTs more resilient than those relying on traditional web hosting, which are vulnerable to link rot if a company's server goes offline. Services like Filecoin and Pinata provide incentivized, persistent storage layers to ensure CIDs remain retrievable long-term.

For developers and collectors, verifying an NFT's integrity involves checking the CIDs in its metadata. Tools can fetch the metadata from the tokenURI, hash the linked image file, and compare it to the CID stored in the metadata. A match confirms the asset is authentic. This process highlights a critical best practice: NFT creators should "hash the image first"—generating the asset's CID before creating the metadata file that references it—to ensure the entire data chain is immutable from the point of minting. This prevents mismatches and preserves the intended artwork for the token's lifetime.

IPFS CID

Technical Details & Structure

An InterPlanetary File System (IPFS) Content Identifier (CID) is a self-describing, cryptographically verifiable label for content, forming the backbone of decentralized data addressing.

An IPFS Content Identifier (CID) is a self-describing content-addressed identifier that uniquely and verifiably points to a piece of data on the IPFS network. It works by applying a cryptographic hash function (like SHA-256) to the content itself, generating a unique fingerprint. This fingerprint, combined with metadata about the hash function and encoding format, forms the CID. The system ensures that the same content always produces the same CID, and any alteration to the content results in a completely different identifier, guaranteeing data integrity and enabling decentralized, trustless retrieval.

IPFS

Common Misconceptions About CIDs

Content Identifiers (CIDs) are fundamental to decentralized storage, but their behavior is often misunderstood. This section clarifies the most frequent points of confusion regarding IPFS CIDs.

No, a CID is a cryptographic hash identifier, not a location pointer. A Content Identifier (CID) is a self-describing content-addressed identifier derived from the cryptographic hash of the data itself. Unlike a traditional URL (e.g., https://example.com/file.pdf) which points to a specific server location, a CID (e.g., bafybeigdyr...) points to the content, regardless of where it is stored. The IPFS network locates the data by finding peers who have announced they possess the content matching that specific hash. This means the same CID will always refer to the exact same immutable data, providing verifiable integrity.

IPFS CID

Frequently Asked Questions (FAQ)

A Content Identifier (CID) is the foundational addressing system of the InterPlanetary File System (IPFS). These questions cover its core mechanics, versions, and practical use cases.

An IPFS CID (Content Identifier) is a self-describing cryptographic hash that uniquely and permanently identifies content on the IPFS network. It works by applying a cryptographic hash function (like SHA-256) to the content itself, creating a unique fingerprint. This process, known as content addressing, means the CID is derived from the data's content, not its location. The CID contains a multihash, which specifies the hash function used and the hash digest, along with other metadata like the codec (e.g., dag-pb for files, dag-cbor for data) and a version number (CIDv0 or CIDv1). When you request content by its CID, the IPFS network locates nodes storing that specific hash, enabling decentralized, verifiable retrieval.

InterPlanetary File System (IPFS) CID

What is InterPlanetary File System (IPFS) CID?

How an IPFS CID Works

Key Features of an IPFS CID

Cryptographic Hash-Based

Self-Describing (Multiformat)

Versioning (CIDv0 vs. CIDv1)

Human-Readable Encoding

Persistence & Decentralization

Related: Content Addressing vs. Location Addressing

CIDv0 vs. CIDv1

Ecosystem Usage: Where CIDs Are Used

Decentralized Storage (IPFS & Filecoin)

Blockchain Data & NFTs

Decentralized Web (dWeb) & Hosting

Data Provenance & Scientific Datasets

Decentralized Applications (dApps)

Interoperable Protocol Bridges

CIDs in NFT Metadata and Provenance

IPFS Gateway

Technical Details & Structure

Common Misconceptions About CIDs

Frequently Asked Questions (FAQ)

Get a free quote.

Get In Touch
today.

InterPlanetary File System (IPFS) CID

What is InterPlanetary File System (IPFS) CID?

How an IPFS CID Works

Key Features of an IPFS CID

Cryptographic Hash-Based

Self-Describing (Multiformat)

Versioning (CIDv0 vs. CIDv1)

Human-Readable Encoding

Persistence & Decentralization

Related: Content Addressing vs. Location Addressing

CIDv0 vs. CIDv1

Ecosystem Usage: Where CIDs Are Used

Decentralized Storage (IPFS & Filecoin)

Blockchain Data & NFTs

Decentralized Web (dWeb) & Hosting

Data Provenance & Scientific Datasets

Decentralized Applications (dApps)

Interoperable Protocol Bridges

CIDs in NFT Metadata and Provenance

Related Terms & Concepts

Content Addressing

Multihash

Multicodec

Multibase

IPFS Gateway

IPNS (InterPlanetary Name System)

Technical Details & Structure

Common Misconceptions About CIDs

Frequently Asked Questions (FAQ)

Get In Touch today.

Get In Touch
today.