An IPFS Content Identifier (CID) is a self-describing content-addressed identifier that uniquely and permanently points to a piece of data on the InterPlanetary File System. Unlike location-based addressing (e.g., a URL), which tells you where to find data, a CID is derived from the content's cryptographic hash, telling you what the data is. This creates immutable links; if the content changes, its CID changes completely, guaranteeing data integrity and enabling verifiable, permanent references.
IPFS CID (Content Identifier)
What is IPFS CID (Content Identifier)?
An IPFS CID is the unique cryptographic fingerprint for any piece of content stored on the InterPlanetary File System.
A CID is a multiformat string, typically starting with Qm for CIDv0 or b for CIDv1, which encodes two critical pieces of information: the cryptographic hash of the content (e.g., SHA2-256) and the codec, which specifies how to interpret the underlying data (e.g., dag-pb for IPFS data structures or raw for raw bytes). This self-describing nature allows systems to understand how to process the data without external context. CIDs are the foundational building block for content-addressed storage and verifiable data structures across decentralized networks.
In practice, developers interact with CIDs when pinning files, retrieving data via ipfs get <CID>, or referencing assets in smart contracts and NFTs. For example, an NFT's metadata JSON file is often stored on IPFS, and its CID is recorded on-chain, creating a permanent, tamper-proof link to the asset's data. This architecture decouples data storage from specific servers, enabling persistent, decentralized access as long as at least one node on the IPFS network hosts the content referenced by the CID.
How an IPFS CID Works
An IPFS CID (Content Identifier) is a self-describing cryptographic hash that uniquely and permanently addresses content on the InterPlanetary File System.
An IPFS CID (Content Identifier) is a unique label derived from the cryptographic hash of the content itself, ensuring that identical data produces the same CID anywhere in the network. This process, known as content addressing, contrasts with traditional location addressing (like URLs), which points to a specific server. The CID is the foundational mechanism that enables IPFS's decentralized, peer-to-peer file sharing by allowing users to retrieve content from any node that possesses it, verified by its immutable hash.
A CID is not a single hash but a structured format that encodes several pieces of information. The most common version, CIDv1, includes a multicodec prefix (indicating the data format, e.g., dag-pb for IPFS files), a multihash (the actual cryptographic digest like SHA2-256), and optionally a multibase prefix (specifying the encoding, like base32). This self-describing structure allows systems to interpret the CID without external context. For example, the CID bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi tells you it's a SHA2-256 hash encoded in base32.
The generation of a CID begins with the InterPlanetary Linked Data (IPLD) model, where data is structured into Merkle DAGs (Directed Acyclic Graphs). When a file is added to IPFS, it is chunked, and each chunk is hashed. These hashes are then organized into a tree structure, with the root hash becoming the file's CID. This architecture enables efficient deduplication—if two files share identical chunks, those chunks are stored only once, referenced by the same hash, optimizing storage across the network.
A critical property of CIDs is their immutability: the identifier is inextricably linked to the content's bits. If the data changes by even one byte, its cryptographic hash—and therefore its CID—changes completely. This guarantees integrity, as anyone fetching data with a specific CID can verify it matches the expected hash. However, this also means updating content requires generating and distributing a new CID, which is why mutable pointers like IPNS (InterPlanetary Name System) or DNSLink are often layered on top for human-readable, updable references.
In practice, developers interact with CIDs through IPFS client libraries or CLI commands like ipfs add. The system handles the complex hashing and DAG construction automatically. Understanding CIDs is essential for building decentralized applications (dApps), implementing secure data provenance, and leveraging the core promise of IPFS: permanent, verifiable, and location-independent access to information.
Key Features of IPFS CIDs
An IPFS Content Identifier (CID) is a self-describing content-addressed identifier that provides verifiable, permanent references to data in decentralized networks.
Content Addressing
A CID is a cryptographic hash of the data itself, not its location. This means the identifier is derived from the content, enabling verifiability and immutability. Two identical files will always produce the same CID, regardless of where or when they are created.
- Key Property: The link is to the data, not a server.
- Example: A CID for a document will remain the same even if the document is moved between different IPFS nodes.
Self-Describing Format
A CID contains metadata that describes how to interpret the data it points to. This includes the multihash (hash type and length), multicodec (data format, e.g., dag-pb for IPLD), and multibase (encoding, e.g., base58btc).
- Key Property: The CID itself tells you what hash function was used and the format of the linked data.
- Benefit: Enables systems to process CIDs without external context or configuration.
CID Versions (v0 & v1)
IPFS CIDs have evolved through versions with different characteristics.
- CIDv0: The original version, starting with
Qm.... It is a Base58-encoded SHA-256 multihash and is less flexible. - CIDv1: The current standard, featuring a forward-compatible structure that includes explicit multicodec and multibase prefixes. It supports multiple hash functions and encodings (like
bafy...for Base32). - Migration: Most modern tooling generates CIDv1 by default, though v0 remains supported for backward compatibility.
Immutability & Persistence
Because a CID is a cryptographic hash, the referenced data cannot be changed without altering the CID. This creates a cryptographic commitment to the exact content.
- Immutability: Any change to the underlying data produces a completely different CID.
- Persistence Challenge: The CID guarantees what the data is, but not that it is stored. Data persistence depends on the peer-to-peer network of nodes choosing to host (pin) the content.
IPLD & Merkle Structures
CIDs are the core identifier for the InterPlanetary Linked Data (IPLD) model. They can link to chunks of data that themselves contain other CIDs, forming Merkle DAGs (Directed Acyclic Graphs).
- Key Property: Enables efficient representation of large or complex data structures (like file directories or blockchain states).
- Use Case: A CID for a large file points to a root block, which links to CIDs of its constituent chunks.
Multiformats & Future-Proofing
CIDs are built on the Multiformats project, a collection of self-describing protocol suites. This design makes them extensible and future-proof.
- Multihash: Allows upgrading cryptographic hash functions (e.g., from SHA-256 to Blake3) without breaking the system.
- Multicodec: Can identify new data serialization formats as they are invented.
- Multibase: Supports different string encodings for different environments (URL-safe, case-insensitive, etc.).
CIDv0 vs. CIDv1: A Comparison
Key technical differences between the original and current IPFS Content Identifier specifications.
| Feature | CIDv0 | CIDv1 |
|---|---|---|
Prefix | Qm... (implicit base58btc) | Explicit multibase prefix (e.g., bafy..., zb2...) |
Multibase Support | ||
Multicodec Support | ||
Default Hash Function | SHA-256 | Flexible (specified in multicodec) |
String Representation | Base58btc only | Any multibase encoding (Base32 default) |
Binary Representation (CID-in-CBOR) | Not self-describing | Self-describing (includes version, codec, multihash) |
IPFS Path Compatibility | Fully compatible | Fully compatible |
Future-proofing for new codecs/hashes |
Where IPFS CIDs Are Used
IPFS CIDs provide a permanent, verifiable address for content, enabling new paradigms for data storage and linking across decentralized applications and protocols.
Technical Anatomy of a CID
An in-depth look at the structure and components that make up a Content Identifier, the fundamental unit of addressability in content-addressed systems like IPFS.
An IPFS CID (Content Identifier) is a self-describing content-addressed identifier composed of a multicodec prefix, a multihash, and an optional multibase prefix, which together uniquely and verifiably point to a piece of content on the decentralized web. The CID specification, defined in RFC 2391, ensures that the identifier itself contains all the information needed to interpret the data it references, including the format of the content and the cryptographic hash function used. This design makes CIDs immutable, portable, and future-proof, as they are not tied to a specific location or protocol.
The core of a CID is its multihash, a self-describing hash digest that specifies the hash function (e.g., sha2-256) and the digest length. This is prefixed by a multicodec identifier that indicates the format of the target data, such as dag-pb for IPLD Protobuf nodes or raw for raw bytes. Finally, a multibase prefix (like b for base32) can be added to the entire string, encoding it for safe use in various contexts like URLs or filenames. A CIDv1 in its textual form typically appears as bafybeig... where the b denotes base32 encoding.
CIDs exist in two primary versions: CIDv0 and CIDv1. CIDv0 is the legacy format, starting with Qm, which implicitly assumes SHA-256 hashing and Base58BTC encoding. CIDv1 is the extensible, self-describing format that explicitly includes the multicodec and can use any multibase encoding. The migration to CIDv1 was crucial for supporting diverse hash functions (like blake3) and data formats beyond the original IPFS Merkle DAG, enabling broader interoperability across content-addressed systems.
Under the hood, a CID is a binary structure that can be serialized using CID multicodecs like dag-cbor or dag-pb. When developers work with CIDs programmatically—using libraries such as multiformats in JavaScript or Go—they parse this binary structure to extract the hash, verify data integrity, and resolve the content. This technical anatomy is what allows a single CID to reliably and permanently identify the same content across different networks, storage systems, and applications, forming the backbone of verifiable data exchange.
Frequently Asked Questions about IPFS CIDs
A Content Identifier (CID) is the foundational unit of content-addressing in the IPFS ecosystem. These questions cover its core mechanics, versions, and practical applications.
An IPFS CID (Content Identifier) is a self-describing cryptographic hash that uniquely and permanently identifies content on the InterPlanetary File System (IPFS). It works by applying a cryptographic hash function (like SHA-256) to the content itself, creating a unique fingerprint. This fingerprint is then encoded with metadata about the hash function and format into a single string using multihash, multicodec, and multibase protocols. The key principle is content-addressing: you request data by what it is (its hash) rather than where it is (a server URL). This ensures that identical content will always produce the same CID, enabling deduplication and verifiable integrity.
Get In Touch
today.
Our experts will offer a free quote and a 30min call to discuss your project.