IPFS CID: Content Identifier Explained for Blockchain

definition

DECENTRALIZED STORAGE

What is IPFS CID (Content Identifier)?

An IPFS CID is the unique cryptographic fingerprint for any piece of content stored on the InterPlanetary File System.

An IPFS Content Identifier (CID) is a self-describing content-addressed identifier that uniquely and permanently points to a piece of data on the InterPlanetary File System. Unlike location-based addressing (e.g., a URL), which tells you where to find data, a CID is derived from the content's cryptographic hash, telling you what the data is. This creates immutable links; if the content changes, its CID changes completely, guaranteeing data integrity and enabling verifiable, permanent references.

A CID is a multiformat string, typically starting with Qm for CIDv0 or b for CIDv1, which encodes two critical pieces of information: the cryptographic hash of the content (e.g., SHA2-256) and the codec, which specifies how to interpret the underlying data (e.g., dag-pb for IPFS data structures or raw for raw bytes). This self-describing nature allows systems to understand how to process the data without external context. CIDs are the foundational building block for content-addressed storage and verifiable data structures across decentralized networks.

In practice, developers interact with CIDs when pinning files, retrieving data via ipfs get <CID>, or referencing assets in smart contracts and NFTs. For example, an NFT's metadata JSON file is often stored on IPFS, and its CID is recorded on-chain, creating a permanent, tamper-proof link to the asset's data. This architecture decouples data storage from specific servers, enabling persistent, decentralized access as long as at least one node on the IPFS network hosts the content referenced by the CID.

how-it-works

CONTENT IDENTIFIER

How an IPFS CID Works

An IPFS CID (Content Identifier) is a self-describing cryptographic hash that uniquely and permanently addresses content on the InterPlanetary File System.

An IPFS CID (Content Identifier) is a unique label derived from the cryptographic hash of the content itself, ensuring that identical data produces the same CID anywhere in the network. This process, known as content addressing, contrasts with traditional location addressing (like URLs), which points to a specific server. The CID is the foundational mechanism that enables IPFS's decentralized, peer-to-peer file sharing by allowing users to retrieve content from any node that possesses it, verified by its immutable hash.

A CID is not a single hash but a structured format that encodes several pieces of information. The most common version, CIDv1, includes a multicodec prefix (indicating the data format, e.g., dag-pb for IPFS files), a multihash (the actual cryptographic digest like SHA2-256), and optionally a multibase prefix (specifying the encoding, like base32). This self-describing structure allows systems to interpret the CID without external context. For example, the CID bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi tells you it's a SHA2-256 hash encoded in base32.

The generation of a CID begins with the InterPlanetary Linked Data (IPLD) model, where data is structured into Merkle DAGs (Directed Acyclic Graphs). When a file is added to IPFS, it is chunked, and each chunk is hashed. These hashes are then organized into a tree structure, with the root hash becoming the file's CID. This architecture enables efficient deduplication—if two files share identical chunks, those chunks are stored only once, referenced by the same hash, optimizing storage across the network.

A critical property of CIDs is their immutability: the identifier is inextricably linked to the content's bits. If the data changes by even one byte, its cryptographic hash—and therefore its CID—changes completely. This guarantees integrity, as anyone fetching data with a specific CID can verify it matches the expected hash. However, this also means updating content requires generating and distributing a new CID, which is why mutable pointers like IPNS (InterPlanetary Name System) or DNSLink are often layered on top for human-readable, updable references.

In practice, developers interact with CIDs through IPFS client libraries or CLI commands like ipfs add. The system handles the complex hashing and DAG construction automatically. Understanding CIDs is essential for building decentralized applications (dApps), implementing secure data provenance, and leveraging the core promise of IPFS: permanent, verifiable, and location-independent access to information.

key-features

ARCHITECTURE

Key Features of IPFS CIDs

An IPFS Content Identifier (CID) is a self-describing content-addressed identifier that provides verifiable, permanent references to data in decentralized networks.

01

Content Addressing

A CID is a cryptographic hash of the data itself, not its location. This means the identifier is derived from the content, enabling verifiability and immutability. Two identical files will always produce the same CID, regardless of where or when they are created.

Key Property: The link is to the data, not a server.
Example: A CID for a document will remain the same even if the document is moved between different IPFS nodes.

02

Self-Describing Format

A CID contains metadata that describes how to interpret the data it points to. This includes the multihash (hash type and length), multicodec (data format, e.g., dag-pb for IPLD), and multibase (encoding, e.g., base58btc).

Key Property: The CID itself tells you what hash function was used and the format of the linked data.
Benefit: Enables systems to process CIDs without external context or configuration.

03

CID Versions (v0 & v1)

IPFS CIDs have evolved through versions with different characteristics.

CIDv0: The original version, starting with Qm.... It is a Base58-encoded SHA-256 multihash and is less flexible.
CIDv1: The current standard, featuring a forward-compatible structure that includes explicit multicodec and multibase prefixes. It supports multiple hash functions and encodings (like bafy... for Base32).
Migration: Most modern tooling generates CIDv1 by default, though v0 remains supported for backward compatibility.

04

Immutability & Persistence

Because a CID is a cryptographic hash, the referenced data cannot be changed without altering the CID. This creates a cryptographic commitment to the exact content.

Immutability: Any change to the underlying data produces a completely different CID.
Persistence Challenge: The CID guarantees what the data is, but not that it is stored. Data persistence depends on the peer-to-peer network of nodes choosing to host (pin) the content.

05

IPLD & Merkle Structures

CIDs are the core identifier for the InterPlanetary Linked Data (IPLD) model. They can link to chunks of data that themselves contain other CIDs, forming Merkle DAGs (Directed Acyclic Graphs).

Key Property: Enables efficient representation of large or complex data structures (like file directories or blockchain states).
Use Case: A CID for a large file points to a root block, which links to CIDs of its constituent chunks.

06

Multiformats & Future-Proofing

CIDs are built on the Multiformats project, a collection of self-describing protocol suites. This design makes them extensible and future-proof.

Multihash: Allows upgrading cryptographic hash functions (e.g., from SHA-256 to Blake3) without breaking the system.
Multicodec: Can identify new data serialization formats as they are invented.
Multibase: Supports different string encodings for different environments (URL-safe, case-insensitive, etc.).

IPFS CONTENT IDENTIFIERS

CIDv0 vs. CIDv1: A Comparison

Key technical differences between the original and current IPFS Content Identifier specifications.

Feature	CIDv0	CIDv1
Prefix	Qm... (implicit base58btc)	Explicit multibase prefix (e.g., bafy..., zb2...)
Multibase Support
Multicodec Support
Default Hash Function	SHA-256	Flexible (specified in multicodec)
String Representation	Base58btc only	Any multibase encoding (Base32 default)
Binary Representation (CID-in-CBOR)	Not self-describing	Self-describing (includes version, codec, multihash)
IPFS Path Compatibility	Fully compatible	Fully compatible
Future-proofing for new codecs/hashes

ecosystem-usage

APPLICATIONS

Where IPFS CIDs Are Used

IPFS CIDs provide a permanent, verifiable address for content, enabling new paradigms for data storage and linking across decentralized applications and protocols.

01

Decentralized Storage & NFTs

IPFS CIDs are the standard for storing NFT metadata and assets (images, videos) off-chain. Platforms like OpenSea and Rarible use CIDs to ensure immutable provenance and permanent links to the digital asset, preventing link rot. The CID is stored on-chain, while the content is fetched from the IPFS network.

Example: An NFT's tokenURI often points to an IPFS gateway URL containing the asset's CID.

EXPLORE

02

Decentralized Web (dWeb) & Websites

Entire static websites can be hosted on IPFS, with each page and asset referenced by its CID. This creates censorship-resistant and permanently accessible sites. Services like Fleek and Pinata help deploy and pin websites to IPFS. Visitors access the site via a public gateway or a decentralized domain like ENS or a .ipfs subdomain.

EXPLORE

03

Software Distribution & Package Management

CIDs enable verifiable, decentralized software distribution. Package managers can use IPFS to fetch dependencies via their CIDs, guaranteeing integrity and availability without relying on a central registry. Projects like IPFS Cluster and Berty use this for peer-to-peer updates. This mitigates risks like registry compromise or the "left-pad" incident.

EXPLORE

04

Blockchain Data & Oracles

Blockchains use CIDs as efficient pointers to large datasets stored off-chain, a pattern known as content-addressed storage. Oracles like Chainlink can store data proofs on IPFS and reference the CID on-chain. This allows smart contracts to access verifiable external data (e.g., weather data, legal documents) without bloating the chain.

EXPLORE

05

Data Archiving & Scientific Datasets

Research institutions and archives use IPFS CIDs to preserve large, immutable datasets. The CID provides a permanent checksum, ensuring data integrity over decades. Projects like arXiv and the InterPlanetary Scientific Database (IPSD) leverage this for replicating petabytes of scientific data across a global peer-to-peer network.

EXPLORE

06

Decentralized Identity & Verifiable Credentials

CIDs anchor Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs). Identity documents can be stored on IPFS, with the CID referenced in a DID document on a blockchain. This creates a self-sovereign identity system where credentials are portable, cryptographically verifiable, and controlled by the user, not a central authority.

EXPLORE

technical-details

IPFS

Technical Anatomy of a CID

An in-depth look at the structure and components that make up a Content Identifier, the fundamental unit of addressability in content-addressed systems like IPFS.

An IPFS CID (Content Identifier) is a self-describing content-addressed identifier composed of a multicodec prefix, a multihash, and an optional multibase prefix, which together uniquely and verifiably point to a piece of content on the decentralized web. The CID specification, defined in RFC 2391, ensures that the identifier itself contains all the information needed to interpret the data it references, including the format of the content and the cryptographic hash function used. This design makes CIDs immutable, portable, and future-proof, as they are not tied to a specific location or protocol.

The core of a CID is its multihash, a self-describing hash digest that specifies the hash function (e.g., sha2-256) and the digest length. This is prefixed by a multicodec identifier that indicates the format of the target data, such as dag-pb for IPLD Protobuf nodes or raw for raw bytes. Finally, a multibase prefix (like b for base32) can be added to the entire string, encoding it for safe use in various contexts like URLs or filenames. A CIDv1 in its textual form typically appears as bafybeig... where the b denotes base32 encoding.

CIDs exist in two primary versions: CIDv0 and CIDv1. CIDv0 is the legacy format, starting with Qm, which implicitly assumes SHA-256 hashing and Base58BTC encoding. CIDv1 is the extensible, self-describing format that explicitly includes the multicodec and can use any multibase encoding. The migration to CIDv1 was crucial for supporting diverse hash functions (like blake3) and data formats beyond the original IPFS Merkle DAG, enabling broader interoperability across content-addressed systems.

Under the hood, a CID is a binary structure that can be serialized using CID multicodecs like dag-cbor or dag-pb. When developers work with CIDs programmatically—using libraries such as multiformats in JavaScript or Go—they parse this binary structure to extract the hash, verify data integrity, and resolve the content. This technical anatomy is what allows a single CID to reliably and permanently identify the same content across different networks, storage systems, and applications, forming the backbone of verifiable data exchange.

GLOSSARY

Frequently Asked Questions about IPFS CIDs

A Content Identifier (CID) is the foundational unit of content-addressing in the IPFS ecosystem. These questions cover its core mechanics, versions, and practical applications.

An IPFS CID (Content Identifier) is a self-describing cryptographic hash that uniquely and permanently identifies content on the InterPlanetary File System (IPFS). It works by applying a cryptographic hash function (like SHA-256) to the content itself, creating a unique fingerprint. This fingerprint is then encoded with metadata about the hash function and format into a single string using multihash, multicodec, and multibase protocols. The key principle is content-addressing: you request data by what it is (its hash) rather than where it is (a server URL). This ensures that identical content will always produce the same CID, enabling deduplication and verifiable integrity.

IPFS CID (Content Identifier)

What is IPFS CID (Content Identifier)?

How an IPFS CID Works

Key Features of IPFS CIDs

Content Addressing

Self-Describing Format

CID Versions (v0 & v1)

Immutability & Persistence

IPLD & Merkle Structures

Multiformats & Future-Proofing

CIDv0 vs. CIDv1: A Comparison

Where IPFS CIDs Are Used

Decentralized Storage & NFTs

Decentralized Web (dWeb) & Websites

Software Distribution & Package Management

Blockchain Data & Oracles

Data Archiving & Scientific Datasets

Decentralized Identity & Verifiable Credentials

Technical Anatomy of a CID

IPFS (InterPlanetary File System)

Multihash

Multicodec

Multibase

IPLD (InterPlanetary Linked Data)

Frequently Asked Questions about IPFS CIDs

Get a free quote.

Get In Touch
today.

IPFS CID (Content Identifier)

What is IPFS CID (Content Identifier)?

How an IPFS CID Works

Key Features of IPFS CIDs

Content Addressing

Self-Describing Format

CID Versions (v0 & v1)

Immutability & Persistence

IPLD & Merkle Structures

Multiformats & Future-Proofing

CIDv0 vs. CIDv1: A Comparison

Where IPFS CIDs Are Used

Decentralized Storage & NFTs

Decentralized Web (dWeb) & Websites

Software Distribution & Package Management

Blockchain Data & Oracles

Data Archiving & Scientific Datasets

Decentralized Identity & Verifiable Credentials

Technical Anatomy of a CID

Related Terms & Concepts

IPFS (InterPlanetary File System)

Multihash

Multicodec

Multibase

IPLD (InterPlanetary Linked Data)

Content Addressing vs. Location Addressing

Frequently Asked Questions about IPFS CIDs

Get In Touch today.

Get In Touch
today.