IPFS Hash (CID): Decentralized Content Identifier

definition

DECODING CONTENT IDENTIFIERS

What is an IPFS Hash (CID)?

A technical breakdown of the cryptographic fingerprint at the heart of the InterPlanetary File System.

An IPFS Hash, formally known as a Content Identifier (CID), is a unique cryptographic fingerprint that permanently identifies a piece of content—such as a file, directory, or data block—on the InterPlanetary File System (IPFS). Unlike location-based addresses (e.g., URLs), which point to where data is stored, a CID is a content-addressed identifier derived from the content itself using a cryptographic hash function. This means the same content will always produce the same CID, enabling deduplication, tamper-proof verification, and permanent, location-independent addressing across a decentralized network.

The structure of a modern CID is self-describing, containing several key components within its encoded string. It specifies the multihash (the actual hash digest and the algorithm used, like SHA-256), the multicodec (the format of the data, e.g., raw bytes, dag-pb for IPFS objects, or dag-cbor), and the multibase prefix (the encoding, such as base58btc or base32, for readability). For example, a CIDv1 like bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi tells the network everything needed to fetch and interpret the content. This self-contained design ensures CIDs remain usable even as hash algorithms evolve.

In practice, when a file is added to IPFS, it is split into blocks, each receiving its own CID. These blocks are then organized into a Merkle Directed Acyclic Graph (DAG) structure, with a root CID representing the entire file or directory. This architecture enables powerful features: deduplication (identical blocks are stored once), tamper-evidence (any change alters the CID), and efficient versioning. CIDs are fundamental to the decentralized web, forming the backbone of protocols like IPFS and Filecoin, and are widely used in blockchain applications for storing NFTs, static website assets, and immutable data.

how-it-works

CONTENT ADDRESSING

How Does an IPFS Hash (CID) Work?

An IPFS hash, known as a Content Identifier (CID), is a cryptographic fingerprint that uniquely and permanently identifies content on the InterPlanetary File System (IPFS).

A Content Identifier (CID) is a self-describing content-addressed identifier. It is generated by applying a cryptographic hash function (like SHA-256) to the content's data, creating a unique, fixed-size string. This process, known as content addressing, means the CID is derived directly from the data itself. If the data changes even by a single bit, the resulting CID will be completely different. This ensures immutability and verifiability, as anyone can hash the data to confirm it matches the CID.

The structure of a modern CID (version 1, or CIDv1) is more than just a hash. It is a multiformat string that encodes several pieces of metadata in a self-describing way using Multicodec, Multihash, and Multibase prefixes. For example, a CID like bafybeigdyr... tells you the hash function used (SHA2-256), the length of the hash, and the base encoding (Base32). This layered structure allows the IPFS protocol to evolve, supporting new hash functions and data formats without breaking compatibility.

When you request content by its CID, the IPFS network uses a Distributed Hash Table (DHT) to locate network peers who have announced they are storing the data blocks associated with that fingerprint. The system retrieves and reassembles these blocks. This mechanism decouples content from location; instead of asking "where is the file?" (like https://server.com/file.jpg), you ask "who has the data matching this hash?" This makes content resilient, as it can be retrieved from any node that has a copy, not just a single server.

key-features

CONTENT ADDRESSING

Key Features of IPFS CIDs

A Content Identifier (CID) is a self-describing cryptographic hash that uniquely and permanently identifies content on the IPFS network and other decentralized systems.

01

Cryptographic Hashing

At its core, a CID is generated by applying a cryptographic hash function (like SHA-256) to the content's data. This creates a unique, fixed-size fingerprint. Even a single bit change in the original data produces a completely different CID, ensuring data integrity and tamper-evidence.

02

Self-Describing Format

A CID is not just a hash; it's a structured identifier that describes itself. It encodes:

The multihash (the hash digest and the function used).
The multicodec (the format of the data, e.g., raw bytes, dag-pb, dag-cbor).
The multibase prefix (the encoding, e.g., base58btc for Qm..., base32 for b...). This allows systems to know how to interpret the data without external context.

03

Versioning (CIDv0 vs CIDv1)

IPFS CIDs have evolved:

CIDv0: The original format, starting with Qm. It is a Base58-encoded SHA-256 hash of a protobuf. Limited flexibility.
CIDv1: The current standard, more flexible with explicit multicodec and multibase prefixes (e.g., bafybei...). It supports future-proofing with different hash functions and data formats. Most new systems use CIDv1.

04

Content vs. Location Addressing

This is the fundamental shift CIDs enable.

Location Addressing: Points to where data is (e.g., https://server.com/file.jpg). If the server moves, the link breaks.
Content Addressing: Points to what the data is via its CID (e.g., ipfs://bafy...). The data can be retrieved from any node on the IPFS network that has a copy, providing persistence and redundancy.

05

Immutability & Deduplication

A CID guarantees immutability; the same content always has the same CID. This enables automatic deduplication across the network. If two users add the same 1GB file, IPFS stores it only once under the same CID, saving massive amounts of storage and bandwidth.

06

Interoperability & The Multiformats Project

CIDs are built using Multiformats—a collection of self-describing protocols. This makes them usable far beyond IPFS. The same CID format is used by Filecoin for storage deals, IPLD for linked data structures, and other decentralized protocols, creating a universal language for content-addressed data.

EXPLORE

cid-versions

IPFS HASH (CID)

CID Versions: v0 vs. v1

A technical comparison of the two primary Content Identifier (CID) formats used in the InterPlanetary File System (IPFS) and related decentralized protocols.

A Content Identifier (CID) is a self-describing content-addressed identifier for data stored on the InterPlanetary File System (IPFS). The evolution from CIDv0 to CIDv1 represents a fundamental shift towards a more flexible, future-proof, and multi-protocol addressing scheme. CIDv0 is the original format, essentially a Base58-encoded SHA-256 multihash (e.g., Qm...), while CIDv1 is an extensible format that includes a version prefix, multicodec identifier, and multihash within a binary structure, typically represented as a CIDv1 string like bafybeig....

The primary technical distinction lies in their structure and encoding. CIDv0 is a legacy format that is implicitly version 0 and uses the dag-pb multicodec for IPLD data. It is restricted to the SHA-256 hash function and Base58 encoding, making it recognizable by its Qm prefix. In contrast, CIDv1 is explicitly versioned, includes a field to specify the content type (e.g., dag-cbor, raw), and can accommodate any hash function defined in the multihash table. This allows CIDv1 to represent a wider array of data structures and cryptographic commitments beyond IPFS's original scope.

A key practical difference is CIDv1's support for multiple textual representations. While CIDv0 only exists in Base58, a CIDv1 can be represented in various bases defined by the Multibase prefix, such as Base32 (b...) for case-insensitive environments or Base64. The common bafy... string is a Base32-encoded CIDv1. This flexibility makes CIDv1 more suitable for web applications, DNS, and filenames. Importantly, the underlying content address is identical for both versions when pointing to the same data; they are different encodings of the same cryptographic hash.

For developers, the choice is increasingly straightforward: CIDv1 is the modern standard. Most contemporary IPFS tooling and APIs, including the JavaScript (ipfs-core) and Go (kubo) implementations, generate CIDv1 by default. While the network remains fully compatible with CIDv0 for backward compatibility, new projects should adopt CIDv1 for its extensibility. A CIDv0 can be losslessly converted to its CIDv1 equivalent, but the reverse is not universally true, as CIDv1 can express constructs that CIDv0 cannot, such as content using the SHA3-256 hash or the dag-json codec.

Understanding the version difference is crucial for interoperability across the decentralized web stack. Protocols like IPLD (InterPlanetary Linked Data), Filecoin, and libp2p rely on CIDv1's ability to precisely describe the data being referenced. The version, multicodec, and multihash components work together to ensure that data is not only located but also correctly interpreted upon retrieval, forming the backbone of content-addressed verifiability in Web3 systems.

examples

IPFS HASH (CID)

Examples & Ecosystem Usage

The Content Identifier (CID) is the cornerstone of IPFS's content-addressed storage model. These examples illustrate its practical applications across the decentralized ecosystem.

01

NFT Metadata & Media Storage

Most NFTs store their artwork and metadata on IPFS, using a CID as the permanent pointer. This ensures the digital asset is immutable and accessible as long as the content is pinned. For example, an NFT's tokenURI might point to ipfs://QmXoypiz.../metadata.json, which itself contains a CID for the image file.

EXPLORE

02

Decentralized Application (dApp) Frontends

Projects like Uniswap and Aragon host their application interfaces on IPFS to achieve censorship resistance. Users interact with a frontend served via a CID, guaranteeing they always access the verified, untampered code. This removes reliance on centralized web servers that could be taken down.

EXPLORE

03

Software Distribution & Package Management

CIDs enable verifiable software distribution. Package managers can use IPFS to fetch dependencies, ensuring cryptographic integrity—the downloaded code exactly matches the author's original. This mitigates supply chain attacks. Projects like IPFS Cluster use CIDs to coordinate the pinning of large datasets across many nodes.

EXPLORE

04

Data Archiving & Scientific Datasets

Research institutions and archives use CIDs to publish and preserve large datasets. The CID acts as a permanent, versioned citation. Anyone can retrieve the exact dataset used in a published paper, enabling reproducible research. The InterPlanetary Linked Data (IPLD) model builds upon CIDs to create complex, linked data structures.

EXPLORE

05

Decentralized File Sharing & Collaboration

Tools like Fleek, Pinata, and Filecoin (which uses CIDs for storage deals) leverage IPFS hashes for secure file sharing and backup. Users can share a single CID to distribute folders, documents, or videos, with the guarantee that the content cannot be altered without changing the identifier.

EXPLORE

06

Blockchain State & Merkle Proofs

Blockchains like Ethereum and Polygon PoS use cryptographic hashes structurally similar to CIDs in their Merkle Patricia Tries to represent state. While not native IPFS CIDs, the principle is identical: a root hash (like a CID) uniquely identifies the entire state, and CIDs can be used to store and reference blockchain data off-chain.

EXPLORE

CONTENT ADDRESSING COMPARISON

CID vs. Traditional URL Addressing

A technical comparison of content-addressed identifiers (CIDs) and location-addressed URLs, highlighting their core architectural differences.

Feature	Content Identifier (CID)	Uniform Resource Locator (URL)
Addressing Method	Content-based (cryptographic hash of the data)	Location-based (path to a server and file)
Data Integrity
Data Immutability
Decentralization
Persistence	Data persists as long as one node hosts it	Link breaks if the hosting server changes or goes offline
Verification	Any node can verify the data matches the CID	Client must trust the server to serve correct data
Deduplication	Automatic (identical content = identical CID)	None (identical content can have infinite URLs)
Performance (Cached Data)	Near-instant from local or peer cache	Latency depends on origin server and network path

security-considerations

IPFS HASH (CID)

Security & Permanence Considerations

An IPFS Content Identifier (CID) is a self-describing cryptographic hash that uniquely addresses content on the InterPlanetary File System. Its design has profound implications for data integrity, availability, and long-term persistence.

01

Content-Addressed vs. Location-Addressed

A CID is a content-addressed identifier, meaning it is derived from the content's cryptographic hash. This contrasts with location-addressed systems (like HTTP URLs) that point to a server location. The key security benefit is immutability: if the data changes, its CID changes, guaranteeing the data you fetch is exactly what was originally stored. This prevents tampering and ensures verifiable integrity.

02

The Pinning Problem

IPFS is a peer-to-peer network; data is not stored permanently by default. Pinning is the mechanism that tells an IPFS node to retain and serve specific CIDs indefinitely. Security risk: If no node pins the data, it can be garbage-collected and become unavailable. Long-term permanence requires either running your own pinning node or using a pinning service (like Pinata, Infura, or Filecoin) which acts as a persistent, reliable host for your CIDs.

EXPLORE

03

CID Inherent Properties

A CID is self-describing and versioned. It encodes:

The cryptographic hash of the content (e.g., SHA2-256).
The codec (e.g., dag-pb, raw) describing the data format.
The multihash identifier, specifying the hash function used. This structure allows any system to independently verify the data's integrity and understand how to interpret it without external context, a core feature for trustless systems.

04

Sybil & Eclipse Attacks

While the CID itself is secure, the IPFS peer-to-peer routing layer can be vulnerable. Sybil attacks (creating many malicious nodes) or eclipse attacks (isolating a node from honest peers) can prevent a user from finding the correct peers hosting the desired CID. This doesn't corrupt the data (the CID remains valid) but can create a denial-of-service by hiding its availability. Mitigations include using trusted peers or DHT security extensions.

05

Permanence Through Incentives (Filecoin)

IPFS provides the addressing layer; Filecoin builds an incentive layer on top for verifiable, provable storage. Users pay miners to store CIDs via storage deals. Miners provide cryptographic proofs (Proof-of-Replication, Proof-of-Spacetime) to the blockchain, proving they are storing the data continuously. This creates a cryptoeconomic guarantee of permanence, making it costly for a miner to lose or censor data.

EXPLORE

06

CIDv1 & Future-Proofing

Early CIDv0 was base58-encoded and limited. CIDv1 is the current standard, featuring:

Multibase prefixes (e.g., bafy...) for encoding flexibility.
Explicit version byte.
Support for multiple hash functions (future-proofing against cryptographic breaks). Migrating to CIDv1 is critical for long-term archival, as it ensures addresses remain interpretable and resolvable even as underlying protocols evolve.

IPFS HASH (CID)

Frequently Asked Questions (FAQ)

A Content Identifier (CID) is the core addressing mechanism of the InterPlanetary File System (IPFS), providing a unique, verifiable fingerprint for any piece of content.

A Content Identifier (CID) is a self-describing content-addressed identifier that uniquely and verifiably points to data stored on the InterPlanetary File System (IPFS). It works by applying a cryptographic hash function (like SHA-256) to the content itself, generating a unique string of characters. This CID is not a location-based address (like a URL); instead, it is derived from the content's data. Any change to the data results in a completely different CID. The system uses this hash to locate and retrieve the content from the distributed IPFS network, ensuring data integrity and persistence.

Key components of a CID include:

Multihash: Specifies the hash function used and the hash digest.
Multicodec: Indicates the format of the target data (e.g., raw, dag-pb, dag-cbor).
Multibase: The encoding prefix (like b for base58btc) for the string representation.

Example CIDv1: bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi

IPFS Hash (CID)

What is an IPFS Hash (CID)?

How Does an IPFS Hash (CID) Work?

Key Features of IPFS CIDs

Cryptographic Hashing

Self-Describing Format

Versioning (CIDv0 vs CIDv1)

Content vs. Location Addressing

Immutability & Deduplication

Interoperability & The Multiformats Project

CID Versions: v0 vs. v1

Examples & Ecosystem Usage

NFT Metadata & Media Storage

Decentralized Application (dApp) Frontends

Software Distribution & Package Management

Data Archiving & Scientific Datasets

Decentralized File Sharing & Collaboration

Blockchain State & Merkle Proofs

CID vs. Traditional URL Addressing

Security & Permanence Considerations

Content-Addressed vs. Location-Addressed

The Pinning Problem

CID Inherent Properties

Sybil & Eclipse Attacks

Permanence Through Incentives (Filecoin)

CIDv1 & Future-Proofing

Frequently Asked Questions (FAQ)

IPFS (InterPlanetary File System)

IPNS (InterPlanetary Name System)

Get a free quote.

Get In Touch
today.

IPFS Hash (CID)

What is an IPFS Hash (CID)?

How Does an IPFS Hash (CID) Work?

Key Features of IPFS CIDs

Cryptographic Hashing

Self-Describing Format

Versioning (CIDv0 vs CIDv1)

Content vs. Location Addressing

Immutability & Deduplication

Interoperability & The Multiformats Project

CID Versions: v0 vs. v1

Examples & Ecosystem Usage

NFT Metadata & Media Storage

Decentralized Application (dApp) Frontends

Software Distribution & Package Management

Data Archiving & Scientific Datasets

Decentralized File Sharing & Collaboration

Blockchain State & Merkle Proofs

CID vs. Traditional URL Addressing

Security & Permanence Considerations

Content-Addressed vs. Location-Addressed

The Pinning Problem

CID Inherent Properties

Sybil & Eclipse Attacks

Permanence Through Incentives (Filecoin)

CIDv1 & Future-Proofing

Frequently Asked Questions (FAQ)

Related Terms

Content Addressing

Multihash

IPFS (InterPlanetary File System)

Decentralized Storage

Merkle DAG (Directed Acyclic Graph)

IPNS (InterPlanetary Name System)

Get In Touch today.

Get In Touch
today.