Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
LABS
Guides

How to Implement a Decentralized Identity for Data Provenance

A developer tutorial for creating immutable provenance trails for scientific data using Decentralized Identifiers (DIDs), Content Identifiers (CIDs), and on-chain attestations.
Chainscore © 2026
introduction
INTRODUCTION

How to Implement a Decentralized Identity for Data Provenance

This guide explains how to use decentralized identifiers (DIDs) and verifiable credentials to create a tamper-proof record of data origin and ownership.

Data provenance—the ability to trace the origin, ownership, and history of data—is critical for trust in digital systems. Traditional centralized models rely on single points of failure and opaque verification. Decentralized Identity (DID) offers a solution by giving users control over their identifiers and credentials through cryptographic proofs stored on a blockchain or other distributed ledger. This creates an immutable, verifiable chain of custody for any digital asset, from AI training data to supply chain records.

The core components are the Decentralized Identifier (DID), a globally unique URI controlled by the subject (e.g., did:ethr:0xabc123), and Verifiable Credentials (VCs), which are tamper-evident claims issued about that DID. A VC is signed by an issuer's DID and can represent anything from a person's name to a dataset's hash and license terms. The Verifiable Data Registry, often a blockchain like Ethereum or Polygon, stores the DIDs and their associated public keys, enabling anyone to resolve and verify credentials without a central authority.

To implement this, you first choose a DID method like did:ethr for Ethereum-compatible chains or did:key for simpler use cases. You then create a DID document containing public keys and service endpoints. For data provenance, a common pattern is to issue a VC containing a cryptographic hash (e.g., SHA-256) of the dataset, metadata about its creation, and the issuer's DID. This credential is signed and can be presented alongside the data to prove its authenticity and lineage.

Here's a conceptual code snippet using the ethr-did library to create a DID and sign a claim:

javascript
import { EthrDID } from 'ethr-did';
const provider = /* your Ethereum provider */;
const issuer = new EthrDID({ identifier: '0xIssuerAddress', provider });
// Create a Verifiable Credential for a dataset hash
const vc = await issuer.createVerifiableCredential({
  sub: 'did:ethr:0xDataOwnerAddress',
  claim: { hash: '0xabc...', license: 'CC-BY-4.0', creationDate: '2024-01-01' }
});
// The signed VC (vc.jwt) can now be stored or transmitted with the data.

Practical applications are vast. In AI/ML, DIDs can attest to training data sources, model authorship, and usage rights. For supply chains, each product batch can have a DID with VCs for origin, temperature logs, and certifications. Academic publishing can use DIDs to create citable, unforgeable records of research data. The system's power lies in selective disclosure; you can prove a specific claim (e.g., "this data is licensed for commercial use") without revealing the entire credential or your full identity.

When implementing, consider key management (using hardware security modules or MPC wallets for private keys), the cost and finality of your chosen blockchain for the registry, and interoperability standards from the W3C Decentralized Identifiers (DIDs) v1.0 and Verifiable Credentials Data Model. Frameworks like SpruceID's Sign-in with Ethereum and Microsoft's ION on Bitcoin provide robust tooling. The result is a user-centric, cryptographically secure foundation for auditing data history and establishing trust across decentralized applications.

prerequisites
SETUP & FOUNDATIONS

Prerequisites

Before building a decentralized identity system for data provenance, you need to establish the core technical and conceptual foundations. This section outlines the essential knowledge and tools required.

To implement a decentralized identity (DID) system for data provenance, you must first understand the core components. A DID is a cryptographically verifiable identifier (e.g., did:ethr:0xabc123...) controlled by the user, not a central authority. For provenance, this identity anchors the origin and history of data. You'll need a working knowledge of public key infrastructure (PKI), where a user's private key signs claims, and their public key, resolvable via their DID, allows anyone to verify those signatures. This creates an immutable chain of custody for any digital asset.

Your development environment must be configured with specific tools. Essential software includes Node.js (v18+) and a package manager like npm or yarn. You will interact with a blockchain; for learning, a local Hardhat or Foundry development chain is ideal. Familiarity with the Ethereum Improvement Proposal (EIP) standards is crucial, particularly EIP-712 for structured data signing and EIP-2844 for the did:ethr method. You should also have a basic wallet like MetaMask installed for key management and signing operations.

The implementation relies on specific libraries and protocols. The Verifiable Credentials (VC) Data Model is the W3C standard for expressing claims, and you'll use libraries like did-jwt-vc or veramo to create and verify them. For on-chain components, you need a smart contract to act as a DID Registry—you can deploy your own or use an existing one like the EthereumDIDRegistry. Understanding how to write events from this registry to log DID document updates is key for provenance tracking.

Finally, grasp the data flow. A provenance system using DIDs typically follows this pattern: 1) An issuer (e.g., a sensor or company) creates a Verifiable Credential about a data point and signs it with their DID. 2) This VC is stored off-chain (e.g., IPFS) with its content identifier (CID) recorded on-chain. 3) A verifier can fetch the DID document, resolve the public key, and cryptographically verify the credential's origin and integrity. Your code will need to handle each of these stages.

core-components
DECENTRALIZED IDENTITY

Core Components

A decentralized identity (DID) system for data provenance requires several foundational building blocks. This section details the key protocols and standards developers need to implement.

03

Verifiable Data Registries (VDRs)

A Verifiable Data Registry is the trusted system where DIDs and their public key material are anchored. Blockchains like Ethereum, Polygon, and Solana are commonly used as VDRs due to their immutability and decentralization. Key functions include:

  • DID Anchoring: Writing the hash of a DID Document or its public keys to the chain.
  • Key Rotation: Updating the state of a DID to revoke old keys and add new ones via on-chain transactions.
  • Event Logs: Providing an immutable audit trail of all changes to a DID's state, which is critical for provenance tracking.
Ethereum
Primary VDR for did:ethr
< $1
Cost to anchor on L2s
04

Selective Disclosure & Zero-Knowledge Proofs

For privacy-preserving provenance, systems must support selective disclosure. This allows a holder to prove a specific claim from a credential without revealing the entire document. Zero-Knowledge Proofs (ZKPs) are the advanced cryptographic tool for this.

  • BBS+ Signatures: Allow for deriving multiple, unlinkable proofs from a single VC.
  • zk-SNARKs/zk-STARKs: Enable proving complex statements (e.g., "I am over 18") without revealing the underlying data (your birth date). Frameworks like iden3's circom and libraries for BBS+ signatures are essential for implementing this layer.
06

Provenance-Specific Schemas & Ontologies

To track data lineage, you need standardized data models. This involves creating or adopting JSON schemas for VCs that define provenance-specific fields.

  • Provenance Attributes: createdBy, derivedFrom, timestamp, actionPerformed, toolUsed.
  • Linked Data Vocabularies: Using ontologies like PROV-O (W3C Provenance Ontology) or schema.org to ensure semantic interoperability.
  • Immutable Timestamps: Anchoring credential issuance or data creation events on-chain provides a globally-verifiable timestamp, a cornerstone of auditability.
step1-did-creation
IDENTITY FOUNDATION

Step 1: Create a Decentralized Identifier (DID)

A Decentralized Identifier (DID) is the cryptographic anchor for your on-chain identity, enabling verifiable data provenance without centralized registries.

A Decentralized Identifier (DID) is a globally unique, persistent identifier that an individual, organization, or device controls. Unlike traditional identifiers (like an email address issued by Google), a DID is self-sovereign. It is created, owned, and managed by the entity it identifies, without reliance on a central authority. This is foundational for data provenance, as it allows you to cryptographically sign and claim ownership of data across different systems and blockchains. The standard format, defined by the W3C, is did:method:method-specific-id.

To create a DID, you first choose a DID method, which defines the specific blockchain or network and the rules for creating, reading, updating, and deactivating the DID. Popular methods for developers include did:ethr (Ethereum, Polygon), did:pkh (for any blockchain public key), and did:key (a simple, self-contained method). For data provenance on EVM chains, did:ethr is a common choice as it maps directly to an Ethereum account. The creation process involves generating a cryptographic key pair and registering the public key on the chosen blockchain to form the DID document.

Here is a practical example using the ethr-did library with an Ethereum provider. This code generates a DID linked to a new Ethereum wallet:

javascript
import { EthrDID } from 'ethr-did';
import { ethers } from 'ethers';

// 1. Generate a new Ethereum key pair
const wallet = ethers.Wallet.createRandom();
const privateKey = wallet.privateKey;

// 2. Create an EthrDID instance for the Goerli testnet
const providerConfig = { name: 'goerli', rpcUrl: 'https://goerli.infura.io/v3/YOUR_KEY' };
const did = new EthrDID({
  identifier: wallet.address,
  privateKey,
  chainNameOrId: 'goerli',
  provider: new ethers.providers.JsonRpcProvider(providerConfig.rpcUrl)
});

console.log('DID created:', did.did); // e.g., did:ethr:goerli:0xAbc...

This creates a DID like did:ethr:goerli:0xabc123.... The associated DID Document is a JSON-LD file stored on-chain that contains the public key, authentication methods, and service endpoints, enabling verification.

Once created, this DID becomes your verifiable identity anchor. You use the corresponding private key to sign Verifiable Credentials—cryptographic attestations about your data. For provenance, this means you can sign a statement like "I, did:ethr:..., attest that I generated dataset hash:0xfea3... at timestamp 173042...." Any verifier can then resolve your DID to fetch the public key from the blockchain and verify the signature's authenticity, establishing a tamper-proof chain of custody without needing to trust a central database.

step2-data-cid-generation
DATA INTEGRITY

Step 2: Generate a CID for Your Dataset

Create a unique, content-addressed fingerprint for your data using IPFS to establish an immutable foundation for your decentralized identity.

A Content Identifier (CID) is the cornerstone of data provenance. It is a self-describing cryptographic hash that uniquely identifies your dataset's content on the InterPlanetary File System (IPFS). Unlike a traditional URL that points to a location, a CID points to the data itself. Any change to the data—even a single byte—produces a completely different CID. This property makes CIDs perfect for creating tamper-evident records, as the CID recorded in your decentralized identity will only remain valid for the exact, original dataset.

To generate a CID, you must first prepare your data. For a single file, you can hash it directly. For a dataset consisting of multiple files or a directory structure, you should create a DAG (Directed Acyclic Graph) using a tool like ipfs-unixfs. This structure preserves the arrangement of your files. You can then generate the CID using the IPFS command-line tool: ipfs add -r --cid-version=1 /path/to/dataset. The --cid-version=1 flag ensures you get a modern, future-proof CIDv1. The output will include the root CID for your entire dataset, which you will use in the next step.

For programmatic integration, libraries like js-ipfs or helia in JavaScript or ipfs-http-client for API calls are commonly used. The core process involves reading your data, passing it through the chosen IPFS library's add function, and capturing the returned CID string. This CID is now your dataset's permanent, verifiable fingerprint. Store this CID securely; it is the primary reference you will anchor to a blockchain or a decentralized identifier (DID) document to prove data ownership and integrity without needing to store the data on-chain.

step3-create-provenance-attestation
IMPLEMENTATION

Step 3: Create and Sign a Provenance Attestation

This step details the core action of your decentralized identity system: generating a cryptographically signed statement that links a data asset to its creator and its history.

A provenance attestation is a structured claim that binds a specific data asset to its origin and chain of custody. Think of it as a digital notarization. In the context of Verifiable Credentials (VCs), this attestation is the credential itself. The core data structure, defined by the W3C Verifiable Credentials Data Model, includes essential fields: the issuer (your DID), the credentialSubject (the data asset's metadata and its current holder's DID), a proof section for the signature, and a type such as ProvenanceCredential. This structure creates an immutable, machine-readable record of who created the data, what it is, and who currently possesses the right to use it.

Signing this attestation is what gives it trust and verifiability. Using the private key associated with your DID (from Step 2), you generate a cryptographic signature over the entire credential data. This process typically uses JSON Web Signatures (JWS) or Linked Data Signatures (LD-Proofs). For example, using the did:key method and the Ed25519 signature suite, a library like @digitalbazaar/ed25519-signature-2020 would hash the credential, sign it with your private key, and embed the resulting signature in the proof field. This signature allows anyone to verify the attestation's authenticity using your public DID Document without needing to contact you directly.

Here is a simplified code example of creating and signing a provenance attestation using a hypothetical VC library. Note that key management and secure signing would occur in a secure environment, like a backend service or a hardware wallet module.

javascript
import { signCredential } from 'vc-js-lib';
import { keyToDidDoc } from 'did-key-resolver';

// 1. Define the unsigned provenance credential
const unsignedCredential = {
  "@context": ["https://www.w3.org/2018/credentials/v1"],
  "type": ["VerifiableCredential", "ProvenanceCredential"],
  "issuer": "did:key:z6Mkf5rGM...", // Your DID
  "issuanceDate": "2023-10-26T15:23:48Z",
  "credentialSubject": {
    "id": "did:ethr:0xabc...", // Holder's DID
    "dataAsset": {
      "id": "urn:uuid:550e8400-e29b-41d4-a716-446655440000",
      "name": "Climate_Dataset_Q3_2023.csv",
      "hash": "0x9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08"
    }
  }
};

// 2. Sign the credential with your private key
const signedCredential = await signCredential({
  credential: unsignedCredential,
  suite: new Ed25519Signature2020({ key: issuerPrivateKey }),
  documentLoader: customLoader // Resolves DIDs and contexts
});

// `signedCredential` now contains a `proof` signature.

The signed attestation is now a Verifiable Credential. It can be shared with the data recipient, stored on-chain in a registry (like Ethereum Attestation Service or Ceramic Network), or held in a user's identity wallet. The critical property is that its integrity and origin are cryptographically verifiable. Any alteration to the data within the credential—changing the file hash, the issuer DID, or the issuance date—will cause the signature verification to fail, immediately exposing tampering. This creates a strong, cryptographic basis for trust in the data's provenance.

Best practices for attestation design include using standard vocabularies (schema.org, W3C VC types) for interoperability, including a timestamp for establishing a timeline, and clearly defining the rights or license being attested to. For high-value assets, consider anchoring the credential's hash on a public blockchain like Ethereum or IPFS to provide a decentralized timestamp and existence proof, making the attestation itself resilient and independently verifiable across the ecosystem.

step4-record-on-chain
EXECUTION

Step 4: Record the Attestation On-Chain

This step finalizes the decentralized identity process by permanently anchoring a verifiable credential's proof to a blockchain, creating an immutable record of data provenance.

On-chain attestation involves writing a cryptographic proof of your verifiable credential to a public ledger. This proof is not the credential data itself, which should remain private, but a commitment or digital fingerprint that can be used to verify its authenticity and state without revealing the underlying information. Common methods include storing a hash of the credential (e.g., using keccak256), a Merkle root of a batch of credentials, or a zero-knowledge proof. This creates a tamper-evident anchor; any subsequent alteration to the original off-chain credential will break the link to this on-chain proof.

The choice of blockchain and smart contract is critical. For Ethereum and EVM-compatible chains (like Polygon or Arbitrum), you would deploy or interact with a registry contract. A basic Solidity function for recording a hash might look like:

solidity
function attest(bytes32 _credentialHash, address _subject) public {
    registry[_subject] = _credentialHash;
    emit Attested(_subject, _credentialHash, msg.sender);
}

Other ecosystems use their own frameworks: Solana programs, Cosmos modules, or dedicated attestation protocols like Ethereum Attestation Service (EAS) or Verax. Consider factors like transaction cost, finality time, and the ecosystem of verifiers when selecting your chain.

After broadcasting the transaction, you must capture the transaction hash and block number. These details are essential for anyone who wants to verify your attestation later. The verifier's process is the inverse: they hash the presented credential data using the same algorithm, fetch the stored proof from the blockchain using your transaction details, and compare the two hashes. A match proves the data is identical to what was originally attested and has not been altered since the block was mined. This mechanism is foundational for supply chain logs, academic credential verification, and audit trails where proof of existence at a specific time is required.

For advanced use cases, consider attestation schemas. Rather than storing a single hash, you can register a schema definition on-chain that specifies the structure of your credential (e.g., field names and types). Subsequent attestations then reference this schema ID, enabling standardized verification across applications. Platforms like EAS provide this infrastructure. Furthermore, you can attest to revocation by writing a nullifier or updating a registry entry, providing a decentralized mechanism to invalidate a credential without relying on the original issuer to be online.

Finally, integrate this on-chain proof back into your verifiable credential. The credential's metadata should include a proof property pointing to the blockchain evidence. A W3C Verifiable Credential might have an extension like:

json
"evidence": [{
  "type": "BlockchainTransaction",
  "chainId": 1,
  "transactionHash": "0x...",
  "blockNumber": 19283476
}]

This creates a closed loop: the off-chain credential contains a pointer to its on-chain verification anchor, and the on-chain anchor uniquely identifies the credential. This completes the data provenance lifecycle, making the credential self-sovereign, independently verifiable, and permanently timestamped by the blockchain's consensus.

DECENTRALIZED IDENTITY STACKS

Protocol Comparison for Data Provenance

Comparison of core protocols for anchoring and verifying data provenance with decentralized identifiers.

Feature / MetricVerifiable Credentials (W3C)IPFS + Ethereum AttestationsCeramic Network

Data Anchoring Method

Linked Data Proofs (JWT, JSON-LD)

CID stored in on-chain registry

StreamID in decentralized log

Decentralized Identifier (DID) Support

Off-Chain Data Storage

Holder's agent or cloud

IPFS (content-addressed)

Ceramic stream tiles (mutable)

Selective Disclosure

Revocation Mechanism

Status List / Registry

On-chain attestation expiry

Stream controller update

Average Anchor Cost

$2-10 (L1)

$5-20 (L1 gas)

< $0.01 (L2)

Schema Flexibility

High (JSON-LD contexts)

Medium (custom structs)

High (composable streams)

Primary Use Case

Portable user credentials

Asset provenance (NFTs)

Mutable application data

DECENTRALIZED IDENTITY

Troubleshooting Common Issues

Common challenges and solutions for developers implementing decentralized identity (DID) systems for data provenance.

A verifier rejecting your Verifiable Credential typically indicates a failure in one of the core verification checks. The most common issues are:

  • Invalid Proof: The cryptographic signature on the VC or its associated Verifiable Presentation (VP) is incorrect or was signed with a revoked key. Verify the signing process using libraries like did-jwt-vc or vc-js.
  • Expired Credential: Check the expirationDate field. Credentials are time-bound by design.
  • Unsatisfied Proof Purpose: The VC's proofPurpose (e.g., authentication, assertionMethod) must match what the verifier expects for the specific interaction.
  • Schema/Context Mismatch: The verifier's policy may require credentials conforming to a specific schema (e.g., W3C's VerifiableCredentialsSchema2018). Ensure your VC's @context and type fields are correct.

Always test with the verifier's public API or sandbox environment first.

DID IMPLEMENTATION

Frequently Asked Questions

Common technical questions and solutions for developers building with Decentralized Identifiers (DIDs) for data provenance.

A Decentralized Identifier (DID) is a globally unique, persistent identifier that is controlled by its subject (a person, organization, or thing) without reliance on a central registry. It is defined by the W3C standard. A blockchain address (e.g., 0x...) is a specific type of identifier, but a DID is more abstract and flexible.

Key Differences:

  • Control: A DID's control is proven via cryptographic keys, independent of any single blockchain. An address is tied to a specific network.
  • Format: DIDs follow a URI scheme: did:method:method-specific-id. For example, did:ethr:0xab... or did:key:z6Mk....
  • Portability: A DID can be resolved to a DID Document containing public keys and service endpoints, enabling verifiable interactions across systems. An address typically only references a wallet.
  • Use Case: DIDs are designed for verifiable credentials and data provenance, linking an entity to claims about data. Addresses are primarily for asset transfers.
conclusion
IMPLEMENTATION SUMMARY

Conclusion and Next Steps

This guide has outlined the core components for building a decentralized identity system for data provenance. The next step is to integrate these concepts into a production-ready application.

You now have a functional blueprint for a Decentralized Identifier (DID) system using Verifiable Credentials (VCs). The core architecture involves: - An issuer (e.g., a sensor or API) signing data with a private key to create a VC. - Storing the credential's cryptographic proof (like a digital signature) on-chain for immutable timestamping, while keeping the data payload off-chain. - A verifier using the issuer's public DID document to cryptographically verify the data's origin and integrity. This model ensures data provenance without relying on a central authority.

For a production implementation, consider these next steps. First, choose a DID method suited to your chain, such as did:ethr for EVM chains or did:ion for scalable Sidetree-based systems. Integrate a VC library like veramo or daf to handle credential issuance and verification logic. Your smart contract for anchoring proofs should be gas-optimized, emitting events with credential hashes (e.g., keccak256(credentialProof)) rather than storing full data. Implement revocation registries using smart contracts or the W3C Status List 2021 standard to manage credential status.

Finally, design the user experience. Build a holder wallet (a web or mobile app) where users can store their VCs. Create a verifier portal where third parties can submit a VC id to check its on-chain proof and issuer signature. For advanced use, explore Zero-Knowledge Proofs (ZKPs) using frameworks like Circom or SnarkJS to allow verification of data properties (e.g., "temperature > 100") without revealing the raw data itself, enhancing privacy. Start with a pilot on a testnet like Sepolia or Polygon Amoy before mainnet deployment to refine security and usability.

How to Implement Decentralized Identity for Data Provenance | ChainScore Guides