DataModel: Definition in Decentralized Social Protocols

definition

BLOCKCHAIN GLOSSARY

What is a DataModel?

A formal specification defining the structure, relationships, and constraints of data within a system.

A DataModel is a formal specification that defines the structure, relationships, and constraints of data within a system. In blockchain contexts, it acts as the schema or blueprint for how data—such as transactions, smart contract states, and on-chain events—is organized, stored, and validated. It establishes the rules for data integrity and consistency, ensuring all nodes in a decentralized network interpret the ledger's state identically. This is distinct from a data structure, which is a specific implementation for organizing data in memory or storage.

The core components of a blockchain DataModel typically include entities (e.g., Account, Transaction, Block), their attributes (e.g., balance, timestamp, hash), and the relationships between them (e.g., a Transaction spends from an Output). Constraints, such as a cryptographic signature being valid for a specific input, are enforced by consensus rules. This model is often represented in code via class definitions in object-oriented languages or structs in languages like Rust or C, forming the basis for a node's internal state management.

For developers, a well-defined DataModel is critical for building reliable applications. It dictates how to query data (e.g., using an indexer), how to serialize data for network transmission or storage, and how to interpret low-level bytecode from a smart contract. For example, the Ethereum DataModel defines accounts as having fields for nonce, balance, storageRoot, and codeHash. Analysts and CTOs rely on these models to understand the system's capabilities, audit data flows, and design scalable infrastructure that correctly mirrors the chain's canonical state.

how-it-works

CORE CONCEPT

How a DataModel Works

A DataModel is the foundational schema that defines how raw blockchain data is structured, transformed, and made queryable for applications.

A DataModel is a declarative schema that defines the structure of transformed blockchain data, specifying entities, their properties, and the relationships between them for efficient querying. It acts as a blueprint, instructing an indexing engine on how to process raw, on-chain data—such as transaction logs and block headers—into a structured, application-ready format. This process, known as indexing, converts low-level hexadecimal data into high-level entities like User, TokenTransfer, or SwapEvent, which can be easily queried using GraphQL.

The core components of a DataModel are Entities and Fields. An Entity represents a distinct type of object you want to track, such as a Pool or a Vote. Each Entity is defined by its Fields, which are the specific data points or attributes of that object, like pool_address, total_liquidity, or timestamp. Crucially, Fields can define relationships, linking one Entity to another (e.g., a Transaction entity can have a from field that links to a User entity), creating a connected graph of data that mirrors real-world interactions.

To populate these Entities, a DataModel relies on Event Handlers or Block Handlers. These are functions mapped to specific on-chain events or blocks. When the indexing engine detects a matching Swap event on a DEX contract, for example, the corresponding handler executes. It extracts data from the event logs, performs any necessary calculations or transformations, and then creates or updates the relevant Entities in the database according to the DataModel's schema. This automated pipeline ensures the queriable database is always synchronized with the blockchain.

The primary output of a working DataModel is a GraphQL API. Once indexed, the structured data is exposed through a GraphQL endpoint, allowing developers to write precise, nested queries to fetch exactly the data their dApp needs. Instead of parsing raw transactions, an application can simply query for "all token swaps by a specific user in the last 24 hours" and receive a clean JSON response. This abstraction is what enables complex analytics dashboards, portfolio trackers, and advanced DeFi applications to be built efficiently on top of blockchain data.

In practice, creating a DataModel requires a deep understanding of the smart contract Application Binary Interface (ABI) and the business logic of the protocol being indexed. The modeler must decide which events are crucial, how to normalize data across different contracts, and how to structure relationships for optimal query performance. A well-designed DataModel balances completeness with efficiency, ensuring that common query patterns are fast while maintaining the flexibility to support unforeseen analytical needs as an application evolves.

key-features

ARCHITECTURE

Key Features of a DataModel

A DataModel is a structured schema that defines how on-chain data is organized, queried, and aggregated for analysis. It is the core abstraction powering Chainscore's analytics engine.

01

Schema Definition

A DataModel is defined by a schema that specifies the entities (e.g., User, Transaction, Pool) and their fields (e.g., address, amount, timestamp). This schema acts as a blueprint, mapping raw blockchain logs and traces into a structured, relational format that is optimized for analytical queries.

02

SQL Interface

The primary interface for querying a DataModel is SQL (Structured Query Language). Analysts write standard SQL SELECT statements against the defined entities and fields, abstracting away the complexity of interacting directly with low-level blockchain data structures like event logs. This enables complex joins, aggregations, and filters.

03

Incremental Computation

DataModels are designed for incremental computation. Instead of re-processing the entire blockchain history for each query, the system processes only new blocks, updating materialized views and aggregates. This is critical for maintaining performance and providing real-time or near-real-time analytics on fast-moving chains.

04

Declarative Logic

Transformation logic from raw data to the model is defined declaratively. Developers specify what the final data shape should be (e.g., "a table of token transfers") and the rules to derive it, rather than writing imperative code to process each block. This simplifies maintenance and ensures consistency.

05

Materialized Views

To optimize query performance, DataModels often rely on materialized views. These are pre-computed query results (like daily transaction volumes or user balances) that are persisted and updated incrementally. Queries run against these views, delivering sub-second latency for complex analytical questions.

06

Cross-Chain Abstraction

A core feature is the abstraction of chain-specific details. A well-designed DataModel for a concept like DEX swaps can have a unified schema, while the underlying transformation logic handles the differences between protocols on Ethereum, Arbitrum, or Solana. This allows for consistent cross-chain analysis.

examples

IMPLEMENTATIONS

Examples of DataModels

A DataModel is a structured schema defining how on-chain data is organized, queried, and served. These examples illustrate different architectural approaches and their primary use cases.

01

The Graph Subgraph

A decentralized indexing protocol for querying networks like Ethereum and IPFS. Developers define a subgraph manifest (subgraph.yaml) that specifies:

The smart contracts to index
The events to listen for
How to map event data to the GraphQL schema Data is served via a hosted service or a decentralized network of Indexers.

EXPLORE

02

Goldsky Streaming Query

A real-time streaming DataModel that transforms blockchain events into a live data feed. It uses SQL-like syntax to define transformations as data flows. Key features include:

Sub-second latency for new blocks
Continuous materialized views
Direct integration with data warehouses and applications This model is optimal for dashboards, alerts, and event-driven applications.

EXPLORE

03

Dune Analytics Spell

A community-built DataModel defined using SQL and Jinja templating within the Dune platform. Spells are modular datasets that clean and transform raw decoded blockchain data (like ethereum.transactions) into human-readable tables (like dex.trades). They form a curated abstraction layer powering all dashboards on the platform.

EXPLORE

04

Flipside Crypto ShroomDK

A SDK-first DataModel providing direct SQL access to indexed, query-ready blockchain data via a REST API. Developers use the SDK to programmatically build and execute queries against standardized tables, bypassing a GUI. It emphasizes reproducibility and version control for analytics pipelines.

EXPLORE

05

Footprint Analytics Data Dictionary

A pre-built, unified DataModel that abstracts raw chain data into business-ready tables across 30+ blockchains. It provides consistent schemas for core concepts like nft_mints, dex_swaps, and bridge_transactions. This model reduces time-to-insight by eliminating the need for manual event decoding and joining across contracts.

EXPLORE

06

Chainscore Indexing Pipeline

A configurable real-time ETL DataModel defined via a YAML configuration file. It specifies data sources, transformations, and destinations. The pipeline:

Ingests raw block data or event streams
Transforms it using JavaScript functions
Loads results into a destination like PostgreSQL or Kafka This model offers granular control for custom indexing logic.

composedb-context

DEFINITION

DataModels in ComposeDB

A DataModel is the core schema definition that structures and governs the creation, querying, and composability of user-owned data on the Ceramic network.

A DataModel is a GraphQL-based schema that defines a specific type of structured data, such as a user profile, a blog post, or a social graph connection, within the ComposeDB protocol. It acts as a blueprint, specifying the fields, data types, and relationships for a composable data stream. Each DataModel is assigned a globally unique StreamID on the Ceramic network, making it a portable, reusable, and interoperable component that different applications can read from and write to, forming the foundation of a decentralized data ecosystem.

The structure of a DataModel is defined using GraphQL Schema Definition Language (SDL), enhanced with custom ComposeDB directives like @createModel and @accountReference. This allows developers to specify not only the data fields but also critical permissions, such as which decentralized identifiers (DIDs) are authorized to create or update records. By enforcing these rules at the protocol level, DataModels ensure data integrity and user sovereignty, as control remains with the entity that holds the signing keys for the authorized DID.

A key innovation of DataModels is their inherent composability. Applications are not monolithic databases but are assembled from multiple, interoperable DataModels. For example, a social media app might compose independent DataModels for UserProfile, Post, and Like. This allows a user's profile data created in one app to be seamlessly read and utilized by another, enabling true user-centric data portability and unlocking network effects across the decentralized web.

From a technical perspective, each instance of data created from a DataModel is a ComposeDB Runtime Model Instance. These instances are stored as CIP-25 Streams on Ceramic, which are mutable data structures updated via signed commits. The DataModel's StreamID is used as the family or controller for these instances, grouping them logically on the network and enabling efficient indexing and querying through ComposeDB's GraphQL API.

In practice, using a DataModel involves two main steps: definition and deployment. A developer writes the schema, then uses the ComposeDB CLI to deploy it to a Ceramic node, which registers it on the network. Once deployed, client applications can use the model's GraphQL API to create, query, and update data instances, with all interactions cryptographically verified and permissioned according to the schema's rules, ensuring a secure and consistent data layer.

ARCHITECTURAL COMPARISON

DataModel vs. Traditional Database Schema

Key differences between a blockchain-native DataModel and a conventional relational database schema.

Feature	DataModel (Blockchain-Native)	Traditional Database Schema (RDBMS)
Primary Purpose	Defines on-chain state structure and business logic	Defines storage structure for application data
Data Mutability
Access Control Logic	Embedded in the model via smart contracts	Managed externally by the application layer
State Transition Validation	Enforced by consensus and smart contract code	Enforced by application logic and database constraints
Data Provenance	Immutable, cryptographically verifiable history	Mutable, typically no built-in cryptographic audit trail
Query Language	Contract calls, event indexing, subgraphs	SQL (Structured Query Language)
Data Ownership	Users (via private keys)	Application or database administrator
Typical Latency for Writes	2-60 seconds (block confirmation)	< 1 second

ecosystem-usage

DATA MODEL

Ecosystem Usage & Protocols

A Data Model is a formal specification of the structure, relationships, and constraints of data within a system. In blockchain, it defines how on-chain state is organized, accessed, and updated.

01

Account-Based vs. UTXO

The two primary data models for tracking ownership. Account-based models (e.g., Ethereum) maintain a global state of account balances and smart contract storage. UTXO (Unspent Transaction Output) models (e.g., Bitcoin) treat assets as discrete, chainable outputs that are consumed to create new ones. The choice impacts transaction design, privacy, and scalability.

02

State Trie & Merkle Proofs

A core data structure for verifiable state. Ethereum uses a Merkle Patricia Trie to store all accounts, balances, and contract data. This cryptographic tree allows any node to generate a concise Merkle proof that a specific piece of data (e.g., a user's balance) is part of the current, agreed-upon state without needing the entire dataset.

03

Event Logs & Indexing

A mechanism for off-chain data access. Smart contracts emit structured event logs during execution, which are stored in a bloom-filtered, indexed data structure on-chain. These logs are not directly queryable by contracts but are the primary source for off-chain indexers (like The Graph) to build queryable databases of historical contract activity.

04

Storage Layout in EVM

The specific data model for smart contract storage. In the EVM, each contract has a persistent storage area, a key-value store mapping 256-bit words to 256-bit words. Variables are packed according to the Solidity ABI specification. Understanding this layout is crucial for low-level operations, gas optimization, and building state access tools.

05

Protocol-Specific Models (e.g., Cosmos SDK)

Frameworks that define higher-level data models. The Cosmos SDK uses an object-capability model built on IAVL+ trees, where the application state is a multi-store. This model explicitly separates modules (like banking, staking) and their data, enforcing clear boundaries and enabling composable blockchain development.

EXPLORE

06

Data Availability & Sampling

A model for scaling via data separation. Data Availability (DA) layers, like Celestia or Ethereum's danksharding, decouple the consensus on transaction ordering from the availability of the underlying data. Light clients use Data Availability Sampling (DAS) to probabilistically verify that all data for a block is published, without downloading it entirely.

developer-benefits

DATAMODEL

Developer Benefits

The Chainscore DataModel provides a structured, composable framework for analyzing on-chain activity, enabling developers to build powerful analytics and risk applications.

01

Composable Entity Framework

The DataModel treats on-chain actors—wallets, smart contracts, and protocols—as first-class entities with defined relationships. This allows developers to build complex queries by chaining entities, such as tracing all interactions from a wallet to a specific DeFi protocol, without writing custom parsing logic for each chain.

02

Standardized Risk & Behavior Signals

Pre-computed metrics like wallet age, transaction volume, gas spending patterns, and counterparty diversity are standardized across the model. Developers can instantly assess user behavior and risk profiles, enabling features like:

Sybil resistance for airdrops
Creditworthiness scoring for undercollateralized lending
Anomaly detection for security monitoring

03

Cross-Chain Abstraction

The DataModel normalizes semantic differences between blockchains (e.g., EVM vs. Solana account models). A query for "NFT mints by a wallet" returns consistent results whether the activity occurred on Ethereum, Polygon, or another supported chain, drastically reducing integration complexity for multi-chain applications.

04

Real-Time Event Stream

Beyond historical analysis, the model provides a real-time stream of structured events—token transfers, contract calls, and liquidity changes. Developers can subscribe to specific event patterns (e.g., large withdrawals from a lending pool) to trigger alerts, update dashboards, or execute logic in their dApps with sub-second latency.

05

Reduced Data Engineering Overhead

By providing cleaned, indexed, and relationally linked on-chain data, the DataModel eliminates the need for teams to:

Build and maintain indexers for raw blockchain data
Develop ETL pipelines to transform log data
Create entity resolution systems to link addresses to real-world actors This allows developers to focus on application logic rather than data infrastructure.

06

Enhanced Query Performance

The structured schema is optimized for analytical queries, enabling complex joins and aggregations that would be prohibitively slow on raw blockchain data. Developers can perform cohort analysis, calculate total value locked (TVL) trends, or identify top traders across protocols with performance measured in milliseconds, not minutes.

DATAMODEL

Frequently Asked Questions

A data model defines the structure, relationships, and constraints of data within a system. In blockchain, it specifies how state, transactions, and accounts are organized and validated.

A data model in blockchain is the formal specification of how the system's state, transactions, and accounts are structured, related, and validated. It defines the core entities (like accounts, balances, smart contract storage) and the rules governing state transitions. For example, Ethereum's data model is based on accounts (Externally Owned and Contract) that hold state, while Bitcoin's uses the Unspent Transaction Output (UTXO) model, tracking discrete pieces of unspent currency. The data model is enforced by the network's consensus rules, ensuring all nodes maintain a consistent view of the ledger's state.

DataModel

What is a DataModel?

How a DataModel Works

Key Features of a DataModel

Schema Definition

SQL Interface

Incremental Computation

Declarative Logic

Materialized Views

Cross-Chain Abstraction

Examples of DataModels

The Graph Subgraph

Goldsky Streaming Query

Dune Analytics Spell

Flipside Crypto ShroomDK

Footprint Analytics Data Dictionary

Chainscore Indexing Pipeline

DataModels in ComposeDB

DataModel vs. Traditional Database Schema

Ecosystem Usage & Protocols

Account-Based vs. UTXO

State Trie & Merkle Proofs

Event Logs & Indexing

Storage Layout in EVM

Protocol-Specific Models (e.g., Cosmos SDK)

Data Availability & Sampling

Developer Benefits

Composable Entity Framework

Standardized Risk & Behavior Signals

Cross-Chain Abstraction

Real-Time Event Stream

Reduced Data Engineering Overhead

Enhanced Query Performance

Frequently Asked Questions

State Trie

Get a free quote.

Get In Touch
today.

DataModel

What is a DataModel?

How a DataModel Works

Key Features of a DataModel

Schema Definition

SQL Interface

Incremental Computation

Declarative Logic

Materialized Views

Cross-Chain Abstraction

Examples of DataModels

The Graph Subgraph

Goldsky Streaming Query

Dune Analytics Spell

Flipside Crypto ShroomDK

Footprint Analytics Data Dictionary

Chainscore Indexing Pipeline

DataModels in ComposeDB

DataModel vs. Traditional Database Schema

Ecosystem Usage & Protocols

Account-Based vs. UTXO

State Trie & Merkle Proofs

Event Logs & Indexing

Storage Layout in EVM

Protocol-Specific Models (e.g., Cosmos SDK)

Data Availability & Sampling

Developer Benefits

Composable Entity Framework

Standardized Risk & Behavior Signals

Cross-Chain Abstraction

Real-Time Event Stream

Reduced Data Engineering Overhead

Enhanced Query Performance

Frequently Asked Questions

Related Terms

State Trie

UTXO Model

Account-Based Model

Block Header

Receipt Trie

World State

Get In Touch today.

Get In Touch
today.