Why The Graph Will Fragment in a Modular Blockchain World

introduction

THE DATA FRAGMENTATION PROBLEM

Introduction

Modular blockchains solve scaling but create a new, critical bottleneck: fragmented and inaccessible data.

Modular architectures fragment data. Separating execution, settlement, and data availability across layers like Arbitrum, Celestia, and EigenDA breaks the monolithic database model, making holistic data queries impossible for a single node.

The indexing layer is now critical infrastructure. Applications need a unified view across rollups and chains, transforming projects like The Graph and Substreams from optional tools into mandatory data plumbing for user-facing apps.

Real-time indexing defines performance. The 12-second block time of Ethereum is a latency ceiling; users expect sub-second updates, forcing indexers to process streams from Espresso's fast finality or Avail's data availability layer directly.

thesis-statement

THE DATA LAYER

The Core Argument: The Indexing Stack Must Modularize

Monolithic indexing is a scaling bottleneck; the future is a modular stack of specialized components.

Monolithic indexing architectures fail because they bundle data ingestion, processing, and querying. This creates a single point of failure and prevents scaling individual components, as seen with The Graph's subgraph syncing delays during high-throughput events.

Modularization separates concerns into distinct layers: a data availability layer (Celestia, EigenDA), a compute/execution layer (RISC Zero, Jolt), and a query layer. This mirrors the L2 scaling playbook, applying it to the data access problem.

Specialization unlocks performance. A dedicated proving layer for indexing logic, like using RISC Zero, allows verifiable computation. A separate query engine can then serve cached, proven results at sub-second latency without re-executing logic.

The precedent is established. Just as rollups separated execution from consensus, the indexing stack must follow. Protocols like Hyperliquid and dYdX v4 building their own app-chains prove the demand for sovereign, performant data access.

key-trends

THE MODULAR DATA CHALLENGE

Key Trends Driving Indexing Fragmentation

As monolithic chains shatter into specialized layers (execution, settlement, data availability), the unified data index is dead. Here's what's replacing it.

The L2 Data Silos Problem

Every new L2 (Arbitrum, Optimism, Base) is a new API. Querying cross-chain state requires stitching data from dozens of isolated RPC endpoints. This kills developer velocity and user experience.

Result: Developers spend >40% of time on data plumbing, not product logic.
Trend: The number of active L2s/L3s is projected to grow from ~50 to 500+ in 2 years.

50+

Active L2s/L3s

>40%

Dev Time Wasted

The Specialized Data Appchain

Protocols are spinning up dedicated chains for indexing (e.g., The Graph's L2, Subsquid) to escape mainnet congestion and cost. This moves indexing from a shared resource to a sovereign, performance-optimized service.

Benefit: Sub-second finality for index updates vs. Ethereum's ~12 seconds.
Trade-off: Introduces new trust assumptions and data availability coordination layers.

<1s

Index Finality

-90%

Query Cost

The Rise of Intent-Centric Queries

Users and dApps don't want raw logs; they want answers. Systems like Goldsky and Covalent are shifting from "fetch block X" to "show me the best liquidity route" or "calculate my ROI." This requires indexing layers to embed business logic.

Driver: The success of intent-based architectures in DeFi (UniswapX, CowSwap).
Outcome: Indexers become verifiable compute engines, not simple databases.

10x

Query Complexity

~500ms

Response SLA

The Verifiability Gap

Centralized indexers (Alchemy, Infura) are a single point of failure and trust. The future is cryptographically verifiable indexing proofs, where the data's correctness can be checked on-chain. This is the core thesis behind Brevis, Herodotus, and Lagrange.

Mechanism: Using ZK proofs or optimistic fraud proofs for state commitments.
Impact: Enables trust-minimized cross-chain apps and on-chain automation.

100%

Data Verifiability

Trust Assumption

The Modular Stack Tax

In a modular world (Celestia, EigenDA, Arbitrum Orbit), data is published to one layer, settled on another, and executed elsewhere. Indexers must now aggregate and reconcile state from 3+ independent networks, each with its own latency and cost profile.

Complexity: A single user transaction can generate events across multiple DA layers and settlement chains.
Consequence: Indexing infrastructure must become as modular as the chains it serves.

Networks to Query

+300%

Data Pipeline Complexity

The Real-Time Finance Imperative

DeFi, gaming, and on-chain social demand sub-second data freshness. Traditional blockchain indexing, which waits for finality, is too slow. This forces a shift to speculative indexing based on mempool data and fast finality layers (like Solana or high-performance L2s).

Benchmark: <100ms update latency for perpetual swaps and prediction markets.
Players: Pyth Network for prices, Clockwork for automation, and custom solutions for high-frequency dApps.

<100ms

Update Latency

$10B+

TVL Dependent

FEATURED SNIPPETS

The Cost of Universality: Indexing Latency & Cost Matrix

A first-principles comparison of data indexing architectures, quantifying the trade-offs between universal coverage and specialized performance.

Core Metric / Capability	Universal Indexer (The Graph)	Specialized RPC (Alchemy, QuickNode)	Application-Specific Indexer (dYdX, Uniswap)
Indexing Latency (Block to Query)	2-12 seconds	< 1 second	< 500 milliseconds
Cost per 1M Queries (Approx.)	$5-15	$50-200	$0 (Sunk Dev Cost)
Multi-Chain Coverage (EVM, Solana, Cosmos)
Subgraph Deployment & Maintenance
Guaranteed State Consistency
Custom Business Logic at Index Layer
Time to New Chain Integration	Weeks (Subgraph Dev)	Days (RPC Node Spin-up)	Months (Full Stack Dev)
Protocol Example	The Graph, Goldsky	Alchemy, QuickNode, Chainstack	dYdX v4, Uniswap Labs API

deep-dive

THE DATA LAYER

Deep Dive: The Technical Inevitability of Fragmentation

Modular architecture fragments application state, making traditional indexers obsolete and creating a new market for decentralized data infrastructure.

Fragmentation is a feature of modular blockchains. Separating execution from consensus and data availability forces application logic to span multiple specialized layers like Arbitrum, Celestia, and EigenDA. This architectural shift breaks the monolithic database model where a single node indexes all state.

Traditional indexers like The Graph fail in this environment. Their subgraph model assumes a single, queryable chain. A modular app's state exists across rollups, DA layers, and co-processors, creating a coordination problem that monolithic indexers cannot solve.

The solution is a new data mesh. Indexing becomes a network of specialized adapters—one for each execution environment and data availability layer. Projects like Subsquid and Goldsky are building this, treating each rollup as a distinct data source to be aggregated.

This creates a market for data proofs. Simply aggregating data is insufficient; users need cryptographic guarantees of correctness across domains. Future indexers will integrate zk-proofs or optimistic verification to become trust-minimized data oracles for cross-chain state.

Evidence: The total value locked across the top 10 rollups exceeds $20B, but no existing indexer provides a unified view of liquidity and positions across Arbitrum, Optimism, and Base. This gap defines the product-market fit.

protocol-spotlight

THE FUTURE OF DATA INDEXING

Protocol Spotlight: Early Movers in the New Stack

As blockchains fragment into modular layers, the old query model is breaking. These protocols are building the new data infrastructure.

The Graph: From Monolith to Supernet

The incumbent is pivoting from a monolithic L1 indexer to a network of application-specific subgraphs (Substreams) on a dedicated L2 rollup. This modularizes indexing logic, enabling real-time data streams and massive parallelization.\n- Key Benefit: Unlocks sub-second latency for high-frequency dApps like perps.\n- Key Benefit: Cost predictability via rollup-based execution, decoupling from mainnet gas.

~500ms

Stream Latency

30k+

Subgraphs

Goldsky: The Real-Time Data Firehose

Built on Substreams, Goldsky bypasses traditional RPC polling to deliver real-time event streams directly to applications. It's the infrastructure for the intent-based future, powering UX for protocols like UniswapX and CowSwap.\n- Key Benefit: Sub-100ms data delivery, enabling instant UI updates.\n- Key Benefit: Declarative data pipelines that developers configure, not code.

<100ms

Event Delivery

10x

Dev Speed

The Problem: RPCs Are Not Indexers

Standard JSON-RPC endpoints are state-query machines, not designed for complex historical queries or aggregations. Asking an RPC for "all Uniswap swaps by wallet X" is like using a screwdriver as a hammer.\n- Consequence: DApp frontends become bloated, performing client-side aggregation which fails at scale.\n- Consequence: Creates centralization pressure as teams rely on a single Infura/Alchemy node for complex logic.

10k+

RPC Calls/UI

5-10s

UI Lag

The Solution: Decoupled Execution & Proving

The new stack separates data ingestion, computation, and proof generation. Protocols like EigenLayer AVS (e.g., Hyperbolic) and Risc Zero allow indexers to prove their query results are correct without re-executing every block.\n- Key Benefit: Verifiable data APIs that apps can trust without running a node.\n- Key Benefit: Horizontal scaling of indexing workloads across cheap cloud compute.

-90%

Compute Cost

ZK-Proofs

Verification

Storage Layers Are The New Source of Truth

With data availability layers like Celestia, EigenDA, and Avail, the canonical chain is no longer the primary data source. Indexers must now ingest from multiple DA layers and rollups, creating a multi-chain indexing problem.\n- Key Benefit: Enables universal data queries across any modular chain.\n- Key Benefit: Future-proofs infrastructure against the rise of sovereign rollups.

10+

DA Sources

Unlimited

Rollup Scale

Who Wins? The Orchestrator

The winning protocol won't just index faster; it will orchestrate a marketplace of specialized indexers. Think Across Protocol but for data, routing queries to the optimal indexer (Goldsky for speed, The Graph for breadth, a ZK-prover for security).\n- Key Benefit: Best-in-class performance for every query type via intelligent routing.\n- Key Benefit: Economic efficiency through competitive indexing markets, not fixed staking.

Marketplace

Model

Intent-Based

Routing

counter-argument

THE LEGACY CONSTRAINT

Counter-Argument: Can't The Graph Just Adapt?

The Graph's monolithic architecture is fundamentally misaligned with the modular execution and data availability demands of modern blockchains.

The Graph's monolithic architecture is its core constraint. Its design assumes a single, unified data source, which is incompatible with the fragmented data availability landscape of rollups and Layer 2s like Arbitrum and Optimism.

Adapting requires a full-stack rebuild. To index from Celestia or EigenDA, The Graph must re-architect its node software, consensus, and economic model. This is a multi-year engineering challenge, not a simple upgrade.

The economic model breaks. Indexers stake on Ethereum mainnet but must pay for data from external DA layers. This creates a capital efficiency and settlement mismatch that native, chain-specific indexers avoid entirely.

Evidence: Market share erosion. Emerging chains like Solana and Sui are building their own indexing stacks (e.g., Sui Move Analyzer). The Graph's subgraph deployment growth on new L2s lags behind its Ethereum mainnet dominance.

risk-analysis

THE FRAGMENTATION TRAP

Risk Analysis: What Could Go Wrong?

Modular blockchains solve scaling but create a data indexing nightmare for applications.

The Data Availability Black Box

Indexers must now trust external Data Availability (DA) layers like Celestia, EigenDA, or Avail. If a DA layer censors or loses data, the indexer's state becomes corrupted, breaking downstream applications. This creates systemic risk for protocols like The Graph or Subsquid that rely on historical data integrity.

Risk: Unrecoverable state forks from DA failures.
Impact: Breaks DeFi oracles and NFT provenance.

100%

State Corruption

~2-3s

DA Finality Lag

Cross-Chain Indexing Latency Arbitrage

In a modular stack, finality is asynchronous across execution, settlement, and DA layers. Fast indexers reading from an execution layer (e.g., Arbitrum) could serve stale data before the settlement layer (e.g., Ethereum) confirms the rollup's proof. This opens a multi-layer MEV attack vector where arbitrage bots exploit timing gaps in indexed data feeds.

Risk: Front-running based on indexing speed differentials.
Vector: Exploits gap between L2 inclusion and L1 finality.

~12s

L2->L1 Delay

$M+

MEV Opportunity

The Interoperability Tax on Query Cost

Indexing a user's activity across Ethereum + 5 L2s + a DA layer requires aggregating data from multiple RPC endpoints and proving data consistency. This multiplies infrastructure costs and query complexity. Projects like Goldsky or Covalent face 10-100x cost inflation versus indexing a single chain, making real-time cross-chain queries economically non-viable for most dApps.

Risk: Indexing becomes a capital-intensive oligopoly.
Result: Kills long-tail dApp innovation.

10-100x

Cost Increase

Endpoints/User

Sequencer-Level Censorship

Centralized sequencers (e.g., in Optimism, Arbitrum) control transaction ordering and data publication. They can withhold or reorder transaction data before it reaches the DA layer, making it impossible for decentralized indexers to build a canonical timeline. This gives sequencers the power to manipulate indexed states for DeFi apps, akin to Flashbots on steroids.

Risk: Indexers see only the sequencer's curated reality.
Control Point: Single entity dictates historical record.

Central Point of Failure

100%

Data Control

future-outlook

THE MODULAR DATA STACK

Future Outlook: The 2024-2025 Indexing Landscape

Indexing infrastructure will fragment to serve specialized data needs across modular execution, settlement, and DA layers.

Specialized indexers win. Generic The Graph subgraphs fail for high-throughput rollups and novel VM states. Projects like EigenLayer AVSs and Risc Zero will index and prove specific data streams, creating a market for verifiable off-chain compute.

Data availability dictates architecture. Indexers for Celestia or EigenDA require different sync patterns than Ethereum archival nodes. This creates layer-specific tooling, fragmenting the monolithic indexer model into a composable service mesh.

The query layer commoditizes. Competition between The Graph, Subsquid, and Goldsky pushes cost to zero. Value accrues to the proving layer—services that generate ZK proofs of query correctness become the premium product.

Evidence: The Graph's indexing time for a new chain like Base often lags by weeks, while dedicated RPC providers like Alchemy serve custom data in real-time, proving the demand for specialization.

takeaways

THE FUTURE OF DATA INDEXING

Key Takeaways for Builders and Investors

In a modular world where execution, settlement, and data availability are disaggregated, the indexing layer becomes the critical abstraction for application logic.

The Graph's Subgraphs Are a Legacy Monolith

Subgraphs are monolithic, indexing-specific smart contracts that must be redeployed for every chain. This creates fragmented data silos and ~$100k+ annual costs for multi-chain dApps.\n- Problem: No cross-chain querying, forcing developers to manage N+1 subgraphs.\n- Solution: Next-gen indexers like Goldsky and Subsquid use a schema-first, multi-chain data lake approach, enabling a single query across Ethereum, Arbitrum, and Polygon.

-80%

Dev Ops

10+

Chains Indexed

Intent-Based Queries Will Eat Batch Processing

Traditional indexing is a push model: index everything, filter later. This wastes ~70% of compute on unused data.\n- Problem: Inefficient for real-time, user-specific intents (e.g., "find my best liquidity route").\n- Solution: PropellerHeads and RISC Zero are pioneering ZK-proof-based query engines. Users submit an intent, and a prover generates a ZK-proof of the query result in ~2 seconds, consuming only the necessary data.

100x

Efficiency Gain

~2s

Proof Time

The Indexer is the New RPC

As applications demand richer data (historical states, event correlations), simple JSON-RPC calls are insufficient. The indexing layer becomes the primary data gateway.\n- Problem: RPCs like Alchemy and Infura offer low-level chain data, not application-ready abstractions.\n- Solution: Indexers like Covalent and Blockpour provide unified APIs that return structured, business-logic-ready data, abstracting away the underlying Celestia, EigenDA, or Avail data availability layer.

1 API

All Chains

500ms

P95 Latency

Indexing is the Ultimate MEV Surface

Who controls the indexer controls the data lens—and the arbitrage opportunities. Fast, proprietary indexing is a competitive moat for DeFi protocols.\n- Problem: Public indexers create a level playing field, revealing opportunities to everyone simultaneously.\n- Solution: Protocols like Uniswap (with UniswapX) and Aave are building internal, ultra-low-latency indexers to power their own intent-based systems, capturing ~$1B+ in annual MEV that would otherwise leak to searchers.

$1B+

MEV Captured

<100ms

Alpha Window

Decentralization is a Scaling Trade-Off

Fully decentralized indexing networks (e.g., The Graph) suffer from higher latency and cost volatility due to tokenomics and coordination overhead.\n- Problem: ~5-10 second query latency is unacceptable for high-frequency trading or gaming applications.\n- Solution: Hybrid models are winning: use a centralized, performant indexer for real-time queries (Goldsky, Subsquid) and a decentralized network for censorship-resistant archival data and verification, similar to the Ethereum execution and EigenLayer restaking security model.

5-10s

Decentralized Latency

<1s

Hybrid Latency

The Vertical Integration Play: Indexing-As-A-Settlement

The logical endpoint is for rollups and app-chains to bundle a native indexer as part of their state transition function.\n- Problem: External indexers add latency and are a trust assumption outside the chain's security model.\n- Solution: Fuel Network with its native Sway language and Movement Labs with MoveVM are architecting state models where indexing is a first-class primitive. This enables native intent matching and ~$0.001 query costs baked into transaction fees.

$0.001

Query Cost

Native

To VM

The Future of Data Indexing in a Modular Blockchain World

Introduction

The Core Argument: The Indexing Stack Must Modularize

Key Trends Driving Indexing Fragmentation

The L2 Data Silos Problem

The Specialized Data Appchain

The Rise of Intent-Centric Queries

The Verifiability Gap

The Modular Stack Tax

The Real-Time Finance Imperative

The Cost of Universality: Indexing Latency & Cost Matrix

Deep Dive: The Technical Inevitability of Fragmentation

Protocol Spotlight: Early Movers in the New Stack

The Graph: From Monolith to Supernet

Goldsky: The Real-Time Data Firehose

The Problem: RPCs Are Not Indexers

The Solution: Decoupled Execution & Proving

Storage Layers Are The New Source of Truth

Who Wins? The Orchestrator

Counter-Argument: Can't The Graph Just Adapt?

Risk Analysis: What Could Go Wrong?

The Data Availability Black Box

Cross-Chain Indexing Latency Arbitrage

The Interoperability Tax on Query Cost

Sequencer-Level Censorship

Future Outlook: The 2024-2025 Indexing Landscape

Key Takeaways for Builders and Investors

The Graph's Subgraphs Are a Legacy Monolith

Intent-Based Queries Will Eat Batch Processing

The Indexer is the New RPC

Indexing is the Ultimate MEV Surface

Decentralization is a Scaling Trade-Off

The Vertical Integration Play: Indexing-As-A-Settlement

Get a free quote.

Get In Touch
today.

The Future of Data Indexing in a Modular Blockchain World

Introduction

The Core Argument: The Indexing Stack Must Modularize

Key Trends Driving Indexing Fragmentation

The L2 Data Silos Problem

The Specialized Data Appchain

The Rise of Intent-Centric Queries

The Verifiability Gap

The Modular Stack Tax

The Real-Time Finance Imperative

The Cost of Universality: Indexing Latency & Cost Matrix

Deep Dive: The Technical Inevitability of Fragmentation

Protocol Spotlight: Early Movers in the New Stack

The Graph: From Monolith to Supernet

Goldsky: The Real-Time Data Firehose

The Problem: RPCs Are Not Indexers

The Solution: Decoupled Execution & Proving

Storage Layers Are The New Source of Truth

Who Wins? The Orchestrator

Counter-Argument: Can't The Graph Just Adapt?

Risk Analysis: What Could Go Wrong?

The Data Availability Black Box

Cross-Chain Indexing Latency Arbitrage

The Interoperability Tax on Query Cost

Sequencer-Level Censorship

Future Outlook: The 2024-2025 Indexing Landscape

Key Takeaways for Builders and Investors

The Graph's Subgraphs Are a Legacy Monolith

Intent-Based Queries Will Eat Batch Processing

The Indexer is the New RPC

Indexing is the Ultimate MEV Surface

Decentralization is a Scaling Trade-Off

The Vertical Integration Play: Indexing-As-A-Settlement

Get In Touch today.

Get In Touch
today.