Graph Node excels at providing a robust, battle-tested query layer for decentralized applications. Its mature ecosystem, with over 2,000 subgraphs deployed and a proven track record powering major protocols like Uniswap and Aave, offers reliability and developer familiarity. The GraphQL API provides a flexible interface for complex queries, making it ideal for applications that require rich, relational data exploration directly from a frontend. However, this power comes with operational overhead, as running a performant indexer requires significant infrastructure management.
Graph Node vs Substreams: Choosing The Graph's Indexing Runtime
Introduction: The Evolution of On-Chain Indexing
A data-driven comparison of The Graph's Graph Node and StreamingFast's Substreams, the two dominant paradigms for building and querying blockchain data.
Substreams takes a fundamentally different approach by treating blockchain data as a high-performance stream. This architecture enables massive parallelization, allowing developers to process historical data orders of magnitude faster—often in hours instead of weeks for full-chain indexing. By outputting to sinks like a SQL database or a Firehose data lake, it decouples indexing from serving, offering superior flexibility for data pipelines. The trade-off is a steeper initial learning curve and a less mature ecosystem of pre-built modules compared to The Graph's subgraph marketplace.
The key trade-off: If your priority is developer velocity and a proven, end-to-end query solution for a dApp, choose Graph Node. If you prioritize massive-scale data processing, custom ETL pipelines, or real-time analytics and can handle more initial setup, choose Substreams.
TL;DR: Core Differentiators
Key architectural strengths and trade-offs at a glance.
The Graph Node: Maturity & Ecosystem
Established Decentralized Network: Over 600+ Indexers securing subgraphs for protocols like Uniswap, Aave, and Compound. This matters for production applications requiring battle-tested reliability and decentralized data provenance.
Substreams: Unmatched Speed & Scale
Parallelized Firehose Architecture: Processes blockchain data in a single, high-throughput stream. Enables sub-second indexing for new blocks. This matters for high-frequency dashboards, real-time arbitrage bots, and processing entire chain history in hours, not weeks.
Head-to-Head Feature Comparison
Direct comparison of key architectural and operational metrics for blockchain indexing solutions.
| Metric | The Graph (Graph Node) | Substreams |
|---|---|---|
Data Processing Throughput | ~1,000 blocks/hour | ~100,000 blocks/hour |
Indexing Latency (Block to Query) | ~1-5 minutes | ~1-5 seconds |
Data Output Format | GraphQL API | Protobuf streams |
Parallel Processing | ||
Incremental Data Updates | ||
Primary Use Case | Application APIs (dApps) | High-frequency analytics, backfills |
Deployment Model | Decentralized Network / Self-hosted | Self-hosted / Firehose |
Graph Node vs Substreams
Key strengths and trade-offs for building and maintaining blockchain data pipelines.
Graph Node: Maturity & Ecosystem
Established Production Standard: Powers over 4,000 subgraphs and $50B+ in DeFi TVL (Uniswap, Aave, Lido). This matters for teams requiring a battle-tested, audited indexing solution with a vast library of existing subgraphs to fork and build upon.
Substreams: Performance & Scale
Parallelized, Firehose-Based: Processes blockchain data as a stream, enabling >100k blocks/sec indexing speeds. This matters for high-frequency data use cases like real-time analytics, arbitrage bots, or indexing entire chains (e.g., Ethereum full history) in hours, not weeks.
Graph Node: Limitations
Synchronous Bottlenecks: Processes blocks sequentially, leading to slower sync times for deep history. Operational Overhead: Requires managing a Postgres database and syncer/indexer services. This matters for projects needing ultra-fast historical backfills or wanting a serverless, low-ops data layer.
Substreams: Limitations
Steeper Learning Curve: Requires knowledge of Protocol Buffers, Rust, and stream processing concepts. Younger Ecosystem: Fewer pre-built modules compared to subgraphs. This matters for smaller teams prioritizing speed-to-market over ultimate performance or those lacking Rust expertise.
Substreams: Strengths and Limitations
Key architectural differences and trade-offs for blockchain indexing, from development velocity to production scaling.
Graph Node: Developer Experience
Declarative Mapping: Developers define entities and event handlers in a high-level language (AssemblyScript/TypeScript). This matters for rapid prototyping and teams familiar with web2 development patterns. The Graph Studio provides a managed build/test/deploy pipeline, reducing initial DevOps overhead.
Substreams: Data Composability
Modular Data Pipeline: Substreams packages (modules) can be chained and reused, turning raw blocks into refined data streams. This matters for complex data transformations and teams building interdependent data products. A single Substreams execution can feed multiple downstream services (e.g., a database and a real-time API).
Graph Node: Limitations
Synchronous Processing: Indexing is block-by-block, which can lead to hours of sync time for new subgraphs on long chains. This matters for rapid iteration or indexing entire chain history. Complex subgraphs can hit performance ceilings, requiring manual optimization or sharding.
Substreams: Limitations
Steeper Learning Curve: Requires knowledge of Rust and protocol buffer schemas. This matters for broader engineering teams without systems programming expertise. The ecosystem of pre-built modules is younger than The Graph's subgraph library, potentially requiring more custom development.
Decision Framework: When to Use Each
The Graph Node for Protocol Architects
Verdict: The established standard for production-ready, complex subgraphs. Strengths: Battle-tested with a massive ecosystem of existing subgraphs (e.g., Uniswap, Aave, Lido). It offers a mature, declarative GraphQL API that is ideal for frontend applications and public data services. The hosted service (now transitioning to The Graph Network) provides a decentralized marketplace for indexers. Use it when you need a robust, community-audited data layer with predictable query patterns and extensive tooling like Subgraph Studio.
Substreams for Protocol Architects
Verdict: The high-performance engine for real-time data pipelines and custom business logic. Strengths: Unmatched speed and efficiency for processing historical and real-time blockchain data. Its modular, Rust-based streaming architecture allows for powerful data transformations and sinks to multiple destinations (e.g., gRPC streams, PostgreSQL, S3). It's superior for building proprietary analytics, MEV strategies, or low-latency dashboards where you control the entire pipeline. Choose Substreams for high-throughput, custom ETL jobs that Graph Node's GraphQL layer cannot efficiently serve.
Technical Deep Dive: Architecture and Data Flow
A technical comparison of The Graph's Graph Node and StreamingFast's Substreams, focusing on their underlying architectures, data processing models, and implications for developers building indexing solutions.
The core difference is batch processing versus streaming. Graph Node uses a pull-based, batch-oriented architecture where subgraphs sync historical data in discrete blocks, processing them sequentially. Substreams uses a push-based, streaming-first architecture that processes blockchain data as a continuous firehose, enabling parallel execution and real-time outputs. This fundamental shift allows Substreams to handle data transformations as independent, composable modules.
Final Verdict and Strategic Recommendation
Choosing between Graph Node and Substreams is a foundational decision that hinges on your protocol's data architecture and operational philosophy.
The Graph Node excels at providing a robust, general-purpose indexing layer for decentralized applications because of its mature ecosystem and battle-tested reliability. For example, it powers over 40% of the DeFi ecosystem, including major protocols like Uniswap and Aave, with subgraph deployments exceeding 1,000 on the hosted service. Its pull-based model and GraphQL API offer developers a familiar, flexible query interface for complex on-chain data.
Substreams takes a radically different approach by focusing on high-throughput, composable data streams. This results in a trade-off: you gain exceptional performance—capable of processing 100+ blocks per second for high-volume chains like Solana—and powerful modularity, but you must adopt a Firehose-first, push-based architecture and work primarily with gRPC/Protobuf. This paradigm is optimized for building new data products and backends, not for serving ad-hoc frontend queries.
The key trade-off: If your priority is serving rich, customizable data to a dApp frontend with a mature toolchain, choose The Graph Node. If you prioritize building high-performance, specialized data pipelines, indexers, or analytics engines where raw throughput and modularity are critical, choose Substreams. For many teams, the optimal strategy is a hybrid: using Substreams to efficiently transform and sink data into a structured store, which is then queried via a traditional API.
Get In Touch
today.
Our experts will offer a free quote and a 30min call to discuss your project.