Rollups are complex systems where sequencers, provers, and data availability layers must operate in sync. Without proper observability, issues like transaction backlogs, proof generation delays, or data availability failures can go unnoticed, leading to degraded performance or security risks. A dedicated monitoring dashboard provides a single pane of glass to track these critical metrics, enabling proactive maintenance and informed decision-making. This guide walks through building a dashboard using open-source tools like Prometheus, Grafana, and custom data collectors.
Setting Up a Rollup Monitoring and Analytics Dashboard
Setting Up a Rollup Monitoring and Analytics Dashboard
A comprehensive guide to building a real-time dashboard for tracking the health, performance, and security of your rollup.
The foundation of any monitoring system is metrics collection. For a rollup, you need to gather data from multiple sources: the sequencer node (e.g., transaction throughput, pending queue size, gas usage), the prover (proof generation time, success rate, hardware utilization), and the underlying L1 (data posting costs, confirmation times, bridge state). Tools like Prometheus exporters can be written to scrape this data from node APIs, RPC endpoints, and smart contracts. For example, you might track rollup_blocks_produced_total or da_batch_submission_cost_eth.
Once metrics are collected, you need a visualization layer. Grafana is the industry standard for this, allowing you to create dashboards with graphs, gauges, and alerts. You can build panels to visualize real-time TPS, average transaction latency over time, the cost of posting data to Ethereum, and the status of the bridge's withdrawal queue. Setting meaningful alert rules in Prometheus or Grafana is crucial; you should be notified if proof generation fails consecutively or if the sequencer's mempool exceeds a dangerous threshold.
For deeper analytics beyond basic metrics, consider integrating a specialized indexer or building custom queries. You might want to analyze user growth by tracking unique active addresses, monitor the economic security by calculating the total value locked (TVL) in bridges, or audit sequencer censorship resistance. Services like The Graph for subgraphs or direct queries to an indexed database (e.g., using Dune Analytics or a self-hosted Postgres instance) can power these insights. This analytical layer transforms raw data into actionable business intelligence.
Finally, the dashboard must be actionable. Integrate alert notifications to Slack, PagerDuty, or Telegram so your team can respond immediately to incidents. Use Grafana annotations to mark deployments or major network events, creating a timeline for post-mortem analysis. Regularly review and update your dashboard as the rollup evolves—new contracts, upgrade mechanisms, and fraud proof systems will introduce new critical metrics to monitor. A well-maintained dashboard is not a static project but a core component of your rollup's operational infrastructure.
Prerequisites
Before building a rollup monitoring dashboard, you need the right tools, infrastructure, and data sources. This section covers the essential components to have in place.
A functional rollup monitoring system requires a solid foundation. You'll need access to the rollup's execution client (e.g., an OP Stack node, Arbitrum Nitro node, or zkSync Era server) and its connected L1 data availability layer (like Ethereum). This provides the raw transaction data, block headers, and state roots. Simultaneously, you must configure access to the rollup's sequencer RPC endpoint, which is the primary gateway for submitting transactions and querying the latest state. For historical analysis, you'll also need access to an indexed archive node or a service like The Graph to efficiently query past events and transactions.
The core of your dashboard is the data pipeline. You will need to set up a process to extract, transform, and load (ETL) data from the sources mentioned above. This typically involves writing scripts or using a framework to listen for new blocks and events, parse the data into a structured format (like converting hex values to decimals), and load it into a time-series database. Popular choices for this backend include PostgreSQL with TimescaleDB, InfluxDB, or ClickHouse. These databases are optimized for storing and querying sequential metric data, which is essential for tracking trends like TPS, gas fees, and active addresses over time.
Your development environment must include the necessary libraries and SDKs. For most EVM-compatible rollups, you will need Node.js (v18+) or Python (3.9+) with the Ethers.js or Web3.py library to interact with RPC endpoints and smart contracts. To build the dashboard frontend, a framework like React or Next.js is common, paired with a charting library such as Recharts, Chart.js, or Apache ECharts for visualization. You should also be familiar with the rollup's specific bridge contracts and precompiles, as monitoring cross-chain message passing and proving activity is a critical function.
Finally, establish your alerting and logging infrastructure. Integrate a service like Prometheus to scrape metrics from your application and Grafana to build the visual dashboard and set up alerts. For logging errors and tracking application health, configure structured logging with a service like Loki or an APM tool. Ensure you have the rollup's contract addresses (e.g., L1 and L2 bridge addresses, sequencer inbox) and RPC URLs documented and accessible to your scripts. With these components ready, you can proceed to implement the specific data collectors and visualizations for your dashboard.
Key Monitoring Concepts
Essential metrics, tools, and frameworks for building a comprehensive rollup observability stack.
Setting Up a Rollup Monitoring and Analytics Dashboard
A production-grade rollup requires comprehensive observability to track performance, security, and user activity. This guide details the architecture for a real-time monitoring dashboard.
A robust rollup monitoring system aggregates data from multiple sources. The core components are: a data ingestion layer pulling from the rollup's sequencer, L1 settlement contracts, and RPC nodes; a time-series database like Prometheus or TimescaleDB for metrics storage; and a visualization layer such as Grafana. Key metrics to capture include transaction throughput (TPS), batch submission latency to L1, gas costs, and sequencer health status. For example, an Optimism rollup would monitor ovm_sequencer_tx_count and gas_used for each L1 batch.
Instrumenting your rollup node is the first implementation step. For a custom rollup client, you must expose metrics via an HTTP endpoint, typically /metrics. Using a library like Prometheus client for Go or prom-client for Node.js, you can instrument key functions. Track counters for transactions processed, gauges for mempool size, and histograms for block production time. For existing stacks like Arbitrum Nitro, the node already exposes Prometheus metrics on port 9876. You configure Prometheus to scrape these targets at a defined interval, such as every 15 seconds.
The dashboard must visualize both chain state and system health. Create Grafana panels for: Sequencer Performance (live TPS, pending queue), L1 Settlement (batch submission frequency, confirmation time, gas spend), and RPC Service (request latency, error rates). Implement alerts for critical failures, like sequencer downtime or a spike in failed transactions. Use subgraphs from The Graph to index and query complex event data, such as daily active addresses or popular smart contract interactions, enriching your analytics beyond basic node metrics.
For advanced analysis, integrate a dedicated analytics database. Stream raw transaction data and event logs to ClickHouse or Apache Pinot using a service like Vector or Fluentd. This enables complex SQL queries for business intelligence, like calculating user retention cohorts or identifying the most gas-intensive contract calls. Architecturally, this forms a second data pipeline separate from the real-time operational metrics, ensuring analytical queries don't impact monitoring system performance.
Security and access control are critical for a production dashboard. Secure Grafana and Prometheus endpoints behind authentication. Use Grafana's built-in roles or integrate with OAuth2 providers. For team access, consider exposing the dashboard via a secure tunnel like Cloudflare Tunnel instead of public IPs. Regularly back up your Grafana dashboards as JSON files to version control. This setup provides the single pane of glass needed to maintain 99.9% uptime and make data-driven decisions for rollup optimization.
Step 1: Build the Data Collector Service
The data collector is the foundational backend service that fetches, processes, and stores raw blockchain data from your rollup for analysis.
A robust data collector service is the backbone of any monitoring dashboard. Its primary function is to ingest raw data from your rollup's RPC endpoints and sequencer, then structure it into a queryable format. For an OP Stack rollup like Base or Optimism, you would typically connect to the eth_getBlockByNumber and eth_getLogs JSON-RPC methods. This service runs on a schedule (e.g., every 12 seconds) to capture new blocks, transactions, and contract events, ensuring your analytics reflect near real-time chain activity.
You need to architect the service for reliability and idempotency. This means implementing robust error handling for RPC timeouts, managing chain reorganizations (reorgs), and ensuring no data is duplicated if the service restarts. A common pattern is to track the last processed block height in a database like PostgreSQL. The service logic should: 1) Poll for the latest block, 2) Fetch all transactions and logs for new blocks, 3) Parse and normalize the data, and 4) Commit it to your datastore in a single atomic transaction to maintain consistency.
For processing, you'll extract key metrics from the raw data. Essential datasets to collect include:
- Transaction Volumes: Count and aggregate value per block.
- Gas Usage: Track
gasUsedto monitor network congestion and fee trends. - Contract Interactions: Decode event logs (e.g.,
Transferevents for ERC-20 tokens) to track token flows and popular dApps. - Sequencer Metrics: Monitor batch submission transactions to L1 for latency and cost analysis.
Structuring this data into relational tables (e.g.,
blocks,transactions,events) is crucial for efficient querying in later steps.
Here is a simplified code snippet in Node.js using Ethers.js to fetch and store block data. This example assumes you have a PostgreSQL client (pg) and a blocks table set up.
javascriptconst { ethers } = require('ethers'); const { Client } = require('pg'); const rpcUrl = 'https://your-rollup-rpc-url.com'; const provider = new ethers.JsonRpcProvider(rpcUrl); const dbClient = new Client({ connectionString: process.env.DATABASE_URL }); async function collectBlockData(blockNumber) { const block = await provider.getBlock(blockNumber); const txs = await Promise.all( block.transactions.map(txHash => provider.getTransaction(txHash)) ); // Insert into database await dbClient.query( 'INSERT INTO blocks(number, hash, timestamp, tx_count) VALUES($1, $2, $3, $4) ON CONFLICT DO NOTHING', [block.number, block.hash, block.timestamp, txs.length] ); console.log(`Collected block #${block.number}`); }
This basic collector must be extended to handle reorgs by checking parent hashes and to process transaction receipts for logs.
Finally, consider scaling and maintenance. As transaction volume grows, batch inserts and connection pooling become essential. For production, containerize the service with Docker and orchestrate it using Kubernetes or a process manager like PM2. Implement comprehensive logging (e.g., with Winston or Pino) and alerting for failed data-fetching cycles. The output of this service—a clean, timestamped dataset—is what powers all subsequent analytical models and dashboard visualizations.
Step 2: Expose Metrics to Prometheus
Configure your rollup node to export structured performance and operational data for collection.
Prometheus operates on a pull-based model, meaning it periodically scrapes HTTP endpoints for metrics. Your rollup node must expose these metrics in Prometheus's specific text-based exposition format. For most blockchain clients built in Go, Rust, or Node.js, this is achieved by integrating a dedicated metrics library. The prom-client for Node.js, prometheus crate for Rust, and client_golang for Go are the standard choices. These libraries handle the formatting and provide a /metrics HTTP endpoint that serves the current snapshot of all registered metrics.
You must instrument your application code to track the data you care about. Common rollup-specific metrics include: rollup_blocks_proposed_total, rollup_transactions_processed, sequencer_batch_size_bytes, l1_submission_duration_seconds, and state_root_calculation_time. Each metric should be labeled with dimensions like chain_id or batch_type for granular analysis. For example, tracking l1_gas_used per transaction batch helps optimize cost efficiency. Avoid exposing sensitive information like private keys or raw transaction data in labels.
The metrics endpoint must be served on a dedicated port, separate from the node's main RPC API, for security and isolation. Configure your node's startup command or configuration file to enable metrics and define the bind address (e.g., --metrics, --metrics.addr 0.0.0.0, --metrics.port 6060). Use environment variables for port configuration to maintain flexibility across deployment environments (development, staging, production). Ensure this port is accessible from your Prometheus server's network.
Finally, verify the setup by querying the endpoint directly. Run your node and navigate to http://<your-node-ip>:<metrics-port>/metrics in a browser or use curl. You should see a plain-text response beginning with # HELP and # TYPE directives, followed by metric lines like rollup_blocks_proposed_total 142. This confirms your node is correctly exposing data. The next step is to configure Prometheus to scrape this target.
Create the Grafana Dashboard
Import and configure a pre-built dashboard to visualize your rollup's real-time health and performance metrics.
With Prometheus scraping your node's metrics, you now need a visualization layer. Grafana dashboards transform raw time-series data into actionable charts and alerts. For this guide, we'll import the OP Stack Node Dashboard (Dashboard ID 18602), a community-maintained template designed specifically for OP Stack-based rollups. This dashboard provides a comprehensive view of key performance indicators (KPIs) like block production, transaction throughput, and system resource usage.
To import the dashboard, log into your Grafana instance (typically at http://localhost:3000). In the left sidebar, navigate to Dashboards > New > Import. In the Import via grafana.com field, enter the dashboard ID 18602 and click Load. You will be prompted to select a Prometheus data source; choose the prometheus source you configured in the previous step. Finally, click Import to create the dashboard.
The imported dashboard is organized into logical panels. Key sections to monitor include:
- Chain Data: Tracks
l2gethblock height, transaction count per block, and gas usage. - System Metrics: Monitors CPU, memory, disk I/O, and network usage of your node's host machine.
- Batch Submitter/Proposer: Shows the health of the components that submit data and proofs to L1 (critical for sequencer operation).
- Database Performance: Graphs LevelDB read/write latencies, which can become a bottleneck.
You should customize the dashboard for your specific deployment. Edit the dashboard and modify panel queries to match your node's job labels (e.g., job="op-node"). Set up alert rules directly in Grafana or in Prometheus to notify you of critical issues, such as the block height stalling or system memory exceeding 90%. For production, configure persistent storage for Grafana and set up authentication.
This dashboard provides the foundational observability needed to operate a rollup node reliably. By correlating chain activity with system resource graphs, you can diagnose performance bottlenecks, verify that batches are being submitted on schedule, and ensure overall network health. The next step involves configuring log aggregation with Loki to complement these metrics with detailed event tracing.
Step 4: Configure Alerting Rules
Define conditions that trigger notifications when your rollup's health or performance deviates from expected baselines.
Alerting rules transform raw metrics into actionable intelligence. They are conditional statements evaluated at regular intervals by your monitoring system (like Prometheus with the Alertmanager). When a rule's condition is true for a sustained period, it fires an alert, which is then routed to configured channels like Slack, PagerDuty, or email. Effective rules act as an early warning system for issues like sequencer downtime, transaction backlogs, or liquidity depletion before they impact end-users.
Key Alert Categories for Rollups
You should configure rules across several critical dimensions:
- Sequencer Health: Alerts for sequencer process downtime, high error rates on RPC endpoints, or failure to submit batches to L1.
- Transaction Pipeline: Alerts for a growing mempool (
pending_transactions), a sudden drop in transactions per second (TPS), or failed transaction ratio spikes. - L1 Settlement: Alerts for missed batch submission deadlines, abnormally high L1 gas costs per batch, or failures in state root updates.
- Financial & State: Alerts for bridge contract balance thresholds, validator/staker slashing events, or a stalled state root finalization.
Here is an example Prometheus alerting rule for a sequencer heartbeat, written in YAML format for a prometheus-rules.yaml file:
yamlgroups: - name: rollup_sequencer rules: - alert: SequencerDown expr: up{job="sequencer"} == 0 for: 1m labels: severity: critical annotations: summary: "Sequencer instance {{ $labels.instance }} is down." description: "The sequencer has been unreachable for over 1 minute."
This rule uses the up metric, which is 1 for a healthy target and 0 when down. The for: 1m clause creates a waiting period to prevent false alarms from transient network blips.
Configure alert severity levels (e.g., warning, critical) and routing appropriately. A critical sequencer-down alert might page an on-call engineer, while a warning for elevated gas costs might only go to a Slack channel. Use annotations to include diagnostic information in notifications, such as the affected instance, current metric value, and a link to the relevant dashboard. Always test your rules by intentionally triggering the failure condition in a staging environment to validate the entire pipeline—from detection to notification.
Avoid alert fatigue by tuning thresholds carefully. Start with conservative values and adjust based on historical data. Implement alert grouping and silencing in Alertmanager to prevent notification storms during a widespread incident. Finally, document each alert's purpose, expected response, and escalation path in a runbook. This ensures anyone on call understands the urgency and procedure when an alert fires.
Essential Rollup Monitoring Metrics
Critical metrics to track for operational health, security, and performance of a rollup.
| Metric Category | Specific Metric | Target / Healthy Range | Why It Matters |
|---|---|---|---|
Sequencer Health | Block Production Interval | < 2 seconds | Indicates sequencer liveness and network stability. |
Sequencer Health | Pending Queue Size | < 100 transactions | Shows transaction backlog; high values cause delays. |
Data Availability | L1 Data Posting Latency | < Ethereum block time (12s) | Measures delay in publishing proofs, affecting finality. |
Data Availability | Calldata Cost per Batch | $50 - $200 (varies with L1 gas) | Primary operational cost driver for the rollup. |
State & Synchronization | Node Sync Time (Full) | < 1 hour | Time for a new node to sync, indicating chain efficiency. |
State & Synchronization | State Growth Rate | Track weekly trend | Unbounded growth impacts node hardware requirements. |
Financials & Fees | Avg Transaction Fee (USD) | $0.10 - $1.00 | Direct user cost and network congestion indicator. |
Financials & Fees | Sequencer Profit Margin |
| Sustainability metric for rollup operation. |
Security & Decentralization | Active Proposers / Challengers |
| Measures the health of the fraud/validity proof system. |
Security & Decentralization | Time to Finality (L1 confirmation) | ~30 minutes to 1 week | Time for a withdrawal to be considered fully secure. |
Troubleshooting Common Issues
Common problems encountered when building or using rollup monitoring dashboards, with solutions for developers.
A blank dashboard is often caused by incorrect RPC endpoint configuration or a stalled data ingestion pipeline.
Check these points first:
- RPC Endpoint Health: Verify your L1 and L2 RPC URLs are correct and responsive. Use
curlto test the endpoint:curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' YOUR_RPC_URL. - Indexer Status: If using The Graph, check the subgraph's syncing status in the hosted service dashboard or your Graph Node logs for errors.
- Time Range Filter: Ensure your dashboard's default time filter isn't set to a future date or a period before the rollup launched.
- Data Source Permissions: For cloud-based solutions like Google BigQuery or AWS, confirm the service account has the necessary read permissions on the data tables.
Tools and Resources
These tools are commonly used to build a production-grade rollup monitoring and analytics dashboard. Each card focuses on a concrete component: metrics collection, tracing, chain-level observability, and transaction-level debugging.
Frequently Asked Questions
Common questions and solutions for developers building and maintaining rollup monitoring dashboards.
Focus on sequencer health, data availability, and cost efficiency. Key metrics include:
- Sequencer Metrics: Block production rate, transaction inclusion latency, and uptime.
- L1-L2 Bridge: Deposit/withdrawal confirmation times, bridge transaction success rate, and gas costs.
- Data Availability: Calldata posted to L1 per batch, data availability layer latency (e.g., Celestia, EigenDA).
- Network State: Transactions per second (TPS), active addresses, and total value locked (TVL).
- Costs: Average transaction fee in USD and gas, and cost per batch posted to L1.
Track these using tools like Prometheus for collection and Grafana for visualization to identify bottlenecks.