Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
LABS
Guides

Setting Up Rollup Monitoring Systems

A technical guide for developers to implement monitoring for EVM-compatible rollups. Covers key metrics, alerting, and tools for Arbitrum, Optimism, and zkSync.
Chainscore © 2026
introduction
A PRACTICAL GUIDE

Setting Up Rollup Monitoring Systems

A step-by-step tutorial for developers to implement comprehensive monitoring for rollup infrastructure, covering key metrics, tools, and alerting strategies.

Rollup monitoring is essential for maintaining the health, security, and performance of your Layer 2 network. Unlike monolithic chains, rollups introduce unique components like sequencers, provers, and bridges that require specialized oversight. A robust monitoring system tracks data availability, state commitment latency, transaction finality, and bridge security. Without it, you risk silent failures, degraded user experience, and potential security vulnerabilities. The goal is to achieve observability—not just collecting logs, but deriving actionable insights into system behavior.

The monitoring stack typically consists of three layers: data collection, processing/aggregation, and visualization/alerting. For collection, you'll need agents to scrape metrics from your sequencer node (e.g., Geth or Erigon fork), the prover service, and the bridge contracts. Key metrics include rollup_sequenced_batches, rollup_l1_submission_delay, prover_batch_proof_time, and bridge_total_value_locked. Tools like Prometheus are standard for pulling and storing this time-series data. Logs from these services should be aggregated using Loki or a similar service for tracing specific transaction journeys.

Here's a basic Prometheus configuration snippet to scrape a rollup node's metrics endpoint:

yaml
scrape_configs:
  - job_name: 'rollup_sequencer'
    static_configs:
      - targets: ['sequencer-host:9090']
    metrics_path: '/metrics'

You must instrument your rollup node's code to expose these custom metrics. For an OP Stack chain, you would monitor the op_node and op_geth health endpoints. For a zkRollup like zkSync Era, you would track the server and prover components. The processing layer often uses Grafana for dashboards and Alertmanager to route alerts based on threshold rules, such as a sequencer being down for more than 5 minutes.

Critical alerts should be configured for core failure scenarios. These include: SequencerIsDown, HighL1SubmissionDelay (e.g., >30 minutes), DataAvailabilityError from the DAC or L1, ProverQueueBacklog exceeding a safe limit, and BridgeActivityAnomaly indicating a potential exploit. Alerts should be routed to appropriate channels like PagerDuty, Slack, or OpsGenie. It's also crucial to monitor the economic security of the system by tracking the bond size of validators/provers and the challenge period status for optimistic rollups.

Finally, effective monitoring extends beyond infrastructure to the user experience. Implement synthetic transactions that periodically send test transfers through the bridge and measure the end-to-end confirmation time. Use blockchain explorers like Blockscout (for your rollup) and Etherscan (for L1) as external data sources to verify state consistency. By combining low-level system metrics with high-level application checks, you create a defense-in-depth monitoring strategy that can identify issues from the hardware layer all the way to the end-user transaction.

prerequisites
ROLLUP MONITORING

Prerequisites and Setup

Essential tools and configurations required to build a robust monitoring system for rollup networks.

Effective rollup monitoring requires a foundational stack of tools and services before you begin writing custom alerts or dashboards. The core components are a blockchain node (either an execution client for the L1 or a sequencer RPC for the L2), a time-series database for storing metrics, and a visualization/alerting platform. For production systems, you'll need dedicated infrastructure for each component to ensure reliability and data isolation. Popular stacks include running a Geth or Erigon node for Ethereum, Prometheus for metrics collection, and Grafana for dashboards and alerting.

The first critical step is establishing reliable data ingestion. You must run or have access to a node with the appropriate RPC endpoints. For monitoring an Optimism or Arbitrum rollup, you need a connection to the sequencer's RPC (https://mainnet.optimism.io) and a connection to the L1 (e.g., Ethereum Mainnet) to track bridge contracts and dispute events. Use tools like Prometheus Node Exporter for system metrics and a custom exporter (often written in Go or Python) to query the node's JSON-RPC API and convert blockchain data into Prometheus metrics, such as rollup_block_height, pending_transactions, and gas_price.

Configuration is key to a maintainable system. Your Prometheus scrape_configs must define jobs for your node exporter and custom blockchain exporter. A typical alert rule in Prometheus YAML might watch for a stalled sequencer: expr: increase(rollup_block_height[5m]) == 0. For visualizing this data, Grafana dashboards should be built to show real-time chain health, including blocks per second, transaction pool size, and bridge finalization delays. Always secure these endpoints; use firewalls, VPNs, or authentication proxies for Prometheus and Grafana interfaces exposed to the internet.

Beyond the base setup, consider integrating log aggregation with Loki or ELK Stack to parse node logs for errors, and set up alert managers like Alertmanager to route notifications to Slack, PagerDuty, or email. For teams not wanting to manage this infrastructure, third-party services like Chainstack, Blockdaemon, or Tenderly provide managed nodes with enhanced APIs and built-in monitoring features, which can significantly reduce initial setup time while providing production-ready reliability and uptime guarantees.

key-concepts-text
KEY MONITORING CONCEPTS

Setting Up Rollup Monitoring Systems

A practical guide to building observability for rollup infrastructure, covering essential metrics, data sources, and alerting strategies.

Rollup monitoring requires a multi-layered approach, as you must track both the health of the underlying L1 settlement layer and the internal state of the rollup's own execution environment. At a minimum, your system should monitor sequencer health, data availability, state commitment finality, and cross-chain messaging. For example, an Optimism or Arbitrum node operator needs to track the sequencer_pending_tx_count to detect transaction processing backlogs, while also verifying that batch submissions to Ethereum are succeeding and not exceeding gas limits. This dual-layer visibility is non-negotiable for maintaining user trust and system reliability.

The primary data sources for monitoring are node RPC endpoints, blockchain explorers, and dedicated indexers. You should instrument your rollup node's JSON-RPC API to collect metrics like eth_blockNumber propagation delay and net_peerCount. For Ethereum L1 dependencies, use services like Alchemy or Infura, or your own archival node, to monitor contract events from the rollup's Inbox and Bridge contracts. Prometheus is the industry-standard tool for scraping and storing these time-series metrics, while Grafana provides the visualization layer. A critical alert might trigger if the rollup_state_root_lag exceeds 100 blocks, indicating a potential halt in state progression.

Effective alerting separates operational noise from genuine incidents. Configure alerts based on thresholds (e.g., sequencer downtime > 2 minutes), absences (e.g., no new batches for 10 minutes), and anomalies (e.g., a 300% spike in failed transactions). Use a tool like Alertmanager to route alerts to Slack, PagerDuty, or email. For instance, a key alert for a zkSync Era validator would monitor the frequency of zkSync_proof_submissions to Ethereum; a missed window could stall withdrawals. Always include contextual information in alerts, such as the affected chain ID and the last known good block hash, to accelerate diagnosis.

Beyond basic uptime, you must monitor for economic security and data integrity. Track the rollup's bond or stake on the L1 to ensure it's sufficiently collateralized. Monitor the cost and latency of forced inclusion transactions, a user's escape hatch if the sequencer censors them. For optimistic rollups, alert on the challenge period status and any submitted fraud proofs. For ZK rollups, verify the validity proof submission latency and verification success rate. These metrics guard against liveness failures and ensure the system's cryptographic guarantees are functioning as designed.

Finally, implement structured logging and distributed tracing for deep diagnostics. Logs from your rollup node's geth or reth instance should be ingested into a system like Loki or Elasticsearch. Trace individual transaction journeys from user submission through mempool, sequencing, batch creation, L1 submission, and finalization. This trace data is invaluable when debugging issues like a transaction that is finalized on L1 but not appearing in the rollup's state. A robust monitoring setup is not a one-time task; it requires continuous refinement of dashboards and alerts as the network upgrades and usage patterns evolve.

monitoring-tools
ROLLUP MONITORING

Essential Monitoring Tools

A robust monitoring stack is critical for rollup security and performance. These tools provide the observability needed to track sequencer health, bridge activity, and fraud proofs.

06

Economic Security Dashboards

Monitor the economic security of the rollup, particularly for Optimistic Rollups. Track the total value bonded in the fraud proof system and the value locked in the bridge. A significant drop in bonded value relative to bridge TVL increases security risk. Dashboards should also track the challenger set's health and activity.

  • Vital Statistic: Ratio of bonded ETH to bridge TVL.
  • Alert Threshold: Bonded value falls below a predefined safety multiple of bridge TVL.
7 Days
Standard Challenge Window
MONITORING FOCUS

Core Metrics by Rollup Type

Key performance indicators and operational data points to track for different rollup architectures.

Metric / EventZK RollupsOptimistic RollupsValidiums

State Finality Time

~10 min

~7 days

~10 min

Data Availability Layer

On-chain

On-chain

Off-chain (DAC/Celestia)

Proof/Dispute Submission Interval

Every batch

Only if fraud is suspected

Every batch

Primary Cost Driver

ZK proof generation

L1 calldata & bond posting

Off-chain data & ZK proof

Critical Monitoring Alert

Proof verification failure on L1

State root challenge initiated

Data availability challenge or proof failure

Gas Fee Tracking Complexity

Medium (L1 verify + batch)

High (L1 dispute windows)

Medium (L1 verify + DA proof)

Sequencer Liveness Check

Required Trust Assumption

Cryptographic (validity proof)

Economic (fraud proof bond)

Cryptographic + Data Committee

implementation-steps
ROLLUP MONITORING

Implementation: Setting Up Prometheus and Grafana

A step-by-step guide to deploying a robust monitoring stack for rollup node operators, enabling real-time visibility into system health, performance, and consensus metrics.

Effective rollup node operation requires comprehensive monitoring to ensure high availability, performance stability, and consensus participation. A Prometheus and Grafana stack provides this visibility by collecting, storing, and visualizing time-series metrics. Prometheus acts as the metrics collection and storage engine, pulling data from instrumented services like your rollup client (e.g., OP Stack, Arbitrum Nitro) and the underlying execution and consensus layer clients. Grafana serves as the visualization layer, allowing you to build dashboards that display key performance indicators (KPIs) such as block production latency, transaction throughput, peer counts, and system resource usage.

The first step is installing and configuring Prometheus. After downloading the latest release from the official website, you define a prometheus.yml configuration file. This file specifies which targets to scrape (your nodes) and how often. A crucial configuration is setting up service discovery for dynamic environments, though for a static setup, you list targets directly. For a rollup sequencer, you would typically scrape metrics from ports like :7300 for the rollup client's metrics endpoint, :6060 for the execution client (e.g., Geth), and :8080 for the consensus client (e.g., Lighthouse).

Next, you must expose metrics from your rollup node software. Most modern rollup clients have built-in Prometheus support. For an OP Stack node, you enable metrics by setting the --metrics.enabled flag and specifying a port (--metrics.port=7300). Similarly, ensure your execution and consensus clients are configured to expose their metrics endpoints. The key is verifying that the /metrics HTTP endpoint on each service returns data. You can test this with a simple curl localhost:7300/metrics command. Prometheus will then periodically HTTP GET this endpoint to collect the data.

With data flowing into Prometheus, you deploy Grafana to create actionable dashboards. After installation, you add your Prometheus server as a data source within Grafana's UI. The power lies in crafting PromQL queries to extract meaningful insights. For example, to monitor sequencer health, you might track rollup_sequencer_blocks_proposed to ensure continuous block production, or increase(rollup_sequencer_tx_processed_total[5m]) to visualize transaction throughput. For system health, use node exporter metrics like node_memory_MemAvailable_bytes and node_cpu_seconds_total. Grafana allows you to plot these queries on graphs, set up alert rules based on thresholds, and organize them into a cohesive dashboard.

To move from monitoring to alerting, configure Prometheus Alertmanager. This involves defining alert.rules files that contain conditions which, when met, trigger alerts. A critical rule for a rollup operator might be: ALERT SequencerDown IF up{job="rollup-node"} == 0 FOR 1m. This checks if the metrics endpoint is unreachable. Alertmanager then handles routing, grouping, and silencing of these alerts, sending notifications via channels like email, Slack, or PagerDuty. This creates a proactive system where operators are notified of issues like high memory usage, stalled block production, or peer connection loss before they impact network service.

Finally, consider advanced configurations for production resilience. Run Prometheus and Grafana in Docker containers or orchestrate them with Kubernetes for easy management and scaling. Implement long-term storage for metrics by integrating Prometheus with remote write targets like Thanos or Cortex, which is essential for analyzing historical performance trends. Regularly update your dashboards and alerting rules to match new versions of your rollup software and incorporate community best practices. A well-tuned monitoring stack is not a set-and-forget tool but a critical component of operational excellence for any rollup node operator.

code-snippets
ROLLUP MONITORING

Code Snippets for Custom Metrics

Implement custom monitoring dashboards for rollups using Prometheus, Grafana, and the Chainscore API to track performance, security, and economic health.

Rollups require specialized monitoring beyond standard node metrics. Key custom metrics include sequencer health (block production latency, batch submission success rate), data availability layer status (DA submission latency, blob confirmation time), prover performance (proof generation time, success rate), and economic security (sequencer bond value, fraud proof/challenge window status). These metrics provide early warnings for liveness failures, congestion, and security degradation. Tools like Prometheus for metric collection and Grafana for visualization form the core of a robust monitoring stack.

To collect custom metrics, you need to instrument your rollup node software. Below is a Python example using the prometheus_client library to expose a gauge for sequencer batch submission latency. This script simulates measuring the time between batch creation and its successful inclusion on the L1.

python
from prometheus_client import Gauge, start_http_server
import time
import random

# Define a custom Prometheus Gauge
BATCH_SUBMISSION_LATENCY = Gauge('rollup_batch_submission_latency_seconds',
                                 'Latency of batch submission to L1 in seconds')

def simulate_batch_submission():
    """Simulates a batch submission and records its latency."""
    start_time = time.time()
    # Simulate network delay and L1 confirmation time
    time.sleep(random.uniform(2.0, 10.0))
    latency = time.time() - start_time
    # Set the gauge value
    BATCH_SUBMISSION_LATENCY.set(latency)
    print(f"Batch submitted with latency: {latency:.2f}s")

if __name__ == '__main__':
    # Start Prometheus metrics HTTP server on port 8000
    start_http_server(8000)
    print("Metrics server started on port 8000")
    # Simulate periodic batch submissions
    while True:
        simulate_batch_submission()
        time.sleep(30)

Run this script, and Prometheus will scrape metrics from http://localhost:8000. The rollup_batch_submission_latency_seconds gauge will then be available for graphing in Grafana.

For L1 state and on-chain data, integrate the Chainscore API. This provides verified metrics like sequencer bond balances, fraud proof window status, and bridge activity without requiring complex event indexing. The following snippet fetches the current economic security metrics for a specified rollup, which you can feed into your Prometheus instance.

javascript
// Node.js example using axios to fetch Chainscore API data
const axios = require('axios');
const { Gauge, Registry } = require('prom-client');

// Create a custom Prometheus registry and gauge
const registry = new Registry();
const sequencerBondGauge = new Gauge({
  name: 'rollup_sequencer_bond_eth',
  help: 'Sequencer bond value in ETH',
  registers: [registry],
});

async function updateChainscoreMetrics() {
  try {
    // Replace with your actual API key and rollup identifier
    const response = await axios.get(
      'https://api.chainscore.dev/v1/rollups/optimism/metrics/economic-security',
      { headers: { 'x-api-key': 'YOUR_API_KEY' } }
    );
    const { sequencerBondEth } = response.data;
    // Update the Prometheus gauge with the live value
    sequencerBondGauge.set(parseFloat(sequencerBondEth));
    console.log(`Updated sequencer bond gauge: ${sequencerBondEth} ETH`);
  } catch (error) {
    console.error('Failed to fetch Chainscore metrics:', error.message);
  }
}

// Update metrics every 60 seconds
setInterval(updateChainscoreMetrics, 60000);

// Expose metrics endpoint for Prometheus
require('http').createServer(async (req, res) => {
  if (req.url === '/metrics') {
    res.setHeader('Content-Type', registry.contentType);
    res.end(await registry.metrics());
  }
}).listen(8080);

In Grafana, create dashboards using your custom Prometheus metrics. Key panels to build include: a time-series graph for rollup_batch_submission_latency_seconds with alerts for spikes over 30 seconds; a stat panel for rollup_sequencer_bond_eth with a warning threshold; and a heartbeat panel for prover status. Use Grafana Alerting to configure notifications to Slack, PagerDuty, or email when critical metrics breach thresholds, such as sequencer downtime or a significant drop in bond value. This end-to-end pipeline—custom export, external API integration, visualization, and alerting—creates a production-grade monitoring system tailored to your rollup's specific risks.

ROLLUP MONITORING

Troubleshooting Common Issues

Common problems encountered when setting up monitoring for rollups, with solutions for developers.

A failing sequencer health check typically indicates a connectivity or state issue. Common causes include:

  • RPC Endpoint Issues: The monitoring service cannot reach your sequencer's RPC endpoint (http://localhost:8545). Verify the node is running and the port is open.
  • Chain ID Mismatch: Your monitoring tool is configured for the wrong chain ID. Confirm the CHAIN_ID in your rollup config matches the one in your monitoring dashboard.
  • Block Production Halted: The sequencer has stopped producing blocks. Check sequencer logs for errors and verify the batcher and proposer components are functioning.
  • High Latency: Response time from the sequencer exceeds the health check threshold (often 5-10 seconds). This can be due to high load or system resource constraints.

First, run a manual curl command to the RPC endpoint: curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' http://localhost:8545. If this fails, the issue is with your node, not the monitor.

ROLLUP MONITORING

Recommended Alert Rules and Thresholds

Critical alert configurations for detecting anomalies in sequencer, prover, and bridge operations.

Alert TypeSeverityRecommended ThresholdAction Required

Sequencer Liveness

Critical

3 missed slots

Immediate PagerDuty

Proving Latency

High

15 minutes

Investigate within 1 hour

State Root Finality Delay

High

30 minutes

Investigate within 1 hour

Bridge Deposit/Withdrawal Failure Rate

High

5% over 1 hour

Investigate within 2 hours

L1 Gas Price Spike

Medium

200% baseline

Monitor and adjust batch size

RPC Error Rate (5xx)

Medium

2% over 5 minutes

Check node health

Batch Submission Cost

Informational

$50 per batch

Review gas optimization

ROLLUP MONITORING

Frequently Asked Questions

Common questions and troubleshooting steps for developers implementing rollup monitoring and alerting systems.

Monitoring a rollup requires observing two distinct layers. L1 (Ethereum) monitoring tracks the canonical state and security guarantees, focusing on:

  • Batch/State root submissions to the L1 bridge contract.
  • Challenge periods and fraud proof windows.
  • Sequencer status via L1 contract calls.

L2 (Rollup) monitoring tracks the execution environment and user experience, including:

  • Sequencer health (RPC endpoint availability, block production).
  • Transaction lifecycle (queueing, execution, finality).
  • Cross-chain message delivery (L1->L2 and L2->L1).

A complete system must correlate events across both layers to detect failures like a sequencer producing blocks but failing to post them to L1.

conclusion
IMPLEMENTATION GUIDE

Conclusion and Next Steps

You have now configured a foundational monitoring system for your rollup. This guide covered the essential components: data collection, alerting, and visualization.

A robust monitoring stack is not a one-time setup but an evolving system. Your next step should be to define Service Level Objectives (SLOs) and Service Level Indicators (SLIs). For a rollup, key SLIs include sequencer liveness, batch submission latency, L1 confirmation time, and state root finality. Tools like Prometheus can calculate error budgets and alert you when you're at risk of violating an SLO, shifting monitoring from reactive to proactive management.

To deepen your observability, integrate distributed tracing using Jaeger or Tempo. This is critical for debugging cross-layer transactions. You can instrument your sequencer, prover, and node software with OpenTelemetry to trace a user transaction from its submission on L2, through batch creation and proof generation, to its finalization on the L1. Correlating logs, metrics, and traces provides a complete picture of system behavior and failure points.

Finally, consider automating responses to common alerts. Using the Prometheus Alertmanager with webhook integrations, you can create runbooks that automatically restart a stalled service, failover to a backup sequencer, or post detailed incident summaries to a team channel. The goal is to reduce mean time to resolution (MTTR). Regularly review and test your alerting rules to prevent alert fatigue and ensure they remain relevant as your rollup's architecture evolves.