How to Set Up Blockchain Network Monitoring Signals

introduction

GETTING STARTED

Setting Up Network Monitoring Signals

Learn how to configure and interpret key signals to monitor the health and performance of blockchain networks.

Blockchain network monitoring is the practice of collecting, analyzing, and alerting on key performance indicators (KPIs) from nodes, smart contracts, and the network layer. Unlike traditional web services, blockchains introduce unique metrics like finality time, gas usage, and validator set health. Effective monitoring requires setting up signals that provide early warnings for issues such as transaction backlogs, consensus failures, or smart contract vulnerabilities, enabling proactive incident response.

The first step is instrumenting your node software. For Ethereum clients like Geth or Nethermind, you can enable metrics export via their built-in APIs. A common approach is to use Prometheus to scrape these metrics. For example, configuring Geth with --metrics and --metrics.addr 0.0.0.0 exposes a /debug/metrics/prometheus endpoint. You should monitor core signals like chain_head_block for syncing status, p2p_peers for network connectivity, and txpool_pending for mempool size.

Beyond node health, you must monitor on-chain activity and smart contracts. This involves tracking events emitted by your dApp's contracts, watching for failed transactions (status 0), and measuring gas consumption patterns. Tools like The Graph for indexing or direct RPC calls to eth_getLogs can be used. Setting alerts for anomalous gas spikes or a sudden drop in successful transactions can signal a contract exploit or a configuration error in your application layer.

For network-level oversight, public block explorers and specialized services provide vital signals. Monitor metrics like average block time (target is ~12s for Ethereum, ~2s for Polygon PoS) and network hash rate (for PoW) or total stake (for PoS). A significant deviation can indicate network stress. Services like Chainscore Labs aggregate these signals, offering insights into cross-chain bridge volumes, stablecoin depegging events, and overall DeFi ecosystem health, which are crucial for risk management.

Finally, establish a clear alerting and dashboard strategy. Use Grafana to visualize the Prometheus metrics from your nodes. Create dashboards with panels for block propagation time, peer count, and CPU/Memory usage. Set up alert rules in Prometheus Alertmanager or a cloud service to notify your team via Slack or PagerDuty when critical thresholds are breached, such as a node falling behind by more than 100 blocks or peer count dropping below a minimum for network security.

prerequisites

FOUNDATION

Prerequisites and System Requirements

Before implementing network monitoring signals, ensure your infrastructure meets the necessary technical and operational prerequisites.

Effective blockchain monitoring begins with a reliable data source. You must have access to a full node or a node provider API (like Alchemy, Infura, or QuickNode) for each network you intend to monitor. For real-time signals, a WebSocket connection is essential. Your system should also have a stable internet connection and sufficient storage for logs and indexed data. Basic familiarity with command-line interfaces and your operating system's process management is required.

The core software requirement is a programming environment for your monitoring agent. This guide uses Node.js (v18+) and TypeScript, but the principles apply to Python, Go, or Rust. You will need npm or yarn for package management. Essential libraries include an Ethereum client like ethers.js v6 or viem for interacting with the blockchain, and a framework for structuring your application, such as a simple Express server or a dedicated background job processor like BullMQ.

For monitoring specific on-chain events, you need the Application Binary Interface (ABI) of the smart contracts you're tracking. This is typically found in the project's GitHub repository or on block explorers like Etherscan. You must also identify the precise event signatures and the contract addresses on the relevant networks (Mainnet, Arbitrum, Optimism, etc.). Incorrect addresses or ABIs will result in missed signals.

Operational readiness involves setting up alerting channels. Configure a service like Discord Webhooks, Telegram Bot API, or PagerDuty to receive notifications. For persistent storage of alert states or historical data, provision a database. A simple SQLite instance works for development, while production systems may require PostgreSQL or Redis. Ensure your environment variables are securely managed using a .env file or a secrets manager.

Finally, consider the scope of your monitoring. Define clear objectives: are you tracking wallet balances, specific transaction types, contract event emissions, or validator health? Start with a single, high-priority signal (e.g., "large transfer from treasury") to validate your pipeline before scaling to complex multi-contract, multi-chain logic. This iterative approach helps isolate configuration issues early.

architecture-overview

FOUNDATION

Monitoring Architecture Overview

A robust monitoring system is the central nervous system for any Web3 application, transforming raw blockchain data into actionable signals for developers and operators.

At its core, a Web3 monitoring architecture ingests data from multiple sources—blockchain RPC nodes, indexers, and subgraphs—and processes it into a unified stream of events. These events are then evaluated against predefined alert rules and signal definitions to detect specific on-chain conditions. The resulting signals, such as a large token transfer or a failed contract interaction, are delivered to configured endpoints like Slack, Discord, or PagerDuty. This pipeline enables real-time awareness of application health, user activity, and potential security incidents.

The architecture is typically composed of three logical layers. The Data Ingestion Layer is responsible for connecting to data sources and normalizing the information, often using tools like Chainscore's Signal Engine or custom indexers. The Processing & Rules Layer applies logic to this data stream, filtering for relevant transactions and calculating derived metrics. Finally, the Alerting & Delivery Layer formats and routes the resulting alerts. A critical design principle is decoupling; each layer should be independently scalable and the rules should be defined as code (e.g., in TypeScript or YAML) for version control and easy updates.

For example, monitoring a decentralized exchange (DEX) requires tracking several key signals. You would configure rules to watch for: - Unusual liquidity pool withdrawals - Failed swap transactions exceeding a rate threshold - Governance proposal submissions. A rule for a large withdrawal might be defined as: IF event == "Withdraw" AND pool == "USDC/ETH" AND amount > 100000 USDC THEN severity = "high". Implementing this requires listening to the specific pool's contract events via an RPC websocket or a subgraph.

Choosing the right tools depends on your stack and needs. For teams building from scratch, combining The Graph for indexing with an alerting service like Chainscore or Tenderly can accelerate development. For more control, you can run your own event listener using Ethers.js or Viem and process logs in a dedicated service. The key is to start with critical user journeys—like deposit or swap flows—and instrument them first. This ensures you detect outages or exploits that directly impact users before expanding to more granular operational metrics.

Ultimately, a well-architected monitoring system does more than just send alerts; it provides a telemetry backbone for your application. Correlated signals can feed into dashboards to visualize total value locked (TVL) or transaction success rates. They can also trigger automated responses, such as pausing a minting contract if anomalous behavior is detected. By treating on-chain signals as a first-class data source, teams can build more resilient, transparent, and user-aware decentralized applications.

key-monitoring-targets

NETWORK HEALTH

Key Metrics and Signals to Monitor

Effective monitoring requires tracking specific, actionable data points. These are the essential metrics for assessing blockchain network health and performance.

Block Production & Finality

Monitor the block time (average interval between blocks) and finality time (time for a block to be irreversible). High variance indicates instability. For example, Ethereum targets a 12-second block time post-Merge, while Solana aims for ~400ms. Use tools like block explorers or node RPC endpoints to track these metrics in real-time.

Network Throughput & Gas

Track transactions per second (TPS) and gas prices (on EVM chains). High TPS with low fees indicates healthy capacity. Sudden gas price spikes signal congestion. For L2s, also monitor sequencer metrics and data submission costs to L1. Tools like Etherscan's gas tracker or Dune Analytics dashboards provide this data.

EXPLORE

Validator/Node Health

For Proof-of-Stake networks, monitor validator participation rate (percentage of stake active) and slashing events. A low participation rate or frequent slashing reduces security. Track node sync status and peer count. Alerts should trigger for nodes falling behind by more than 100 blocks or dropping below a minimum peer threshold (e.g., < 50 peers).

EXPLORE

MemPool & Transaction Lifecycle

The mempool holds pending transactions. Monitor its size and average pending time. A large, stagnant mempool indicates network stress or fee market issues. Track transaction failure rates and revert reasons (e.g., out-of-gas, slippage) to identify smart contract or user experience problems. Services like Blocknative Mempool Explorer offer real-time visibility.

EXPLORE

Cross-Chain Bridge Status

If your dApp uses bridges, monitor bridge TVL, pending transactions, and attestation delays. For optimistic bridges, track the challenge period (e.g., 7 days for Arbitrum). For light-client bridges, monitor relayer health and signature submission rates. Set alerts for bridge pause events or abnormal withdrawal delays reported on status pages like Wormhole's.

EXPLORE

RPC/API Endpoint Performance

Your application depends on node providers. Monitor RPC endpoint latency (p95 response time), error rates (4xx/5xx), and rate limit usage. Degraded performance (< 99.9% uptime, latency > 1s) directly impacts UX. Implement health checks and failover logic. Services like Chainlist can help identify reliable public endpoints, but dedicated providers offer better SLAs.

EXPLORE

step1-prometheus-setup

SETTING UP NETWORK MONITORING SIGNALS

Step 1: Configure Prometheus for Node Metrics

This guide details how to configure Prometheus to scrape and store metrics from your blockchain node, creating the foundation for a robust monitoring system.

Prometheus is a powerful open-source monitoring and alerting toolkit that operates on a pull-based model. It periodically scrapes metrics from configured targets via HTTP. For blockchain node monitoring, you will configure Prometheus to connect to your node's metrics endpoint, which is typically exposed by the client software (e.g., Geth, Erigon, Prysm, Lighthouse). The core configuration file, prometheus.yml, defines these scrape targets, collection intervals, and data retention policies.

A standard prometheus.yml for a local Ethereum execution client like Geth would include a scrape_configs job. The target is the node's IP and port where metrics are exposed (default localhost:6060 for Geth's metrics). The metrics_path is usually /debug/metrics/prometheus. It's crucial to label your jobs clearly (e.g., job: "geth-mainnet") for easy identification in dashboards. Here is a basic configuration snippet:

yaml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "geth-execution"
    static_configs:
      - targets: ["localhost:6060"]
        labels:
          job: "geth-mainnet"
          network: "mainnet"

After saving the configuration, start Prometheus with the --config.file flag pointing to your YAML file. Verify the setup by navigating to the Prometheus web UI (default http://localhost:9090) and using the Graph or Targets page. The Targets page should show your geth-execution job as UP. If the status is DOWN, check that your node is running with the --metrics and --metrics.addr flags enabled and that no firewall is blocking the port. Successful configuration means Prometheus is now collecting time-series data like geth_chain_head_block, geth_p2p_peers, and process_cpu_seconds_total.

For production environments with multiple nodes, you can expand the static_configs list or use service discovery mechanisms like DNS SRV records or file-based discovery. Setting appropriate scrape_interval values (e.g., 15s for high-priority chains, 60s for archive nodes) balances data granularity with system load. Remember to configure retention policies (--storage.tsdb.retention.time) based on your storage capacity and alerting needs. This configured Prometheus instance becomes the single source of truth for your node's operational metrics, ready to be visualized with Grafana or used for alerting rules.

step2-grafana-dashboards

ACTIONABLE TUTORIAL

Step 2: Build Grafana Dashboards for Visualization

Transform raw blockchain data into actionable insights by creating custom Grafana dashboards for network monitoring.

Grafana is the industry-standard platform for visualizing time-series data from sources like Prometheus. After setting up your data collection in Step 1, you'll use Grafana to build dashboards that display key network health signals. Start by connecting Grafana to your Prometheus data source using the connection URL (e.g., http://localhost:9090). This allows you to query the metrics you've exposed, such as chainscore_block_height, chainscore_peer_count, and chainscore_transaction_pool_size.

Effective dashboards answer specific operational questions. Create panels for core infrastructure metrics: Node Synchronization Status (tracking chainscore_block_height vs. network tip), Network Connectivity (monitoring chainscore_peer_count for churn), and System Resources (CPU, memory, and disk usage of your node). Use Graph panels for historical trends and Stat panels for current values with color-coded thresholds (e.g., red for < 5 peers). Always set meaningful Y-axis labels and units.

For blockchain-specific monitoring, build panels around transaction flow and consensus. Visualize chainscore_transaction_pool_size to gauge network congestion. Plot chainscore_block_propagation_time_seconds to detect latency issues. A crucial panel is one that tracks finality or confirmation times using metrics related to block finalization events. Use Grafana's Transform feature to calculate derivatives or rates, such as transactions per second (rate(chainscore_transactions_total[5m])).

Implement alerting directly within Grafana to proactively manage your node. Define alert rules on critical panels; for example, trigger a notification if chainscore_peer_count drops below 10 for 5 minutes, or if chainscore_block_height stops increasing, indicating a stall. Configure alert channels to send notifications to Slack, email, or PagerDuty. This turns passive monitoring into an active defense system, ensuring you're notified of issues before they impact service.

Organize your dashboard logically. Group related panels into rows (e.g., "Network Health," "System Performance," "Transaction Metrics"). Use Text panels to add explanations and links to your runbooks. Finally, make your dashboard dynamic by adding template variables at the top. For multi-node setups, create a variable like $instance that lets you filter all panels to view data from a specific node IP or hostname, enabling quick troubleshooting across your deployment.

step3-alert-rules

CONFIGURATION

Step 3: Define Alerting Rules in Prometheus

Learn how to create and manage Prometheus alerting rules to trigger notifications for critical network events, such as missed blocks or validator downtime.

Prometheus alerting rules are defined in YAML files, typically named rules.yml, and loaded via the rule_files directive in your main prometheus.yml configuration. These rules are evaluated at regular intervals, and when a rule's expression evaluates to true for a configured duration, it fires an alert. This alert is then sent to the Alertmanager service for routing and notification. The core structure of a rule file groups related alerts under a groups key, with each group containing a list of individual rules.

A basic alert rule for a Cosmos validator might monitor the cosmos_validator_missed_blocks metric. The rule expression uses Prometheus's PromQL query language to check if the number of missed blocks in the last 10 minutes exceeds a threshold. For example, cosmos_validator_missed_blocks{chain="osmosis"} > 5 would fire if your validator missed more than 5 blocks on the Osmosis chain. The for clause adds a delay, requiring the condition to be true for a period (e.g., 2m) before firing, preventing false positives from transient spikes.

Each rule requires descriptive labels and annotations. Labels like severity: "critical" are used by Alertmanager to route the alert to the correct team or channel. Annotations provide human-readable context in notifications, using Go templating to inject metric values. For instance, summary: "Validator {{ $labels.instance }} is jailed" and description: "Validator on {{ $labels.chain }} has been jailed for {{ $value }} seconds." make alerts actionable. You should define rules for key failure modes: validator jailed, missed blocks, node syncing status, and RPC endpoint health.

After defining your rules, validate the YAML syntax and PromQL expressions using the promtool utility: promtool check rules ./rules.yml. Reload Prometheus to apply the new rules by sending a SIGHUP signal (kill -HUP <pid>) or via the HTTP reload endpoint if enabled. You can then verify active alerts in the Prometheus web UI under the Alerts tab, which shows each rule's current state (inactive, pending, firing).

For production reliability, structure your rules into logical groups (e.g., validator_health, node_infrastructure). Use recording rules to pre-compute expensive expressions that are reused across multiple alerts, improving evaluation performance. Always document the purpose and threshold rationale for each alert within the rule file using YAML comments. This practice is crucial for maintaining clarity as your monitoring setup grows in complexity across multiple chains or node types.

step4-alertmanager-webhooks

CONFIGURE NOTIFICATIONS

Step 4: Route Alerts with Alertmanager and Webhooks

Learn how to configure Prometheus Alertmanager to process, group, and route alerts to external services like Slack, PagerDuty, or custom webhooks for effective incident response.

Prometheus scrapes metrics and evaluates alerting rules, but it does not handle notifications. This is the role of Alertmanager, a separate service that de-duplicates, groups, and routes alerts to various receivers. After Prometheus fires an alert, it pushes it to the Alertmanager's API endpoint, typically http://alertmanager:9093. The core configuration file, alertmanager.yml, defines routing logic, notification templates, and integrations with external systems like Slack, email, PagerDuty, or generic webhooks.

The routing logic is controlled by route and receiver blocks. A top-level route acts as the entry point, with child routes allowing for hierarchical grouping. You can route alerts based on labels like severity, job, or alertname. For example, you might send all severity: critical alerts to a PagerDuty receiver, while routing severity: warning alerts to a Slack channel for visibility. This prevents alert fatigue by ensuring the right notifications reach the right teams.

For Web3 infrastructure, a common setup is to send alerts to a Slack workspace. This requires configuring a Slack receiver in alertmanager.yml with your incoming webhook URL and channel. A more flexible approach is using a generic webhook receiver, which sends a JSON payload to a specified HTTP endpoint. This allows you to build custom integrations, such as triggering an on-chain transaction, updating a dashboard, or paging an on-call engineer via services like Opsgenie.

Here is a basic example of an alertmanager.yml configuration that routes critical RPC node alerts to a webhook and warnings to Slack:

yaml
route:
  group_by: ['alertname', 'chain']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'slack-warnings'
  routes:
  - match:
      severity: critical
    receiver: 'webhook-critical'

receivers:
- name: 'slack-warnings'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/...'
    channel: '#alerts-web3'
    title: '{{ .GroupLabels.alertname }}'
    text: '{{ .CommonAnnotations.description }}'

- name: 'webhook-critical'
  webhook_configs:
  - url: 'http://your-service:8080/alert'
    send_resolved: true

This configuration groups alerts by alertname and chain, waits 30 seconds to group similar alerts, and sends different severity levels to distinct receivers.

After configuring alertmanager.yml, you must update your Prometheus configuration to point to the Alertmanager instance. In your prometheus.yml, add the alerting section:

yaml
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

Reload or restart both services. Test the pipeline by triggering a known alert condition, like stopping a monitored Geth node, and verify the notification appears in your configured receiver. Effective alert routing is critical for maintaining high uptime and rapid response to issues in decentralized infrastructure.

MONITORING FRAMEWORK

Signal Severity and Response Matrix

Recommended actions based on the severity and type of a triggered network monitoring signal.

Signal Severity	Example Trigger	Immediate Action	Investigation Priority	Escalation Path
Critical	Block production halted for > 5 minutes	PagerDuty alert to on-call engineer	P0 - Highest	Engineering lead & Incident Commander
High	Validator slashed or jailed	Slack channel alert, begin diagnostic checks	P1 - High	Protocol team & DevOps
Medium	RPC endpoint latency > 2 seconds	Create ticket, review logs and metrics	P2 - Medium	SRE team
Low	Peer count drops below threshold	Log event for trend analysis	P3 - Low	Monitoring dashboard
Informational	New validator joins the set	None required	P4 - None	N/A

SETUP & CONFIGURATION

Troubleshooting Common Monitoring Issues

Resolve common challenges when setting up network monitoring signals for blockchain nodes and infrastructure.

Alerts may fail to fire due to misconfigured thresholds, incorrect data sources, or notification channel issues.

Common causes and fixes:

Thresholds are too high/low: Verify your alert rules against normal baseline metrics for your node type (e.g., Geth vs Erigon memory usage).
Data source is down: Confirm your monitoring agent (Prometheus, Datadog agent) is running and scraping metrics from the node's exposed ports (e.g., localhost:6060 for Go metrics). Check firewall rules.
Alertmanager/PagerDuty misconfiguration: Ensure your notification pipeline (e.g., Alertmanager routes, Slack webhook URLs) is correctly configured and tested. Use amtool to verify Alertmanager config.
Silence rules: Check for active silences in Alertmanager that may be suppressing notifications.

resource-links

DEVELOPER TOOLING

Tools and Further Resources

These tools and resources help teams set up reliable network monitoring signals for blockchain infrastructure, smart contracts, and protocol health. Each card focuses on actionable ways to detect failures, attacks, or abnormal behavior early.

Forta Network: On-Chain Threat Detection

Forta is a decentralized monitoring network designed to detect malicious activity and protocol risks in real time. Developers write custom detection bots that analyze blockchain transactions and contract state changes.

Key capabilities:

Real-time alerts for exploits, governance attacks, oracle manipulation, and abnormal contract behavior
Bots written in TypeScript or Python, deployed permissionlessly
Coverage across Ethereum, Arbitrum, Optimism, Polygon, BNB Chain, and others

Practical use cases:

Monitor privileged function calls like pause(), upgradeTo(), or role changes
Detect large balance movements from treasury or bridge contracts
Alert on known exploit patterns such as reentrancy or flash loan price manipulation

Signals can be sent to Slack, Telegram, PagerDuty, or email, making Forta suitable for production incident response pipelines.

EXPLORE

OpenZeppelin Defender: Monitoring and Alerting

OpenZeppelin Defender provides managed monitoring, alerting, and automation for smart contract systems. It is commonly used by production protocols that need high-signal alerts without building custom infrastructure.

Monitoring features:

Sentinels watch for specific on-chain events, function calls, or transaction conditions
Filters by contract address, function signature, event parameters, or gas usage
Alerts triggered on failed transactions, privileged actions, or threshold breaches

Workflow examples:

Alert when an admin role is granted or revoked
Detect when upgradeable proxies change implementation addresses
Monitor paused states or emergency shutdowns

Defender integrates with Slack, email, webhooks, and incident tools, and supports automation such as transaction execution or timelock coordination once alerts fire.

EXPLORE

Tenderly: Transaction and Infrastructure Monitoring

Tenderly combines transaction tracing, simulation, and alerting to monitor smart contracts and blockchain infrastructure health.

Core monitoring capabilities:

Real-time alerts for failed transactions, contract reverts, or gas anomalies
Deep execution traces for post-mortem analysis
Simulation of pending transactions to predict failures before execution

Common monitoring signals:

Spike in reverted transactions after a deployment
Changes in gas usage that indicate logic changes or attack attempts
Unexpected execution paths in critical contracts

Tenderly is especially useful for teams that want to pair monitoring with debugging, allowing engineers to quickly understand why an alert fired and reproduce the issue using the same execution context.

EXPLORE

Prometheus and Grafana for Node Metrics

Prometheus and Grafana are widely used for monitoring blockchain nodes and supporting infrastructure such as RPC gateways, indexers, and relayers.

What to monitor:

Node health: peer count, block height, synchronization lag
RPC performance: latency, error rates, request volume
System metrics: CPU, memory, disk IO, and network usage

Typical setup:

Export metrics from Ethereum clients like Geth, Nethermind, or Erigon
Scrape metrics with Prometheus on a fixed interval
Visualize dashboards and configure alerts in Grafana

This approach is essential for infrastructure-level signals that cannot be observed on-chain, such as node desyncs, degraded RPC performance, or regional outages.

EXPLORE

Block Explorer APIs for Custom Signals

Block explorers like Etherscan and Arbiscan provide APIs that can be used to build lightweight, custom monitoring signals without running full nodes.

Useful API-driven signals:

Monitor contract events or function calls by address
Track token transfers above a defined threshold
Detect contract source code verification changes

Implementation examples:

Poll getLogs endpoints for critical events
Alert when large balances move from known multisig wallets
Track nonce gaps or abnormal transaction frequency

While not real time, explorer APIs are effective for low-cost monitoring, compliance checks, and secondary validation layers alongside more advanced tooling.

EXPLORE

NETWORK MONITORING

Frequently Asked Questions

Common questions and troubleshooting for setting up real-time monitoring signals for your blockchain nodes and infrastructure.

Monitoring an Ethereum node requires tracking several key health and performance metrics to prevent downtime and ensure reliable RPC service.

Essential signals include:

Sync Status: Monitor eth_syncing to ensure your node is fully synced with the network. A lagging node provides stale data.
Peer Count: Track active peer connections (net_peerCount). A low count (< 10) can hinder block propagation and data availability.
Memory & CPU Usage: High resource consumption can cause crashes, especially during periods of high network activity.
Disk I/O and Space: Full disks are a common cause of node failure. Monitor write latency and available storage.
Gas Price & Block Propagation Time: Sudden spikes can indicate network congestion, affecting transaction inclusion for your applications.

Setting alerts for deviations in these metrics is the first step to proactive node management.