How to Benchmark Performance After a Network Upgrade

introduction

INTRODUCTION

Setting Up a Post-Upgrade Performance Benchmarking Suite

A systematic guide to establishing a robust performance monitoring framework for blockchain networks after a major protocol upgrade.

A performance benchmarking suite is a critical tool for developers and node operators to quantitatively measure the impact of a network upgrade. Post-upgrade, it's essential to validate that changes to consensus mechanisms, virtual machines, or network layers perform as expected under real-world conditions. This involves tracking key metrics like transaction throughput (TPS), block propagation times, gas fee volatility, and node synchronization speed. Without a baseline and ongoing measurements, performance regressions or bottlenecks can go unnoticed, leading to degraded user experience and potential network instability.

The core of any benchmarking suite is its telemetry and data collection layer. This typically involves instrumenting nodes (e.g., Geth, Erigon, Prysm) to export metrics to a time-series database like Prometheus. You'll monitor low-level system resources (CPU, memory, disk I/O) alongside chain-specific data such as eth_blockNumber polling latency and eth_getLogs query performance. For Ethereum clients, the built-in metrics endpoints (e.g., --metrics flag in Geth) provide a wealth of data. The goal is to create a dashboard, often using Grafana, that visualizes pre- and post-upgrade performance side-by-side.

To simulate realistic load, your suite should include a transaction load generator. Tools like Blockchain Performance Benchmarking Framework (BPBF) or custom scripts using web3.js or ethers.js can deploy smart contracts, send token transfers, and execute common DApp interactions. For example, you might script a series of ERC-20 transfers and Uniswap V3 swaps to measure how the upgrade handles complex, gas-intensive operations. It's crucial to run these tests on a testnet or private development network that mirrors the mainnet upgrade before and after the hard fork.

Establishing a reliable baseline is the most important preparatory step. You must collect performance data for a significant period (e.g., 1-2 weeks) on the pre-upgrade software version under various network conditions. This baseline becomes your reference point. After the upgrade activates, you run identical load tests and compare results. Look for statistically significant changes in p95/p99 latency, error rates, and resource utilization. A 10% increase in block processing time or a spike in orphaned blocks could indicate a critical issue requiring immediate attention from client teams.

Finally, automate the entire process. Use CI/CD pipelines (GitHub Actions, GitLab CI) to schedule regular benchmark runs and alert on anomalies. The suite should produce clear, actionable reports. By integrating performance benchmarking into your development and operations workflow, you transition from reactive troubleshooting to proactive network health management, ensuring upgrades deliver their intended scalability and efficiency improvements.

prerequisites

PREREQUISITES

Setting Up a Post-Upgrade Performance Benchmarking Suite

Before analyzing the impact of a network upgrade, you need a robust benchmarking environment. This guide covers the essential tools and setup required to measure performance changes accurately.

A performance benchmarking suite is a collection of scripts, tools, and configurations designed to execute a standardized set of operations against a blockchain node. Its primary function is to establish a baseline before an upgrade and measure key metrics—like transaction throughput, block propagation time, and state sync speed—after the upgrade is applied. For Ethereum clients like Geth or Erigon, this involves running a local testnet, deploying a suite of smart contracts, and simulating user load. You'll need a development environment with Docker, Go 1.21+ (for client compilation), and Python 3.10+ for scripting and data analysis.

The core of your setup is the node software itself. You must be able to compile the client from source for both the pre-upgrade and post-upgrade versions. For example, to benchmark Geth, clone the repository and check out the specific git tags (e.g., v1.13.0 for a pre-Dencun baseline and v1.13.5 for post-Dencun). Use make geth to build. It's critical to run these nodes with identical hardware specifications and initial state to ensure a fair comparison. A common practice is to use a pre-mined genesis block and a standardized set of initial accounts funded with test ETH to eliminate variables.

Your benchmarking scripts should automate the test lifecycle. A typical flow involves: 1) Starting the node with a clean data directory, 2) Deploying a set of benchmark contracts (e.g., an ERC-20, a Uniswap-like pool, and a complex NFT minting contract), 3) Executing a predefined load test (sending batches of transactions via eth_sendRawTransaction), and 4) Collecting metrics from the node's RPC endpoints (eth_blockNumber, net_peerCount, debug_metrics). Tools like Grafana and Prometheus are invaluable for visualization, while a Python library like web3.py handles RPC interaction and transaction signing.

Finally, establish your key performance indicators (KPIs). These should be specific and measurable. Common KPIs include: Transactions Per Second (TPS) measured during a sustained load, average block gas usage, time-to-finality for a batch of transactions, and node synchronization time from genesis. Document your entire environment—including OS version, CPU, RAM, and SSD specs—in a README.md. This reproducibility is essential for validating your results and for the community to verify your findings. With this suite ready, you can confidently quantify the real-world impact of protocol changes.

key-concepts-text

CORE BENCHMARKING CONCEPTS

Setting Up a Post-Upgrade Performance Benchmarking Suite

A systematic approach to measuring and validating blockchain performance after a network upgrade or hard fork.

A post-upgrade benchmarking suite is a collection of automated tests and metrics designed to quantify the performance impact of a protocol change. Unlike simple unit tests, it measures system-level behavior under realistic loads, focusing on metrics like transaction throughput, block propagation time, gas usage efficiency, and node resource consumption (CPU, memory, disk I/O). The goal is to establish a performance baseline before an upgrade and compare it against results from the new client or protocol version, providing data-driven evidence of improvements or regressions.

The first step is defining your key performance indicators (KPIs). For a Layer 1 like Ethereum post-Merge, critical KPIs include finality time, sync speed, and validator performance. For an EVM-compatible chain, you might benchmark eth_call latency or state growth. For a rollup, prove time and L1 data submission cost are paramount. Tools like Hyperledger Caliper or custom scripts using the chain's JSON-RPC API are used to simulate load. It's essential to run benchmarks in an environment that mirrors mainnet conditions, using similar hardware specs and network topology.

Implementing the suite involves writing reproducible test scenarios. A common pattern is to use a framework like Python's asyncio or Go to create a load generator that sends a mix of transactions—simple transfers, contract deployments, and complex smart contract interactions—to a local testnet or a dedicated benchmarking cluster. You should instrument your node clients (e.g., Geth, Erigon, Besu) to export detailed metrics, often via Prometheus, and visualize results with Grafana. This setup allows you to capture not just averages but also percentiles (p95, p99) for latency, which are crucial for understanding tail performance.

After executing the benchmark, analysis is key. Compare the pre-upgrade and post-upgrade dashboards. Look for significant deviations in your KPIs. A 10% increase in blocks processed per second is a positive indicator; a 20% rise in memory usage might signal a regression. It's critical to run the suite multiple times to account for variance and ensure results are statistically significant. Findings should be documented in a report that clearly states the testing methodology, environment, raw data, and conclusions, providing transparency for core developers and the community.

tools

PERFORMANCE BENCHMARKING

Essential Tools and Frameworks

A robust benchmarking suite is critical for validating performance gains and identifying regressions after a network upgrade. These tools help developers measure and analyze key metrics.

Hardhat Network

The Hardhat Network is a local Ethereum network designed for development. It's the foundation for creating reproducible test environments.

Use hardhat node to run a local JSON-RPC server with deterministic behavior.
Configure forking from a mainnet archive node to simulate post-upgrade state.
Essential for running automated benchmark scripts in a controlled setting.

EXPLORE

Ganache

Ganache provides a personal blockchain for rapid Ethereum and EVM-compatible chain development and testing.

Quickly spin up a local chain with configurable block times and gas limits.
Useful for initial, lightweight performance testing of smart contracts before moving to more complex forked environments.
Offers a UI for inspecting transactions and contract state.

EXPLORE

Tenderly Forks

Tenderly Forks offer a powerful cloud-based alternative to local forking.

Create a fork of mainnet or testnets via a dashboard or API without running an archive node.
Execute transactions and deploy contracts on the forked chain to benchmark gas costs and execution paths.
Integrates with Hardhat and Foundry for seamless workflow integration.

EXPLORE

Foundry's Cast & Forge

Foundry is a blazing-fast toolkit with built-in benchmarking capabilities.

Use forge test with the --gas-report flag to get detailed gas consumption for all function calls.
Cast can be used in scripts to call contracts and measure response times on live or forked networks.
Write benchmark tests in Solidity for maximum fidelity to production execution.

EXPLORE

Benchmarking Scripts with Ethers.js

Custom scripts using Ethers.js v6 are essential for measuring real-world performance metrics.

Scripts can measure transaction confirmation latency, block propagation time, and RPC endpoint response times.
Simulate user load by sending batches of transactions to stress-test the network's post-upgrade capacity.
Log results to CSV or a database for trend analysis over multiple upgrade cycles.

EXPLORE

Block Explorers & RPC Analytics

Post-upgrade, use block explorers and RPC provider dashboards for macro-level performance data.

Monitor average block time, gas used, and transaction success rates on explorers like Etherscan.
Services like Alchemy's Supernode or Infura's Metrics provide dashboards for request volume, latency, and error rates to gauge network health.
Compare these metrics against pre-upgrade baselines.

EXPLORE

step-1-baseline

BENCHMARKING

Step 1: Establish a Pre-Upgrade Baseline

Before deploying any network upgrade, you must create a controlled environment to measure its impact. This step defines the process for setting up a reproducible performance benchmarking suite against the current, stable network state.

A pre-upgrade baseline is a comprehensive set of performance metrics captured from your node or application running on the current, stable version of the blockchain. This dataset serves as the objective control group. Without it, any performance changes post-upgrade are anecdotal and unquantifiable. Key metrics to capture include: block_propagation_time, state_sync_duration, transaction_throughput (TPS), average_gas_used_per_block, peer_count, and memory/cpu utilization. Tools like Prometheus for metrics collection and Grafana for visualization are industry standards for this task.

To ensure reproducibility, your benchmarking environment must be isolated and consistent. Use infrastructure-as-code tools like Docker Compose or Terraform to spin up a local testnet that mirrors your production configuration. This includes the same node client (e.g., Geth, Erigon), database settings, and network topology. Run a series of standardized load tests against this environment using a tool like k6 or a custom script that replays historical transaction patterns. Capture metrics over a significant period (e.g., 1-2 hours under load) to account for natural variance.

The output of this step is not just raw numbers, but a documented benchmark profile. Store the results—including the exact software versions, configuration files, and load test scripts—in a repository like GitHub. This creates a versioned benchmark that any team member can re-run. For example, your baseline for an Ethereum client upgrade might be tagged as baseline-geth-v1.13.0. This rigor transforms upgrade analysis from guesswork into a data-driven engineering process, providing clear before/after comparisons for performance, resource usage, and network stability.

step-2-upgrade-test

PERFORMANCE VALIDATION

Step 2: Execute the Upgrade and Re-run Tests

After deploying your upgrade, the next critical step is to validate its performance against the previous version. This guide details how to execute a controlled upgrade and run a comprehensive benchmarking suite to measure key metrics like gas costs, transaction throughput, and latency.

Begin by executing the upgrade transaction on your testnet or local fork. For a standard OpenZeppelin TransparentUpgradeableProxy pattern, you would call the upgradeTo or upgradeToAndCall function on the proxy, pointing it to the new implementation contract's address. It is crucial to verify the upgrade was successful by checking the proxy's implementation slot or calling a view function on the proxy to confirm the new logic is active. Always perform this step in a forked mainnet environment to simulate real-world state and interactions before proceeding to a live network.

With the new version live in your test environment, re-run your entire test suite. This includes unit tests, integration tests, and, most importantly, your newly created performance benchmarks. Focus on identifying any functional regressions first. Use tools like Hardhat or Foundry to run tests; for example, forge test --match-contract MyContractUpgradeTest can isolate your upgrade-specific test cases. Pay close attention to state migration—ensure that all existing user data and contract storage are correctly preserved and accessible through the new logic, as this is a common source of post-upgrade bugs.

The core of this step is analyzing the output from your performance benchmarking suite. Compare the results from the pre-upgrade and post-upgrade runs. Key metrics to scrutinize include: - Gas Costs: Has the average gas consumption for core functions increased significantly? - Throughput: Can the contract handle the same transaction load under stress tests? - Latency: Are there any new bottlenecks in transaction finality? A 5-10% variance might be acceptable, but a 50% increase in gas costs for a frequently called function is a critical red flag that requires immediate investigation and potential optimization.

If benchmarks reveal performance degradation, you must diagnose the root cause. Use Ethereum execution traces (debug_traceTransaction) or Foundry's -vvvv verbose logging to profile the new contract's execution. Common issues include: introducing new storage reads/writes in hot paths, increased complexity in algorithms, or unintended loops. Compare the opcode execution between the two versions. The goal is to understand if the regression is a necessary trade-off for new features or an optimization oversight that can be fixed.

Document all findings in a formal upgrade report. This should include a table comparing key performance indicators (KPIs), any discovered regressions with their severity, and the decision rationale for proceeding or rolling back. This report is essential for on-chain governance proposals if your protocol uses a DAO, or for internal security review. Only after performance is validated and documented should the upgrade be considered for mainnet deployment.

CORE BENCHMARKS

Key Performance Metrics to Compare

Essential on-chain and node-level metrics to measure before and after a protocol upgrade.

Metric	Pre-Upgrade Baseline	Target Threshold
Average Block Time	2.1 sec	< 2.5 sec
Peak TPS (Sustained 1 min)	450	500
Average Tx Finality Time	4.8 sec	< 5 sec
State Sync Duration (Full Node)	6.5 hours	< 8 hours
RPC Endpoint P99 Latency	120 ms	< 150 ms
Gas Cost for Standard Swap	$1.20	Within ±10% of baseline
Node Memory Footprint (Archive)	1.8 TB	< 2.0 TB
Consensus Participation Rate	98.5%	95%

step-3-analysis

POST-UPGRADE ANALYSIS

Step 3: Analyze Results and Identify Regressions

After executing your benchmark suite, the critical phase begins: interpreting the data to distinguish normal variance from significant performance regressions.

Start by comparing the new benchmark results against your established baseline. Focus on key metrics like average transaction latency, throughput (TPS), and gas consumption per operation. A simple percentage change calculation is your first filter. For example, if your baseline shows an average block processing time of 2.1 seconds and the post-upgrade result is 2.5 seconds, that's a ~19% increase warranting investigation. Use visualization tools like the plots generated by hardhat-gas-reporter or custom Grafana dashboards to spot trends and outliers at a glance.

Not all deviations are regressions. You must account for environmental variance and statistical significance. Run your benchmarks multiple times to establish a confidence interval. A tool like benchmark.js provides statistical analysis to determine if a change is meaningful. Look for patterns: is the slowdown consistent across all test cases, or isolated to specific contract functions like complex SSTORE operations? Correlate performance dips with changes in the code, such as new opcode usage or increased storage writes, which are common culprits after EVM upgrades.

For smart contract upgrades, gas profiling is essential. Compare the gas cost of each function before and after the upgrade. A regression might manifest in a frequently called function, drastically increasing user costs. Use eth-gas-reporter output or trace transactions with debug_traceTransaction to pinpoint expensive opcodes. Document any regression with a clear report: include the metric affected, the measured delta, the test case, and a link to the relevant code change. This creates a actionable ticket for developers.

Finally, establish your regression threshold policy. Define acceptable margins—e.g., "a <5% latency increase is tolerated, but a >10% gas increase on core functions is a blocker." This policy turns subjective judgment into a pass/fail gate for your upgrade. Integrate this analysis into your CI/CD pipeline; tools like GitHub Actions can be configured to fail a build if benchmarks exceed these thresholds, ensuring performance is a continuous requirement, not an afterthought.

POST-UPGRADE SUITE

Troubleshooting Common Benchmarking Issues

Common errors and solutions when establishing a performance benchmarking framework after a network or client upgrade.

Inconsistent results often stem from uncontrolled variables. After an upgrade, ensure your test environment is isolated and identical for each run.

Key factors to control:

Network State: Run benchmarks on a private testnet or a dedicated, synchronized archive node to avoid mainnet congestion variability.
Node Configuration: Use the exact same CLI flags, JWT secret path, and data directory for all tests. A change in --max-peers or cache sizes can drastically alter performance.
System Resources: Monitor CPU throttling, memory usage, and disk I/O on your benchmarking machine. Use tools like docker stats or htop to ensure consistent resource availability.
Warm-up Period: Nodes need time to populate caches. Perform several "warm-up" transactions or block imports before starting the official measurement period.

resource-links

BENCHMARKING TOOLING

Resources and Further Reading

These resources help you design and operate a post-upgrade performance benchmarking suite for smart contracts and blockchain infrastructure. Each card focuses on a concrete tool or methodology you can integrate into CI or staging environments to measure gas usage, latency, throughput, and regression risk after protocol or contract upgrades.

Foundry Gas and Execution Benchmarks

Foundry provides native tooling for gas usage, execution time, and opcode-level changes before and after a contract upgrade. Using forge test --gas-report, you can produce deterministic gas snapshots tied to a specific compiler version and EVM configuration.

Key practices for post-upgrade benchmarking:

Pin Solidity version, optimizer runs, and EVM version to avoid false deltas
Run benchmarks against identical calldata and state fixtures
Track gas changes per function rather than aggregate averages

Foundry tests execute locally and in CI, making them suitable for catching regressions introduced by refactors, proxy upgrades, or compiler changes. Teams commonly store gas reports as artifacts and diff them across releases to enforce explicit gas budgets on critical paths like swaps, liquidations, or governance execution.

EXPLORE

Hardhat Gas Reporter and Upgrade Diffing

Hardhat’s gas reporter plugin is widely used for pre- and post-upgrade gas comparisons in JavaScript and TypeScript-based workflows. It integrates directly with Mocha tests and outputs per-function gas costs in human-readable tables.

Recommended usage patterns:

Run gas reports against a forked mainnet state to capture realistic storage access costs
Combine with @openzeppelin/hardhat-upgrades to benchmark proxy upgrades
Export reports to JSON or Markdown and diff them in CI

This approach is particularly useful for teams maintaining large test suites that already cover upgrade paths. By enforcing thresholds on gas increases, you can block upgrades that unintentionally degrade user-facing transaction costs or exceed block gas targets.

EXPLORE

Prometheus for Node and RPC Metrics

Prometheus is the standard for collecting time-series performance metrics from blockchain nodes, RPC gateways, and supporting infrastructure after upgrades. Clients such as Geth, Nethermind, Erigon, and Besu expose native Prometheus endpoints.

Metrics commonly tracked in post-upgrade benchmarking:

Block processing time and import latency
RPC request duration and error rates
Memory usage and database read amplification

By scraping identical metric sets before and after an upgrade, you can identify performance regressions that do not surface in functional tests. Prometheus excels at long-running comparisons across days or weeks, making it suitable for validating protocol upgrades, client version bumps, or configuration changes under sustained load.

EXPLORE

Grafana Dashboards for Comparative Analysis

Grafana is typically paired with Prometheus to visualize before-and-after performance deltas introduced by upgrades. Dashboards allow you to overlay metrics from different deployment versions or time windows.

Effective benchmarking dashboards usually include:

Side-by-side panels comparing pre-upgrade and post-upgrade latency
Percentile views (p50, p95, p99) for RPC and block times
Annotation markers for upgrade block numbers or deployment timestamps

For blockchain teams, Grafana provides fast feedback on whether an upgrade improves throughput, degrades tail latency, or shifts resource consumption. Storing dashboard JSON in version control ensures that benchmarking views remain consistent across environments and releases.

EXPLORE

Tenderly Simulation and Runtime Profiling

Tenderly offers transaction-level simulation and profiling that is useful for post-upgrade validation in staging or forked environments. You can replay historical transactions against upgraded contracts and compare execution traces.

Key benchmarking capabilities:

Gas usage comparison for identical transactions
Opcode-level trace diffs highlighting new hot paths
Detection of new reverts or changed execution branches

This approach is valuable when upgrades affect complex execution paths that are hard to capture with synthetic tests alone. By benchmarking real mainnet transactions, teams gain higher confidence that upgrades will not introduce hidden performance or cost regressions for existing users.

EXPLORE

PERFORMANCE BENCHMARKING

Frequently Asked Questions

Common questions and troubleshooting steps for setting up a robust performance benchmarking suite after a network upgrade or hard fork.

A post-upgrade benchmarking suite is a collection of automated tests and metrics collectors designed to objectively measure a blockchain node's performance before and after a major network upgrade. Its primary purpose is to quantify the impact of the upgrade on critical system parameters, ensuring changes meet their intended goals and do not introduce regressions.

Key objectives include:

Measuring throughput: Transactions per second (TPS), block propagation time.
Assessing resource usage: CPU, memory, disk I/O, and network bandwidth under load.
Validating consensus changes: Finality time, fork choice rule efficiency, sync speed.
Providing a baseline for future optimizations and identifying performance bottlenecks introduced by new features like EIP-4844 blobs or new precompiles.

conclusion

IMPLEMENTATION

Conclusion and Next Steps

This guide has outlined the essential components for building a performance benchmarking suite after a major protocol upgrade. The next step is to operationalize these concepts.

You now have the blueprint for a robust post-upgrade benchmarking suite. The core workflow involves: - Establishing a baseline using pre-upgrade data from tools like Tenderly or Blocknative. - Defining key metrics such as transaction latency, gas costs, and block propagation times. - Automating data collection with scripts that query RPC endpoints and archive nodes. - Implementing alerting for metric deviations beyond predefined thresholds using PagerDuty or Opsgenie.

For a concrete next step, create a simple script to track a critical metric. For example, use the Ethers.js library to measure the average gas cost of a common function call, like a transfer() on an upgraded ERC-20 contract, over the last 100 blocks. Compare this to your historical baseline stored in a database like TimescaleDB. This initial script forms the foundation of your automated monitoring system and validates your data pipeline.

To deepen your analysis, integrate with specialized tools. Use Blockprint to analyze validator performance post-merge, or leverage Erigon's detailed tracing features for deep state access analysis. Consider joining protocol-specific dev forums or Discord channels to share findings and benchmarks with other node operators, turning your local suite into a contributor to collective network health intelligence.

Finally, document your methodology and findings. A well-maintained README.md in your benchmarking repository should explain the setup, metric definitions, and how to interpret the results. This documentation is crucial for team onboarding and provides a clear audit trail. Regular reporting, even if just internal, ensures performance remains a continuous priority and not just a post-upgrade checklist item.