How to Benchmark ZK Proof Generation: Setup Guide

introduction

DEVELOPER GUIDE

Introduction to ZK Proof Benchmarking

A practical guide to setting up and running performance benchmarks for zero-knowledge proof systems, covering key metrics, tools, and methodologies.

Zero-knowledge (ZK) proof generation is computationally intensive. Benchmarking is the systematic process of measuring the performance of a ZK proving system—typically a prover—under controlled conditions. Key metrics include proving time, memory consumption, and proof size. For developers working with frameworks like Circom, Halo2, or Noir, establishing a reliable benchmarking setup is essential for optimizing circuit design, selecting hardware, and estimating real-world operational costs. This guide focuses on the practical steps for creating reproducible benchmarks.

The first step is defining your benchmarking environment. Consistency is critical; results are only meaningful when compared under identical conditions. You must document and control for variables like the CPU architecture (e.g., x86 vs. ARM), clock speed, available RAM, and software versions of the proving framework and its dependencies (e.g., specific commits of arkworks libraries). Using containerization with Docker or specifying exact version pins in a Cargo.toml or package.json file helps ensure reproducibility across different machines and over time.

Next, you need to instrument your code to capture metrics. Most ZK frameworks provide hooks or can be wrapped with timing functions. A basic approach in Rust for a Halo2 prover might use std::time::Instant. For more detailed resource profiling, tools like perf on Linux or heaptrack can measure CPU cycles and memory allocation. It's also important to benchmark across a range of circuit scales (constraint counts) to understand how performance scales, which is crucial for applications expecting variable input sizes.

Finally, analyze and present your data. Raw timings should be aggregated over multiple runs (e.g., 10 iterations) to account for system noise. Calculate the mean and standard deviation. Visualizing the data with graphs—such as proving time vs. number of constraints—makes trends clear. Share your methodology and raw results publicly, for example in a GitHub repository's BENCHMARKS.md file, to contribute to community knowledge and allow for peer verification of performance claims.

prerequisites

ZK FRAMEWORK BENCHMARKING

Prerequisites and System Requirements

Before running performance benchmarks for zero-knowledge proof generation, you must configure your development environment with the correct tools and hardware. This guide covers the essential software, system specifications, and initial setup steps.

Benchmarking ZK proof systems requires a consistent and controlled environment to produce meaningful, reproducible results. The core prerequisites are a functional Rust toolchain and a working installation of the specific ZK framework you intend to test, such as Halo2, Plonky2, or gnark. You will also need Git for cloning repositories and a package manager like Homebrew (macOS) or apt (Linux) for installing system dependencies. Ensure your development environment is stable before proceeding.

System hardware directly impacts proof generation time and memory usage. For accurate benchmarks, a machine with a modern multi-core CPU (e.g., Intel i7/i9 or AMD Ryzen 7/9) and at least 16GB of RAM is recommended. Proof generation, especially for large circuits, is computationally intensive and can consume significant memory. Using an SSD over an HDD will also improve performance for I/O-heavy operations. Note that results will vary significantly between consumer laptops and dedicated server hardware.

The primary software requirement is the Rust programming language and its build tool, cargo. Install them via rustup, which manages toolchain versions. It is crucial to use a nightly Rust toolchain for frameworks like Halo2 that depend on unstable features. You can set this with rustup default nightly and update with rustup update. Verify your installation with cargo --version and rustc --version.

Next, install the benchmarking tool itself. Most ZK frameworks use Criterion.rs, a statistics-driven benchmarking library for Rust. Add it to your project's Cargo.toml under [dev-dependencies] with criterion = "0.5". You will also need the framework's specific dependencies and, often, Python 3 with packages like pandas and matplotlib for post-processing and visualizing benchmark results from Criterion's output.

Finally, clone the benchmark suite for your chosen framework. For example, to benchmark Halo2, you would clone the official halo2 repository. Navigate to the benches directory within the repo. Run an initial build with cargo build --release to compile all dependencies. Your first proof generation benchmark can then be executed with a command like cargo bench --bench <benchmark_name>. This establishes your baseline performance profile.

key-concepts-text

PERFORMANCE ANALYSIS

Key Benchmarking Metrics for ZK Proofs

A guide to the essential metrics for evaluating the performance of zero-knowledge proof systems, from proof generation time to memory consumption.

Benchmarking a zero-knowledge (ZK) proof system requires measuring several interdependent metrics to understand its practical viability. The most critical is proof generation time, which measures how long it takes to create a proof for a given computation. This is often the primary bottleneck for user-facing applications. Closely related is prover memory consumption, as complex circuits can require gigabytes of RAM, limiting the hardware on which proofs can be generated. These two metrics directly impact the cost and latency of any ZK application, from private transactions to verifiable machine learning.

On the verifier's side, verification time and the size of the proof itself are paramount. Fast, sub-second verification is essential for on-chain applications where gas costs are critical, while compact proofs reduce data transmission overhead. For example, a Groth16 proof for a simple transaction might be only 128 bytes and verify in milliseconds on Ethereum, whereas a STARK proof could be tens of kilobytes. You must also track the trusted setup contribution for SNARKs, which includes the size of the Common Reference String (CRS) and the time required for the setup ceremony, a one-time but crucial process.

To collect these metrics effectively, you need a reproducible benchmarking setup. Start by instrumenting your prover code. For a framework like Circom with snarkjs, you can wrap the groth16.fullProve call with timing functions and use Node.js's process.memoryUsage() to track RAM. In Rust-based frameworks like arkworks or Halo2, use crates like criterion for rigorous timing benchmarks and pprof for memory profiling. Always run multiple iterations in a controlled environment to account for variance, and log key parameters like the number of constraints in your circuit.

Beyond raw performance, benchmark the scalability of the system. This involves measuring how proof time and memory scale with the size of the computation, often represented by the number of constraints or the size of the witness. Plotting these relationships reveals whether the system scales linearly, polynomially, or exhibits step-function increases. Also, measure the preprocessing time for generating proving/verification keys, as this can be significant for large circuits. Use these scalability curves to estimate the feasibility of your target application.

Finally, contextualize your benchmarks with real-world comparisons. A proof generation time of 2 minutes might be acceptable for a rollup sequencer but unusable for a wallet transaction. Document the exact hardware specifications (CPU, RAM, OS), software versions (ZK library, compiler), and circuit details. Publish results in a standard format, allowing others to compare across frameworks like gnark, Noir, or Plonky2. Consistent, transparent benchmarking drives informed decision-making and accelerates the adoption of performant ZK technology.

benchmark-setup-circom

ZK FRAMEWORK PERFORMANCE

Setting Up Benchmarks for Circom and snarkjs

A guide to establishing a reproducible benchmarking pipeline for measuring the performance of zero-knowledge proof generation across different hardware and software configurations.

Benchmarking zero-knowledge proof systems like Circom and snarkjs is essential for developers and researchers to understand performance bottlenecks, optimize circuit design, and make informed infrastructure decisions. A proper benchmark suite measures key metrics: circuit compilation time, witness generation time, trusted setup contribution time, and most critically, proof generation time. These metrics vary significantly based on the circuit's complexity (number of constraints), the proving scheme (Groth16, PLONK), and the hardware (CPU, memory, GPU acceleration). Establishing a consistent methodology allows for meaningful comparisons between different versions or configurations.

To begin, you need a standardized test environment. Create a dedicated project directory and install the required tools with specific, pinned versions to ensure reproducibility. For example, use npm install circom@2.1.5 snarkjs@0.7.0. Your benchmark suite should include a set of representative circuits of varying sizes. A common approach is to use simple circuits like a multiplier (multiplier2.circom), a SHA256 hash preimage verifier, and a more complex Merkle tree inclusion proof. These provide a gradient of constraint counts from thousands to millions.

The core of the benchmark is a script that automates the proof generation pipeline. A typical Bash or Node.js script will execute these steps for each circuit: 1) Compile the circuit to R1CS and WASM, 2) Perform a Phase 1 and Phase 2 trusted setup (or use a pre-existing .ptau file for consistency), 3) Generate a witness for a sample input, and 4) Generate the proof. Use the time command or language-specific timing functions (e.g., Node.js performance.now()) to record the duration of each step. It's crucial to run multiple iterations and calculate averages to account for system noise.

For accurate results, control your system state. Close unnecessary applications, disable turbo boost on CPUs for consistent clock speeds, and consider using performance governor settings on Linux. Log all relevant system specifications: CPU model, core count, RAM size, and operating system. When benchmarking snarkjs proof generation, note whether you are using the pure JavaScript backend or the native rapidsnark prover, as the latter can be orders of magnitude faster for large circuits. The output should be structured data, like JSON or CSV, for easy analysis and visualization.

Beyond basic timing, advanced benchmarking explores scalability. Plot a graph of proof generation time versus the number of constraints in your circuit suite. This helps identify if performance scales linearly or exhibits polynomial growth. You can also benchmark memory usage using tools like /usr/bin/time -v on Linux. Share your methodology and results openly, referencing frameworks like the ZKProof Community Standards for guidance. Consistent, transparent benchmarks drive optimization efforts across the ecosystem, from improving circom compiler passes to enhancing the underlying cryptographic libraries in snarkjs.

benchmark-setup-halo2

ZK FRAMEWORK

Setting Up Benchmarks for Halo2

A practical guide to measuring and optimizing proof generation performance for your Halo2 circuits using the `halo2_proofs` benchmarking utilities.

Benchmarking is a critical step in developing production-ready zero-knowledge applications with Halo2. It allows you to measure the computational cost of proof generation, identify performance bottlenecks, and make informed decisions about circuit design and hardware requirements. The halo2_proofs crate provides a dedicated dev module (halo2_proofs::dev) containing utilities for this purpose, including mock proving, key generation timing, and detailed cost breakdowns. This guide will walk you through setting up a basic benchmarking suite for a Halo2 circuit.

To begin, you need to structure your project to separate your circuit's logic from the benchmarking code. A common pattern is to have a lib.rs file defining your circuit struct and its Circuit trait implementation, and a separate benches/ directory for Criterion benchmarks. First, add the necessary dependencies to your Cargo.toml: criterion for the benchmarking harness and halo2_proofs for the ZK primitives. You'll also need to enable the dev-graph feature of halo2_proofs if you want to generate visual circuit graphs for analysis.

The core of the benchmark is a function that creates an instance of your circuit with specific parameters (like the size of the computation) and then times the key generation and proving processes. Use the MockProver::run function to verify the circuit's constraints are satisfied for a given witness. For timing, you can use Criterion's BenchmarkGroup or lower-level timing functions. A key utility is CircuitCost::measure, which returns a struct detailing the number of advice columns, lookup arguments, and other metrics that directly impact proving time and memory usage.

Here is a minimal example of a benchmark function using the Criterion library:

rust
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
use halo2_proofs::dev::CircuitCost;
use my_circuit_lib::MyCircuit;

fn benchmark_proof_gen(c: &mut Criterion) {
    let k = 10; // Circuit size parameter (2^k rows)
    let circuit = MyCircuit::new(/* ... */);
    let public_inputs = vec![];

    let mut group = c.benchmark_group("MyCircuit");
    group.sample_size(10);
    group.bench_with_input(BenchmarkId::new("prove", k), &k, |b, &_| {
        b.iter(|| {
            // This is a simplified mock; real benchmarks use actual Prover
            let _cost = CircuitCost::measure(k, &circuit).unwrap();
        });
    });
    group.finish();
}

criterion_group!(benches, benchmark_proof_gen);
 criterion_main!(benches);

This measures the cost analysis, which correlates with proving overhead. For full proving benchmarks, you would instantiate a real Prover with a MockProver-generated proving key.

When analyzing results, focus on metrics that scale with your circuit's complexity: the number of advice cells, the number of lookup argument inputs, and the number of custom gates. These are reported by CircuitCost. Proving time in a real setup is heavily influenced by multi-scalar multiplication (MSM) size and the number of rows. Use the benchmarks to experiment with different circuit configurations—such as changing the k parameter or optimizing constraint density—and observe their impact on performance. This iterative process is essential for creating efficient, deployable ZK applications.

For advanced profiling, consider integrating the pprof profiler with Criterion to get flame graphs of your proving process. Also, the dev-graph feature can generate Graphviz files (circuit.gv) to visualize the placement of cells and gates, helping you identify areas of high connectivity or congestion. Remember that benchmarks should be run in release mode (cargo bench --release) for accurate measurements. Documenting performance characteristics for different input sizes will provide crucial data for estimating operational costs and user experience in a live system.

benchmark-setup-noir

ZK PROOF PERFORMANCE

Setting Up Benchmarks for Noir and Barretenberg

A guide to measuring and comparing the performance of zero-knowledge proof generation using the Noir language and Barretenberg backend.

Benchmarking zero-knowledge proof systems is critical for developers to understand performance trade-offs, optimize circuit design, and select the right tool for production applications. This guide covers setting up a benchmarking environment for Noir, a domain-specific language for ZK circuits, and Barretenberg, a high-performance proof backend. You'll learn to measure key metrics like proof generation time, verification time, and proof size across different circuit complexities. Accurate benchmarks help identify bottlenecks, such as heavy cryptographic operations or inefficient constraint systems, which are essential for building scalable applications on Ethereum, Aztec, or other ZK rollups.

To begin, you need a working installation of Noir and Barretenberg. Install Noir via its package manager, nargo, by following the official Noir installation guide. For Barretenberg, you can clone the repository and build it from source, as it provides the proving system backend. A typical setup involves creating a new Noir project with nargo new benchmark_circuit and then integrating the Barretenberg prover. Ensure your environment has Rust and necessary build tools installed. This setup allows you to compile Noir circuits into an intermediate representation that Barretenberg can execute and prove.

The core of benchmarking involves writing test circuits of varying complexity. Start with a simple circuit, like verifying a hash preimage, and progressively increase constraints by adding more operations (e.g., more hash iterations, signature verifications, or Merkle proofs). In your Noir project, define these circuits in .nr files. Use Barretenberg's command-line interface or its Rust bindings to generate proofs. You can automate this process with a shell script or a Rust test harness that calls the prover with timing commands, capturing outputs like time for proof generation and proof size in bytes. Log these results for comparison.

For reliable results, run benchmarks multiple times in a controlled environment. Use tools like hyperfine for command-line timing or integrate with Rust's criterion crate for statistical analysis. Isolate variables by running on the same hardware, closing background processes, and using consistent input sizes. Key metrics to record include: circuit compilation time, witness generation time, proof generation time, verification time, and final proof size. Comparing these across circuit sizes reveals how performance scales—often non-linearly—with the number of constraints, which is vital for estimating gas costs on-chain.

Interpreting benchmark data helps optimize your application. If proof generation time grows exponentially, consider circuit optimizations like reducing non-deterministic computations or using lookup tables. Barretenberg-specific features, such as its UltraPlonk arithmetization, may offer performance benefits for certain operations. Share your findings by publishing reproducible scripts and results, contributing to community knowledge. For further reading, consult the Noir documentation and Barretenberg GitHub repository. Effective benchmarking ensures your ZK application is both performant and cost-effective in production.

PROOF SYSTEM COMPARISON

ZK Framework Benchmarking Capabilities

A comparison of core performance and feature metrics for popular ZK frameworks used in proof generation benchmarks.

Benchmark Metric	Circom	Halo2	Plonky2	Groth16 (libsnark)
Proving Time (10k gates)	< 2 sec	~5 sec	< 1 sec	~15 sec
Verification Time	< 100 ms	< 50 ms	< 20 ms	< 200 ms
Trusted Setup Required
Recursive Proof Support
Proof Size (KB)	~2.5 KB	~1.8 KB	~45 KB	~0.2 KB
Primary Language	Circom / Rust	Rust	Rust	C++
Developer Tooling Maturity
On-Chain Gas Cost (ETH L1)	Medium	Low	High	Very Low

resource-links

ZK BENCHMARKING SETUP

Essential Resources and Tools

These tools and references are required to build reproducible benchmarks for zero-knowledge proof generation across circuits, proving systems, and hardware targets. Each resource supports measurement of prover time, memory usage, and constraint growth under realistic workloads.

Circom + SnarkJS Benchmark Harness

Circom remains the most widely used DSL for arithmetic circuits targeting Groth16 and PLONK-style provers. Combined with SnarkJS, it allows controlled benchmarking of compilation, witness generation, and proof time.

Use this stack to measure:

Constraint count and R1CS size via circom --r1cs
Witness generation time with snarkjs wtns calculate
Proof generation latency using Groth16 or PLONK backends

For consistent benchmarks:

Fix compiler versions (Circom >= 2.1.8 recommended)
Disable parallelism when comparing prover performance
Track peak RSS memory during witness and proof phases

This setup is best suited for baseline benchmarks and comparisons against newer frameworks.

EXPLORE

Halo2 Proving System Benchmarks

Halo2 is a Rust-based proving system focused on PLONKish arithmetization with custom gates and lookup arguments. Its design makes benchmarking more complex but more representative of production ZK systems.

Key benchmarking dimensions:

Prover time per circuit degree (k parameter)
FFT and MSM breakdowns using the halo2_proofs profiling features
Memory growth driven by polynomial commitments

Practical tips:

Use cargo bench with fixed features and release mode
Benchmark both create_proof and keygen_pk paths
Pin Rust nightly versions to avoid compiler variance

Halo2 benchmarks are critical for evaluating proving performance in rollups and recursive proof systems.

EXPLORE

Plonky2 Prover Performance Testing

Plonky2 is optimized for fast recursive proofs using a STARK-based design with a PLONK-like interface. Benchmarks here focus on proof generation speed at scale and recursion overhead.

Common benchmark targets:

Single-proof prover latency for standard circuits
Recursive proof aggregation time
Constraint and gate growth under recursion depth

Recommended approach:

Use the built-in benchmarking binaries in the repository
Run benchmarks on pinned CPU models to compare cycles
Record wall-clock time and memory separately for base and recursive phases

Plonky2 excels in scenarios where recursion dominates performance decisions.

EXPLORE

Hardware-Level Prover Benchmarking

ZK proof generation is bottlenecked by FFT, MSM, and hashing workloads, making hardware configuration a first-class benchmarking variable.

When running benchmarks, record:

CPU model, core count, and clock speed
RAM capacity and bandwidth
SIMD and instruction set support (AVX2, AVX-512)

Recommended practices:

Disable CPU frequency scaling and hyperthreading
Isolate benchmark processes using taskset or cgroups
Run multiple iterations and report median values

Without hardware normalization, prover benchmarks are not comparable across teams or publications.

Reproducibility and Reporting Standards

Benchmark results are only useful if they can be reproduced. ZK benchmarks should follow strict reporting conventions to avoid misleading comparisons.

Include the following in every benchmark report:

Exact framework and commit hash
Compiler and dependency versions
Circuit description and constraint counts
Warm vs cold cache behavior

For automation:

Use Docker or Nix to pin environments
Export raw timing logs, not just aggregated numbers
Separate witness generation, proving, and verification metrics

Adhering to these standards prevents common benchmarking errors in ZK research and engineering.

interpreting-results

ZK FRAMEWORK BENCHMARKS

Interpreting Benchmark Results

Learn how to analyze and understand the performance metrics from your zero-knowledge proof generation benchmarks to optimize your application.

After running a benchmark suite for your zero-knowledge proof system, you'll be presented with a table of raw metrics. The key columns to analyze are proof generation time, proof size, and verification time. These are the primary indicators of system performance and cost. For example, a benchmark might show that generating a proof for a specific zk-SNARK circuit takes 2.1 seconds and results in a 2 KB proof. Your goal is to interpret these numbers in the context of your application's requirements, such as user wait times or on-chain gas costs.

Context is critical for interpretation. A proof generation time of 5 seconds might be acceptable for a batch settlement process but untenable for a real-time gaming transaction. Compare your results against baseline expectations for your chosen proving system (e.g., Groth16, PLONK, Halo2). Look for anomalies: if proof generation time scales non-linearly with circuit size, it may indicate an inefficient constraint system or a need for different cryptographic backends. Tools like criterion.rs or custom scripts can help visualize these trends across multiple runs.

To derive actionable insights, establish performance budgets. For a decentralized application, you might set a target of sub-3-second proof generation for user experience and a proof size under 5 KB to minimize Ethereum calldata costs. If your benchmarks exceed these budgets, you need to optimize. Common optimization paths include: - Reducing the number of constraints in your circuit. - Utilizing more efficient primitives or hash functions. - Experimenting with different proving backends or parameters. - Implementing recursive proof composition for complex operations.

Beyond the primary metrics, pay attention to memory usage and prover key size. High memory consumption can limit the environments where your prover can run, while a large prover key affects setup costs and decentralization. Benchmarking should be an iterative process. After making optimizations, re-run the benchmarks to measure the delta. Documenting this process creates a performance profile for your project, which is valuable for audits, documentation, and informing architectural decisions for production deployment.

ZK PROOF GENERATION

Frequently Asked Questions

Common questions and troubleshooting for developers setting up and running benchmarks for zero-knowledge proof systems.

Slow proof generation is often caused by incorrect hardware utilization or suboptimal configuration. The primary bottlenecks are usually CPU performance, RAM bandwidth, and GPU memory for GPU-accelerated provers.

Key areas to check:

CPU: Ensure your benchmarks are not throttled by thermal limits. Use tools like lscpu and perf to monitor frequency and cache misses.
RAM: ZK proving is memory-intensive. Verify you are using high-bandwidth memory (e.g., DDR5) and that your system isn't swapping to disk.
Configuration: Many frameworks like Halo2, Plonky2, and gnark have configurable parameters (e.g., k for circuit size, number of threads). A circuit size (k) that is too large for your hardware will drastically increase proving time.
Example: For a medium-sized circuit, a k value of 18 might be optimal on a server with 128GB RAM, while k=20 could cause out-of-memory errors on a 32GB machine.

conclusion

BENCHMARKING INSIGHTS

Conclusion and Next Steps

This guide has walked you through setting up a reproducible benchmarking environment for ZK proof systems. The next steps involve analyzing your results and integrating these benchmarks into a continuous development workflow.

With your benchmark suite operational, you can now systematically compare proof systems like Halo2, Plonky2, and Groth16. Focus on the key metrics: proof generation time, memory usage, and verification speed. Use the structured data from your results/ directory to create visualizations, such as plotting proof time against circuit size. This analysis will reveal performance bottlenecks and help you select the optimal proving system for your specific application, whether it's a high-throughput rollup or a privacy-preserving smart contract.

To make your benchmarks actionable, integrate them into your CI/CD pipeline. Tools like GitHub Actions or CircleCI can be configured to run your benchmarks on every pull request, automatically flagging performance regressions. Consider using a dedicated benchmarking service like Criterion.rs for Rust projects or custom scripts to track performance over time. Documenting your setup and findings in a shared repository, as suggested by frameworks like the ZK-Bench initiative, contributes valuable data to the broader zero-knowledge research community.

The field of zero-knowledge proving is rapidly evolving. Stay informed about new frameworks like Nova and SuperNova for incremental verification, or Boole for GPU acceleration. Regularly update your benchmark circuits to include new cryptographic primitives and optimization techniques. By maintaining a rigorous, data-driven approach to performance evaluation, you ensure your ZK applications remain efficient, cost-effective, and secure as the underlying technology advances.

Setting Up ZK Framework Proof Generation Benchmarks

Introduction to ZK Proof Benchmarking

Prerequisites and System Requirements

Key Benchmarking Metrics for ZK Proofs

Setting Up Benchmarks for Circom and snarkjs

Setting Up Benchmarks for Halo2

Setting Up Benchmarks for Noir and Barretenberg

ZK Framework Benchmarking Capabilities

Essential Resources and Tools

Circom + SnarkJS Benchmark Harness

Halo2 Proving System Benchmarks

Plonky2 Prover Performance Testing

Hardware-Level Prover Benchmarking

Reproducibility and Reporting Standards

Interpreting Benchmark Results

Frequently Asked Questions

Conclusion and Next Steps

Get a free quote.