Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
Free 30-min Web3 Consultation
Book Now
Smart Contract Security Audits
Learn More
Custom DeFi Protocol Development
Explore
Full-Stack Web3 dApp Development
View Services
LABS
Guides

Setting Up ZK Framework Proof Generation Benchmarks

A practical guide for developers to measure and compare proof generation times, memory usage, and circuit constraints across popular ZK frameworks.
Chainscore © 2026
introduction
DEVELOPER GUIDE

Introduction to ZK Proof Benchmarking

A practical guide to setting up and running performance benchmarks for zero-knowledge proof systems, covering key metrics, tools, and methodologies.

Zero-knowledge (ZK) proof generation is computationally intensive. Benchmarking is the systematic process of measuring the performance of a ZK proving system—typically a prover—under controlled conditions. Key metrics include proving time, memory consumption, and proof size. For developers working with frameworks like Circom, Halo2, or Noir, establishing a reliable benchmarking setup is essential for optimizing circuit design, selecting hardware, and estimating real-world operational costs. This guide focuses on the practical steps for creating reproducible benchmarks.

The first step is defining your benchmarking environment. Consistency is critical; results are only meaningful when compared under identical conditions. You must document and control for variables like the CPU architecture (e.g., x86 vs. ARM), clock speed, available RAM, and software versions of the proving framework and its dependencies (e.g., specific commits of arkworks libraries). Using containerization with Docker or specifying exact version pins in a Cargo.toml or package.json file helps ensure reproducibility across different machines and over time.

Next, you need to instrument your code to capture metrics. Most ZK frameworks provide hooks or can be wrapped with timing functions. A basic approach in Rust for a Halo2 prover might use std::time::Instant. For more detailed resource profiling, tools like perf on Linux or heaptrack can measure CPU cycles and memory allocation. It's also important to benchmark across a range of circuit scales (constraint counts) to understand how performance scales, which is crucial for applications expecting variable input sizes.

Finally, analyze and present your data. Raw timings should be aggregated over multiple runs (e.g., 10 iterations) to account for system noise. Calculate the mean and standard deviation. Visualizing the data with graphs—such as proving time vs. number of constraints—makes trends clear. Share your methodology and raw results publicly, for example in a GitHub repository's BENCHMARKS.md file, to contribute to community knowledge and allow for peer verification of performance claims.

prerequisites
ZK FRAMEWORK BENCHMARKING

Prerequisites and System Requirements

Before running performance benchmarks for zero-knowledge proof generation, you must configure your development environment with the correct tools and hardware. This guide covers the essential software, system specifications, and initial setup steps.

Benchmarking ZK proof systems requires a consistent and controlled environment to produce meaningful, reproducible results. The core prerequisites are a functional Rust toolchain and a working installation of the specific ZK framework you intend to test, such as Halo2, Plonky2, or gnark. You will also need Git for cloning repositories and a package manager like Homebrew (macOS) or apt (Linux) for installing system dependencies. Ensure your development environment is stable before proceeding.

System hardware directly impacts proof generation time and memory usage. For accurate benchmarks, a machine with a modern multi-core CPU (e.g., Intel i7/i9 or AMD Ryzen 7/9) and at least 16GB of RAM is recommended. Proof generation, especially for large circuits, is computationally intensive and can consume significant memory. Using an SSD over an HDD will also improve performance for I/O-heavy operations. Note that results will vary significantly between consumer laptops and dedicated server hardware.

The primary software requirement is the Rust programming language and its build tool, cargo. Install them via rustup, which manages toolchain versions. It is crucial to use a nightly Rust toolchain for frameworks like Halo2 that depend on unstable features. You can set this with rustup default nightly and update with rustup update. Verify your installation with cargo --version and rustc --version.

Next, install the benchmarking tool itself. Most ZK frameworks use Criterion.rs, a statistics-driven benchmarking library for Rust. Add it to your project's Cargo.toml under [dev-dependencies] with criterion = "0.5". You will also need the framework's specific dependencies and, often, Python 3 with packages like pandas and matplotlib for post-processing and visualizing benchmark results from Criterion's output.

Finally, clone the benchmark suite for your chosen framework. For example, to benchmark Halo2, you would clone the official halo2 repository. Navigate to the benches directory within the repo. Run an initial build with cargo build --release to compile all dependencies. Your first proof generation benchmark can then be executed with a command like cargo bench --bench <benchmark_name>. This establishes your baseline performance profile.

key-concepts-text
PERFORMANCE ANALYSIS

Key Benchmarking Metrics for ZK Proofs

A guide to the essential metrics for evaluating the performance of zero-knowledge proof systems, from proof generation time to memory consumption.

Benchmarking a zero-knowledge (ZK) proof system requires measuring several interdependent metrics to understand its practical viability. The most critical is proof generation time, which measures how long it takes to create a proof for a given computation. This is often the primary bottleneck for user-facing applications. Closely related is prover memory consumption, as complex circuits can require gigabytes of RAM, limiting the hardware on which proofs can be generated. These two metrics directly impact the cost and latency of any ZK application, from private transactions to verifiable machine learning.

On the verifier's side, verification time and the size of the proof itself are paramount. Fast, sub-second verification is essential for on-chain applications where gas costs are critical, while compact proofs reduce data transmission overhead. For example, a Groth16 proof for a simple transaction might be only 128 bytes and verify in milliseconds on Ethereum, whereas a STARK proof could be tens of kilobytes. You must also track the trusted setup contribution for SNARKs, which includes the size of the Common Reference String (CRS) and the time required for the setup ceremony, a one-time but crucial process.

To collect these metrics effectively, you need a reproducible benchmarking setup. Start by instrumenting your prover code. For a framework like Circom with snarkjs, you can wrap the groth16.fullProve call with timing functions and use Node.js's process.memoryUsage() to track RAM. In Rust-based frameworks like arkworks or Halo2, use crates like criterion for rigorous timing benchmarks and pprof for memory profiling. Always run multiple iterations in a controlled environment to account for variance, and log key parameters like the number of constraints in your circuit.

Beyond raw performance, benchmark the scalability of the system. This involves measuring how proof time and memory scale with the size of the computation, often represented by the number of constraints or the size of the witness. Plotting these relationships reveals whether the system scales linearly, polynomially, or exhibits step-function increases. Also, measure the preprocessing time for generating proving/verification keys, as this can be significant for large circuits. Use these scalability curves to estimate the feasibility of your target application.

Finally, contextualize your benchmarks with real-world comparisons. A proof generation time of 2 minutes might be acceptable for a rollup sequencer but unusable for a wallet transaction. Document the exact hardware specifications (CPU, RAM, OS), software versions (ZK library, compiler), and circuit details. Publish results in a standard format, allowing others to compare across frameworks like gnark, Noir, or Plonky2. Consistent, transparent benchmarking drives informed decision-making and accelerates the adoption of performant ZK technology.

benchmark-setup-circom
ZK FRAMEWORK PERFORMANCE

Setting Up Benchmarks for Circom and snarkjs

A guide to establishing a reproducible benchmarking pipeline for measuring the performance of zero-knowledge proof generation across different hardware and software configurations.

Benchmarking zero-knowledge proof systems like Circom and snarkjs is essential for developers and researchers to understand performance bottlenecks, optimize circuit design, and make informed infrastructure decisions. A proper benchmark suite measures key metrics: circuit compilation time, witness generation time, trusted setup contribution time, and most critically, proof generation time. These metrics vary significantly based on the circuit's complexity (number of constraints), the proving scheme (Groth16, PLONK), and the hardware (CPU, memory, GPU acceleration). Establishing a consistent methodology allows for meaningful comparisons between different versions or configurations.

To begin, you need a standardized test environment. Create a dedicated project directory and install the required tools with specific, pinned versions to ensure reproducibility. For example, use npm install circom@2.1.5 snarkjs@0.7.0. Your benchmark suite should include a set of representative circuits of varying sizes. A common approach is to use simple circuits like a multiplier (multiplier2.circom), a SHA256 hash preimage verifier, and a more complex Merkle tree inclusion proof. These provide a gradient of constraint counts from thousands to millions.

The core of the benchmark is a script that automates the proof generation pipeline. A typical Bash or Node.js script will execute these steps for each circuit: 1) Compile the circuit to R1CS and WASM, 2) Perform a Phase 1 and Phase 2 trusted setup (or use a pre-existing .ptau file for consistency), 3) Generate a witness for a sample input, and 4) Generate the proof. Use the time command or language-specific timing functions (e.g., Node.js performance.now()) to record the duration of each step. It's crucial to run multiple iterations and calculate averages to account for system noise.

For accurate results, control your system state. Close unnecessary applications, disable turbo boost on CPUs for consistent clock speeds, and consider using performance governor settings on Linux. Log all relevant system specifications: CPU model, core count, RAM size, and operating system. When benchmarking snarkjs proof generation, note whether you are using the pure JavaScript backend or the native rapidsnark prover, as the latter can be orders of magnitude faster for large circuits. The output should be structured data, like JSON or CSV, for easy analysis and visualization.

Beyond basic timing, advanced benchmarking explores scalability. Plot a graph of proof generation time versus the number of constraints in your circuit suite. This helps identify if performance scales linearly or exhibits polynomial growth. You can also benchmark memory usage using tools like /usr/bin/time -v on Linux. Share your methodology and results openly, referencing frameworks like the ZKProof Community Standards for guidance. Consistent, transparent benchmarks drive optimization efforts across the ecosystem, from improving circom compiler passes to enhancing the underlying cryptographic libraries in snarkjs.

benchmark-setup-halo2
ZK FRAMEWORK

Setting Up Benchmarks for Halo2

A practical guide to measuring and optimizing proof generation performance for your Halo2 circuits using the `halo2_proofs` benchmarking utilities.

Benchmarking is a critical step in developing production-ready zero-knowledge applications with Halo2. It allows you to measure the computational cost of proof generation, identify performance bottlenecks, and make informed decisions about circuit design and hardware requirements. The halo2_proofs crate provides a dedicated dev module (halo2_proofs::dev) containing utilities for this purpose, including mock proving, key generation timing, and detailed cost breakdowns. This guide will walk you through setting up a basic benchmarking suite for a Halo2 circuit.

To begin, you need to structure your project to separate your circuit's logic from the benchmarking code. A common pattern is to have a lib.rs file defining your circuit struct and its Circuit trait implementation, and a separate benches/ directory for Criterion benchmarks. First, add the necessary dependencies to your Cargo.toml: criterion for the benchmarking harness and halo2_proofs for the ZK primitives. You'll also need to enable the dev-graph feature of halo2_proofs if you want to generate visual circuit graphs for analysis.

The core of the benchmark is a function that creates an instance of your circuit with specific parameters (like the size of the computation) and then times the key generation and proving processes. Use the MockProver::run function to verify the circuit's constraints are satisfied for a given witness. For timing, you can use Criterion's BenchmarkGroup or lower-level timing functions. A key utility is CircuitCost::measure, which returns a struct detailing the number of advice columns, lookup arguments, and other metrics that directly impact proving time and memory usage.

Here is a minimal example of a benchmark function using the Criterion library:

rust
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
use halo2_proofs::dev::CircuitCost;
use my_circuit_lib::MyCircuit;

fn benchmark_proof_gen(c: &mut Criterion) {
    let k = 10; // Circuit size parameter (2^k rows)
    let circuit = MyCircuit::new(/* ... */);
    let public_inputs = vec![];

    let mut group = c.benchmark_group("MyCircuit");
    group.sample_size(10);
    group.bench_with_input(BenchmarkId::new("prove", k), &k, |b, &_| {
        b.iter(|| {
            // This is a simplified mock; real benchmarks use actual Prover
            let _cost = CircuitCost::measure(k, &circuit).unwrap();
        });
    });
    group.finish();
}

criterion_group!(benches, benchmark_proof_gen);
 criterion_main!(benches);

This measures the cost analysis, which correlates with proving overhead. For full proving benchmarks, you would instantiate a real Prover with a MockProver-generated proving key.

When analyzing results, focus on metrics that scale with your circuit's complexity: the number of advice cells, the number of lookup argument inputs, and the number of custom gates. These are reported by CircuitCost. Proving time in a real setup is heavily influenced by multi-scalar multiplication (MSM) size and the number of rows. Use the benchmarks to experiment with different circuit configurations—such as changing the k parameter or optimizing constraint density—and observe their impact on performance. This iterative process is essential for creating efficient, deployable ZK applications.

For advanced profiling, consider integrating the pprof profiler with Criterion to get flame graphs of your proving process. Also, the dev-graph feature can generate Graphviz files (circuit.gv) to visualize the placement of cells and gates, helping you identify areas of high connectivity or congestion. Remember that benchmarks should be run in release mode (cargo bench --release) for accurate measurements. Documenting performance characteristics for different input sizes will provide crucial data for estimating operational costs and user experience in a live system.

benchmark-setup-noir
ZK PROOF PERFORMANCE

Setting Up Benchmarks for Noir and Barretenberg

A guide to measuring and comparing the performance of zero-knowledge proof generation using the Noir language and Barretenberg backend.

Benchmarking zero-knowledge proof systems is critical for developers to understand performance trade-offs, optimize circuit design, and select the right tool for production applications. This guide covers setting up a benchmarking environment for Noir, a domain-specific language for ZK circuits, and Barretenberg, a high-performance proof backend. You'll learn to measure key metrics like proof generation time, verification time, and proof size across different circuit complexities. Accurate benchmarks help identify bottlenecks, such as heavy cryptographic operations or inefficient constraint systems, which are essential for building scalable applications on Ethereum, Aztec, or other ZK rollups.

To begin, you need a working installation of Noir and Barretenberg. Install Noir via its package manager, nargo, by following the official Noir installation guide. For Barretenberg, you can clone the repository and build it from source, as it provides the proving system backend. A typical setup involves creating a new Noir project with nargo new benchmark_circuit and then integrating the Barretenberg prover. Ensure your environment has Rust and necessary build tools installed. This setup allows you to compile Noir circuits into an intermediate representation that Barretenberg can execute and prove.

The core of benchmarking involves writing test circuits of varying complexity. Start with a simple circuit, like verifying a hash preimage, and progressively increase constraints by adding more operations (e.g., more hash iterations, signature verifications, or Merkle proofs). In your Noir project, define these circuits in .nr files. Use Barretenberg's command-line interface or its Rust bindings to generate proofs. You can automate this process with a shell script or a Rust test harness that calls the prover with timing commands, capturing outputs like time for proof generation and proof size in bytes. Log these results for comparison.

For reliable results, run benchmarks multiple times in a controlled environment. Use tools like hyperfine for command-line timing or integrate with Rust's criterion crate for statistical analysis. Isolate variables by running on the same hardware, closing background processes, and using consistent input sizes. Key metrics to record include: circuit compilation time, witness generation time, proof generation time, verification time, and final proof size. Comparing these across circuit sizes reveals how performance scales—often non-linearly—with the number of constraints, which is vital for estimating gas costs on-chain.

Interpreting benchmark data helps optimize your application. If proof generation time grows exponentially, consider circuit optimizations like reducing non-deterministic computations or using lookup tables. Barretenberg-specific features, such as its UltraPlonk arithmetization, may offer performance benefits for certain operations. Share your findings by publishing reproducible scripts and results, contributing to community knowledge. For further reading, consult the Noir documentation and Barretenberg GitHub repository. Effective benchmarking ensures your ZK application is both performant and cost-effective in production.

PROOF SYSTEM COMPARISON

ZK Framework Benchmarking Capabilities

A comparison of core performance and feature metrics for popular ZK frameworks used in proof generation benchmarks.

Benchmark MetricCircomHalo2Plonky2Groth16 (libsnark)

Proving Time (10k gates)

< 2 sec

~5 sec

< 1 sec

~15 sec

Verification Time

< 100 ms

< 50 ms

< 20 ms

< 200 ms

Trusted Setup Required

Recursive Proof Support

Proof Size (KB)

~2.5 KB

~1.8 KB

~45 KB

~0.2 KB

Primary Language

Circom / Rust

Rust

Rust

C++

Developer Tooling Maturity

On-Chain Gas Cost (ETH L1)

Medium

Low

High

Very Low

interpreting-results
ZK FRAMEWORK BENCHMARKS

Interpreting Benchmark Results

Learn how to analyze and understand the performance metrics from your zero-knowledge proof generation benchmarks to optimize your application.

After running a benchmark suite for your zero-knowledge proof system, you'll be presented with a table of raw metrics. The key columns to analyze are proof generation time, proof size, and verification time. These are the primary indicators of system performance and cost. For example, a benchmark might show that generating a proof for a specific zk-SNARK circuit takes 2.1 seconds and results in a 2 KB proof. Your goal is to interpret these numbers in the context of your application's requirements, such as user wait times or on-chain gas costs.

Context is critical for interpretation. A proof generation time of 5 seconds might be acceptable for a batch settlement process but untenable for a real-time gaming transaction. Compare your results against baseline expectations for your chosen proving system (e.g., Groth16, PLONK, Halo2). Look for anomalies: if proof generation time scales non-linearly with circuit size, it may indicate an inefficient constraint system or a need for different cryptographic backends. Tools like criterion.rs or custom scripts can help visualize these trends across multiple runs.

To derive actionable insights, establish performance budgets. For a decentralized application, you might set a target of sub-3-second proof generation for user experience and a proof size under 5 KB to minimize Ethereum calldata costs. If your benchmarks exceed these budgets, you need to optimize. Common optimization paths include: - Reducing the number of constraints in your circuit. - Utilizing more efficient primitives or hash functions. - Experimenting with different proving backends or parameters. - Implementing recursive proof composition for complex operations.

Beyond the primary metrics, pay attention to memory usage and prover key size. High memory consumption can limit the environments where your prover can run, while a large prover key affects setup costs and decentralization. Benchmarking should be an iterative process. After making optimizations, re-run the benchmarks to measure the delta. Documenting this process creates a performance profile for your project, which is valuable for audits, documentation, and informing architectural decisions for production deployment.

ZK PROOF GENERATION

Frequently Asked Questions

Common questions and troubleshooting for developers setting up and running benchmarks for zero-knowledge proof systems.

Slow proof generation is often caused by incorrect hardware utilization or suboptimal configuration. The primary bottlenecks are usually CPU performance, RAM bandwidth, and GPU memory for GPU-accelerated provers.

Key areas to check:

  • CPU: Ensure your benchmarks are not throttled by thermal limits. Use tools like lscpu and perf to monitor frequency and cache misses.
  • RAM: ZK proving is memory-intensive. Verify you are using high-bandwidth memory (e.g., DDR5) and that your system isn't swapping to disk.
  • Configuration: Many frameworks like Halo2, Plonky2, and gnark have configurable parameters (e.g., k for circuit size, number of threads). A circuit size (k) that is too large for your hardware will drastically increase proving time.
  • Example: For a medium-sized circuit, a k value of 18 might be optimal on a server with 128GB RAM, while k=20 could cause out-of-memory errors on a 32GB machine.
conclusion
BENCHMARKING INSIGHTS

Conclusion and Next Steps

This guide has walked you through setting up a reproducible benchmarking environment for ZK proof systems. The next steps involve analyzing your results and integrating these benchmarks into a continuous development workflow.

With your benchmark suite operational, you can now systematically compare proof systems like Halo2, Plonky2, and Groth16. Focus on the key metrics: proof generation time, memory usage, and verification speed. Use the structured data from your results/ directory to create visualizations, such as plotting proof time against circuit size. This analysis will reveal performance bottlenecks and help you select the optimal proving system for your specific application, whether it's a high-throughput rollup or a privacy-preserving smart contract.

To make your benchmarks actionable, integrate them into your CI/CD pipeline. Tools like GitHub Actions or CircleCI can be configured to run your benchmarks on every pull request, automatically flagging performance regressions. Consider using a dedicated benchmarking service like Criterion.rs for Rust projects or custom scripts to track performance over time. Documenting your setup and findings in a shared repository, as suggested by frameworks like the ZK-Bench initiative, contributes valuable data to the broader zero-knowledge research community.

The field of zero-knowledge proving is rapidly evolving. Stay informed about new frameworks like Nova and SuperNova for incremental verification, or Boole for GPU acceleration. Regularly update your benchmark circuits to include new cryptographic primitives and optimization techniques. By maintaining a rigorous, data-driven approach to performance evaluation, you ensure your ZK applications remain efficient, cost-effective, and secure as the underlying technology advances.

How to Benchmark ZK Proof Generation: Setup Guide | ChainScore Guides