How to Use ZK Proofs for Private AI Model Training

introduction

PRIVACY-PRESERVING ML

Introduction to ZK-Proofs for Private Model Training

Zero-knowledge proofs enable machine learning models to be trained on sensitive data without exposing the raw inputs or the model's internal parameters, a critical capability for healthcare, finance, and confidential enterprise AI.

Traditional federated learning decentralizes data but still requires sharing model updates (gradients), which can be reverse-engineered to reveal private information. Zero-knowledge proofs (ZKPs) solve this by allowing a client to generate a cryptographic proof that they have correctly computed a model update on their local data, without revealing the data or the update itself. The server only verifies the proof, ensuring the training step was valid while preserving privacy. This creates a verifiable, trust-minimized framework for collaborative AI.

Setting up ZK-proofs for private training involves several core components. You need a circuit compiler (like Circom or Halo2) to translate your model's forward pass and loss calculation into an arithmetic circuit. A proving system (e.g., Groth16, PLONK) generates and verifies the proofs. Finally, a client-server protocol defines how proofs are submitted and verified. The primary challenge is the computational overhead of generating proofs for complex neural network operations, which is an active area of research in ZKML (Zero-Knowledge Machine Learning).

For a practical example, consider a simple linear regression update. The client's private data is a set of points (x_i, y_i). The public model is weights w. The client computes the gradient ∇L and generates a ZK-proof asserting: ∇L was correctly derived from w and some (x_i, y_i) that satisfy the data schema, without revealing the points. In Circom, you would define a circuit that takes private inputs x_i, y_i and public inputs w, computes the mean squared error and its gradient, and outputs the gradient as a public signal alongside the proof.

Key libraries and frameworks are emerging to streamline this process. EZKL allows you to export a PyTorch model to a Halo2 circuit. zkML by 0xPARC provides tools for converting models to R1CS constraints. When implementing, you must decide what constitutes a public statement versus a private witness. Typically, the initial model weights, the final updated weights (or the gradient), and the proof are public. The training data, intermediate activations, and the exact loss value remain private.

The verification cost on-chain is a major consideration. While proof generation is client-side and can be expensive, the verification must be cheap enough to be posted on a blockchain for maximum trustlessness. Succinct proofs like PLONK and STARKs offer faster verification times. For production, you might use a recursive proof system to aggregate multiple training steps into a single, efficiently verifiable proof, or leverage specialized co-processors like the Ethereum L2 zkSync for scalable verification.

prerequisites

ZKML TUTORIAL

Prerequisites and Setup

This guide outlines the essential tools and foundational knowledge required to implement zero-knowledge proofs for private machine learning model updates.

Before generating proofs for private model updates, you need a solid development environment and an understanding of core concepts. The primary prerequisite is a working knowledge of zero-knowledge proof systems, particularly zk-SNARKs (Succinct Non-interactive Arguments of Knowledge). You should be familiar with the role of a prover (who generates the proof), a verifier (who checks it), and the concept of a circuit that defines the computational statement to be proven. For ML, this circuit will encode your model's forward pass and update logic. A basic understanding of elliptic curve cryptography and finite fields is also beneficial, as these are the mathematical foundations for most ZK systems.

Your technical setup requires installing specific ZK toolchains. We recommend starting with Circom, a popular domain-specific language for defining arithmetic circuits, and its associated trusted setup tool, snarkjs. You can install them via npm: npm install -g circom snarkjs. For compiling and testing circuits, you'll need Rust and Cargo. An alternative is to use the gnark library in Go, which offers high-level APIs for circuit design. Ensure your development machine has sufficient RAM (16GB+ recommended) for circuit compilation, as complex ML models can generate circuits with millions of constraints.

For the machine learning component, you need to decide on a framework for defining and training your model. TensorFlow or PyTorch are standard choices. The critical step is translating your trained model into a format your ZK circuit can compute. This involves quantizing model weights and activations to fixed-point integers, as ZK circuits operate natively in a finite field and cannot handle floating-point numbers directly. Tools like EZKL or zkml can help automate parts of this conversion from PyTorch models to ZK circuits, handling the quantization and circuit generation pipeline.

Finally, you must set up a project structure. Create a dedicated directory for your ZKML project. Organize it into clear subdirectories: one for your original ML model code and training scripts, another for your Circom circuit definitions (e.g., circuits/model_update.circom), and a third for your proof generation and verification scripts. You will also need to manage the trusted setup phase (the Powers of Tau ceremony) to generate the proving and verification keys for your circuit. This is a one-time setup per circuit structure and is critical for security.

key-concepts-text

CORE CONCEPTS

Setting Up Zero-Knowledge Proofs for Private Model Updates

This guide explains how to use zk-SNARKs to verify machine learning model updates without revealing the underlying training data or model parameters.

Zero-knowledge proofs, specifically zk-SNARKs (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge), enable one party (the prover) to convince another (the verifier) that a statement is true without revealing any information beyond the validity of the statement itself. In the context of machine learning, this allows a model trainer to prove they have correctly computed a model update—such as a gradient descent step—on a private dataset, without exposing the data or the updated model weights. This is foundational for privacy-preserving federated learning and verifiable AI, where trust and data confidentiality are paramount.

The technical workflow involves three main components: the arithmetic circuit, the trusted setup, and the proof system. First, the ML operation (e.g., a forward pass and loss calculation) must be expressed as an arithmetic circuit, a computational model consisting of addition and multiplication gates over a finite field. Libraries like circom or snarkjs are commonly used for this. This circuit defines the constraints that must be satisfied for a valid computation. A critical step is the trusted setup ceremony, which generates a proving key and a verification key. This setup must be performed once per circuit and is a potential security bottleneck if compromised.

To generate a proof for a private model update, the prover (the client with the data) executes the training step locally. They use the proving key, their private inputs (the training data and initial model weights), and the public inputs (the new model weights or a commitment to them) to generate a zk-SNARK proof. This proof is a small, fixed-size cryptographic object that attests to the correctness of the computation. The verifier, who only has access to the public inputs and the verification key, can check this proof in milliseconds, confirming the update is valid without learning anything about the private inputs. This enables scalable, trust-minimized collaboration.

Implementing this requires careful engineering. The ML model must be quantized or adapted to work within the finite field arithmetic of the proof system, which can affect precision. Furthermore, proving time and cost scale with circuit complexity. For a simple linear regression update, a proof might take seconds, but for a deep neural network layer, it could be prohibitively expensive. Teams often use techniques like model partitioning (proving updates layer-by-layer) or leveraging zk-friendly ML architectures to manage this. Practical frameworks are emerging, such as EZKL, which compiles PyTorch models into zk-SNARK circuits.

The primary use case is decentralized and federated learning. A central aggregator can verify that contributions from thousands of devices are valid before incorporating them into a global model, preventing malicious or faulty updates. This also enables verifiable inference, where a model's prediction can be proven correct. The on-chain implications are significant: smart contracts on networks like Ethereum can act as verifiers, enabling trustless AI oracles and DeFi applications that use ML without relying on a centralized data provider. The field is rapidly evolving, with new proof systems like zk-STARKs and PLONK offering different trade-offs in setup requirements, proof size, and verification speed.

resource-links

DEVELOPER TOOLING

Essential Tools and Libraries

These tools are commonly used to implement zero-knowledge proofs for private or verifiable machine learning model updates. Each card focuses on libraries that support circuit design, proof generation, or ML-specific ZK workflows.

Circom + snarkjs

Circom is a domain-specific language for writing arithmetic circuits, and snarkjs is the reference toolkit for compiling circuits and generating Groth16 or PLONK proofs.

Common use cases for private model updates:

Encoding gradient updates or parameter deltas as arithmetic constraints
Proving that an update was computed from a fixed previous model hash
Verifying bounds, norms, or sparsity constraints on updates

Typical workflow:

Write circuits in Circom 2.x defining the model update constraints
Compile to R1CS and WASM
Use snarkjs to generate trusted setup artifacts and proofs

This stack is widely used in production ZK systems and has mature documentation, but circuit size grows quickly for large models. It works best for linear or shallow models, or for verifying properties of updates rather than full training steps.

EXPLORE

Halo2

Halo2 is a Rust-based proving system using PLONKish arithmetization with no trusted setup. It is developed by the Zcash team and used in several production-grade ZK protocols.

Why Halo2 is relevant for private model updates:

Supports custom gates for non-linear operations common in ML
Recursive proof composition enables batching multiple updates
No per-circuit trusted setup

Developers typically:

Implement model update logic as Rust circuits
Use fixed-point arithmetic for weights and gradients
Aggregate proofs for multiple training rounds

Halo2 has a steeper learning curve than Circom but offers more flexibility and better long-term maintainability for complex constraints. It is often chosen when updates must be verified on-chain or recursively wrapped for rollups or DA layers.

EXPLORE

EZKL (ZKML Framework)

EZKL is a zero-knowledge machine learning framework that compiles ONNX models into ZK circuits, currently targeting Halo2.

Key capabilities:

Proving inference correctness without revealing model weights
Proving that updated weights satisfy constraints relative to prior weights
Automatic handling of quantization and fixed-point scaling

Typical workflow for private updates:

Export model or update step to ONNX
Use EZKL to generate proving and verification keys
Produce proofs for each update or inference step

EZKL significantly reduces the manual circuit-writing burden and is suitable for research prototypes and early production systems. Developers still need to carefully manage model size, quantization, and constraint counts to keep proving times practical.

EXPLORE

RISC Zero zkVM

RISC Zero provides a general-purpose zkVM that can prove the execution of arbitrary Rust programs, including ML update logic.

Why zkVMs matter for private model updates:

No manual circuit design
Directly prove execution of training or update code
Easier integration with existing ML pipelines

Common patterns:

Prove that an update was computed from authenticated data
Verify learning rate schedules or aggregation logic
Combine with Merkle commitments for model state

zkVMs trade larger proof sizes and longer proving times for developer velocity. They are well-suited for complex update logic or rapidly changing models where rewriting circuits would be impractical.

EXPLORE

Fixed-Point Arithmetic Libraries

Most ZK systems do not support floating-point arithmetic. Fixed-point math libraries are essential for representing model weights, gradients, and activations.

Key considerations:

Choose scaling factors that balance precision and constraint count
Enforce overflow and range checks in-circuit
Maintain consistent encoding across training rounds

Common approaches:

Manual fixed-point implementations in Circom or Halo2
Using helper crates provided by Halo2-based ML frameworks

Errors in fixed-point design are a common source of incorrect proofs or unverifiable updates. Testing numerical stability off-circuit before encoding constraints is critical for reliable private model updates.

PROTOCOL SELECTION

ZK Proof System Comparison for ML

Comparison of leading ZK proof systems for verifying private machine learning model updates, focusing on computational overhead and developer experience.

Feature / Metric	Groth16	Plonk	Halo2
Trusted Setup Required
Proof Generation Time (for a small ML layer)	~2-5 sec	~5-10 sec	~10-20 sec
Proof Verification Time	< 100 ms	< 200 ms	< 300 ms
Proof Size	~200 bytes	~400 bytes	~1 KB
Recursive Proof Support
Developer Libraries (Rust/JS)	bellman, snarkjs	arkworks, plonkjs	halo2, halo2-lib
Circuit Flexibility	Low (fixed structure)	High (custom gates)	Very High (plonkish arith)
Ideal Use Case	Final on-chain verification	General-purpose applications	Complex, recursive proofs

circuit-design

FOUNDATION

Step 1: Designing the Arithmetic Circuit

The arithmetic circuit is the computational blueprint for a zero-knowledge proof. It defines the logical constraints that must be satisfied for a private model update to be valid, without revealing the underlying data or model parameters.

An arithmetic circuit is a directed acyclic graph where nodes represent arithmetic operations (addition and multiplication) over a finite field, and edges represent values (wires). In the context of private machine learning, the circuit encodes the forward pass of a model—such as a linear regression or a small neural network layer—alongside the update rule (e.g., gradient descent). The circuit's inputs are the private data, the current model weights, and the public hyperparameters. Its output is the proof that a correct update was computed.

To design this circuit, you must first formalize your model's operations as polynomial constraints. For a simple example, consider a single neuron with a ReLU activation: y = ReLU(w * x + b). This requires breaking down into constraints: 1) s = w * x + b, 2) y = s * (1 - q), and 3) q * (1 - q) = 0, where q is a binary witness indicating if s is negative. Libraries like Circom or ZoKrates provide domain-specific languages to write these constraints declaratively, which are then compiled into the circuit's Rank-1 Constraint System (R1CS) representation.

The critical design challenge is circuit size and complexity, directly impacting proof generation time and cost. You must optimize by minimizing non-arithmetic operations. For instance, comparison operations (>, <) and non-linear functions (like sigmoid) are expensive. Common techniques include using lookup tables for fixed-point arithmetic or approximating functions with low-degree polynomials. The circuit must also include constraints to enforce that the initial weights and final updated weights are linked correctly through the gradient calculation, ensuring the proof validates the entire update logic.

Here is a conceptual snippet in a Circom-like syntax for a constraint checking a gradient step update, where new_w should equal old_w - lr * grad:

code
template GradientStep() {
    signal input old_w;
    signal input grad;
    signal input lr;
    signal output new_w;

    // Constraint: new_w = old_w - (lr * grad)
    new_w === old_w - (lr * grad);
}

This single component would be instantiated for each model parameter. The complete circuit is the aggregation of all such components for the model's forward pass, loss calculation, and gradient update.

Finally, the circuit design must be tested with known inputs and outputs to ensure it produces correct witnesses and constraints. This involves generating a witness—a valid assignment to all signals that satisfies every constraint—using a trusted setup. Any error in the circuit logic will make it impossible to generate a valid proof, even for correct computations. This step is foundational; a flawed circuit design compromises the entire system's integrity and privacy guarantees.

proof-generation-client

ZKML WORKFLOW

Step 2: Client-Side Proof Generation

Generate a zero-knowledge proof locally to verify the integrity of a private model update without revealing the underlying data or model parameters.

Client-side proof generation is the cryptographic core of private ML. Using a zk-SNARK or zk-STARK proving system, you create a proof that attests to the correct execution of a computation—like a model training step or inference—over private inputs. This proof, typically a few hundred bytes, can be publicly verified by anyone (e.g., a blockchain verifier contract) to confirm the computation's validity, while the sensitive data (the training data D and model weights W) remain entirely hidden. Popular frameworks for this include Circom with SnarkJS, Halo2, and StarkWare's Cairo.

The process requires you to define an arithmetic circuit that represents your ML operation. For a simple linear regression update step W_new = W - η * ∇L(W, D), the circuit encodes the gradient calculation ∇L, the learning rate η multiplication, and the weight update. This circuit is compiled into a set of constraints. Using your private inputs (W, D) and public parameters (e.g., the new model hash), the proving key generates a proof. The critical output is the public outputs of the circuit, such as the cryptographic commitment to the new model state commit(W_new), which becomes the verifiable claim.

Here is a conceptual outline using a Circom template for a gradient step. The circuit takes private signals for the weights and data batch, and outputs a commitment to the updated weights.

circom
template GradientStep() {
    // Private inputs
    signal input weights[10];
    signal input data_batch[100];
    signal input learning_rate;

    // Public output: commitment to new weights (simplified as a hash)
    signal output new_weights_hash;

    // ... Constraints to compute weights - lr * gradient(data_batch, weights)
    // ... Hash the resulting new_weights to produce new_weights_hash
}

After compiling this circuit, you use SnarkJS with the proving key to generate a proof.json and public_signals.json file from your actual private inputs.

Performance and cost are key considerations. Proof generation is computationally intensive. For a non-trivial model, proving time can range from seconds to minutes on a consumer machine and requires significant RAM (often 8-16GB+). The choice of proving system involves trade-offs: zk-SNARKs (e.g., Groth16) have small, constant-sized proofs and fast verification but require a trusted setup. zk-STARKs are transparent (no trusted setup) and faster to prove, but generate larger proofs. The proof generation time and memory overhead are the primary bottlenecks for client-side applications.

To integrate this into a federated learning round, after generating the proof, the client sends two items to the blockchain or aggregator: 1) the proof bytes, and 2) the public signals (like new_weights_hash and a nullifier to prevent replay). The on-chain verifier contract, which holds the verifying key, will validate the proof against these public signals. If valid, it accepts new_weights_hash as the correct commitment for the client's updated model. This allows the system to trust that the update was computed correctly according to the protocol, without learning anything about the client's private dataset.

on-chain-verification

PRIVACY-PRESERVING ML

Step 3: On-Chain Verification and Aggregation

This step details how to generate and verify zero-knowledge proofs for private model updates on-chain, ensuring data privacy while maintaining cryptographic integrity.

After a node computes a local model update on its private dataset, it must prove the correctness of this computation without revealing the underlying data. This is achieved by generating a zero-knowledge proof (ZKP). The node uses a proving key to create a cryptographic proof that attests: the update was computed correctly from valid input data, the computation followed the agreed-upon machine learning algorithm (e.g., a specific gradient descent step), and the resulting model parameters are valid. Popular frameworks for this include zk-SNARKs (via Circom or Halo2) and zk-STARKs, chosen based on the trade-offs between proof size, verification speed, and trust assumptions.

The generated proof and the new model parameters (the public output of the computation) are then submitted to a verifier smart contract on the blockchain. This contract contains a verification key corresponding to the proving key. Its sole function is to run a lightweight verification algorithm that checks the proof against the public inputs (model parameters). If the verification passes, the contract emits an event or stores a commitment, cryptographically confirming that the update is valid. This process ensures the network can trust the node's contribution without any participant ever seeing the private training data. The gas cost for on-chain verification is a critical design consideration.

Once multiple nodes have submitted verified updates, the system must aggregate them into a new global model. A separate smart contract, often an aggregator, collects the verified model parameter sets. It then executes a pre-defined aggregation function, such as Federated Averaging (FedAvg). The contract computes the weighted average of the received parameters, producing the new consensus model state. This aggregated model is then stored on-chain (or in a verifiable data structure like an IPFS hash) and becomes the baseline for the next round of training, completing one federated learning cycle in a fully verifiable, privacy-preserving manner.

ZK-SNARK VS. ZK-STARK VS. PLONK

Performance Benchards and Overhead

Comparison of proving system performance for private ML model updates, measured on a standard 1M-parameter neural network update task.

Metric	ZK-SNARK (Groth16)	ZK-STARK	PLONK
Proving Time	45 sec	12 sec	28 sec
Proof Size	288 bytes	45 KB	400 bytes
Verification Time	< 10 ms	120 ms	< 15 ms
Trusted Setup Required
Quantum Resistance
Gas Cost for On-Chain Verify (ETH)	$0.85	$5.20	$1.10
Memory Overhead (Peak RAM)	4 GB	16 GB	6 GB
Recursive Proof Support

ZKML DEVELOPMENT

Frequently Asked Questions

Common technical questions and troubleshooting for implementing zero-knowledge proofs in machine learning workflows, focusing on private model updates.

The primary challenge is the computational overhead of representing non-linear activation functions (like ReLU, Sigmoid) within a ZK circuit. These functions are not natively arithmetic-friendly for proof systems like Groth16 or PLONK, which excel at additions and multiplications in finite fields.

Common solutions include:

Lookup tables: Pre-computing activation outputs for a constrained input range.
Polynomial approximations: Using low-degree polynomials (e.g., from a Taylor series) to approximate the function.
Specialized circuits: Designing custom gates in systems like Halo2 or using zkSNARK-friendly alternatives like the Quadratic Unit (QU) activation.

The choice impacts proof generation time, verification key size, and model accuracy, creating a trade-off triangle.

use-cases

ZK FOR MACHINE LEARNING

Practical Use Cases and Applications

Zero-knowledge proofs enable private, verifiable updates to machine learning models. These tools and frameworks let you implement ZKML in production.

EZKL: ZK Circuits for On-Chain ML

EZKL is a library for creating zero-knowledge proofs of TensorFlow and ONNX model inferences. It allows you to prove a model's output was computed correctly without revealing the model weights or input data.

Key Use Case: Private inference for sensitive data (e.g., medical records, financial scores).
Workflow: Export a trained model, compile it into a ZK circuit using EZKL, and generate a proof.
Integration: Proofs can be verified on Ethereum, with gas costs often under 1M gas for medium-sized models.

EXPLORE

zkML with Circom and snarkjs

Manually design ZK circuits for custom ML operations using Circom. This approach offers maximum flexibility for proving specific computations like gradient updates or activation functions.

Process: Write circuit logic in Circom, compile to R1CS, generate proofs with snarkjs.
Example: Prove a private model update via federated learning where only the aggregated gradient is revealed.
Consideration: Circuit complexity grows with model size; optimizing for constraints is critical.

EXPLORE

Private Model Aggregation with ZKPs

Use ZK proofs to securely aggregate model updates from multiple parties in federated learning. This preserves data privacy for each participant while ensuring the integrity of the aggregation process.

Mechanism: Each client proves their local update was computed correctly from their private dataset.
Verification: The aggregator verifies all proofs before combining updates into a new global model.
Framework: Often implemented using Plonk or Groth16 proving systems for efficiency.

Groth16

Common Proving System

zk-SNARKs for Verifiable Training

Prove the correctness of an entire training run. This involves generating a ZK proof that a model was trained according to a specified algorithm on a committed dataset, without revealing the data.

Challenge: Training circuits are extremely large. Solutions like incremental proving or proof recursion are necessary.
Tooling: Projects like Marlin or Plonk can help manage the proving overhead.
Application: Auditable AI models for regulatory compliance or proving fair model training in competitions.

Recursion

Key Technique

On-Chain AI Oracles with ZK Proofs

Build oracles that provide verifiable ML inferences to smart contracts. The oracle submits a ZK proof with each prediction, allowing the contract to verify its authenticity on-chain.

Architecture: Off-chain server runs the model, generates a proof, and submits both result and proof to the chain.
Use Case: Dynamic NFT traits based on verifiable real-world data, or decentralized credit scoring.
Stack: Often uses Giza Tech or custom stacks with StarkNet's Cairo for efficient verification.

EXPLORE

Choosing a ZK Proving System

Select the right cryptographic backend based on your ML task requirements.

Groth16: Small proof sizes (~200 bytes) and fast verification. Best for single, complex proofs of inference. Requires a trusted setup.
Plonk: Universal trusted setup. More efficient for circuits with many similar operations.
STARKs (e.g., StarkEx): No trusted setup, faster proving for large computations, but larger proof sizes (~100KB).
Key Metric: Focus on constraint count of your compiled ML circuit to estimate cost and feasibility.

Constraints

Primary Cost Driver

conclusion-next-steps

IMPLEMENTATION SUMMARY

Conclusion and Next Steps

This guide has walked through the core components for implementing zero-knowledge proofs to enable private updates to machine learning models.

You have now implemented a foundational system for private model updates using ZK-SNARKs. The core workflow involves: generating a proof of a valid model update without revealing the underlying training data, verifying that proof on-chain, and updating the model's state commitment. This preserves user privacy while maintaining the integrity and auditability of the federated learning process. Tools like Circom for circuit design and the snarkjs library for proof generation and verification are essential for this pipeline.

For production deployment, several critical next steps must be addressed. Circuit optimization is paramount; reducing the number of constraints directly lowers proving costs and time. Explore techniques like custom gate design or leveraging zk-friendly hash functions (e.g., Poseidon). Next, integrate a robust trusted setup ceremony (Phase 2 Perpetual Powers of Tau) for your final circuit to ensure security. Finally, you must design the on-chain verifier contract, carefully managing gas costs, which are a function of your circuit's verification key size and proof points.

To extend this system, consider these advanced research and development directions. Implement recursive proofs to aggregate multiple update proofs into a single verification, drastically improving scalability. Explore proof bounties or slashing mechanisms within your smart contract to incentivize honest participation and penalize malfeasance. For broader applicability, research frameworks like EZKL or zkML that aim to streamline the compilation of high-level model descriptions (e.g., from PyTorch) into ZK circuits, abstracting away much of the low-circuit development complexity.