Private Set Intersection (PSI) is a secure multi-party computation (MPC) protocol designed to solve a fundamental privacy challenge: determining common elements between datasets held by mutually distrustful parties. For example, two hospitals can discover shared patients or two financial institutions can identify common high-risk clients, all while keeping their complete, non-matching data entries completely confidential. The core cryptographic guarantee is that each party learns only the intersecting items and nothing else about the other party's set. This is a critical advancement over naive methods, which would require one party to share its entire dataset, creating unacceptable privacy and security risks.
Private Set Intersection (PSI)
What is Private Set Intersection (PSI)?
Private Set Intersection (PSI) is a cryptographic protocol that enables two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection.
The protocol typically leverages advanced cryptographic primitives such as oblivious transfer, homomorphic encryption, or Diffie-Hellman key exchange to facilitate the computation. In a common two-party PSI scheme using public-key cryptography, each party hashes and encrypts their dataset elements. They then exchange these encrypted values in a structured way that allows for comparison, but where the encryption ensures that only matches can be identified. The result is that both parties can compute the intersection A ∩ B without party A learning B's unique elements and vice-versa. Modern PSI protocols are highly optimized for performance, enabling intersections of large datasets with minimal computational and communication overhead.
In blockchain and Web3 contexts, PSI has powerful applications for privacy-preserving analytics and coordination. It can enable decentralized identity verification where services check a user's credentials against a private deny-list without learning the entire list. In decentralized finance (DeFi), it allows for confidential risk assessment by comparing wallet addresses against private sanction lists. Furthermore, PSI is foundational for private smart contracts and cross-chain protocols, where parties need to prove membership in a set (like token holders) without revealing the full set's composition, thus preserving user anonymity and commercial secrecy while enabling trustless collaboration.
How Does Private Set Intersection Work?
Private Set Intersection (PSI) is a cryptographic protocol that enables two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection.
Private Set Intersection (PSI) is a cryptographic protocol that allows two mutually distrusting parties, each holding a private set of data, to compute the intersection of their sets without revealing any elements outside the shared result. This ensures data privacy is maintained throughout the computation. The core security guarantee is that after the protocol concludes, each party learns only which items they have in common and gains no knowledge about the other party's unique, non-intersecting elements. This makes PSI a foundational tool for privacy-preserving analytics and secure multi-party computation (MPC).
The protocol typically works by having each party transform their data items into cryptographic commitments or oblivious pseudorandom function (OPRF) evaluations. In a common two-party scheme, Party A encrypts its set and sends the encrypted values to Party B. Party B then performs a cryptographic operation on these values using its own secret key and returns the processed set. Party A can then decrypt this result to identify matches, but cannot learn anything about Party B's other items. Modern efficient PSI protocols, such as those based on Diffie-Hellman key exchange or homomorphic encryption, achieve this with linear computational and communication complexity.
In blockchain and Web3 contexts, PSI enables critical privacy-preserving use cases without relying on trusted intermediaries. Key applications include private contact discovery in messaging apps, fraud detection where financial institutions can find common bad actors without sharing sensitive client lists, and genomic testing for finding genetic matches while keeping DNA data private. For decentralized networks, it allows nodes or validators to privately synchronize state or check for data availability without exposing their full local datasets, enhancing both privacy and scalability.
Key Features of Private Set Intersection (PSI)
Private Set Intersection (PSI) is a cryptographic protocol that allows two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection. This section details its fundamental technical properties and applications.
Input Privacy Guarantee
The core security property of PSI ensures that participants learn only the intersection of their datasets. No information is revealed about the elements that are unique to any party. This is achieved through cryptographic techniques like oblivious transfer, homomorphic encryption, or Diffie-Hellman key exchange combined with hash functions.
Protocol Outputs
PSI protocols can be configured to produce different types of outputs:
- Plain PSI: Parties learn which specific items they have in common.
- PSI-Cardinality: Parties learn only the number of intersecting items, not the items themselves.
- PSI with Associated Data: When an intersection is found, one party can reveal associated data (e.g., a user's risk score) for the matched item.
Multi-Party vs. Two-Party
PSI scales beyond two participants:
- Two-Party PSI: The most common and efficient variant, involving a direct computation between two entities.
- Multi-Party PSI (MPSI): Allows three or more parties to compute the common intersection of all their private sets. This is significantly more complex and computationally intensive but enables broader consortium analysis.
Malicious vs. Semi-Honest Security
PSI protocols are designed with different adversarial models:
- Semi-Honest (Passive) Security: Assumes parties follow the protocol but may try to learn extra information from the messages they receive. Most practical PSI implementations use this model for efficiency.
- Malicious (Active) Security: Protects against parties who may arbitrarily deviate from the protocol to cheat or learn private data. These protocols are more robust but come with higher computational and communication overhead.
Performance & Scalability
Modern PSI protocols are optimized for large datasets. Key performance metrics include:
- Communication Complexity: The total data sent between parties. Modern protocols like ECNR or KKRT16 achieve linear communication overhead.
- Computational Complexity: The processing time required, often dominated by cryptographic operations like hashing or public-key encryption. Garbled Circuit-based PSI can be computationally heavy.
- Preprocessing: Some protocols allow for one-time, dataset-independent setup to drastically speed up future executions.
Common Cryptographic Building Blocks
PSI is not a single algorithm but a family of protocols built from cryptographic primitives:
- Oblivious Transfer (OT): A fundamental primitive where a receiver gets one of several messages from a sender without the sender learning which was chosen.
- Homomorphic Encryption (HE): Allows computation on encrypted data; used in some PSI variants for set operations under encryption.
- Bloom Filters & Cuckoo Filters: Probabilistic data structures used in efficient, heuristic-based PSI to reduce communication costs, sometimes at the expense of perfect accuracy.
Real-World Use Cases & Examples
Private Set Intersection (PSI) enables two or more parties to compute the common elements in their datasets without revealing any information about elements not in the intersection. Below are key applications where this cryptographic protocol provides critical privacy guarantees.
Contact Tracing & Health Data
During pandemics, health authorities can use PSI to identify individuals who have been in contact with infected persons without exposing the identities of healthy individuals. For example, a user's phone can compute an intersection with a health authority's list of infection location hashes, revealing only if a match occurred. This protects user privacy while enabling epidemiological analysis.
- Privacy-Preserving Alerting: Users are notified of exposure without revealing their location history to the authority.
- Data Minimization: Authorities learn only the count of contacts, not their identities, unless a user voluntarily reports.
Fraud Detection in Banking
Financial institutions can collaboratively detect synthetic identity fraud or money laundering rings without sharing sensitive customer data. Using PSI, multiple banks can discover if the same identity (e.g., SSN, phone number) appears in their separate fraud alert lists.
- Regulatory Compliance: Enables collaboration for Anti-Money Laundering (AML) checks while adhering to data privacy laws like GDPR.
- Cross-Institutional Analysis: Uncovers coordinated fraud attacks across banks without leaking each bank's full client list or internal risk scores.
Ad Conversion Measurement
Advertisers and publishers (e.g., social media platforms) can measure campaign effectiveness—such as which ad viewers made a purchase—without sharing user-level data. They use PSI to find matches between the advertiser's customer database and the publisher's ad impression logs.
- Attribution Analysis: Determines conversion rates without the publisher learning who bought what, or the advertiser learning users' browsing habits.
- Privacy-First Analytics: Replaces traditional tracking pixels, aligning with increasing restrictions on third-party cookies and user tracking.
Secure DNA Matching
Research institutions or genealogy services can find genetic relatives by intersecting encoded DNA markers while keeping the vast majority of an individual's genetic data private. This is crucial for medical research and family tree building.
- Confidential Queries: A patient can query a research database for specific genetic disease markers without revealing their full genome.
- Kinship Discovery: Services like ancestry.com could implement PSI to let users find relatives without either party exposing their raw DNA data to the other.
Credential & Watchlist Screening
Organizations can verify if a user's credential (e.g., email, ID number) is on a private watchlist or has been compromised in a breach. The service holding the breach database learns nothing about the querying user's credential if it is not on the list.
- Password Checkers: Services like 'Have I Been Pwned' can be enhanced with PSI, allowing users to check passwords against breach corpuses without sending the password in plaintext.
- Secure Onboarding: A company can screen a potential employee's ID against a confidential industry blacklist without obtaining the list itself.
Private Contact Discovery
Messaging apps (e.g., Signal, WhatsApp) need to discover which of a user's contacts are also on the platform. Naively uploading the entire address book to the server is a privacy risk. PSI allows the server to compute the intersection and inform the user of matches without learning the user's full contact list or the non-members' phone numbers.
- User Privacy: The service provider cannot build a social graph of non-users.
- Efficiency: Modern PSI protocols like ECDH-based PSI are efficient enough for mobile devices with large contact lists.
PSI in the Blockchain Ecosystem
Private Set Intersection (PSI) is a cryptographic protocol that allows two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection.
Core Cryptographic Mechanism
PSI protocols typically use public-key cryptography and oblivious transfer. A common method involves each party hashing their data, encrypting it with a shared or commutative key, and exchanging the results. By comparing the encrypted sets, they can identify matching entries without decrypting non-matching ones, ensuring input privacy and correctness.
Key Use Case: Private Data Matching
PSI enables secure collaboration on sensitive datasets. Key applications include:
- Compliance & KYC: Financial institutions can check client lists against sanctions lists without exposing their full customer database.
- Fraud Detection: Exchanges can privately identify shared bad actors.
- Healthcare Research: Hospitals can find common patients for studies without sharing full medical records.
PSI in Decentralized Finance (DeFi)
In DeFi, PSI facilitates privacy-preserving financial operations. A primary use is for credit delegation or under-collateralized lending, where a borrower can prove membership in a reputable group (e.g., DAO members, high-reputation addresses) without revealing their entire transaction history or the full group list. This balances risk assessment with user privacy.
Blockchain-Specific Challenges
Implementing PSI on-chain presents unique hurdles:
- Cost & Latency: Cryptographic computations are gas-intensive on networks like Ethereum.
- Data Availability: Input data must be available for protocol execution, conflicting with full privacy goals.
- Trusted Setup: Some efficient PSI variants require a trusted setup phase, introducing a potential centralization point. Solutions often use off-chain computation with on-chain verification.
Related Concept: Multi-Party Computation (MPC)
PSI is a specific application of the broader field of Secure Multi-Party Computation (MPC). While MPC allows parties to jointly compute any function over their private inputs, PSI is specialized for computing set intersections. Understanding MPC's general framework—where no party learns anything beyond the output—provides context for PSI's capabilities and limitations.
PSI vs. Related Privacy Techniques
A technical comparison of Private Set Intersection (PSI) with other cryptographic protocols for privacy-preserving computation.
| Feature / Property | Private Set Intersection (PSI) | Homomorphic Encryption (FHE) | Secure Multi-Party Computation (MPC) | Zero-Knowledge Proofs (ZKPs) |
|---|---|---|---|---|
Core Function | Computes intersection of private sets | Computes on encrypted data | Jointly computes a function over private inputs | Proves statement validity without revealing data |
Primary Use Case | Private contact discovery, threat intel sharing | Encrypted data analytics, private ML inference | Private auctions, distributed key generation | Identity verification, transaction privacy |
Input Privacy | ||||
Output Privacy | Intersection only | Encrypted result only | Agreed-upon output only | Proof only (no raw output) |
Computational Overhead | Low to Moderate (O(n log n)) | Very High | High (communication rounds) | High (proof generation) |
Communication Rounds | 1-3 (semi-honest) | 1 | Multiple (depends on circuit depth) | 1 (non-interactive) |
Cryptographic Primitive | Oblivious Transfer, Diffie-Hellman | Lattice-based cryptography | Secret Sharing, Garbled Circuits | Elliptic Curves, Pairings |
Reveals Set Size |
Security Considerations & Limitations
While PSI enables privacy-preserving computation, its practical implementation involves trade-offs in security, performance, and trust assumptions that must be carefully evaluated.
Trust Assumptions & Threat Models
The security of a PSI protocol depends on its underlying trust model. Semi-honest (passive) adversaries follow the protocol but try to learn extra information, while malicious (active) adversaries may deviate arbitrarily. Most efficient PSI protocols assume a semi-honest model, requiring additional zero-knowledge proofs for malicious security, which significantly increases computational overhead. The choice of model directly impacts the required cryptographic primitives and the protocol's resilience to real-world attacks.
Computational & Communication Overhead
Achieving cryptographic privacy introduces inherent performance costs. Oblivious Transfer and Homomorphic Encryption-based PSI can require substantial computation per element. Communication complexity is also critical; naive protocols scale quadratically (O(n²)). Modern protocols like ECDH-PSI or KKRT16 achieve linear or near-linear scaling but still involve transmitting encrypted data structures like Bloom filters or Cuckoo filters, which can be large (e.g., several megabytes per million items). This limits real-time applicability for very large datasets.
Input-Dependent Leakage & Metadata
Even a "leakage-free" PSI protocol can reveal metadata. The cardinality (size) of each party's input set is often revealed. The timing of protocol execution or network traffic patterns (side-channel attacks) could leak information about set sizes or similarities. Furthermore, if the intersection result itself is sensitive (e.g., a very small or very large result), that outcome conveys information. Protocols must be analyzed for what they explicitly leak beyond the intended intersection.
Implementation Vulnerabilities
The theoretical security of a PSI protocol can be undermined by implementation flaws. Common issues include:
- Insecure randomness: Weak random number generation for cryptographic nonces or keys.
- Timing attacks: Execution time variations that correlate with secret data.
- Memory handling: Failure to securely wipe sensitive intermediate values from memory.
- Protocol misuse: Incorrect parameter selection (e.g., filter false-positive rates) that weakens security guarantees. These require rigorous code auditing and testing beyond the cryptographic design.
Limitations in Dynamic & Multi-Party Settings
Many high-performance PSI protocols are designed for two parties with static input sets. Extending to multi-party PSI (MPSI) where N > 2 parties compute the common intersection introduces exponential complexity in rounds or communication. Supporting dynamic sets where inputs can be added or removed without re-running the entire protocol is non-trivial and often requires sacrificing efficiency or privacy. These settings often demand custom protocol designs with different trade-offs.
Dependence on Cryptographic Hardness
PSI security rests on the assumed hardness of mathematical problems. Most protocols rely on the Decisional Diffie-Hellman (DDH) assumption, Learning With Errors (LWE), or the security of specific symmetric cryptographic primitives. A future breakthrough in cryptanalysis (e.g., quantum algorithms breaking DDH) could retroactively compromise the privacy of all past PSI executions. This necessitates careful selection of post-quantum secure PSI protocols for long-term confidentiality.
Common Misconceptions About Private Set Intersection (PSI)
Private Set Intersection (PSI) is a powerful cryptographic protocol often misunderstood. This section clarifies prevalent technical inaccuracies about its capabilities, security model, and practical applications.
No, Private Set Intersection (PSI) and Fully Homomorphic Encryption (FHE) are distinct cryptographic primitives with different capabilities. PSI is a specific protocol designed to compute the intersection of two private datasets without revealing non-intersecting elements. FHE is a general-purpose cryptographic scheme that allows arbitrary computations on encrypted data. While some modern PSI protocols may use FHE as a building block, traditional PSI often relies on other techniques like oblivious transfer or Diffie-Hellman key exchange. PSI is optimized for a single, specific operation (set intersection), making it vastly more efficient for that task than using generic FHE.
Technical Deep Dive
Private Set Intersection (PSI) is a cryptographic protocol that allows two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection.
Private Set Intersection (PSI) is a cryptographic protocol that enables two mutually distrusting parties to discover the common elements in their private datasets without revealing any other information about their respective sets. It works by having each party encrypt or hash their data using a specific, agreed-upon cryptographic function, often involving public-key cryptography or oblivious transfer. The core mechanism ensures that only the fact of an element's presence in both sets is revealed, while all other data remains confidential. For example, in a blockchain context, a decentralized application (dApp) could use PSI to check if a user's credential is on a whitelist held by a server, without the server learning the user's credential if it's not on the list, and without the user learning the entire whitelist.
Frequently Asked Questions (FAQ)
Private Set Intersection (PSI) is a cryptographic protocol enabling two parties to compute the intersection of their private datasets without revealing any information beyond the intersection itself. This section addresses common technical and practical questions.
Private Set Intersection (PSI) is a cryptographic protocol that allows two mutually distrusting parties, each holding a private set of data, to compute the intersection of their sets without revealing any elements not in the intersection. At a high level, it works by having one party transform its set elements into cryptographic commitments or encrypted values (e.g., using Diffie-Hellman key exchange or homomorphic encryption) and sending them to the other party, who can then perform a matching operation on their own encrypted set. The core cryptographic guarantee is that no party learns anything about the other's set elements except for which ones are shared. For example, a hospital could discover which patients are also clients of a research lab without exposing the full patient lists.
Further Reading & Resources
Explore the cryptographic foundations, real-world applications, and advanced protocols that power Private Set Intersection.
PSI-Cardinality & PSI-Sum
Variants of PSI that reveal only aggregated information, offering stronger privacy guarantees.
- PSI-Cardinality: Reveals only the size of the intersection (e.g., "We have 50 customers in common"), not the members themselves.
- PSI-Sum: Reveals the sum of associated values for intersecting items (e.g., "Our shared customers have a total balance of 1M tokens"), without identifying the individuals. These are crucial for privacy-preserving business intelligence.
Privacy vs. Performance Trade-offs
Different PSI protocols optimize for specific constraints.
- Computation vs. Communication: Some protocols are computationally heavy but require few communication rounds, while others are lighter on CPU but require more data exchange.
- Semi-Honest vs. Malicious Security: Protocols are designed for different adversarial models, with malicious-secure versions being more robust but slower.
- Set Size Scalability: Techniques like hashing, bloom filters, and cuckoo hashing are used to handle sets with millions of items efficiently.
Related Privacy Technologies
PSI is part of a broader toolkit for private computation.
- Zero-Knowledge Proofs (ZKPs): Prove a statement is true (e.g., "my data is in the set") without revealing the data itself. Complementary to PSI.
- Federated Learning: Train ML models on decentralized data; PSI can privately align datasets before training.
- Secure Enclaves (TEEs): Trusted Execution Environments like Intel SGX can be used to implement PSI, though they rely on hardware trust assumptions different from pure cryptography.
Get In Touch
today.
Our experts will offer a free quote and a 30min call to discuss your project.