Private Set Intersection (PSI)

definition

CRYPTOGRAPHIC PROTOCOL

What is Private Set Intersection (PSI)?

Private Set Intersection (PSI) is a cryptographic protocol that enables two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection.

Private Set Intersection (PSI) is a secure multi-party computation (MPC) protocol designed to solve a fundamental privacy challenge: determining common elements between datasets held by mutually distrustful parties. For example, two hospitals can discover shared patients or two financial institutions can identify common high-risk clients, all while keeping their complete, non-matching data entries completely confidential. The core cryptographic guarantee is that each party learns only the intersecting items and nothing else about the other party's set. This is a critical advancement over naive methods, which would require one party to share its entire dataset, creating unacceptable privacy and security risks.

The protocol typically leverages advanced cryptographic primitives such as oblivious transfer, homomorphic encryption, or Diffie-Hellman key exchange to facilitate the computation. In a common two-party PSI scheme using public-key cryptography, each party hashes and encrypts their dataset elements. They then exchange these encrypted values in a structured way that allows for comparison, but where the encryption ensures that only matches can be identified. The result is that both parties can compute the intersection A ∩ B without party A learning B's unique elements and vice-versa. Modern PSI protocols are highly optimized for performance, enabling intersections of large datasets with minimal computational and communication overhead.

In blockchain and Web3 contexts, PSI has powerful applications for privacy-preserving analytics and coordination. It can enable decentralized identity verification where services check a user's credentials against a private deny-list without learning the entire list. In decentralized finance (DeFi), it allows for confidential risk assessment by comparing wallet addresses against private sanction lists. Furthermore, PSI is foundational for private smart contracts and cross-chain protocols, where parties need to prove membership in a set (like token holders) without revealing the full set's composition, thus preserving user anonymity and commercial secrecy while enabling trustless collaboration.

how-it-works

CRYPTOGRAPHIC PROTOCOL

How Does Private Set Intersection Work?

Private Set Intersection (PSI) is a cryptographic protocol that enables two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection.

Private Set Intersection (PSI) is a cryptographic protocol that allows two mutually distrusting parties, each holding a private set of data, to compute the intersection of their sets without revealing any elements outside the shared result. This ensures data privacy is maintained throughout the computation. The core security guarantee is that after the protocol concludes, each party learns only which items they have in common and gains no knowledge about the other party's unique, non-intersecting elements. This makes PSI a foundational tool for privacy-preserving analytics and secure multi-party computation (MPC).

The protocol typically works by having each party transform their data items into cryptographic commitments or oblivious pseudorandom function (OPRF) evaluations. In a common two-party scheme, Party A encrypts its set and sends the encrypted values to Party B. Party B then performs a cryptographic operation on these values using its own secret key and returns the processed set. Party A can then decrypt this result to identify matches, but cannot learn anything about Party B's other items. Modern efficient PSI protocols, such as those based on Diffie-Hellman key exchange or homomorphic encryption, achieve this with linear computational and communication complexity.

In blockchain and Web3 contexts, PSI enables critical privacy-preserving use cases without relying on trusted intermediaries. Key applications include private contact discovery in messaging apps, fraud detection where financial institutions can find common bad actors without sharing sensitive client lists, and genomic testing for finding genetic matches while keeping DNA data private. For decentralized networks, it allows nodes or validators to privately synchronize state or check for data availability without exposing their full local datasets, enhancing both privacy and scalability.

key-features

CORE MECHANICS

Key Features of Private Set Intersection (PSI)

Private Set Intersection (PSI) is a cryptographic protocol that allows two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection. This section details its fundamental technical properties and applications.

01

Input Privacy Guarantee

The core security property of PSI ensures that participants learn only the intersection of their datasets. No information is revealed about the elements that are unique to any party. This is achieved through cryptographic techniques like oblivious transfer, homomorphic encryption, or Diffie-Hellman key exchange combined with hash functions.

02

Protocol Outputs

PSI protocols can be configured to produce different types of outputs:

Plain PSI: Parties learn which specific items they have in common.
PSI-Cardinality: Parties learn only the number of intersecting items, not the items themselves.
PSI with Associated Data: When an intersection is found, one party can reveal associated data (e.g., a user's risk score) for the matched item.

03

Multi-Party vs. Two-Party

PSI scales beyond two participants:

Two-Party PSI: The most common and efficient variant, involving a direct computation between two entities.
Multi-Party PSI (MPSI): Allows three or more parties to compute the common intersection of all their private sets. This is significantly more complex and computationally intensive but enables broader consortium analysis.

04

Malicious vs. Semi-Honest Security

PSI protocols are designed with different adversarial models:

Semi-Honest (Passive) Security: Assumes parties follow the protocol but may try to learn extra information from the messages they receive. Most practical PSI implementations use this model for efficiency.
Malicious (Active) Security: Protects against parties who may arbitrarily deviate from the protocol to cheat or learn private data. These protocols are more robust but come with higher computational and communication overhead.

05

Performance & Scalability

Modern PSI protocols are optimized for large datasets. Key performance metrics include:

Communication Complexity: The total data sent between parties. Modern protocols like ECNR or KKRT16 achieve linear communication overhead.
Computational Complexity: The processing time required, often dominated by cryptographic operations like hashing or public-key encryption. Garbled Circuit-based PSI can be computationally heavy.
Preprocessing: Some protocols allow for one-time, dataset-independent setup to drastically speed up future executions.

06

Common Cryptographic Building Blocks

PSI is not a single algorithm but a family of protocols built from cryptographic primitives:

Oblivious Transfer (OT): A fundamental primitive where a receiver gets one of several messages from a sender without the sender learning which was chosen.
Homomorphic Encryption (HE): Allows computation on encrypted data; used in some PSI variants for set operations under encryption.
Bloom Filters & Cuckoo Filters: Probabilistic data structures used in efficient, heuristic-based PSI to reduce communication costs, sometimes at the expense of perfect accuracy.

examples

PRIVATE SET INTERSECTION (PSI)

Real-World Use Cases & Examples

Private Set Intersection (PSI) enables two or more parties to compute the common elements in their datasets without revealing any information about elements not in the intersection. Below are key applications where this cryptographic protocol provides critical privacy guarantees.

01

Contact Tracing & Health Data

During pandemics, health authorities can use PSI to identify individuals who have been in contact with infected persons without exposing the identities of healthy individuals. For example, a user's phone can compute an intersection with a health authority's list of infection location hashes, revealing only if a match occurred. This protects user privacy while enabling epidemiological analysis.

Privacy-Preserving Alerting: Users are notified of exposure without revealing their location history to the authority.
Data Minimization: Authorities learn only the count of contacts, not their identities, unless a user voluntarily reports.

02

Fraud Detection in Banking

Financial institutions can collaboratively detect synthetic identity fraud or money laundering rings without sharing sensitive customer data. Using PSI, multiple banks can discover if the same identity (e.g., SSN, phone number) appears in their separate fraud alert lists.

Regulatory Compliance: Enables collaboration for Anti-Money Laundering (AML) checks while adhering to data privacy laws like GDPR.
Cross-Institutional Analysis: Uncovers coordinated fraud attacks across banks without leaking each bank's full client list or internal risk scores.

03

Ad Conversion Measurement

Advertisers and publishers (e.g., social media platforms) can measure campaign effectiveness—such as which ad viewers made a purchase—without sharing user-level data. They use PSI to find matches between the advertiser's customer database and the publisher's ad impression logs.

Attribution Analysis: Determines conversion rates without the publisher learning who bought what, or the advertiser learning users' browsing habits.
Privacy-First Analytics: Replaces traditional tracking pixels, aligning with increasing restrictions on third-party cookies and user tracking.

04

Secure DNA Matching

Research institutions or genealogy services can find genetic relatives by intersecting encoded DNA markers while keeping the vast majority of an individual's genetic data private. This is crucial for medical research and family tree building.

Confidential Queries: A patient can query a research database for specific genetic disease markers without revealing their full genome.
Kinship Discovery: Services like ancestry.com could implement PSI to let users find relatives without either party exposing their raw DNA data to the other.

05

Credential & Watchlist Screening

Organizations can verify if a user's credential (e.g., email, ID number) is on a private watchlist or has been compromised in a breach. The service holding the breach database learns nothing about the querying user's credential if it is not on the list.

Password Checkers: Services like 'Have I Been Pwned' can be enhanced with PSI, allowing users to check passwords against breach corpuses without sending the password in plaintext.
Secure Onboarding: A company can screen a potential employee's ID against a confidential industry blacklist without obtaining the list itself.

06

Private Contact Discovery

Messaging apps (e.g., Signal, WhatsApp) need to discover which of a user's contacts are also on the platform. Naively uploading the entire address book to the server is a privacy risk. PSI allows the server to compute the intersection and inform the user of matches without learning the user's full contact list or the non-members' phone numbers.

User Privacy: The service provider cannot build a social graph of non-users.
Efficiency: Modern PSI protocols like ECDH-based PSI are efficient enough for mobile devices with large contact lists.

ecosystem-usage

CRYPTOGRAPHIC PRIMITIVE

PSI in the Blockchain Ecosystem

Private Set Intersection (PSI) is a cryptographic protocol that allows two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection.

01

Core Cryptographic Mechanism

PSI protocols typically use public-key cryptography and oblivious transfer. A common method involves each party hashing their data, encrypting it with a shared or commutative key, and exchanging the results. By comparing the encrypted sets, they can identify matching entries without decrypting non-matching ones, ensuring input privacy and correctness.

02

Key Use Case: Private Data Matching

PSI enables secure collaboration on sensitive datasets. Key applications include:

Compliance & KYC: Financial institutions can check client lists against sanctions lists without exposing their full customer database.
Fraud Detection: Exchanges can privately identify shared bad actors.
Healthcare Research: Hospitals can find common patients for studies without sharing full medical records.

03

PSI in Decentralized Finance (DeFi)

In DeFi, PSI facilitates privacy-preserving financial operations. A primary use is for credit delegation or under-collateralized lending, where a borrower can prove membership in a reputable group (e.g., DAO members, high-reputation addresses) without revealing their entire transaction history or the full group list. This balances risk assessment with user privacy.

04

Implementation: PSI Protocols & Libraries

Several libraries implement PSI for developers. Google's Private Join and Compute is an open-source framework for computing on the intersection of encrypted datasets. Mozilla's Prio and academic libraries like libPSI offer implementations of protocols such as Diffie-Hellman-based PSI and Ecdh-PSI. These are integrated into blockchain oracles and middleware layers.

EXPLORE

05

Blockchain-Specific Challenges

Implementing PSI on-chain presents unique hurdles:

Cost & Latency: Cryptographic computations are gas-intensive on networks like Ethereum.
Data Availability: Input data must be available for protocol execution, conflicting with full privacy goals.
Trusted Setup: Some efficient PSI variants require a trusted setup phase, introducing a potential centralization point. Solutions often use off-chain computation with on-chain verification.

06

Related Concept: Multi-Party Computation (MPC)

PSI is a specific application of the broader field of Secure Multi-Party Computation (MPC). While MPC allows parties to jointly compute any function over their private inputs, PSI is specialized for computing set intersections. Understanding MPC's general framework—where no party learns anything beyond the output—provides context for PSI's capabilities and limitations.

COMPARISON MATRIX

PSI vs. Related Privacy Techniques

A technical comparison of Private Set Intersection (PSI) with other cryptographic protocols for privacy-preserving computation.

Feature / Property	Private Set Intersection (PSI)	Homomorphic Encryption (FHE)	Secure Multi-Party Computation (MPC)	Zero-Knowledge Proofs (ZKPs)
Core Function	Computes intersection of private sets	Computes on encrypted data	Jointly computes a function over private inputs	Proves statement validity without revealing data
Primary Use Case	Private contact discovery, threat intel sharing	Encrypted data analytics, private ML inference	Private auctions, distributed key generation	Identity verification, transaction privacy
Input Privacy
Output Privacy	Intersection only	Encrypted result only	Agreed-upon output only	Proof only (no raw output)
Computational Overhead	Low to Moderate (O(n log n))	Very High	High (communication rounds)	High (proof generation)
Communication Rounds	1-3 (semi-honest)	1	Multiple (depends on circuit depth)	1 (non-interactive)
Cryptographic Primitive	Oblivious Transfer, Diffie-Hellman	Lattice-based cryptography	Secret Sharing, Garbled Circuits	Elliptic Curves, Pairings
Reveals Set Size

security-considerations

PRIVATE SET INTERSECTION

Security Considerations & Limitations

While PSI enables privacy-preserving computation, its practical implementation involves trade-offs in security, performance, and trust assumptions that must be carefully evaluated.

01

Trust Assumptions & Threat Models

The security of a PSI protocol depends on its underlying trust model. Semi-honest (passive) adversaries follow the protocol but try to learn extra information, while malicious (active) adversaries may deviate arbitrarily. Most efficient PSI protocols assume a semi-honest model, requiring additional zero-knowledge proofs for malicious security, which significantly increases computational overhead. The choice of model directly impacts the required cryptographic primitives and the protocol's resilience to real-world attacks.

02

Computational & Communication Overhead

Achieving cryptographic privacy introduces inherent performance costs. Oblivious Transfer and Homomorphic Encryption-based PSI can require substantial computation per element. Communication complexity is also critical; naive protocols scale quadratically (O(n²)). Modern protocols like ECDH-PSI or KKRT16 achieve linear or near-linear scaling but still involve transmitting encrypted data structures like Bloom filters or Cuckoo filters, which can be large (e.g., several megabytes per million items). This limits real-time applicability for very large datasets.

03

Input-Dependent Leakage & Metadata

Even a "leakage-free" PSI protocol can reveal metadata. The cardinality (size) of each party's input set is often revealed. The timing of protocol execution or network traffic patterns (side-channel attacks) could leak information about set sizes or similarities. Furthermore, if the intersection result itself is sensitive (e.g., a very small or very large result), that outcome conveys information. Protocols must be analyzed for what they explicitly leak beyond the intended intersection.

04

Implementation Vulnerabilities

The theoretical security of a PSI protocol can be undermined by implementation flaws. Common issues include:

Insecure randomness: Weak random number generation for cryptographic nonces or keys.
Timing attacks: Execution time variations that correlate with secret data.
Memory handling: Failure to securely wipe sensitive intermediate values from memory.
Protocol misuse: Incorrect parameter selection (e.g., filter false-positive rates) that weakens security guarantees. These require rigorous code auditing and testing beyond the cryptographic design.

05

Limitations in Dynamic & Multi-Party Settings

Many high-performance PSI protocols are designed for two parties with static input sets. Extending to multi-party PSI (MPSI) where N > 2 parties compute the common intersection introduces exponential complexity in rounds or communication. Supporting dynamic sets where inputs can be added or removed without re-running the entire protocol is non-trivial and often requires sacrificing efficiency or privacy. These settings often demand custom protocol designs with different trade-offs.

06

Dependence on Cryptographic Hardness

PSI security rests on the assumed hardness of mathematical problems. Most protocols rely on the Decisional Diffie-Hellman (DDH) assumption, Learning With Errors (LWE), or the security of specific symmetric cryptographic primitives. A future breakthrough in cryptanalysis (e.g., quantum algorithms breaking DDH) could retroactively compromise the privacy of all past PSI executions. This necessitates careful selection of post-quantum secure PSI protocols for long-term confidentiality.

DEBUNKING MYTHS

Common Misconceptions About Private Set Intersection (PSI)

Private Set Intersection (PSI) is a powerful cryptographic protocol often misunderstood. This section clarifies prevalent technical inaccuracies about its capabilities, security model, and practical applications.

No, Private Set Intersection (PSI) and Fully Homomorphic Encryption (FHE) are distinct cryptographic primitives with different capabilities. PSI is a specific protocol designed to compute the intersection of two private datasets without revealing non-intersecting elements. FHE is a general-purpose cryptographic scheme that allows arbitrary computations on encrypted data. While some modern PSI protocols may use FHE as a building block, traditional PSI often relies on other techniques like oblivious transfer or Diffie-Hellman key exchange. PSI is optimized for a single, specific operation (set intersection), making it vastly more efficient for that task than using generic FHE.

PRIVATE SET INTERSECTION

Technical Deep Dive

Private Set Intersection (PSI) is a cryptographic protocol that allows two or more parties to compute the intersection of their private datasets without revealing any information about items not in the intersection.

Private Set Intersection (PSI) is a cryptographic protocol that enables two mutually distrusting parties to discover the common elements in their private datasets without revealing any other information about their respective sets. It works by having each party encrypt or hash their data using a specific, agreed-upon cryptographic function, often involving public-key cryptography or oblivious transfer. The core mechanism ensures that only the fact of an element's presence in both sets is revealed, while all other data remains confidential. For example, in a blockchain context, a decentralized application (dApp) could use PSI to check if a user's credential is on a whitelist held by a server, without the server learning the user's credential if it's not on the list, and without the user learning the entire whitelist.

PRIVATE SET INTERSECTION

Frequently Asked Questions (FAQ)

Private Set Intersection (PSI) is a cryptographic protocol enabling two parties to compute the intersection of their private datasets without revealing any information beyond the intersection itself. This section addresses common technical and practical questions.

Private Set Intersection (PSI) is a cryptographic protocol that allows two mutually distrusting parties, each holding a private set of data, to compute the intersection of their sets without revealing any elements not in the intersection. At a high level, it works by having one party transform its set elements into cryptographic commitments or encrypted values (e.g., using Diffie-Hellman key exchange or homomorphic encryption) and sending them to the other party, who can then perform a matching operation on their own encrypted set. The core cryptographic guarantee is that no party learns anything about the other's set elements except for which ones are shared. For example, a hospital could discover which patients are also clients of a research lab without exposing the full patient lists.

further-reading

PRIVATE SET INTERSECTION (PSI)

What is Private Set Intersection (PSI)?

How Does Private Set Intersection Work?

Key Features of Private Set Intersection (PSI)

Input Privacy Guarantee

Protocol Outputs

Multi-Party vs. Two-Party

Malicious vs. Semi-Honest Security

Performance & Scalability

Common Cryptographic Building Blocks

Real-World Use Cases & Examples

Contact Tracing & Health Data

Fraud Detection in Banking

Ad Conversion Measurement

Secure DNA Matching

Credential & Watchlist Screening

Private Contact Discovery

PSI in the Blockchain Ecosystem

Core Cryptographic Mechanism

Key Use Case: Private Data Matching

PSI in Decentralized Finance (DeFi)

Implementation: PSI Protocols & Libraries

Blockchain-Specific Challenges

Related Concept: Multi-Party Computation (MPC)

PSI vs. Related Privacy Techniques

Security Considerations & Limitations

Trust Assumptions & Threat Models

Computational & Communication Overhead

Input-Dependent Leakage & Metadata

Implementation Vulnerabilities

Limitations in Dynamic & Multi-Party Settings

Dependence on Cryptographic Hardness

Common Misconceptions About Private Set Intersection (PSI)

Technical Deep Dive

Frequently Asked Questions (FAQ)

Related Terms & Concepts

Secure Multi-Party Computation (MPC)

Zero-Knowledge Proofs (ZKPs)

Homomorphic Encryption

Differential Privacy

Private Information Retrieval (PIR)

Oblivious Transfer (OT)

Further Reading & Resources

Core Cryptographic Protocols

Blockchain & Web3 Applications

PSI-Cardinality & PSI-Sum

Real-World Implementations & Libraries

Privacy vs. Performance Trade-offs

Related Privacy Technologies

Get In Touch today.

Get In Touch
today.