Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
LABS
Guides

How to Architect an On-Chain Fraud Detection System

A technical guide for developers on building systems to detect anomalous token flows, suspicious contract interactions, and wallet clustering using blockchain data.
Chainscore © 2026
introduction
DEVELOPER GUIDE

How to Architect an On-Chain Fraud Detection System

A technical blueprint for building a system that monitors, analyzes, and flags suspicious activity on blockchain networks using smart contracts and off-chain analytics.

An effective on-chain fraud detection system is a multi-layered architecture that combines real-time blockchain data ingestion with intelligent analysis. The core components are a data ingestion layer that streams transactions from nodes or indexers, a processing engine that applies detection rules, and an alerting system that notifies stakeholders. This architecture must be designed for low latency to identify threats like flash loan attacks or wallet drainers before funds are irreversibly moved. Systems like Forta Network and OpenZeppelin Defender provide foundational frameworks for this purpose.

The first architectural decision is choosing your data source. You can pull data directly from a node's JSON-RPC API, subscribe to events via WebSocket for real-time alerts, or use a specialized indexing service like The Graph or Chainscore. For most production systems, a hybrid approach is best: use a reliable RPC provider (e.g., Alchemy, Infura) for broad data and an indexer for complex historical queries. Your ingestion service must normalize this data into a consistent schema, handling chain reorgs and ensuring no transactions are missed, which is critical for audit trails.

At the heart of the system is the detection engine, where you implement specific agents or bots. Each agent contains logic for a specific threat model. For example, one agent might monitor for transfer() calls to newly created contracts (a common dusting attack precursor), while another tracks large, anomalous withdrawals from a protocol's treasury. Code these agents in a language like JavaScript or Python, using libraries such as ethers.js or web3.py. The logic should evaluate transaction parameters, sender/receiver history, contract interactions, and gas patterns.

Detection rules fall into several categories. Signature-based detection looks for known malicious patterns, like function selectors used in exploit kits. Anomaly detection uses heuristics or machine learning to flag deviations from a wallet's or contract's normal behavior, such as a sudden 1000% increase in transfer volume. Graph analysis maps relationships between addresses to identify money laundering rings or coordinated attack clusters. Implementing a combination of these methods, as seen in platforms like TRM Labs, significantly increases coverage.

Upon detecting a potential threat, the system must execute a predefined response. This ranges from passive alerting via Slack, Discord, or PagerDuty, to active intervention. For protocols with pausable contracts or guardian roles, the system can automatically submit a transaction to pause operations via a secure relayer. Always implement a multi-signature or time-lock delay for any active intervention to prevent the detection system itself from becoming an attack vector. All alerts and actions must be logged immutably, both on-chain and in a secure off-chain database for forensic analysis.

Finally, architect for scalability and maintenance. Run detection agents in isolated, serverless environments (e.g., AWS Lambda) to ensure a failure in one module doesn't crash the entire system. Use a message queue (e.g., RabbitMQ) to decouple data ingestion from processing. Regularly update your threat intelligence feeds and agent rules to adapt to new exploits. Open-source communities and audits, like those from Code4rena, are invaluable for refining detection logic. The goal is a resilient, automated sentinel that operates 24/7 with minimal false positives.

prerequisites
PREREQUISITES AND CORE TECHNOLOGIES

How to Architect an On-Chain Fraud Detection System

Building a system to detect fraud on the blockchain requires a foundational understanding of the data sources, analytical tools, and architectural patterns unique to Web3.

An effective on-chain fraud detection system is built on three core data pillars: blockchain data, smart contract events, and wallet metadata. Blockchain data, accessible via nodes or services like Alchemy and Infura, provides the immutable ledger of transactions. Smart contract events, emitted by protocols like Uniswap or Aave, offer granular insight into specific user actions. Wallet metadata, which includes labels from Etherscan or Arkham, helps cluster addresses into real-world entities. Your architecture must be designed to ingest, parse, and correlate these disparate data streams in real-time.

The analytical engine is the system's brain, where raw data transforms into actionable intelligence. This involves implementing heuristic rules (e.g., flagging transactions with high gas fees to unknown contracts) and machine learning models for anomaly detection. Tools for this layer include Python data science stacks (Pandas, Scikit-learn) for batch analysis and streaming frameworks like Apache Flink for real-time processing. A critical prerequisite is understanding common fraud vectors: rug pulls (liquidity removal), flash loan attacks (exploiting atomic transactions), and wash trading (self-dealing to manipulate metrics).

Finally, the system requires a robust data pipeline and storage layer. You'll need a high-performance database like TimescaleDB for time-series transaction data or a graph database like Neo4j to map complex relationships between wallets. The architecture should separate the ingestion pipeline (using a message queue like Apache Kafka), the processing engine, and a serving layer for queries and alerts. Implementing this allows you to track metrics such as the velocity of funds, identify money laundering patterns, and generate real-time alerts for suspicious activity across chains like Ethereum, Arbitrum, and Polygon.

key-concepts
ARCHITECTURE

Core Concepts in Fraud Detection

Building a robust on-chain fraud detection system requires understanding key architectural components, from data ingestion to real-time alerting.

01

Data Ingestion & Indexing

The foundation of any detection system is reliable, low-latency data. This involves subscribing to blockchain events via RPC nodes or services like The Graph for historical queries. Key considerations include:

  • Event Parsing: Decoding transaction logs and internal calls from smart contracts.
  • Data Normalization: Standardizing data from multiple chains (Ethereum, Solana, Arbitrum) into a unified schema.
  • Scalability: Handling high-throughput chains where block times are under 2 seconds.
02

Heuristic & Rule-Based Detection

Define explicit logic to flag suspicious activity. These are the first line of defense and are fast to execute.

  • Transaction Anomalies: Gas price spikes, failed transactions with high value, or rapid succession of similar calls.
  • Address Reputation: Checking sender/receiver against known threat intelligence feeds (e.g., Etherscan labels, Chainalysis).
  • Pattern Matching: Identifying known exploit signatures, like sandwich attacks or honeypot token patterns.
03

Machine Learning Models

ML models identify complex, non-obvious fraud patterns by learning from historical data. Common approaches include:

  • Anomaly Detection: Unsupervised models like Isolation Forests to flag outliers in transaction graphs or wallet behavior.
  • Classification: Supervised models trained on labeled datasets of "fraudulent" vs "legitimate" transactions.
  • Feature Engineering: Creating inputs like transaction velocity, neighborhood analysis (connected addresses), and time-series behavior.
04

Graph Analysis

Analyzes the network of relationships between addresses and contracts to uncover sophisticated schemes.

  • Entity Clustering: Using algorithms like Louvain to group addresses likely controlled by a single entity (e.g., a scammer's wallet network).
  • Funds Flow Tracing: Following the path of stolen assets across bridges and mixers to identify cash-out points.
  • Centrality Metrics: Identifying key intermediary addresses that facilitate large volumes of illicit flow.
05

Alert Triage & Prioritization

Not all alerts are equal. A triage system reduces noise and focuses human attention on critical risks.

  • Scoring & Severity: Assign a risk score (e.g., 1-100) based on the confidence and potential financial impact of the alert.
  • Alert Deduplication: Grouping related alerts (e.g., multiple transactions from the same attack) into a single incident.
  • Context Enrichment: Appending data like wallet history, token approvals, and associated off-chain intelligence to the alert.
06

Real-Time Response & Automation

Closing the loop from detection to action. This layer enables proactive defense.

  • API Integrations: Automatically sending alerts to Discord, Slack, or security dashboards.
  • Smart Contract Pausing: For protocol teams, integrating with admin multisigs to pause vulnerable contracts.
  • Blocklist Updates: Programmatically submitting malicious addresses to DEX aggregator or RPC provider blocklists to prevent further exploitation.
data-architecture
DATA PIPELINE DESIGN

How to Architect an On-Chain Fraud Detection System

A robust data ingestion and processing pipeline is the backbone of any effective on-chain fraud detection system. This guide outlines the architectural components and design patterns needed to monitor, analyze, and flag malicious activity across blockchains.

The first architectural layer is real-time data ingestion. You need to capture raw blockchain data from multiple sources, primarily via node RPC calls or specialized data providers like Chainlink Functions or The Graph. For Ethereum, this involves subscribing to events from a full node using eth_subscribe for new blocks and pending transactions. The goal is low-latency access to mempool data, which is critical for detecting front-running or sandwich attacks before they are confirmed. This ingestion service must be resilient to node failures and handle the high throughput of mainnet activity.

Once ingested, raw transaction data must be normalized and enriched in a stream processing layer. Tools like Apache Kafka or Apache Pulsar can manage the event stream. Here, you decode transaction calldata using ABI definitions, resolve token addresses to symbols, calculate USD values using price oracles, and map sender addresses to known entity labels (e.g., exchange addresses from Etherscan). This enrichment transforms raw hex data into structured events with contextual fields like potential_protocol, estimated_swap_value_usd, and sender_risk_score, which are essential for subsequent analysis.

The core logic resides in the detection engine, which applies rules and machine learning models to the enriched stream. A modular design is key. Rule-based detectors can flag simple patterns: a single address interacting with a tornado cash contract, or a flash loan transaction exceeding a $50M threshold. Statistical models require a feature store—a database like PostgreSQL or TimescaleDB—to track historical metrics like an address's transaction frequency or profit/loss across swaps. Models can then identify anomalies, such as sudden behavioral shifts indicative of a compromised wallet.

For complex cross-contract fraud, you need control flow analysis. This involves reconstructing the full call trace of a transaction. By analyzing the trace_transaction RPC method output, you can see the sequence of internal calls, identifying if a seemingly benign token approval leads to a drainer contract. Storing these call graphs allows you to detect new attack vectors by comparing them against known malicious patterns. This depth of analysis is computationally intensive and often performed in a separate batch-processing pipeline alongside real-time alerts.

Finally, processed alerts and risk scores must be delivered reliably. This output and action layer typically involves a message queue that feeds into notification channels (Slack, Telegram), a dashboard for investigators, and potentially an on-chain pausing mechanism for protected contracts via a multisig. The architecture must be observable, with metrics for alert volume, false positive rates, and pipeline latency logged to systems like Prometheus. The entire system should be deployable as a modular, containerized suite using Docker and orchestrated with Kubernetes for scalability and resilience.

defining-heuristics
GUIDE

How to Architect an On-Chain Fraud Detection System

A technical guide to building a modular, data-driven system for identifying and mitigating fraudulent activity across blockchain networks.

An on-chain fraud detection system analyzes transaction data, wallet behaviors, and smart contract interactions to identify malicious patterns. Unlike traditional security focused on a single exploit, this architecture must process high-volume, real-time data from multiple sources like mempools, block explorers, and indexers. The core components are a data ingestion layer, a heuristic engine for applying detection rules, and an alerting/action layer. This modular design allows for continuous updates as new attack vectors, such as address poisoning or sandwich attacks, are discovered.

The foundation is the data model. You need to structure raw blockchain data into analyzable entities. This involves tracking EOA (Externally Owned Account) and contract addresses, their transaction history, token holdings (via ERC-20/721 standards), and interaction graphs. For example, a heuristic for detecting a wallet drainer might analyze the sequence of approve() and transferFrom() calls from a new, untrusted contract. Data is typically sourced from RPC nodes, The Graph subgraphs, or services like Dune Analytics, then normalized and stored for low-latency querying.

Risk heuristics are the detection rules. Start with simple, high-signal rules before implementing complex machine learning models. Examples include: - Velocity Checks: Unusually high transaction frequency from an address. - New Contract Interaction: First-time interaction with a recently deployed, unaudited contract. - Anomalous Value Transfer: A transaction amount drastically outside a wallet's historical norm. - Poisoned Address Detection: A transfer to an address that is a near-identical copy of a known address (differing by 1-2 characters). Each heuristic should output a risk score and metadata.

Here is a simplified code example for a heuristic checking for interaction with a newly deployed contract, written in a Node.js-like environment using ethers.js and a hypothetical database. This rule flags interactions with contracts less than 24 hours old.

javascript
async function checkNewContractHeuristic(tx, dbClient) {
  const contractAddress = tx.to;
  if (!contractAddress) return null; // Not a contract call

  // Fetch contract creation timestamp from DB or block explorer API
  const deployInfo = await dbClient.query(
    'SELECT block_timestamp FROM contracts WHERE address = $1',
    [contractAddress]
  );

  if (deployInfo.rows.length === 0) {
    // Contract not in our DB, could be brand new - high risk
    return { riskScore: 85, type: 'NEW_UNKNOWN_CONTRACT' };
  }

  const deployTime = deployInfo.rows[0].block_timestamp;
  const ageInHours = (Date.now() / 1000 - deployTime) / 3600;

  if (ageInHours < 24) {
    return { 
      riskScore: 70, 
      type: 'NEW_CONTRACT_INTERACTION',
      metadata: { contractAgeHours: ageInHours }
    };
  }
  return null; // No risk detected
}

The system must aggregate scores from multiple heuristics to assess overall risk. A weighted scoring model is common, where critical heuristics (like a signature from a known malicious domain) carry more weight. The final step is the action layer. For a high-confidence fraud signal, actions can range from generating an alert in a dashboard, to programmatically submitting a transaction to a pause guardian contract, or even broadcasting a competing transaction to front-run an attack. Integrating with incident response platforms like PagerDuty or OpenZeppelin Defender is crucial for operational readiness.

Maintaining the system requires constant iteration. You must log all heuristic triggers and outcomes to create a feedback loop, tuning rules to reduce false positives. Stay updated on emerging threats by monitoring communities like the Blockchain Security Alliance and reports from firms like CertiK and OpenZeppelin. The most effective architectures are those that are data-transparent, modular (allowing easy addition/removal of heuristics), and fast, as on-chain fraud often unfolds in a matter of blocks.

HEURISTIC CATEGORIES

Common Fraud Detection Heuristics

Core logic patterns used to identify suspicious on-chain activity across different threat vectors.

Heuristic / PatternFinancial AnomaliesBehavioral AnomaliesContract & Protocol Exploits

Sudden Large Value Transfer

Tx value > 95th percentile of wallet history

First interaction with recipient address

Flash loan collateralization or exit liquidity drain

Velocity & Repetition

50 transactions in 1 hour from single EOA

Identical function call to multiple contracts in short burst

Repeated failed approval/transfer calls indicating probe

Address Graph Analysis

Funding from known mixer or stolen fund wallet

Interaction with newly created, non-verified contracts

Proximity to known exploit contract addresses (<2 hops)

Transaction Parameter Manipulation

Gas price spike >300% above network average

Unusual calldata patterns or excessive padding

Malformed input data targeting specific function selector

Time-Based Anomalies

Transaction occurs outside user's typical activity window

Immediate liquidation after borrowing in same block

Exploit execution within minutes of contract deployment or upgrade

Composability Risk

Recursive calls between protocols creating unsustainable leverage

Oracle price manipulation via low-liquidity pool swaps

Re-entrancy into a state-changing function after external call

wallet-clustering
ADVANCED TECHNIQUES

How to Architect an On-Chain Fraud Detection System

This guide details the architectural components and analytical techniques required to build a robust system for detecting fraudulent activity on public blockchains, focusing on wallet clustering and transaction graph analysis.

An effective on-chain fraud detection system requires a multi-layered architecture that ingests, processes, and analyzes blockchain data at scale. The foundational layer is a data pipeline that streams raw transaction data from nodes or indexers like The Graph or Covalent. This data must be normalized and structured into a graph database, such as Neo4j or TigerGraph, where entities (wallets, contracts) are nodes and transactions are edges. A separate analytical layer then applies algorithms to this graph to identify patterns, while a rules engine flags suspicious activity based on predefined heuristics and machine learning models.

Wallet clustering is the process of grouping multiple addresses controlled by a single entity. This is critical because sophisticated actors fragment funds across dozens of addresses to evade detection. Common heuristics for clustering include: - Multi-send transactions where one address sends assets to many new addresses in a single transaction. - Common Input Ownership (CIO), which assumes all inputs to a transaction are controlled by the same entity. - Change address identification, recognizing outputs that return unspent value to the sender. Tools like Arkham and Nansen use these techniques to build entity profiles, which form the basis for more advanced behavioral analysis.

Once wallets are clustered into entities, transaction graph analysis reveals the flow of funds and network structures. Key metrics include centrality (identifying influential hubs), community detection (finding tightly-knit groups like money laundering rings), and path analysis (tracing fund origins). For example, a sudden burst of transactions forming a star topology, where many new wallets send funds to a central address, is a hallmark of a phishing scam or rug pull. Implementing these analyses requires graph query languages like Cypher or Gremlin to efficiently traverse millions of relationships.

To operationalize detection, you need a rules engine that translates graph insights into alerts. Rules can be simple, like flagging any transaction from a wallet on a sanctions list, or complex, involving sequences of events. For instance: IF (entity receives funds from a mixer) AND (within 3 blocks, sends 80% to a DEX) THEN (risk_score = HIGH). Open-source frameworks like Apache Spark with GraphX can batch-process these rules across large datasets, while streaming platforms like Flink enable real-time alerting. The output is a risk score per transaction or entity for investigators to review.

No system is complete without continuous feedback. A false positive feedback loop is essential: analysts review alerts, confirm or dismiss them, and those labels are used to retrain any machine learning models. Over time, the system learns to distinguish between normal DeFi arbitrage and actual wash trading. This architecture—robust data ingestion, entity resolution via clustering, graph-based behavioral analysis, and a tunable rules engine—forms the core of a scalable fraud detection system capable of adapting to evolving on-chain threats.

alerting-workflow
GUIDE

How to Architect an On-Chain Fraud Detection System

A practical guide to designing and implementing a robust alerting and response system for detecting suspicious on-chain activity.

An effective on-chain fraud detection system moves beyond simple monitoring to a structured pipeline of data ingestion, analysis, and automated response. The core architecture typically involves three layers: a data ingestion layer that streams raw blockchain data from RPC nodes or indexers, a processing and detection layer where rules and machine learning models analyze transactions, and an alerting and response layer that triggers notifications or automated actions. This separation of concerns allows for scalability, as each component can be independently optimized and updated to handle new threats like flash loan attacks, phishing scams, or protocol exploits.

The detection logic is the system's intelligence. Start with rule-based heuristics for known patterns: detecting anomalous token approvals to new contracts, large unexpected withdrawals from a treasury, or interactions with known malicious addresses from threat intelligence feeds. For more sophisticated detection, incorporate machine learning models trained on historical attack data to identify subtle, novel patterns. Tools like EigenPhi for MEV analysis or Forta Network for community-shared detection bots can augment your custom rules. All detection logic should output structured alerts with severity scores, contextual transaction data, and relevant entity labels (e.g., attacker_eoa, victim_contract).

Alerting must be actionable to be useful. Configure alerts to route based on severity: critical alerts might trigger an automated response via a smart contract pauser module or a Telegram/Slack message to a security war room, while lower-severity alerts feed into a dashboard for review. Implement alert deduplication and grouping to avoid notification fatigue from related transactions. The response workflow should be clearly documented, specifying steps for investigation (e.g., tracing funds with Tenderly or Etherscan), containment, and, if necessary, communication. Finally, maintain a feedback loop where investigated alerts are used to refine detection rules, creating a continuously improving system.

tools-and-frameworks
ARCHITECTURE COMPONENTS

Tools and Frameworks for Implementation

Building a robust on-chain fraud detection system requires specialized tools for data ingestion, analysis, and response. This guide covers the core frameworks and libraries used in production.

ON-CHAIN FRAUD DETECTION

Frequently Asked Questions

Common technical questions and solutions for developers building fraud detection systems on the blockchain.

On-chain detection analyzes data immutably recorded on the blockchain ledger, such as transaction patterns, smart contract interactions, and wallet behaviors. This is transparent and verifiable by anyone. Off-chain detection uses external data sources like IP addresses, device fingerprints, and centralized KYC databases, which are not part of the blockchain's consensus.

Key differences:

  • Data Source: On-chain uses public ledger data; off-chain uses private, external data.
  • Transparency: On-chain logic and inputs are auditable; off-chain processes are often opaque.
  • Finality: On-chain detection can be used for autonomous, trust-minimized actions (e.g., pausing a contract). Off-chain signals typically require a trusted oracle or manual review to influence on-chain state.
conclusion
ARCHITECTURE REVIEW

Conclusion and Next Steps

This guide has outlined the core components for building a robust on-chain fraud detection system. The next steps involve operationalizing the architecture and expanding its capabilities.

Building an effective fraud detection system is an iterative process. Start by implementing the foundational layers discussed: the Data Ingestion Layer using services like The Graph or Subsquid for historical data and WebSocket providers like Alchemy or QuickNode for real-time streams. Then, focus on the Rule Engine, beginning with a few high-impact heuristics such as detecting flash loan attacks or identifying newly created scam tokens. Use a modular design for your rules to allow for easy updates and additions as new attack vectors emerge.

For production deployment, prioritize the Alerting and Response Layer. Integrate with communication platforms like Slack or Discord for immediate notifications and consider building automated response scripts, such as pausing a vulnerable pool or blacklisting a malicious address. Remember that false positives are inevitable; implement a feedback loop where analysts can label alerts, which in turn refines your machine learning models and rule thresholds. Tools like Tenderly for transaction simulation are invaluable for investigating suspicious activity before taking action.

To advance your system, explore more sophisticated detection techniques. Move beyond simple heuristics to implement anomaly detection models that track deviations from normal user or contract behavior. Incorporate reputation scoring by aggregating data from sources like Etherscan labels, past incident reports from OpenZeppelin Defender, or on-chain attestations. For cross-chain visibility, consider indexing data from Layer 2s and alternative Layer 1s using specialized providers, as fraud often migrates to chains with less mature monitoring ecosystems.

Finally, stay engaged with the security community. Follow reports from Immunefi, rekt.news, and auditing firms to learn about new exploits. Contribute to and utilize open-source detection rule sets, such as those from Forta Network. The architecture you build is not a static product but a continuously evolving defense system that must adapt as quickly as the attackers it aims to thwart.

How to Architect an On-Chain Fraud Detection System | ChainScore Guides