Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
Free 30-min Web3 Consultation
Book Consultation
Smart Contract Security Audits
View Audit Services
Custom DeFi Protocol Development
Explore DeFi
Full-Stack Web3 dApp Development
View App Services
LABS
Guides

How to Design a Governance Proposal Success Rate Predictor

A technical guide to building a machine learning model that predicts the likelihood of a DAO governance proposal passing. Includes Python code for data collection, feature engineering, and model training.
Chainscore © 2026
introduction
GOVERNANCE ANALYTICS

Introduction

This guide explains how to build a machine learning model to predict the success of on-chain governance proposals.

On-chain governance is a core mechanism for decentralized protocols like Compound, Uniswap, and Arbitrum, allowing token holders to vote on protocol upgrades, treasury allocations, and parameter changes. However, proposal success rates vary widely. A governance proposal success rate predictor is a machine learning model that analyzes historical proposal data to forecast the likelihood of a new proposal passing. This tool helps governance participants, DAO contributors, and researchers understand the key factors influencing voter behavior and proposal outcomes.

Building a successful predictor involves several key steps: data collection from blockchain explorers and subgraphs, feature engineering to extract meaningful signals from raw data, model selection and training using appropriate algorithms, and evaluation and deployment to validate performance. The goal is to move beyond simple heuristics and create a data-driven model that can identify patterns—such as voter turnout thresholds, proposer reputation, and proposal complexity—that correlate with success.

This tutorial will walk through a practical implementation using Python, pandas, and scikit-learn. We will use real historical data from the Compound Governance system as a case study. You will learn how to query proposal data from The Graph, clean and structure it into a feature matrix, train a classifier like XGBoost or Random Forest, and evaluate its accuracy using metrics like precision, recall, and F1-score. The complete code will be available in a linked GitHub repository.

Understanding the technical implementation also requires grasping the limitations and ethical considerations of such models. Predictors are probabilistic tools, not oracles; they cannot account for unforeseen community sentiment shifts or external market events. Furthermore, the model's predictions could potentially influence voter behavior, creating a feedback loop. We will discuss these nuances to ensure the tool is used responsibly as an analytical aid, not a definitive decision-maker.

prerequisites
PREREQUISITES

How to Design a Governance Proposal Success Rate Predictor

Before building a predictor, you need a solid understanding of on-chain governance mechanics, data sources, and machine learning fundamentals.

A governance proposal success predictor is a machine learning model that forecasts the likelihood of a proposal passing based on historical and real-time on-chain data. To design one effectively, you must first understand the governance lifecycle of your target protocol—whether it's a Compound-style on-chain vote, a Snapshot off-chain signal, or a hybrid model. Key phases include the temperature check, formal proposal submission, voting period, and execution. Each phase generates specific data points, such as voter turnout, delegate behavior, and forum discussion sentiment, which are crucial features for your model.

Your predictor's accuracy depends entirely on the quality and relevance of its training data. You'll need to gather historical proposal data from sources like The Graph subgraphs, direct RPC calls to archive nodes, or governance-specific APIs (e.g., Tally, Boardroom). Essential data points include: proposal metadata (title, description), voting results (for/against/abstain), voter addresses and voting power, proposal timing, and delegate composition. For advanced models, you may also scrape forum discussions from platforms like Commonwealth or Discourse to perform sentiment analysis.

A strong foundation in data science is required to process this on-chain information. You should be proficient in Python with libraries like pandas for data manipulation, scikit-learn for building traditional models (e.g., Random Forest, XGBoost), and potentially TensorFlow or PyTorch for deep learning approaches. Understanding feature engineering is critical; you'll create features from raw data, such as the proposer's historical success rate, the percentage of voting power held by top delegates at proposal time, or the semantic similarity of a new proposal to past successful ones.

Finally, you must grasp the unique economic and social dynamics of decentralized governance. A proposal's fate isn't just a function of its content; it's influenced by voter apathy, whale concentration, delegate-alignment strategies, and broader market conditions. For instance, a proposal requiring a high quorum may fail during a bear market due to lower participation. Your model should account for these contextual features. Start by analyzing failed proposals from major DAOs like Uniswap or Aave to identify common failure patterns beyond simple majority vote counts.

data-sources
BUILDING THE DATASET

Data Sources and Collection

The foundation of any successful governance proposal predictor is a robust, multi-dimensional dataset. This section details the critical on-chain and off-chain data sources you need to collect and structure.

To predict governance proposal outcomes, you must first identify and aggregate relevant historical data. The primary source is the on-chain proposal history of the target DAO. This includes raw transaction data for each proposal: its unique ID, creator address, voting start/end blocks, voting options (For, Against, Abstain), and the final tally of votes and voting power. You can collect this data by querying the DAO's governance smart contract events (e.g., ProposalCreated, VoteCast) using a blockchain indexer like The Graph or directly via an RPC provider. For Ethereum-based DAOs, the OpenZeppelin Governor contract ABI provides a standard interface for this data.

Raw vote counts are insufficient; you must enrich them with voter context. This involves linking voter addresses to their delegate relationships and token holdings at the time of the vote. You need snapshot data of token balances (e.g., ERC-20, ERC-721) for each voter at the proposal's snapshot block. This reveals the actual voting power behind each cast vote. Furthermore, analyzing delegation patterns—identifying which addresses delegate to influential "whales" or delegates—is crucial for understanding voting blocs. Tools like Dune Analytics or Covalent can help reconstruct these historical states.

Beyond the blockchain, off-chain sentiment and discussion data are powerful predictive signals. This includes parsing forum discussions (e.g., Discourse, Commonwealth), governance temperature checks, and related social media threads (like Discord or Twitter). The goal is to quantify pre-vote sentiment: the number of supportive vs. critical comments, engagement metrics, and the reputation of participants in the discussion. Natural Language Processing (NLP) techniques can be applied to classify sentiment and extract key topics from these text sources, turning qualitative discussion into quantitative features for your model.

Finally, you must engineer proposal-specific features from the collected data. Key features include: the proposal type (e.g., treasury spend, parameter change), the requested amount in USD, the proposer's historical success rate and reputation, voter turnout as a percentage of circulating supply, and the level of voter concentration (e.g., Gini coefficient of voting power). Temporal features are also important, such as the day of the week the vote ends or proximity to major market events. Structuring this data into a time-series format, with each row representing a historical proposal and its outcome (Passed/Failed), creates the training dataset for your predictor.

PREDICTOR VARIABLES

Feature Engineering Matrix

Comparison of data sources and feature types for modeling governance proposal success.

Feature CategoryOn-Chain DataOff-Chain MetadataSocial Sentiment

Proposer Reputation

Delegated Voting Power

DAO Contributor History

Forum Post Engagement

Proposal Context

Treasury Balance

Proposal Category & Scope

Related Thread Sentiment Score

Voter Dynamics

Historical Voting Turnout

Voter Delegation Graph

Snapshot Discourse Activity

Economic Signals

Token Price (30d Volatility)

Grant Request Size vs. Treasury

Social Volume vs. Price Correlation

Technical Complexity

Smart Contract Interactions

Multisig Thresholds

Time-Based Features

Voting Period Duration

Proposal Submission Day/Time

Market Cycle Phase

Data Freshness

Real-time

Daily Updates

API-Dependent (1-4h delay)

Implementation Overhead

Low (RPC Calls)

Medium (API Integration)

High (NLP Processing)

building-the-model
MODEL ARCHITECTURE

How to Design a Governance Proposal Success Rate Predictor

This guide details the process of building and training a machine learning model to predict the outcome of on-chain governance proposals, focusing on feature engineering, model selection, and training methodology.

The foundation of a reliable predictor is feature engineering. You must extract quantifiable signals from proposal data. Key features include: - Proposal Metadata: Voting period length, proposal type (parameter change, treasury spend, upgrade). - Proposer History: Their past proposal success rate and average voter turnout. - Sentiment & Discussion: Metrics from forums like Discourse or Discord (comment count, sentiment score using NLP). - On-Chain Context: Network gas prices and active delegate counts at proposal time. - Voting Dynamics: Early voting patterns and the weight of the first "yes" vs "no" votes. These features transform raw blockchain and forum data into a structured dataset for the model.

For the model architecture, a gradient boosting algorithm like XGBoost or LightGBM is often optimal. These models handle tabular data well, capture non-linear relationships between features, and provide feature importance scores, which are crucial for interpretability. You can implement this using Python libraries. First, prepare your data and split it into training and test sets, ensuring temporal consistency to avoid look-ahead bias.

python
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# X_features contains your engineered features, y_labels are outcomes (1=passed, 0=failed)
X_train, X_test, y_train, y_test = train_test_split(X_features, y_labels, test_size=0.2, shuffle=False)

# Initialize and train the model
model = xgb.XGBClassifier(n_estimators=100, max_depth=5, learning_rate=0.1, use_label_encoder=False)
model.fit(X_train, y_train)

# Make predictions and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Training requires careful validation. Use time-series cross-validation instead of random splits to respect the chronological order of proposals. This prevents data leakage and gives a realistic performance estimate. Key evaluation metrics go beyond simple accuracy. Focus on precision (of predicting "pass" correctly) to avoid false positives, and recall to ensure you catch most successful proposals. The F1-score provides a balanced measure. Always analyze the confusion matrix to understand error types.

Finally, interpret the model using SHAP (SHapley Additive exPlanations) values. This reveals which features most influence predictions. For example, you might find that "proposer past success rate" and "sentiment score in the first 24 hours" are top contributors. Deploy the trained model as a service that ingests new proposal data, runs inference, and outputs a probability of success, providing delegates with a data-driven signal before they vote.

tools-and-libraries
GOVERNANCE ANALYTICS

Tools and Libraries

Essential tools and data sources for building a model to predict on-chain governance proposal outcomes.

model-deployment
DEPLOYMENT AND INTEGRATION

How to Design a Governance Proposal Success Rate Predictor

A guide to building a data-driven model that forecasts the likelihood of a DAO proposal passing, using on-chain and social sentiment data.

Predicting governance proposal outcomes requires analyzing a multi-dimensional dataset. The core data sources are on-chain history and off-chain sentiment. On-chain data includes historical proposal votes, voter participation rates, delegate behavior, and token distribution. Off-chain data can be scraped from governance forums like Commonwealth or Discourse, capturing discussion sentiment, proposal revisions, and author reputation. A successful predictor aggregates this data into features such as proposal_duration, quorum_progress, sentiment_score, and voter_coalition_strength.

The model architecture typically involves a machine learning classifier. For structured on-chain data, a Gradient Boosting model like XGBoost often performs well due to its handling of non-linear relationships. For processing textual forum data, you first generate embeddings using a model like Sentence-BERT, then feed these into a separate neural network layer. The final step is a fusion layer that combines the outputs from both the structured data model and the text model to make a binary pass/fail prediction. Tools like scikit-learn, TensorFlow, and The Graph for on-chain queries are essential.

Deploying this predictor involves creating a pipeline. First, use a subgraph or direct RPC calls to index historical proposals and votes from the target DAO's smart contracts, such as OpenZeppelin's Governor. Second, run a sentiment analysis script on the corresponding forum threads. Third, train the model on this historical dataset. Finally, expose the model via an API (using FastAPI or a serverless function) that takes a new proposal's ID or forum link and returns a probability score. This API can be integrated into dashboards or bot notifications for delegates.

Key challenges include data freshness and DAO-specific dynamics. Models must be retrained regularly as voter behavior evolves. Furthermore, each DAO has unique social dynamics; a model trained on Uniswap data may not transfer directly to Aave. It's crucial to incorporate DAO-specific features, like the influence of large delegates (e.g., "whales") or the success rate of certain proposal types (e.g., treasury grants vs. parameter changes). Continuous validation against live proposals is necessary to maintain accuracy.

For practical implementation, start with a single DAO like Compound or Uniswap due to their extensive proposal history. Use the publicly available Compound Governance Subgraph to fetch proposal and vote data. For sentiment, access the Compound Governance Forum. A baseline model might use just three features: the percentage of required quorum met 24 hours before voting ends, the number of unique voters, and the net sentiment (positive vs. negative comments). This simple approach can establish a proof-of-concept before adding complexity.

Integrating the predictor into a user-facing application completes the loop. Build a simple frontend that displays upcoming proposals with their predicted success scores. More advanced integrations include a Discord or Telegram bot that alerts a channel when a new proposal is posted with its forecast. For delegates, this tool acts as a prioritization filter, highlighting proposals that may require more urgent attention or research based on their predicted contentiousness or likelihood of failure.

limitations-and-ethics
GOVERNANCE PREDICTORS

Limitations and Ethical Considerations

Building a proposal success predictor involves navigating technical constraints and significant ethical questions. This section outlines key limitations and responsible development practices.

Predictive models for on-chain governance face inherent data limitations. Historical proposal data is often sparse, especially for newer DAOs, leading to overfitting risks. Models trained on one DAO's data (e.g., Uniswap) may fail to generalize to another (e.g., Compound) due to differing voter bases, tokenomics, and cultural norms. Furthermore, data is inherently non-stationary; a model trained on proposals from a bull market may perform poorly in a bear market as voter priorities and engagement shift. Capturing the nuanced, qualitative arguments within forum discussions requires advanced NLP, which adds complexity and computational cost.

A core ethical risk is the potential for a predictor to become a self-fulfilling prophecy. If a widely trusted tool predicts a proposal will fail, it could discourage supporters from voting, artificially causing its defeat. This centralizes influence and undermines the decentralized decision-making process. Developers must consider whether to publicly release prediction scores and, if so, how to mitigate this feedback loop. Transparency about model confidence intervals and the factors driving a prediction is crucial to prevent blind reliance.

Building these systems also raises questions of access and fairness. A sophisticated predictor could become a premium tool, creating an information asymmetry between well-resourced delegates and average token holders. This risks entrenching existing power structures. Furthermore, models might inadvertently encode or amplify biases present in historical data, such as favoring proposals from known entities over community newcomers. Regular bias auditing of training data and model outputs is an essential, ongoing practice for ethical deployment.

From a technical implementation standpoint, oracle reliability is a critical limitation. Predictors often rely on external data feeds for token prices, social sentiment, or protocol metrics. Delay or manipulation of these oracles can corrupt prediction accuracy. Smart contracts executing based on these predictions must have robust failure modes and circuit breakers. The computational cost of running complex models on-chain is typically prohibitive, leading to hybrid architectures where predictions are computed off-chain and submitted via oracles, introducing additional trust assumptions.

Responsible development involves clear communication of a model's purpose and constraints. It should be framed as a decision-support tool, not an oracle of truth. Documenting the model's features, accuracy on test sets, and known failure cases is mandatory. Consider implementing a delay in publishing predictions or only providing them to a broad audience after a voting snapshot to reduce manipulation. Engaging with the governance community to align the tool's development with the DAO's values is as important as the technical build.

GOVERNANCE PREDICTORS

Frequently Asked Questions

Common technical questions about building and deploying on-chain governance proposal success predictors.

A robust predictor requires structured on-chain and off-chain data. Key sources include:

On-Chain Data:

  • Voting history: Past proposal outcomes, voter turnout, and delegate behavior from the governance contract (e.g., Compound Governor Bravo, Aave Governance v2).
  • Tokenomics: Real-time token distribution, delegation patterns, and whale wallet activity.
  • Transaction history: Proposal submission and voting transaction timing/fees.

Off-Chain Data:

  • Forum/Snapshot activity: Sentiment analysis from governance forums (e.g., Commonwealth, Discourse) and Snapshot proposal descriptions.
  • Social sentiment: Aggregated data from Twitter, Discord, and developer chat activity.
  • Protocol metrics: TVL changes, revenue, and key performance indicators around proposal dates.

Tools like The Graph for querying historical data, Dune Analytics for custom dashboards, and Chainscore for real-time governance analytics are foundational.

conclusion
BUILDING YOUR MODEL

Conclusion and Next Steps

This guide has outlined the core components for building a governance proposal success predictor. The next steps involve refining your model and integrating it into real-world applications.

You now have a foundational framework for a governance success predictor. The key is to move from a static analysis to a dynamic, real-time system. This involves implementing a data pipeline that continuously ingests new proposals, updates on-chain voting data, and forum sentiment. Tools like The Graph for querying historical governance events and Chainlink Functions for fetching off-chain social metrics can automate this process. Your model should be retrained periodically to adapt to evolving community behavior and protocol changes.

To improve accuracy, focus on feature engineering. Beyond basic metrics, consider creating composite features like voter_turnout_velocity (speed of votes accumulating) or sentiment_volatility in discussion threads. Experiment with different model architectures; a gradient boosting model like XGBoost often performs well on tabular data, but a two-stage model using a transformer for text analysis (e.g., bert-base-uncased) to generate embeddings for the proposal description, fed into a classifier, can capture nuanced semantic signals. Always validate using time-series cross-validation to avoid look-ahead bias.

The final step is deployment and integration. Package your model into a microservice using a framework like FastAPI. It can then be consumed by a front-end dashboard for DAO members, integrated into voting delegation platforms like Tally or Boardroom, or used to power automated alerts. For transparency, publish your model's performance metrics and key feature importances. Remember, the goal is not a perfect crystal ball but a tool that surfaces signal from noise, helping stakeholders make more informed decisions in decentralized governance.

Further exploration could involve analyzing cross-DAO patterns. Does a successful proposal structure in Uniswap predict success in Aave? Building a comparative dataset across multiple protocols could yield powerful meta-insights. Additionally, consider the ethical implications and potential for manipulation—design your system to be transparent about its limitations. The code and research for this guide are available on GitHub. Start by forking the repository and adapting the data collectors for your target DAO.