How to Build a Gas Price Optimization Predictor

introduction

INTRODUCTION

How to Build a Gas Price Optimization Predictor

Learn to build a system that predicts optimal gas prices for Ethereum transactions, saving costs and improving transaction success rates.

Gas price optimization is a critical challenge for any application interacting with the Ethereum network. Manually setting gas fees often leads to overpaying for speed or having transactions stuck due to insufficient fees. A gas price predictor automates this process by analyzing real-time network conditions, historical data, and pending transaction pools to recommend the lowest fee for timely confirmation. This guide walks through building a production-ready predictor using Python, web3.py, and public Ethereum APIs.

The core of any predictor is its data source. You'll need to fetch live metrics like the current base fee, priority fee (tip) trends, and the mempool's composition. Services like the Ethereum Beacon Chain API, Etherscan, and public RPC endpoints (e.g., Alchemy, Infura) provide this data. A robust predictor doesn't just look at the current eth_gasPrice; it models fee volatility by tracking blocks over time, calculating percentiles of gas used, and monitoring the frequency of base fee spikes following full blocks.

Your predictor's logic will process this data to output a recommended maxFeePerGas and maxPriorityFeePerGas. A simple yet effective strategy is to calculate the base fee of the last block and add a priority fee based on the 50th percentile (median) of tips from recent blocks. For more advanced predictions, you can implement machine learning models that forecast network congestion using features like time of day, NFT mint events, or DEX swap volume. The code example later will show a practical implementation of the percentile-based method.

Integrating the predictor into your application is the final step. You can wrap the logic in a FastAPI service or a simple script that your blockchain client calls before sending a transaction. The key is to cache results for a short period (e.g., 5-10 seconds) to avoid hitting rate limits on data providers. Always include fallback mechanisms, such as defaulting to the eth_gasPrice estimate if your predictor fails, to ensure your application remains reliable under all network conditions.

prerequisites

GETTING STARTED

Prerequisites

Before building a gas price predictor, you need a solid foundation in core Web3 technologies and data analysis. This section outlines the essential knowledge and tools required.

A strong grasp of Ethereum fundamentals is non-negotiable. You must understand how the EVM works, the role of gas as a unit of computational work, and the mechanics of the EIP-1559 fee market. This includes knowing the difference between base fee, priority fee (tip), and max fee, and how they interact in a block. Familiarity with transaction lifecycle and mempool dynamics is also crucial for modeling.

Proficiency in Python is the primary technical prerequisite. You'll use libraries like web3.py for interacting with the Ethereum network, pandas for data manipulation, and scikit-learn or TensorFlow for building predictive models. Knowledge of asyncio or concurrent programming is beneficial for efficiently fetching historical data from node providers like Infura, Alchemy, or a local archive node.

You need access to historical and real-time blockchain data. This includes past block data (gas used, base fee), pending transaction pools, and network metrics. Services like Etherscan's API, Blocknative's Gas Platform API, or direct RPC calls to a full/archive node are essential data sources. Understanding how to parse and structure this time-series data is a key step.

Finally, a basic understanding of machine learning concepts for time-series forecasting will be necessary. While you can start with simpler regression models, concepts like feature engineering (creating inputs from raw data), model training, validation, and metrics like Mean Absolute Error (MAE) are needed to evaluate your predictor's accuracy against actual on-chain outcomes.

architecture-overview

SYSTEM ARCHITECTURE

Gas Price Optimization Predictor

This guide outlines the core components and data flow for building a system that predicts optimal gas prices for Ethereum transactions.

A gas price predictor is a data pipeline that ingests on-chain and off-chain data to forecast network congestion and suggest transaction fees. The primary goal is to minimize costs while ensuring timely transaction inclusion. The system architecture typically involves three layers: a data ingestion layer collecting real-time metrics, a processing and modeling layer that analyzes this data, and an API layer serving predictions to users or applications. This separation of concerns allows for scalable, maintainable code and independent updates to the prediction model.

The data ingestion layer is responsible for sourcing raw data. Key inputs include the current base fee from the latest block, pending transaction pools from nodes (via eth_getBlockByNumber and eth_getBlock), historical gas price trends from services like Etherscan or The Graph, and mempool data from specialized providers like Blocknative or Bloxroute. This layer must be resilient to API rate limits and node failures, often implemented with retry logic and multiple data source fallbacks to ensure a continuous feed.

In the processing and modeling layer, raw data is transformed into features for a prediction model. This involves calculating metrics like average gas used per block over the last 100 blocks, pending transaction count segmented by gas price, and the rate of base fee change. A simple model might use a weighted moving average, while more advanced systems employ machine learning libraries like scikit-learn or TensorFlow to train on historical patterns. The output is a suggested maxPriorityFeePerGas and maxFeePerGas for the next few blocks.

The final component is the serving layer, which exposes predictions via a REST API or WebSocket stream. A common endpoint is GET /api/v1/gas-prediction, returning a JSON object with safeLow, standard, and fast price estimates in Gwei. For integration with wallets or dApps, this layer must have low latency and high availability. It's often deployed behind a load balancer, with the prediction model cached and updated at regular intervals (e.g., every block) to serve requests efficiently without recalculating for each call.

Implementing such a system requires careful consideration of the EIP-1559 fee market. Your predictor must account for the variable base fee, which is burned, and the priority fee (tip) for miners/validators. Testing against a live testnet like Goerli or Sepolia is crucial before mainnet deployment. Open-source projects like ethereum-lists/gas-prices provide a reference for API structure, while tools like Hardhat and Ganache can simulate network conditions for model validation.

data-sources-features

BUILDING BLOCKS

Key Data Sources and Predictive Features

A robust gas price predictor relies on real-time on-chain data, historical patterns, and network-level metrics. This section details the essential data sources and feature engineering techniques required for an accurate model.

Real-Time Blockchain Data Feeds

Accessing live mempool and block data is the foundation. Key sources include:

Etherscan & Blocknative APIs for pending transaction pools and gas price estimates.
Alchemy's eth_getBlockByNumber and Infura's WebSocket streams for real-time block propagation data.
EIP-1559 Base Fee tracking from the most recent blocks to understand network demand. Models must poll these sources at sub-30-second intervals to capture rapid market shifts.

EXPLORE

Historical Gas Price Datasets

Training a predictive model requires extensive historical data to identify patterns.

Google BigQuery Public Datasets (bigquery-public-data.crypto_ethereum) provide years of complete transaction and block history.
Dune Analytics offers aggregated and decoded data for specific event analysis (e.g., NFT mints, DEX surges).
Custom archival nodes (e.g., Erigon, Nethermind) allow for efficient querying of historical state. Analyzing multi-year trends reveals weekly cycles and correlation with ETH price volatility.

EXPLORE

Feature Engineering: Beyond Simple Averages

Raw gas prices are noisy. Effective features include:

Percentile Calculations: The 50th (median) and 90th percentiles of pending tx gas prices are more stable indicators than average.
Block Utilization: The percentage of gas used in the last 10 blocks (gasUsed / gasLimit).
Pending Transaction Spike Detection: Rate-of-change in mempool size over 5-minute windows.
Cross-Chain Correlation: Activity spikes on Layer 2s (Arbitrum, Optimism) can precede Mainnet congestion.

90th %ile

Key Predictor Metric

Network Health & External Indicators

Macro-network events and external data provide crucial context.

Validator Participation Rate: A drop below 99% can indicate potential finality delays, affecting gas markets.
MEV Activity: Flashbots bundle volume and arbitrage bot activity directly compete for block space.
Centralized Exchange Flow: Large ETH withdrawals from exchanges (via APIs from CryptoQuant) often precede user activity surges.
Major Protocol Launches: Calendar data for scheduled token launches or NFT mints on platforms like OpenSea.

EXPLORE

Implementing with Python & Web3.py

A practical setup for data collection involves:

python
from web3 import Web3
import pandas as pd
# Connect to provider
w3 = Web3(Web3.HTTPProvider(ALCHEMY_URL))
# Fetch pending block and calculate percentiles
pending = w3.eth.get_block('pending', full_transactions=True)
gas_prices = [tx['gasPrice'] for tx in pending.transactions]
percentile_90 = pd.Series(gas_prices).quantile(0.9)

Schedule this script with Celery or AWS Lambda to build a time-series database.

Web3.py

Primary Library

Model Selection & Evaluation

Choosing the right algorithm is critical for time-series forecasting.

Gradient Boosting (XGBoost, LightGBM): Effective for capturing non-linear relationships between features like block utilization and pending tx count.
LSTM Networks: Can model long-term temporal dependencies in gas price sequences.
Evaluation Metrics: Use Mean Absolute Percentage Error (MAPE) for interpretability and Pinball Loss to assess quantile predictions (e.g., for 90th percentile forecasts). Backtest models against historical crises like the Yuga Labs' Otherdeed mint to stress-test performance.

<5% MAPE

Target Accuracy

data-pipeline-code

DATA PIPELINE

How to Build a Gas Price Optimization Predictor

A reliable gas price predictor requires a robust data pipeline to collect, process, and serve real-time and historical on-chain data. This guide outlines the key components and architecture for building one.

The foundation of any gas predictor is historical and real-time data. You need to ingest data from multiple sources: pending transaction mempools for immediate network state, historical block data for trend analysis, and aggregated fee data from services like Etherscan or Blocknative. A common approach is to run archive nodes for Ethereum (Geth, Erigon) or use node provider APIs (Alchemy, Infura) to stream this data. The pipeline must capture key metrics: base fee per block, priority fees (tips) for included transactions, block utilization, and pending transaction volume.

Once data is ingested, it must be processed into structured features for machine learning models. This involves feature engineering to transform raw blockchain data into predictive signals. Key features include: rolling averages of base fees over the last N blocks, the rate of change in pending transactions, time-of-day and day-of-week patterns, and network congestion indicators like gas used vs. gas limit. This processing stage often uses a stream-processing framework like Apache Kafka or a time-series database like TimescaleDB to handle the continuous, high-volume data flow efficiently.

For model training and serving, you need a separate pipeline branch. Historical processed data is used to train models predicting optimal maxPriorityFeePerGas and maxFeePerGas. Models range from simpler statistical models (quantile regression on historical fees) to LSTM neural networks that capture sequential patterns. The trained model is then deployed as a service, often using a framework like TensorFlow Serving or a serverless function, which the data pipeline feeds with real-time features to generate predictions on-demand.

Finally, the pipeline must include monitoring and feedback loops. Continuously log the accuracy of your predictions by comparing suggested fees to what was actually required for successful inclusion. This data feeds back into the training cycle to improve the model. The entire architecture should be resilient, with fallback mechanisms to default to a reputable public estimator (like the Etherscan Gas Tracker API) if your pipeline fails, ensuring reliability for end-users.

feature-engineering

BUILDING A PREDICTOR

Feature Engineering for Gas Prediction

This guide details the feature engineering process for building a machine learning model to predict and optimize Ethereum transaction gas prices.

Effective gas price prediction relies on transforming raw blockchain data into meaningful predictive features. The core data sources are the mempool (pending transactions) and recent on-chain history. From the mempool, you extract features like the count of pending transactions, their average gas price, and the distribution of gas prices across different percentiles (e.g., the 10th, 50th, and 90th). This reveals current network demand pressure. Historical on-chain data provides context, such as the average gas price of the last 10 blocks or the gas used ratio (gasUsed / gasLimit) in recent blocks, indicating how full blocks have been.

Temporal and network-specific features are crucial for capturing patterns. You should engineer time-based features like the hour of the day and day of the week to account for cyclical human activity. Network congestion metrics, such as the base fee from the previous block (post-EIP-1559) and the priority fee (tip) trends, are direct inputs. Incorporating features from related markets can also improve accuracy; for example, the price volatility of ETH/USD or activity on major DeFi protocols like Uniswap can signal impending network load. Each feature should be normalized or scaled appropriately for model consumption.

Here is a conceptual Python snippet using web3.py and pandas to create a basic feature vector from recent blocks and the mempool:

python
import pandas as pd
from web3 import Web3
w3 = Web3(Web3.HTTPProvider('YOUR_INFURA_URL'))

# Get recent blocks
blocks = [w3.eth.get_block(i) for i in range(w3.eth.block_number-10, w3.eth.block_number)]
# Feature: Average base fee per gas from last 5 blocks
avg_base_fee = sum(b['baseFeePerGas'] for b in blocks[-5:]) / 5 / 1e9  # Convert to Gwei
# Feature: Gas used ratio in last block
gas_used_ratio = blocks[-1]['gasUsed'] / blocks[-1]['gasLimit']

# Get pending transactions (simplified)
pending_txs = w3.eth.get_block('pending')['transactions']
gas_prices = [w3.eth.get_transaction(tx)['gasPrice'] for tx in pending_txs[:100]]  # Sample
# Feature: 90th percentile gas price in mempool
percentile_90 = pd.Series(gas_prices).quantile(0.9) / 1e9 if gas_prices else 0

feature_vector = [avg_base_fee, gas_used_ratio, percentile_90]

This vector would be part of a larger dataset used for training.

The target variable for your model must be carefully defined. For a predictor aimed at optimization, a common target is the gas price at which a transaction is included in the next N blocks (e.g., N=3). You would label historical data by looking at the gas price of transactions that were successfully mined within that window. An alternative is predicting the base fee for a future block. The model's output can then inform a gas estimation strategy, suggesting a maxFeePerGas and maxPriorityFeePerGas that balances cost with timely inclusion.

Finally, continuous validation and retraining are necessary. Gas market dynamics shift with protocol upgrades (like EIP-1559), changes in network usage, and Layer 2 adoption. You must monitor your model's performance against a baseline (like the eth_gasPrice API) and retrain it with fresh data regularly. The most robust predictors often ensemble multiple models or use techniques like LSTM networks to account for the sequential nature of block data. The goal is to move from simple heuristics to a data-driven system that saves users money on transaction fees.

MODEL SELECTION

Machine Learning Model Comparison for Gas Price Prediction

Comparison of common ML models for predicting Ethereum gas prices based on historical on-chain data and network metrics.

Model / Metric	Linear Regression	Gradient Boosting (XGBoost)	LSTM Neural Network
Best for Trend Prediction
Best for Volatility/Spike Prediction
Training Time (on 1M samples)	< 5 sec	30-60 sec	5 min
Prediction Latency	< 1 ms	1-5 ms	10-50 ms
Handles Sequential Data
Feature Importance Output
Typical Mean Absolute Error (Gwei)	8-12 Gwei	4-7 Gwei	3-6 Gwei
Ease of On-Chain Integration

model-training-deployment

MODEL TRAINING, VALIDATION, AND DEPLOYMENT

How to Build a Gas Price Optimization Predictor

This guide details the process of building a machine learning model to predict optimal gas prices for Ethereum transactions, covering data collection, model training, validation, and deployment strategies.

The first step is data collection and feature engineering. You need historical on-chain data, including base_fee_per_gas, max_priority_fee_per_gas, block fullness, network transaction volume, and mempool size. Data can be sourced via providers like Alchemy or Infura using their APIs, or directly from an archive node. Key engineered features might include rolling averages of base fee, time-of-day indicators, and gas price percentiles from recent blocks. This dataset forms the foundation for predicting the minimum gas price required for timely inclusion.

Next, you must select and train a predictive model. For this time-series regression task, models like XGBoost, LightGBM, or a simple LSTM neural network are common choices. The target variable is typically the effective_priority_fee (the actual tip paid) of transactions included in the next block. The model is trained to predict this value given the current network state. Training involves splitting your historical data into training and test sets, ensuring the temporal order is preserved to avoid data leakage.

Model validation and backtesting are critical. Don't rely solely on standard regression metrics like Mean Absolute Error (MAE) on a static test set. Implement a walk-forward validation strategy, where the model is repeatedly retrained on past data and tested on subsequent unseen periods. This simulates real-world performance. More importantly, create a simulation environment that replays historical transactions using your model's predictions, tracking key outcomes: transaction success rate, inclusion time (e.g., within 1-3 blocks), and total gas overspend compared to a baseline strategy.

Before deployment, you must package the model for production. This involves creating a lightweight inference service, often using a framework like FastAPI or Flask. The service should load the trained model artifact (e.g., a .pkl or .joblib file) and expose an endpoint that takes current network metrics as input and returns a recommended maxPriorityFeePerGas and maxFeePerGas. The service needs to be stateless and fast, with inference times under 100ms to keep up with block times.

Finally, deploy and monitor the predictor. Deploy the API to a cloud service like AWS Lambda, Google Cloud Run, or a dedicated server. Integrate it with your transaction-sending infrastructure, such as a wallet provider or a bot. Implement robust monitoring and alerting on the service's health, prediction latency, and, crucially, the model's performance in production. Track the actual inclusion success rate and gas costs of transactions using your predictions, and set up alerts for performance degradation, which signals the need for model retraining.

Continuous improvement is essential. As network dynamics change (e.g., post-EIP-1559, during NFT mints, or after protocol upgrades), the model may become stale. Establish a retraining pipeline that automatically gathers new on-chain data, retrains the model on a schedule (e.g., weekly), validates it via backtesting, and canaries the new version against the current one in production. This closed-loop system ensures your gas price predictor remains cost-effective and reliable over time.

integration-use-cases

GAS OPTIMIZATION

Integration and Use Cases

Practical tools and strategies for predicting and managing transaction costs across different blockchain networks.

Understanding Gas Price Feeds

Gas price predictors rely on real-time data feeds from public mempools and network RPC endpoints. Key sources include:

Ethereum Mainnet: Use the eth_gasPrice RPC call or the eth_feeHistory API for historical data.
Polygon, Arbitrum, Optimism: Each L2 has its own gas model; Polygon uses a base fee, while Optimistic Rollups have L1 data posting costs.
Tools like Etherscan's Gas Tracker and Blocknative's Mempool API provide aggregated metrics like average, fast, and low priority gas prices. Building a predictor starts with collecting this data at regular intervals (e.g., every block).

EXPLORE

Building a Simple Predictor with Python

A basic predictor can be built using web3.py to fetch data and a lightweight model. Core steps:

Data Collection: Use web3.eth.gas_price and web3.eth.fee_history(block_count, 'latest', [25, 50, 75]) to get percentiles.
Feature Engineering: Calculate rolling averages, rate of change, and network congestion metrics (pending transactions count).
Prediction Logic: Implement a simple heuristic (e.g., if pending tx > 100k, recommend 20% above base fee) or a linear regression model using historical fee history.
Output: Return a recommended maxPriorityFeePerGas and maxFeePerGas for EIP-1559 transactions.

EXPLORE

Advanced ML Models for Prediction

For higher accuracy, machine learning models can analyze complex patterns. Common approaches include:

Time Series Models (LSTM/Prophet): Train on historical gas price data to forecast short-term trends.
Feature Set: Include block size, miner activity, time of day, and NFT minting/DeFi transaction volume spikes.
Training Data: Use datasets from Dune Analytics or The Graph to access structured on-chain event data.
Deployment: Serve the model via an API (FastAPI) and integrate it into a wallet or dApp backend to provide real-time suggestions.

EXPLORE

Integrating Predictions into dApps

Once you have a gas price estimate, integrate it to improve user experience.

Wallet Integration: Use EIP-1193's eth_sendTransaction to override user-provided gas parameters with optimized values.
MetaMask Snaps: Develop a snap that suggests gas prices based on your predictor.
Backend Services: For batched transactions or relayers, use the predictor to set maxFeePerGas dynamically before signing.
Fallback Logic: Always include a fallback to public RPC gas estimates (like eth_estimateGas) in case the predictor fails.

EXPLORE

Use Case: Optimizing DeFi Yield Strategies

Automated yield farming strategies on Ethereum or Arbitrum are highly sensitive to gas costs. A predictor can:

Schedule Transactions: Execute swaps, deposits, or harvests during predicted low-gas windows, potentially saving 30-60% on costs.
Dynamic Batching: Aggregate multiple user actions into a single transaction when the model forecasts a gas price dip.
Real Example: A vault using Yearn's strategy could check the predictor before triggering a rebalance, only proceeding if the estimated cost is below a threshold of 0.1% of the transaction value.

30-60%

Potential Gas Savings

Tools and Existing Services

Instead of building from scratch, developers can integrate with existing gas optimization services.

Blocknative's Gas Platform: Provides real-time gas estimates and transaction monitoring via API.
Etherscan Gas Tracker API: Free tier for basic current gas price data.
OpenZeppelin's Defender Relayer: Automatically manages gas prices for automated contract interactions.
Chainlink Gas Station: A reference data feed for current gas prices on Ethereum. Evaluate these based on accuracy, latency, and cost before integration.

EXPLORE

monitoring-maintenance

OPERATIONALIZING THE MODEL

Monitoring, Maintenance, and Model Retraining

Deploying a gas price predictor is just the beginning. This section covers the essential practices for keeping your model accurate and reliable in a live environment.

A production gas price predictor requires continuous monitoring to ensure its predictions remain valuable. This involves tracking both model performance and data pipeline health. Key metrics to monitor include: the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) of your predictions against actual on-chain gas prices, the data ingestion latency from your RPC provider, and the feature drift of key inputs like base fee and pending transaction volume. Tools like Prometheus for metrics collection and Grafana for visualization are commonly used to create dashboards that alert you to anomalies, such as a sudden spike in prediction error which could indicate a fundamental shift in network behavior or a data source failure.

Model maintenance is the routine process of updating the model's operational environment and dependencies. This includes updating the Python libraries in your requirements.txt (e.g., web3.py, scikit-learn, pandas), ensuring your infrastructure (like an AWS Lambda function or a Docker container) has sufficient resources, and verifying that your RPC endpoints are healthy and within rate limits. A critical maintenance task is concept drift detection. Gas market dynamics can change due to protocol upgrades (like EIP-1559), new L2 adoption, or macroeconomic events. Implementing statistical tests on incoming feature data can signal when the model's underlying assumptions are no longer valid, triggering a review.

When monitoring indicates degraded performance or concept drift is detected, model retraining is necessary. This isn't a full rebuild from scratch. The process typically involves: 1) Collecting new training data from your historical data store, 2) Re-running feature engineering on this updated dataset, 3) Retraining the model (e.g., your LSTM or XGBoost model) on the new data, and 4) A/B testing the new model against the current production version in a staging environment. For a gas predictor, you might retrain weekly with the last 30 days of data to capture recent trends. Automating this pipeline with Apache Airflow or Prefect ensures retraining happens consistently without manual intervention.

A robust deployment strategy is essential for integrating a new model version. The canary deployment pattern is highly recommended. Instead of replacing the live predictor immediately, you route a small percentage of prediction requests (e.g., 5%) to the new model while monitoring its performance metrics in real-time. If the new model's error rates are lower, you can gradually increase traffic. This minimizes risk. Your application's prediction service should be designed to load model artifacts (like a .pkl file from sklearn or a .h5 file from TensorFlow) dynamically, allowing for a hot swap without service restart. Versioning your models and their associated training code in Git is non-negotiable for reproducibility.

Finally, establish a feedback loop to continuously improve the system. Log every prediction your model makes along with the actual gas price that was eventually used in a block. This creates a labeled dataset for future retraining cycles. Analyze prediction failures: were they due to network congestion spikes, failed RPC calls, or outlier transactions? Incorporating this analysis into your feature engineering—perhaps by adding a rolling volatility metric or a mempool priority fee indicator—can make the next model iteration more robust. The goal is to evolve the predictor alongside the Ethereum network itself.

GAS PRICE OPTIMIZATION

Frequently Asked Questions

Common developer questions and solutions for building a gas price predictor, covering data sources, model challenges, and implementation strategies.

A robust predictor requires multiple real-time and historical data feeds. Key sources include:

On-chain mempool data: Transaction pools from nodes (e.g., via Erigon, Geth) or services like Blocknative or Bloxroute. This shows pending transactions and their gas bids.
Block history: Past block data from Etherscan's API, Blocknative's Historian, or directly from an archive node. Analyze gas used, base fees, and priority fees.
Network metrics: Metrics like eth_gasPrice, pending transaction count, and network hash rate from providers like Infura, Alchemy, or public RPC endpoints.
Oracle services: Aggregated fee estimates from Chainlink Gas Station, Etherscan's Gas Tracker API, or GasNow (deprecated but historical data is useful).

For production, combine a direct node connection for mempool data with a reliable API for historical analysis to capture both immediate demand and longer-term trends.

resource-links

DEVELOPER RESOURCES

Resources and Further Reading

These resources cover protocol mechanics, data access, and real-world tooling required to build a gas price optimization predictor that works under EIP-1559 and modern MEV conditions.

Ethereum EIP-1559 Fee Market Mechanics

Any gas price prediction model must start with a precise understanding of EIP-1559, which split transaction fees into a base fee and a priority fee (tip).

Key concepts to model:

Base fee adjustment rule: base fee changes by up to 12.5% per block depending on block gas usage.
Target gas usage: 15 million gas per block on Ethereum mainnet.
MaxFeePerGas vs MaxPriorityFeePerGas behavior under congestion.

For predictors, this means:

You cannot predict total gas price directly; you predict future base fees and optimal priority fees.
Historical base fee trends are deterministic but path-dependent.
Priority fee estimation depends on inclusion probability and MEV competition.

Use the EIP specification to derive simulation logic before fitting any statistical or ML model.

EXPLORE

Ethereum JSON-RPC eth_feeHistory API

The eth_feeHistory RPC method is the most important on-chain data source for gas prediction. It provides block-by-block fee statistics required to train and validate models.

Data you can extract:

Base fee per block for the last N blocks
Gas used ratio per block
Reward percentiles showing priority fees paid by included transactions

Practical usage tips:

Query 100 to 1,024 blocks at a time to build rolling features.
Use reward percentiles (e.g., 10th, 50th, 90th) as labels for different inclusion targets.
Combine with block timestamps to model intra-day congestion cycles.

This API is supported by Geth, Nethermind, Erigon, and most hosted providers, making it production-safe for live predictors.

EXPLORE

Blocknative Gas Estimation Research

Blocknative publishes detailed research and open explanations of how real-time gas estimation systems work in production wallets and dApps.

What makes this valuable:

Analysis of mempool dynamics rather than only historical blocks.
Separation of speed tiers (rapid, fast, standard) based on confirmation targets.
Empirical measurements of inclusion probability under volatility.

Design insights you can reuse:

Priority fee estimation should be percentile-based, not mean-based.
Short-term predictors benefit from mempool-aware signals.
Long-term predictors rely more on base fee trend extrapolation.

Even if you do not use their API, their methodology informs how to convert raw fee data into user-facing recommendations.

EXPLORE

Flashbots MEV and Transaction Inclusion Dynamics

Gas price optimization is tightly coupled with MEV infrastructure, especially on Ethereum mainnet where a large share of blocks are built by MEV-aware builders.

Relevant topics to study:

PBS (Proposer-Builder Separation) and its effect on fee competition.
Why some transactions overpay tips but still miss inclusion.
How private order flow changes public mempool signals.

For predictors, this implies:

Public mempool data alone may underpredict required priority fees.
Inclusion probability is no longer solely a function of gas price.
Modeling needs to account for builder behavior during high MEV periods.

Flashbots documentation provides the context needed to avoid naïve assumptions when deploying predictors in production.

EXPLORE