Mean Reversion Backtest in Python for Crypto Traders
Build a complete mean reversion trading strategy in Python, backtest it on real crypto data from Binance, and calculate Sharpe ratio, drawdown, and win rate — all from scratch.
Build a complete mean reversion trading strategy in Python, backtest it on real crypto data from Binance, and calculate Sharpe ratio, drawdown, and win rate — all from scratch.
Most retail traders lose money chasing momentum. Experienced quants often go the other direction — building systems around the statistical tendency of prices to revert to their historical mean. Mean reversion works because markets overshoot. Panic selling pushes Bitcoin 15% below its 20-day average; algorithmic buyers step in and the price snaps back. Capture that snap consistently and you have an edge. This guide walks through building, backtesting, and evaluating a mean reversion strategy in Python using real crypto market data — no hand-waving, no black boxes.
Mean reversion is the statistical hypothesis that asset prices oscillate around a long-term average, and extreme deviations are temporary. In traditional markets this process plays out slowly — over weeks or months. In crypto, it can resolve in hours, which is both the opportunity and the danger. The key mathematical tool is the z-score: how many standard deviations the current price sits from its rolling mean. A z-score of -2 means price is two standard deviations below average — statistically unusual and often followed by a bounce. Bollinger Bands visualize this by drawing envelopes two standard deviations above and below a moving average. When price touches the lower band, mean reversion traders go long expecting a return to the middle band.
Not all crypto pairs mean-revert equally well. Stablecoin pairs like USDC/USDT are almost perfectly mean-reverting by design. Major pairs like BTC/USDT and ETH/USDT mean-revert on shorter timeframes (1h-4h) but trend on longer ones. Altcoins can mean-revert violently but also gap down on bad news and never recover. Your backtest will tell you which pairs and timeframes have historically favored this approach — that is the whole point of building this system before risking real capital.
The CCXT library gives you a unified API to pull historical OHLCV data from over 100 exchanges. The same code that fetches data from Binance works with Bybit, OKX, and Bitget — you just change the exchange name. This matters for backtesting because different exchanges have different liquidity profiles, fee structures, and slippage. A strategy that looks excellent on Binance's deep BTC/USDT market might perform worse on the same pair on KuCoin, where the order book is thinner and spreads are wider.
import ccxt
import pandas as pd
import numpy as np
# Swap 'binance' for 'bybit', 'okx', 'bitget', etc. — same API
exchange = ccxt.binance({'enableRateLimit': True})
def fetch_ohlcv(symbol='BTC/USDT', timeframe='1h', limit=1000):
ohlcv = exchange.fetch_ohlcv(symbol, timeframe=timeframe, limit=limit)
df = pd.DataFrame(
ohlcv,
columns=['timestamp', 'open', 'high', 'low', 'close', 'volume']
)
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
df.set_index('timestamp', inplace=True)
return df
df = fetch_ohlcv('BTC/USDT', '1h', 1000)
print(f'Loaded {len(df)} candles: {df.index[0]} to {df.index[-1]}')
print(df[['open', 'high', 'low', 'close', 'volume']].tail(3))
The core of the strategy is computing z-scores and generating buy signals when price falls too far below its rolling mean. The window size (typically 20 periods) and the entry threshold (typically ±2 standard deviations) are your primary parameters. On 1-hour BTC/USDT data from Binance, a 20-period window represents roughly 20 hours of price action — enough to capture short-term deviations while filtering out tick noise. The exit signal fires when price returns to the mean (z-score crosses zero) rather than waiting for the upper band, which keeps holding time short and reduces exposure to sudden trend reversals.
def add_bollinger_bands(df, window=20, num_std=2.0):
df['sma'] = df['close'].rolling(window).mean()
df['std'] = df['close'].rolling(window).std()
df['upper'] = df['sma'] + num_std * df['std']
df['lower'] = df['sma'] - num_std * df['std']
# Z-score: signed distance from mean in std dev units
df['z_score'] = (df['close'] - df['sma']) / df['std']
return df
def generate_signals(df, entry_z=-2.0, exit_z=0.0):
df['signal'] = 0
# Long entry: price is unusually far below the mean
df.loc[df['z_score'] < entry_z, 'signal'] = 1
# Exit: price has reverted back to or above the mean
df.loc[df['z_score'] > exit_z, 'signal'] = -1
return df
df = add_bollinger_bands(df, window=20, num_std=2.0)
df = generate_signals(df, entry_z=-2.0, exit_z=0.0)
buy_signals = (df['signal'] == 1).sum()
print(f'Buy signals: {buy_signals} ({buy_signals / len(df) * 100:.1f}% of candles)')
print(df[['close', 'sma', 'z_score', 'signal']].tail(5))
Tip: If buy signals fire on fewer than 1% of candles, widen entry_z to -1.5. If they fire constantly, tighten to -2.5. You need at least 30-50 trades in your backtest for the statistics to mean anything — aim for 2-5% signal frequency.
A basic event-driven backtest loops through every candle, checks the previous candle's signal, and enters or exits at the current candle's open price. Using the previous candle's signal to enter at the next open is critical — entering at the same close that generated the signal introduces look-ahead bias, one of the most common mistakes that makes paper results look better than live performance will ever be. The performance metrics that matter most for a mean reversion strategy are Sharpe ratio (risk-adjusted return), maximum drawdown (worst peak-to-trough loss you would have endured), and win rate (mean reversion strategies typically win often but their losses can be larger than individual wins).
def backtest(df, initial_capital=10_000, fee=0.001):
"""Event-driven backtest. Uses prev candle signal to enter at next open."""
capital = initial_capital
position = 0.0
trades = []
equity_curve = []
for i in range(1, len(df)):
prev_signal = df['signal'].iloc[i - 1] # Signal from last candle
entry_price = df['open'].iloc[i] # Execute at current open
if prev_signal == 1 and position == 0:
shares = (capital * 0.95) / entry_price
cost = shares * entry_price * (1 + fee)
if cost <= capital:
position = shares
capital -= cost
trades.append({'type': 'buy', 'price': entry_price, 'idx': i})
elif prev_signal == -1 and position > 0:
proceeds = position * entry_price * (1 - fee)
capital += proceeds
trades.append({'type': 'sell', 'price': entry_price, 'idx': i})
position = 0.0
equity_curve.append(capital + position * df['close'].iloc[i])
if position > 0: # Close any open position at last price
capital += position * df['close'].iloc[-1] * (1 - fee)
return capital, trades, pd.Series(equity_curve, index=df.index[1:])
def calculate_metrics(equity, trades, initial_capital=10_000):
returns = equity.pct_change().dropna()
# Annualized Sharpe (hourly data = 8760 periods/year)
sharpe = (returns.mean() / returns.std()) * np.sqrt(8760)
# Max Drawdown
rolling_max = equity.cummax()
max_dd = ((equity - rolling_max) / rolling_max).min() * 100
# Win Rate
buys = [t for t in trades if t['type'] == 'buy']
sells = [t for t in trades if t['type'] == 'sell']
pairs = list(zip(buys, sells))
wins = sum(1 for b, s in pairs if s['price'] > b['price'])
win_rate = wins / len(pairs) * 100 if pairs else 0
total_return = (equity.iloc[-1] - initial_capital) / initial_capital * 100
print(f'Total Return : {total_return:.2f}%')
print(f'Sharpe Ratio : {sharpe:.2f}')
print(f'Max Drawdown : {max_dd:.2f}%')
print(f'Win Rate : {win_rate:.1f}%')
print(f'Total Trades : {len(pairs)}')
return {'return': total_return, 'sharpe': sharpe, 'max_dd': max_dd, 'win_rate': win_rate}
final_cap, trades, equity = backtest(df)
metrics = calculate_metrics(equity, trades)
| Metric | Poor | Acceptable | Strong |
|---|---|---|---|
| Sharpe Ratio | < 0.5 | 0.5 – 1.5 | > 1.5 |
| Max Drawdown | > 30% | 15% – 30% | < 15% |
| Win Rate | < 50% | 50% – 65% | > 65% |
| Total Trades (sample) | < 20 | 20 – 100 | > 100 |
Fixed position sizing — always using 95% of capital — is fine for running backtests but dangerous in live trading. The Kelly Criterion gives you a mathematically optimal fraction of capital to risk per trade based on your historical win rate and average win-to-loss ratio. In practice, traders use half-Kelly to account for estimation error and model drift. Crypto markets shift regimes constantly: a strategy that posted a 62% win rate during a sideways 2023 market may drop to 51% during a 2024 trend-driven rally. Always apply a hard cap regardless of what the formula suggests. On Bybit and OKX, you set this as a fixed percentage of account balance per order.
def kelly_criterion(win_rate: float, avg_win: float, avg_loss: float,
max_risk: float = 0.02) -> float:
"""
Returns recommended fraction of capital to risk per trade.
win_rate : historical win probability, e.g. 0.58
avg_win : mean profit per winning trade (dollars)
avg_loss : mean loss per losing trade (positive, dollars)
max_risk : hard cap — never risk more than this regardless of Kelly
"""
if avg_loss == 0:
return max_risk
b = avg_win / avg_loss # payoff ratio
p, q = win_rate, 1 - win_rate
kelly = (b * p - q) / b # Full Kelly
half_kelly = kelly * 0.5 # Use half for robustness
return float(np.clip(half_kelly, 0, max_risk))
def extract_trade_stats(trades):
buys = [t for t in trades if t['type'] == 'buy']
sells = [t for t in trades if t['type'] == 'sell']
pnls = [s['price'] - b['price'] for b, s in zip(buys, sells)]
wins = [p for p in pnls if p > 0]
losses = [abs(p) for p in pnls if p <= 0]
win_rate = len(wins) / len(pnls) if pnls else 0.0
avg_win = float(np.mean(wins)) if wins else 0.0
avg_loss = float(np.mean(losses)) if losses else 1.0
return win_rate, avg_win, avg_loss
wr, aw, al = extract_trade_stats(trades)
size = kelly_criterion(wr, aw, al, max_risk=0.02)
print(f'Win Rate : {wr*100:.1f}%')
print(f'Avg Win/Loss : ${aw:.2f} / ${al:.2f}')
print(f'Position Size: {size*100:.2f}% of capital per trade')
Warning: Never trade full Kelly. It is mathematically optimal but causes brutal drawdowns when your win rate estimate is even slightly off. Half-Kelly sacrifices roughly 25% of expected return in exchange for dramatically smoother equity curves — a trade worth making every time.
The most dangerous mistake in backtesting is optimizing parameters until historical results look amazing, then discovering the strategy barely works in live markets. Overfitting is invisible in your backtest output — you only find out when real money is on the line. The standard defense is a strict train/test split: optimize your parameters on the first 70% of data, then evaluate the final version on the held-out 30% without touching anything. If performance drops significantly on the test set, your parameters are overfit to noise. Also test across different market regimes — the 2022 bear market and the 2024 bull run behave completely differently, and a robust strategy should survive both.
Parameter stability is a second key validation test. Run your backtest across a grid of window sizes and entry thresholds. A strategy that only works precisely at window=20 and entry_z=-2.0 is suspicious. A strategy that performs reasonably across window values of 15 to 25, and entry thresholds of -1.8 to -2.2, is likely capturing a real market inefficiency rather than curve-fitted noise. Once you pass both tests, paper trade on Bybit or OKX (both have free paper trading with live market data) for at least two to four weeks before committing capital. Pair that with VoiceOfChain's real-time signal feed to cross-reference your algorithmic entries with on-chain whale activity and exchange flow data that pure price-based models will never capture on their own.
Mean reversion backtesting in Python is one of the most transferable skills in algorithmic crypto trading. The framework built here — CCXT data fetching, z-score signal generation, event-driven backtesting, Kelly position sizing — is not a toy. You can extend it to pairs trading across correlated assets, scale it to sweep dozens of symbols simultaneously on Binance and Bybit, or layer in on-chain filters from VoiceOfChain to improve signal quality. The discipline that separates profitable quants from expensive hobbyists is always the same: test out-of-sample, model realistic fees, and resist the temptation to optimize until your results look perfect. A great strategy looks mediocre in a backtest and consistent in live trading — that is the target.