What Is Backtesting?
Backtesting is like a rehearsal. You take a set of rules — for example, "buy when RSI drops below 30, sell when RSI goes back above 60" — and apply them to historical price data to see what would have happened.
Think of it like rewinding a sports game and asking "what if we had made different calls?" You can see the outcome, but you're looking at a version of the past — not a guarantee about the future.
Three Things Backtesting Is Good For
- Eliminating bad ideas quickly. If a strategy has a terrible win rate over 5 years, you probably don't want to use it. Backtesting lets you rule out duds before risking real money.
- Understanding risk. How bad was the worst losing streak? Could you have stomached that? Backtesting shows you the maximum drawdown — the worst peak-to-trough decline — so you know what you're getting into.
- Comparing two approaches. Is RSI(30/60) better than RSI(25/65) historically on SPY? Backtesting on the same historical data gives you a fair head-to-head comparison.
The Limits of Backtesting (Read This First)
Backtesting is incredibly useful — but it has real limitations. Understanding them now will save you from heartbreak later.
Four Honest Warnings
Markets change. A strategy that worked beautifully from 2018–2023 may not work in 2025. Interest rates shift. Market regimes change. Backtesting shows you history, not a crystal ball. Use it to inform your decisions, but never as proof that something will work in the future.
We're testing tickers that still exist today. Companies that went bankrupt aren't in yfinance — they're not in our backtest. This makes backtests look better than reality. A strategy that sounds great on AAPL, MSFT, and NVDA (all of which thrived) might look very different if we included companies that failed.
OpenClaw's backtester is careful about this — it only uses data that would have been available on the day of the trade. This is called "avoiding look-ahead bias." Some sloppy backtesting tools accidentally use tomorrow's price to decide whether to trade today, which is cheating.
Our simple backtester doesn't account for commissions, bid-ask spreads, or the fact that your order might not fill at the exact price shown. Real returns would be slightly lower. If you're trading big size, slippage costs could be material.
The Strategy We'll Test
RSI Mean Reversion
RSI Mean Reversion is one of the most commonly tested strategies in retail trading. Here's how it works:
- Buy rule: When RSI closes below 30 (oversold), buy at the next day's opening price.
- Sell rule: When RSI rises back above 60 (recovered), or after 30 days (whichever comes first).
- One position at a time: We don't add to a position while already in one. When we exit, we wait for the next signal.
This strategy bets that stocks which have dropped sharply (RSI oversold) tend to bounce back. Sometimes they do. Sometimes they keep falling. The backtest tells us historically how often each happened.
Setting Up the Backtest in ta-config.yaml
OpenClaw reads backtesting settings from your config file. Here's what a typical setup looks like:
# ta-config.yaml — backtesting section
# OpenClaw reads these settings when running ta_backtest.py
watchlist:
tickers:
- AAPL
- SPY
- NVDA
backtest:
strategy: "rsi_mean_reversion" # Which strategy to test
period: "5y" # How much history (5y = 5 years)
# RSI Mean Reversion parameters
rsi_buy: 30 # Buy when RSI drops below this
rsi_sell: 60 # Sell when RSI rises above this
max_hold: 30 # Sell after this many days if RSI hasn't recovered
What Each Setting Does
- tickers: List the symbols you want to backtest. Test multiple tickers to avoid overfitting to one stock.
- strategy: The name of the strategy module OpenClaw will load (in this case, "rsi_mean_reversion").
- period: How much historical data to pull. Common values: "2y" (2 years), "5y" (5 years), "10y" (10 years).
- rsi_buy: RSI threshold for entry. Below this, we buy. Common range: 20–35.
- rsi_sell: RSI threshold for exit. Above this, we sell. Common range: 60–80.
- max_hold: Maximum days to hold a position. Forces an exit even if RSI hasn't recovered yet.
rsi_buy and rsi_sell to test different thresholds. Change period to 2y for a quick test or 10y for a longer look. Each run takes seconds.
The Backtest Script
Now let's look at the actual Python code that runs the backtest. This script loads your config, downloads historical data, and simulates your strategy.
# ta_backtest.py — OpenClaw Backtesting Module
# Tests your signal rules against historical price data
import yaml
import yfinance as yf
import pandas_ta as ta
import pandas as pd
from datetime import datetime
# ── Load config ───────────────────────────────────────────────────
def load_config(path: str = "ta-config.yaml") -> dict:
"""Load settings from ta-config.yaml."""
with open(path) as f:
return yaml.safe_load(f)
cfg = load_config()
BT = cfg["backtest"]
# ── Download data ─────────────────────────────────────────────────
def get_data(ticker: str) -> pd.DataFrame:
"""Download and prepare daily OHLCV data."""
df = yf.download(ticker, period=BT["period"], interval="1d",
progress=False)
df.columns = [c.lower() for c in df.columns]
df = df[["open","high","low","close","volume"]].dropna()
df["rsi"] = ta.rsi(df["close"], 14)
return df.dropna()
# ── Run the backtest ──────────────────────────────────────────────
def run_backtest(ticker: str) -> list:
"""
Simulates the RSI mean reversion strategy on historical data.
Returns a list of completed trades.
IMPORTANT: We use next-day's open price for all entries/exits.
This ensures we're not using data that wasn't available yet.
"""
df = get_data(ticker)
trades = []
in_trade = False
entry_price = None
entry_date = None
entry_idx = None
for i in range(1, len(df) - 1):
rsi_yesterday = df.iloc[i - 1]["rsi"]
if not in_trade:
# Entry: RSI was oversold yesterday, buy at today's open
if rsi_yesterday < BT["rsi_buy"]:
entry_price = df.iloc[i]["open"]
entry_date = df.index[i]
entry_idx = i
in_trade = True
else:
days_held = i - entry_idx
# Exit: RSI recovered OR we've held too long
if (rsi_yesterday > BT["rsi_sell"] or
days_held >= BT["max_hold"]):
exit_price = df.iloc[i]["open"]
exit_date = df.index[i]
return_pct = (exit_price - entry_price) / entry_price * 100
trades.append({
"entry_date": entry_date.date(),
"exit_date": exit_date.date(),
"entry_price": round(entry_price, 2),
"exit_price": round(exit_price, 2),
"days_held": days_held,
"return_pct": round(return_pct, 2),
"win": return_pct > 0,
"exit_reason": ("RSI_SELL" if rsi_yesterday > BT["rsi_sell"]
else "MAX_HOLD"),
})
in_trade = False
return trades
# ── Analyse and print results ─────────────────────────────────────
def print_results(ticker: str, trades: list) -> None:
"""Print backtest results in a human-readable format."""
if not trades:
print(f"\n{ticker}: No trades — try widening RSI thresholds.")
return
returns = [t["return_pct"] for t in trades]
wins = [r for r in returns if r > 0]
losses = [r for r in returns if r <= 0]
win_rate = len(wins) / len(returns) * 100
avg_return = sum(returns) / len(returns)
avg_days = sum(t["days_held"] for t in trades) / len(trades)
# Max drawdown: worst peak-to-trough on cumulative returns
cumulative, peak, max_dd = 0, 0, 0
for r in returns:
cumulative += r
peak = max(peak, cumulative)
max_dd = max(max_dd, peak - cumulative)
print(f"\n{'🦞 OpenClaw Backtest Results':=^55}")
print(f" Ticker: {ticker}")
print(f" Strategy: RSI Mean Reversion")
print(f" Parameters: Buy < {BT['rsi_buy']} | Sell > {BT['rsi_sell']} | "
f"Max hold {BT['max_hold']}d")
print(f" Period: {BT['period']}")
print(f"{'─'*55}")
print(f" Total trades: {len(trades)}")
print(f" Win rate: {win_rate:.1f}%")
print(f" Avg return: {avg_return:+.2f}% per trade")
print(f" Best trade: {max(returns):+.2f}%")
print(f" Worst trade: {min(returns):+.2f}%")
print(f" Max drawdown: -{max_dd:.2f}%")
print(f" Avg hold time: {avg_days:.1f} days")
print(f"{'─'*55}")
print(f"\n Last 5 trades:")
print(f" {'Entry':<12} {'Exit':<12} {'Return':>8} {'Days':>5} "
f"{'Exit Reason':<12}")
for t in trades[-5:]:
icon = "✓" if t["win"] else "✗"
print(f" {str(t['entry_date']):<12} {str(t['exit_date']):<12} "
f"{t['return_pct']:>+7.2f}% {t['days_held']:>5} "
f"{icon} {t['exit_reason']}")
# ── Run for all tickers in config ────────────────────────────────
for ticker in cfg["watchlist"]["tickers"]:
trades = run_backtest(ticker)
print_results(ticker, trades)
How This Code Works (Plain English)
- Load config: Read your ta-config.yaml file to get strategy parameters and tickers.
- Download data: For each ticker, fetch daily OHLCV (open, high, low, close, volume) data from yfinance.
- Calculate RSI: Use pandas_ta to compute the 14-period RSI for each day.
- Run backtest: Loop through each day. If yesterday's RSI was below your buy threshold, enter at today's open. If RSI recovers above your sell threshold, or if you've held for max_hold days, exit at today's open.
- Record trades: For each completed trade, store entry price, exit price, days held, and whether it was a win or loss.
- Calculate metrics: Compute win rate, average return, maximum drawdown, and other statistics.
- Print results: Display a summary table for each ticker.
How to Read the Results
After running the backtest, you'll see output that looks like this for each ticker:
What Each Metric Means
- Total trades: How many buy-sell cycles were completed. More trades = more statistical confidence. Fewer than ~20 trades, and you might just be seeing luck.
- Win rate 61.1%: 61% of your trades were profitable. Above 50% is generally good, but win rate alone doesn't tell the full story. A 55% win rate with huge average winners beats a 75% win rate with tiny average winners.
- Avg return +1.8%: On average each trade made 1.8%. Multiply by hold time: if you held 12 days, that's roughly 15% annualized return (rough estimate — doesn't account for compounding or overlapping trades). Whether that's good depends on your goals and risk tolerance.
- Best trade +12.4% / Worst trade -8.2%: Your range. Good news: no single loss wiped you out. Bad news: the biggest win is only 50% bigger than the biggest loss, so drawdowns could hurt.
- Max drawdown -14.6%: At the worst point in the backtest period, your cumulative losses from the previous peak hit 14.6%. Ask yourself: "Could I stomach that losing streak without panic-selling?" If the answer is no, this strategy might be too risky for you psychologically.
- Avg hold time 12.3 days: You were in each trade for about 2 weeks on average. Short holds = lower transaction costs (fewer commissions), but also less time for the trade to work.
Common Backtest Traps to Avoid
Backtesting is powerful, but it's easy to fool yourself. Here are the most common pitfalls:
| Trap | What Happens | How to Avoid It |
|---|---|---|
| Picking parameters after seeing results | You've memorized history, not found an edge. You tweak RSI_buy to 28, then 26, until results look perfect. | Decide your parameters before you look at results. Run once. Don't fiddle. |
| Too few trades | 5 trades has huge variance — could be luck. One good trade can skew the whole result. | Aim for at least 20–30 trades before drawing conclusions. Longer test periods help. |
| Testing only one stock | AAPL might work; SPY might not. You're overfitting to a single ticker's behavior. | Test across several tickers (at least 3–5). If it only works on one, it's probably luck. |
| Ignoring max drawdown | A 60% win rate means nothing if one loss wipes you out. Psychological collapse follows. | Always check the worst-case drawdown. Ask: "Can I afford this without panic-selling?" |
| Not testing in different market conditions | 2019–2021 bull market results don't apply to bear markets. Strategy breaks when regime changes. | Test across periods that include bull and bear markets. Test recent data separately. |