Reading Backtest Results — Interpretation Guide

Backtest output is overwhelming if you don’t know what to look for. Total return, Sharpe, max drawdown, win rate, profit factor — each has a meaning, each has caveats, each can mislead in isolation. This guide walks through what matters and how to interpret it.

The hierarchy of metrics

Not all metrics are equally important. Operator priority:

Max drawdown — the single most important metric

Caps your worst-case felt experience. Determines whether you can hold through bad periods.

Number of trades — your sample size

Without enough trades, no other metric is statistically meaningful.

Total return — the headline

What did the strategy make. Important but never sufficient alone.

Win rate × win/loss ratio — the trade distribution

Tells you the shape of how you make money.

Sharpe ratio — risk-adjusted return

Useful as a sanity check. Don’t optimize for it directly.

Profit factor — total wins / total losses

Complement to win rate. Magnitude story.

Average trade duration — rhythm check

Should match the mode’s intended timescale.

Max drawdown — the most important metric

What it means

The largest peak-to-trough decline in equity during the backtest window. If equity went +30%, then dropped to -5% (from peak), then recovered to +25%, max drawdown is -35% (the peak-to-trough difference of 30 - (-5) = 35).Critical: max drawdown is what you’d have felt if you’d held through. It’s the worst point of the journey, not the endpoint.

What's acceptable

Depends entirely on your stomach.For most operators:

< -15%: comfortable. Most operators hold through this without action.
-15% to -25%: normal. Expected range for most modes during regime mismatches. Hold if you’ve validated the regime fit; consider kill switch otherwise.
-25% to -40%: stress. Many operators panic-close at these levels, locking in losses just before recovery.
> -40%: critical. Strategies that produce these on a regular basis are not appropriate for most operators.

Important rule of thumb: live drawdowns are typically 1.5x to 2x the backtest max drawdown. If backtest shows -20%, plan for live to potentially see -30% to -40%.

What backtest drawdowns DON'T capture

Operator panic-closes that lock in pre-recovery losses.
Slippage during fast moves.
Latency-induced misses on trailing-stop-driven exits.
Real-world black swans not in the historical sample.

All these tend to widen live drawdowns vs backtest drawdowns. Plan accordingly.

Number of trades — your sample size

What's enough

Statistical meaningfulness needs sample size. Operator rule of thumb:

< 20 trades: not enough. The result could be coincidence. Lengthen the window or pick a higher-frequency mode.
20–50 trades: weak evidence. Useful direction-of-travel signal but don’t bet substantial capital on this alone.
50–200 trades: moderate evidence. Most operator decisions can be made on this sample size.
200+ trades: strong evidence.
1000+ trades: be suspicious. The strategy may be overtrading; check trade frequency and per-trade P&L.

Trade frequency — divide by time

100 trades / 12 months ≈ 8 trades/month. Does that match expectation?BasicMode operators on BTCUSDT typically expect 15–40 trades/month per pair. If backtest shows 8, the mode is undertrading the symbol. If it shows 200, the mode is overtrading.Compare against the mode’s typical behavior. Anomalies are flags.

Total return — the headline

What's good

Highly regime-dependent. For 12-month backtests on majors with BasicMode-style modes:

Bull regime: +50% to +150% is achievable. Strategies that capture uptrends shine.
Sideways/chop: +15% to +40% is decent. The “boring” regime that most modes are designed for.
Bear regime: breakeven or modestly negative. Even good strategies struggle in bears.

A strategy that produces +200% in a bull regime and -50% in a bear regime has high regime sensitivity. A strategy that produces +50% in bull and -5% in bear is more robust.

Total return is misleading without max drawdown

+30% annual return sounds great. With -40% max drawdown, it’s miserable — most operators capitulate during the drawdown and crystallize the loss.Always read total return alongside max drawdown. The ratio matters more than either number alone.

Compounding vs simple

+10% per month compounded for 12 months = +213%. Same +10% simple over 12 months = +120%.Backtests typically show compounded returns. Live operation also compounds when you reinvest gains. Just be aware of which number you’re reading.

Win rate and win/loss ratio

Win rate alone is meaningless

A 90% win rate with 0.1x win/loss ratio loses money: 9 small wins offset by 1 large loss leaves you down. A 30% win rate with 5x win/loss ratio makes money: 3 wins of size 5 = 15, 7 losses of size 1 = 7, net +8.Always look at win rate × win/loss ratio together.

Pre-built modes have asymmetric distributions

BasicMode’s design produces high win rates (70-90%) with smaller per-win sizes and occasionally larger per-loss sizes. The 7-rung sell ladder closes most positions profitably (small wins); the rare drawdown-and-stop-loss produces a larger loss.This is by design. The asymmetric distribution is the trade-off for the high win rate.

Trend-following inverts the distribution

EMA-cross trend-followers typically have lower win rates (40–55%) with larger per-win sizes (when trends ride) and smaller per-loss sizes (whipsaws cut quickly).Different shape, different psychological feel. Both can be net-positive expectancy strategies.

Sharpe ratio — risk-adjusted return

What it measures

(Total return - risk-free rate) / standard deviation of returns. Roughly: how much return per unit of return-volatility.Sharpe doesn’t directly measure drawdown — it measures return volatility (which correlates loosely with drawdown).

What's good

For crypto:

Sharpe < 0: losing money on a risk-adjusted basis.
Sharpe 0–1: marginal. The volatility eats most of the return.
Sharpe 1–2: decent. Most professional trading systems target this range.
Sharpe > 2: excellent. But often suspicious — could indicate overfitting.

Don’t optimize for Sharpe directly. It’s a sanity check, not a target.

Sharpe is window-sensitive

Sharpe is calculated over the backtest window. A strategy with Sharpe 3 over 6 months may have Sharpe 1 over 24 months. The shorter window can have a misleadingly high ratio.For meaningful Sharpe interpretation, use windows of ≥ 12 months.

Profit factor — magnitude check

What it means

Total winning P&L / Total losing P&L (in absolute terms).1.0 = breakeven. 1.5 = decent. 2.0 = strong. > 3.0 = excellent (and possibly overfit; check carefully). < 1.0 = losing money.

Useful complement to win rate

Win rate is count-based. Profit factor is magnitude-based. They tell you different things.BasicMode might have win rate 80% and profit factor 1.6: many small wins, occasional larger losses. Trend-follower might have win rate 45% and profit factor 1.8: fewer wins but larger.Both can be acceptable strategies. Match to your psychology.

Average trade duration — rhythm check

What it means

Mean time from entry to exit across all trades.Should match the mode’s intended timescale:

BasicMode: hours to days (typical few hours to 2 days).
LongTimeLong: days to weeks.
Tsl2Sell: variable, depends on trends.

If actual differs substantially from expected, the mode-symbol pairing is mismatched.

Anomaly: very long average duration

Suggests positions are stuck — sell ladder doesn’t get hit because the symbol moved too far against entry.The bot keeps holding waiting for recovery. Some recover; some go to stop-loss.Flag for review: if the strategy regularly produces stuck positions, it’s mismatched to the symbol’s behavior.

Anomaly: very short average duration

Suggests trades are closing on the first sell rung consistently. May indicate:

The mode is too tight for the symbol’s volatility.
The strategy is overtrading.

Per-trade P&L will be small, fees will dominate. Consider widening the sell ladder or switching modes.

Equity curve shape

The equity curve (P&L over time) tells a story words don’t. Look for:

✅ Smooth upward trend

Steady growth with manageable drawdowns. The healthiest shape. Indicates regime-robust strategy.

⚠️ Big-then-flat

Most of the gain came in one specific period; rest of the window was flat or losing. Indicates regime-dependence.May be acceptable if you understand and can identify the regime that produced the gain. Risky if you can’t.

❌ Saw-toothed (gain-then-drawback-then-gain)

Equity peaks and troughs of substantial size. Indicates strategy struggles in some regimes.The realized end-of-window P&L masks the journey. Operator psychology has to survive the troughs.

❌ Step-function (one big jump, then nothing)

Single anomalous trade or period dominates the whole result. Reduce conviction proportionally.Backtest may have caught a one-time event that’s unlikely to recur.

Putting it together — the operator decision

For each backtest, ask:

Is the sample size sufficient?

< 50 trades? Lengthen the window or pick a higher-frequency mode/symbol pairing.

Is max drawdown within my stomach?

Multiply by 1.5–2x for live planning. If that exceeds your tolerance, the strategy is too aggressive.

Does total return justify the drawdown?

+30% for -15% drawdown is a solid 2:1 ratio. +30% for -50% drawdown is a poor 0.6:1 ratio.

Does the equity curve shape make sense?

Smooth upward = good. Step-function or saw-toothed = caution.

Does it survive multiple windows?

Tested on bear, chop, bull, recent. If yes → robust. If only on one → regime-dependent.

Is the trade frequency normal for the mode?

Compare to mode’s expected behavior. Anomalies are flags.

Decision: scale up, forward-test, or reject

All checks pass → forward-test live on small capital. Mostly pass → forward-test with caution and monitoring. Several fail → reject; iterate the strategy or pick a different mode.

What’s next

Why backtest

The fundamentals of backtesting motivation.

Walk-forward

The technique that catches curve-fitting.

Shadow mode

Forward-testing methodology.

Common mistakes

Backtest pitfalls to avoid.

Backtester module

The module that produces these results.

Backtests

Changelog

Documentation Index

​The hierarchy of metrics

​Max drawdown — the most important metric

​Number of trades — your sample size

​Total return — the headline

​Win rate and win/loss ratio

​Sharpe ratio — risk-adjusted return

​Profit factor — magnitude check

​Average trade duration — rhythm check

​Equity curve shape

​Putting it together — the operator decision

​What’s next

Why backtest

Walk-forward

Shadow mode

Common mistakes

Backtester module

The hierarchy of metrics

Max drawdown — the most important metric

Number of trades — your sample size

Total return — the headline

Win rate and win/loss ratio

Sharpe ratio — risk-adjusted return

Profit factor — magnitude check

Average trade duration — rhythm check

Equity curve shape

Putting it together — the operator decision

What’s next