Skip to main content

Documentation Index

Fetch the complete documentation index at: https://uncoded.ch/docs/llms.txt

Use this file to discover all available pages before exploring further.

Backtest output is overwhelming if you don’t know what to look for. Total return, Sharpe, max drawdown, win rate, profit factor — each has a meaning, each has caveats, each can mislead in isolation. This guide walks through what matters and how to interpret it.

The hierarchy of metrics

Not all metrics are equally important. Operator priority:
1

Max drawdown — the single most important metric

Caps your worst-case felt experience. Determines whether you can hold through bad periods.
2

Number of trades — your sample size

Without enough trades, no other metric is statistically meaningful.
3

Total return — the headline

What did the strategy make. Important but never sufficient alone.
4

Win rate × win/loss ratio — the trade distribution

Tells you the shape of how you make money.
5

Sharpe ratio — risk-adjusted return

Useful as a sanity check. Don’t optimize for it directly.
6

Profit factor — total wins / total losses

Complement to win rate. Magnitude story.
7

Average trade duration — rhythm check

Should match the mode’s intended timescale.

Max drawdown — the most important metric

The largest peak-to-trough decline in equity during the backtest window. If equity went +30%, then dropped to -5% (from peak), then recovered to +25%, max drawdown is -35% (the peak-to-trough difference of 30 - (-5) = 35).Critical: max drawdown is what you’d have felt if you’d held through. It’s the worst point of the journey, not the endpoint.
Depends entirely on your stomach.For most operators:
  • < -15%: comfortable. Most operators hold through this without action.
  • -15% to -25%: normal. Expected range for most modes during regime mismatches. Hold if you’ve validated the regime fit; consider kill switch otherwise.
  • -25% to -40%: stress. Many operators panic-close at these levels, locking in losses just before recovery.
  • > -40%: critical. Strategies that produce these on a regular basis are not appropriate for most operators.
Important rule of thumb: live drawdowns are typically 1.5x to 2x the backtest max drawdown. If backtest shows -20%, plan for live to potentially see -30% to -40%.
  • Operator panic-closes that lock in pre-recovery losses.
  • Slippage during fast moves.
  • Latency-induced misses on trailing-stop-driven exits.
  • Real-world black swans not in the historical sample.
All these tend to widen live drawdowns vs backtest drawdowns. Plan accordingly.

Number of trades — your sample size

Statistical meaningfulness needs sample size. Operator rule of thumb:
  • < 20 trades: not enough. The result could be coincidence. Lengthen the window or pick a higher-frequency mode.
  • 20–50 trades: weak evidence. Useful direction-of-travel signal but don’t bet substantial capital on this alone.
  • 50–200 trades: moderate evidence. Most operator decisions can be made on this sample size.
  • 200+ trades: strong evidence.
  • 1000+ trades: be suspicious. The strategy may be overtrading; check trade frequency and per-trade P&L.
100 trades / 12 months ≈ 8 trades/month. Does that match expectation?BasicMode operators on BTCUSDT typically expect 15–40 trades/month per pair. If backtest shows 8, the mode is undertrading the symbol. If it shows 200, the mode is overtrading.Compare against the mode’s typical behavior. Anomalies are flags.

Total return — the headline

Highly regime-dependent. For 12-month backtests on majors with BasicMode-style modes:
  • Bull regime: +50% to +150% is achievable. Strategies that capture uptrends shine.
  • Sideways/chop: +15% to +40% is decent. The “boring” regime that most modes are designed for.
  • Bear regime: breakeven or modestly negative. Even good strategies struggle in bears.
A strategy that produces +200% in a bull regime and -50% in a bear regime has high regime sensitivity. A strategy that produces +50% in bull and -5% in bear is more robust.
+30% annual return sounds great. With -40% max drawdown, it’s miserable — most operators capitulate during the drawdown and crystallize the loss.Always read total return alongside max drawdown. The ratio matters more than either number alone.
+10% per month compounded for 12 months = +213%. Same +10% simple over 12 months = +120%.Backtests typically show compounded returns. Live operation also compounds when you reinvest gains. Just be aware of which number you’re reading.

Win rate and win/loss ratio

A 90% win rate with 0.1x win/loss ratio loses money: 9 small wins offset by 1 large loss leaves you down. A 30% win rate with 5x win/loss ratio makes money: 3 wins of size 5 = 15, 7 losses of size 1 = 7, net +8.Always look at win rate × win/loss ratio together.
BasicMode’s design produces high win rates (70-90%) with smaller per-win sizes and occasionally larger per-loss sizes. The 7-rung sell ladder closes most positions profitably (small wins); the rare drawdown-and-stop-loss produces a larger loss.This is by design. The asymmetric distribution is the trade-off for the high win rate.
EMA-cross trend-followers typically have lower win rates (40–55%) with larger per-win sizes (when trends ride) and smaller per-loss sizes (whipsaws cut quickly).Different shape, different psychological feel. Both can be net-positive expectancy strategies.

Sharpe ratio — risk-adjusted return

(Total return - risk-free rate) / standard deviation of returns. Roughly: how much return per unit of return-volatility.Sharpe doesn’t directly measure drawdown — it measures return volatility (which correlates loosely with drawdown).
For crypto:
  • Sharpe < 0: losing money on a risk-adjusted basis.
  • Sharpe 0–1: marginal. The volatility eats most of the return.
  • Sharpe 1–2: decent. Most professional trading systems target this range.
  • Sharpe > 2: excellent. But often suspicious — could indicate overfitting.
Don’t optimize for Sharpe directly. It’s a sanity check, not a target.
Sharpe is calculated over the backtest window. A strategy with Sharpe 3 over 6 months may have Sharpe 1 over 24 months. The shorter window can have a misleadingly high ratio.For meaningful Sharpe interpretation, use windows of ≥ 12 months.

Profit factor — magnitude check

Total winning P&L / Total losing P&L (in absolute terms).1.0 = breakeven. 1.5 = decent. 2.0 = strong. > 3.0 = excellent (and possibly overfit; check carefully). < 1.0 = losing money.
Win rate is count-based. Profit factor is magnitude-based. They tell you different things.BasicMode might have win rate 80% and profit factor 1.6: many small wins, occasional larger losses. Trend-follower might have win rate 45% and profit factor 1.8: fewer wins but larger.Both can be acceptable strategies. Match to your psychology.

Average trade duration — rhythm check

Mean time from entry to exit across all trades.Should match the mode’s intended timescale:
  • BasicMode: hours to days (typical few hours to 2 days).
  • LongTimeLong: days to weeks.
  • Tsl2Sell: variable, depends on trends.
If actual differs substantially from expected, the mode-symbol pairing is mismatched.
Suggests positions are stuck — sell ladder doesn’t get hit because the symbol moved too far against entry.The bot keeps holding waiting for recovery. Some recover; some go to stop-loss.Flag for review: if the strategy regularly produces stuck positions, it’s mismatched to the symbol’s behavior.
Suggests trades are closing on the first sell rung consistently. May indicate:
  • The mode is too tight for the symbol’s volatility.
  • The strategy is overtrading.
Per-trade P&L will be small, fees will dominate. Consider widening the sell ladder or switching modes.

Equity curve shape

The equity curve (P&L over time) tells a story words don’t. Look for:
Steady growth with manageable drawdowns. The healthiest shape. Indicates regime-robust strategy.
Most of the gain came in one specific period; rest of the window was flat or losing. Indicates regime-dependence.May be acceptable if you understand and can identify the regime that produced the gain. Risky if you can’t.
Equity peaks and troughs of substantial size. Indicates strategy struggles in some regimes.The realized end-of-window P&L masks the journey. Operator psychology has to survive the troughs.
Single anomalous trade or period dominates the whole result. Reduce conviction proportionally.Backtest may have caught a one-time event that’s unlikely to recur.

Putting it together — the operator decision

For each backtest, ask:
1

Is the sample size sufficient?

< 50 trades? Lengthen the window or pick a higher-frequency mode/symbol pairing.
2

Is max drawdown within my stomach?

Multiply by 1.5–2x for live planning. If that exceeds your tolerance, the strategy is too aggressive.
3

Does total return justify the drawdown?

+30% for -15% drawdown is a solid 2:1 ratio. +30% for -50% drawdown is a poor 0.6:1 ratio.
4

Does the equity curve shape make sense?

Smooth upward = good. Step-function or saw-toothed = caution.
5

Does it survive multiple windows?

Tested on bear, chop, bull, recent. If yes → robust. If only on one → regime-dependent.
6

Is the trade frequency normal for the mode?

Compare to mode’s expected behavior. Anomalies are flags.
7

Decision: scale up, forward-test, or reject

All checks pass → forward-test live on small capital. Mostly pass → forward-test with caution and monitoring. Several fail → reject; iterate the strategy or pick a different mode.

What’s next

Why backtest

The fundamentals of backtesting motivation.

Walk-forward

The technique that catches curve-fitting.

Shadow mode

Forward-testing methodology.

Common mistakes

Backtest pitfalls to avoid.

Backtester module

The module that produces these results.
Last modified on May 3, 2026