Monitoring — Spotting Issues Before They're Crises

Monitoring is the layer that turns ‘something seems off’ from intuition into actionable signal. The Dashboard surfaces every metric you need, but you have to know what to watch and what ‘normal’ looks like to spot deviation. This page tells you what.

What to monitor — the priority list

1. Connection state

Is the TradingBot connected to every active venue? Is the database reachable? Are all containers healthy?First-priority because nothing else matters if connections are down.

2. Kill switch state

Is the kill switch in the expected state? Accidentally-flipped switches are a real source of “why isn’t the bot trading” confusion.

3. Trade frequency vs expectation

Is the bot producing trades at the rate your mode/symbol pairing predicted? Sudden silence or sudden flood are flags.

4. Win rate trend

Is the win rate trending stable, up, or down? Sustained downward trend over 2+ weeks suggests regime mismatch.

5. Drawdown character

Is the current drawdown within backtest expectations? Beyond expectations is a flag.

6. Per-pair P&L

Are some pairs dragging? Some pairs carrying the rest? Asymmetric performance is information.

7. Log error rate

Recurring errors in the Logs panel — rare = OK, frequent = investigate.

8. API key audit log

The venue’s audit log. Monthly review for unfamiliar activity.

Where to watch — the surfaces

Dashboard — the primary surface

Live trades, positions, modes, pairs, kill switch, log stream, analytics. The hub.Check daily for 2–5 minutes; weekly for 30 minutes structured review.

Telegram chat — the firehose

Every closed trade. Running totals. Quick scan during the day for “is everything closing as expected?”Mute notifications during sleep hours; review the chat in the morning.

Logs panel (in Dashboard)

Live stream of TradingBot output. Use during:

First hour after any change (new pair, mode, key rotation).
Investigations (why did this trade not fire?).
Confidence checks (is the bot actually doing anything?).

Venue's API audit log

Each exchange’s API management page has audit logs. Monthly review for anomalies.

VPS-level monitoring

Disk space, memory, CPU. The bot is robust to occasional resource pressure but monitoring catches issues early.Tools: standard VPS monitoring (Netdata, Prometheus + Grafana, or your provider’s built-in). Most operators use the simple solution: occasional top, df, free commands.

External monitoring (optional)

UptimeRobot, Better Uptime, or similar can ping your reverse-proxy endpoints periodically and alert if down.Worth setting up for dashboard and (if applicable) signals.yourdomain.com.

What “normal” looks like — building intuition

BasicMode on BTCUSDT — typical day

With $20,000 capital:

2-6 round-trips per day on average.
Per-trade P&L ranging $5-$200 (most clustered around $10-$50).
Average hold time: hours.
Win rate: ~75-85%.
Drawdown character: small dips into -2% to -5%, recovering within hours.

When that pattern becomes abnormal

Flags worth investigating:

Zero trades for 24+ hours when typically 5+ per day. (Buy conditions not triggering, or kill switch on.)
50+ trades in a single day when typically 5. (Overtrading; check trigger configuration.)
Drawdown reaching -15% when typical max is -5%. (Regime shift; consider kill switch.)
Sudden win-rate drop to <50%. (Regime mismatch.)

Multi-pair operations — typical

Running BTCUSDT, ETHUSDT, SOLUSDT on BasicMode:

5-15 trades per day combined.
One pair often leads daily; rotation across the week.
Aggregate drawdown character similar to single-pair.

Per-mode behaviors differ

BasicMode: high-frequency, small per-trade.
FullBullMarket: medium-frequency, medium per-trade, longer hold times.
LongTimeLong: low-frequency, large per-trade, multi-day holds.
Tsl2Sell: variable based on trends.

Calibrate your “normal” expectation to the mode you’re running.

Anomaly response — the decision tree

Anomaly observed

Something looks off. Could be unusual trade rate, drawdown, error pattern, connection issue.

Step 1: is it within historical range?

Compare to backtest expectations. If within range → likely just normal regime variation. Continue monitoring; don’t act.

Step 2: is the connection healthy?

If connection issues: investigate (network, exchange, IP allowlist). Repair before evaluating performance.

Step 3: is the configuration as expected?

Did you (or someone) accidentally change something? Review recent operator log; check Modes panel.

Step 4: is the regime shift the cause?

If yes: decide whether to ride through (BasicMode handles regime shifts OK), kill-switch off (Tsl2Sell may be hurting), or change modes (deliberate move).

Step 5: kill switch off if uncertain

When in doubt, kill switch ON. Investigate without active risk. Re-enable when you’re clear.

Step 6: document the incident

Operator log entry: what happened, what you observed, what you did, what the resolution was. Future-you benefits.

Common anomalies and what they mean

Bot stopped trading suddenly

Most common causes:

Kill switch ON (most common).
Exchange-side outage.
Rate limit being hit.
Min-notional rejections (capital too small for splits).
Insufficient balance.
API key revoked at venue.

Dashboard logs surface every refusal-to-trade with a clear reason.

Bot trading much more than usual

Most common causes:

SignalEditor strategy with wrong trigger mode (once_per_minute instead of once_per_bar_close).
New volatility regime triggering more buy signals.
Symbol added without recognizing its characteristics.

Investigate the trigger frequency in Logs.

Win rate dropped substantially

Most common causes:

Regime shift — mode is now mismatched.
New symbol added that doesn’t fit the mode.
Stop-loss triggering more frequently due to volatility.

Per-pair analytics often pinpoint which pair is the drag.

Drawdown beyond historical range

Most common causes:

Black-swan-style market event.
Regime mismatch with mode.
Concentration on a struggling pair.

Decide:

Kill switch ON to halt new buying.
Hold existing positions through (if drawdown is recovery-likely).
Manually close (if you’ve lost confidence).

Don’t impulse-close at the bottom.

Connection state intermittent

Most common causes:

VPS network issues.
Exchange-side instability.
DNS issues.

Brief intermittents are normal. Persistent intermittent (every few minutes) is investigation-worthy.

Telegram running total disagrees with Dashboard

Most common causes:

Bot was offline during a trade (Telegram missed).
Database restored from backup, resetting the running total.
Manual close not reflected in one of the surfaces.

Dashboard’s analytics are the source of truth. Telegram running total is a convenience.

Setting up external alerting

Why external alerting

External monitoring catches things you might not see in your daily routine:

Dashboard down (TLS expired? Container crashed?).
SignalsBot down (webhooks failing).
VPS down (provider issue).

Push notifications when something is wrong = faster operator response.

UptimeRobot — free tier, simple setup

https://dashboard.yourdomain.com/api/health (or appropriate health endpoint).
https://signals.yourdomain.com/health.

Configure email/SMS notifications. Most operators get an alert in under a minute when something goes down.

Better Uptime — paid, more features

More sophisticated alerting (escalation policies, on-call rotation, status pages). For operators running larger setups or wanting professional-grade monitoring.

Custom Telegram alerts

For operators comfortable with scripting: a small custom monitor that pings your endpoints and sends Telegram messages on failure.Cheap, no third-party dependency, integrates with your existing Telegram channel.

Best practices

✅ Daily 5-minute Dashboard scan to build intuition for normal.
✅ Weekly 30-minute structured review to catch trends.
✅ External monitoring (UptimeRobot or similar) for uptime alerts.
✅ Match expectations to your mode — BasicMode normal differs from LongTimeLong normal.
✅ Investigate anomalies before reacting — most are noise, some are signal.
✅ Use kill switch when uncertain — false alarms are free.
✅ Document incidents in operator log — pattern recognition improves over time.
✅ Don’t over-react to single-day variance — hold-period thinking matters.
✅ Don’t under-react to multi-day patterns — sustained anomalies are signal.
✅ Set up alerting before you need it — operators reach for monitoring during their first crisis, then realize they didn’t have it.

What’s next

Daily operations

The routines that keep monitoring sustainable.

Common issues

Specific problems and their fixes.

Connection problems

What to do when bot loses connectivity.

Risk overview

Monitoring fits into the broader risk framework.

Dashboard module

The primary monitoring surface.

Support

When monitoring catches something you can’t fix.

​What to monitor — the priority list

1. Connection state

2. Kill switch state

3. Trade frequency vs expectation

4. Win rate trend

5. Drawdown character

6. Per-pair P&L

7. Log error rate

8. API key audit log

​Where to watch — the surfaces

​What “normal” looks like — building intuition

​Anomaly response — the decision tree

​Common anomalies and what they mean

​Setting up external alerting

​Best practices

​What’s next

Daily operations

Common issues

Connection problems

Risk overview

Dashboard module

Support

What to monitor — the priority list

Where to watch — the surfaces

What “normal” looks like — building intuition

Anomaly response — the decision tree

Common anomalies and what they mean

Setting up external alerting

Best practices

What’s next