Exchange API Uptime Monitoring: Keep Your Trades Running
How to detect crypto exchange API outages before they blow up your positions — practical Python monitoring scripts for Binance, Bybit, OKX, Bitget, and more.
How to detect crypto exchange API outages before they blow up your positions — practical Python monitoring scripts for Binance, Bybit, OKX, Bitget, and more.
Your trading bot was running clean — filling orders on Binance, hedging on OKX, everything ticking along. Then at 3:47 AM the Binance REST API went sideways for eleven minutes. Your bot kept retrying into a loop, missed the reversal, and you woke up to a blown position. Exchange API downtime is one of the most underappreciated risks in algorithmic and semi-automated trading. It doesn't happen often — but when it does, it tends to happen during high-volatility windows when you're most exposed. Building a lightweight uptime monitor is not optional if you trade with any automation whatsoever.
Complete outages are actually the easy case. Your bot gets connection refused, your error handler fires, everything stops cleanly. The genuinely dangerous scenario is partial failure: the API responds with 200 OK but returns stale order book data, or authentication works but order placement silently queues without confirming, or cancels time out while new orders go through. Your bot has no idea it's operating on garbage data and keeps trading.
On Binance and Bybit, partial degradation has historically preceded full outages by 5-15 minutes. Latency climbing from 80ms to 600ms is a warning sign, not noise. A monitor that only checks for HTTP 200 will miss this entirely. The categories of failure you actually need to detect are meaningfully different from each other, and your response to each should be different too.
Many traders also monitor signals platforms like VoiceOfChain alongside exchange health — if real-time order flow signals stop updating, that's often a leading indicator that underlying exchange data feeds are degraded even before the REST API shows problems.
Every major exchange exposes lightweight ping or server-time endpoints that require no authentication and return in under 10ms when healthy. These are your canary in the coal mine. Polling them every 30-60 seconds costs essentially nothing and gives you a reliable heartbeat. They're intentionally designed to be cheap — exchanges want you to use them instead of hammering heavier endpoints.
| Exchange | Endpoint | Auth Required | Normal Latency |
|---|---|---|---|
| Binance | GET /api/v3/ping | No | 50-150ms |
| Bybit | GET /v5/market/time | No | 80-200ms |
| OKX | GET /api/v5/public/time | No | 100-250ms |
| Coinbase Adv. | GET /api/v3/brokerage/time | No | 100-300ms |
| Bitget | GET /api/v2/public/time | No | 100-200ms |
| Gate.io | GET /api/v4/spot/time | No | 80-180ms |
Bookmark the official status pages: status.binance.com, status.bybit.com, and status.okx.com. These update during incidents faster than Twitter and show which specific services are affected — REST API, WebSocket feeds, withdrawals — so you know exactly what's broken.
The foundation is a function that hits a health endpoint, measures round-trip latency, handles all failure modes explicitly, and returns structured data. Explicit is better than silent — you want to distinguish a timeout from a connection error from an HTTP 500, because each means something different about what's happening on the exchange side.
import requests
import time
def check_exchange(name: str, url: str, timeout: float = 5.0) -> dict:
try:
t0 = time.monotonic()
resp = requests.get(url, timeout=timeout)
latency_ms = round((time.monotonic() - t0) * 1000, 1)
if resp.status_code == 200:
# Flag as slow even though technically "up"
status = "slow" if latency_ms > 500 else "up"
elif resp.status_code == 429:
status = "rate_limited"
elif resp.status_code >= 500:
status = "server_error"
else:
status = "degraded"
return {
"exchange": name,
"status": status,
"latency_ms": latency_ms,
"http_code": resp.status_code,
}
except requests.exceptions.Timeout:
return {"exchange": name, "status": "timeout", "latency_ms": None, "http_code": None}
except requests.exceptions.ConnectionError:
return {"exchange": name, "status": "unreachable", "latency_ms": None, "http_code": None}
except Exception as e:
return {"exchange": name, "status": "error", "error": str(e), "http_code": None}
# Quick test
result = check_exchange("Binance", "https://api.binance.com/api/v3/ping")
print(result)
# {'exchange': 'Binance', 'status': 'up', 'latency_ms': 87.3, 'http_code': 200}
That 87ms reading is your baseline. Run this every 30 seconds for a few days and you'll know exactly what 'normal' looks like for each exchange from your server's location. When Binance starts responding at 400ms and climbing, you have maybe 5-10 minutes before something breaks. That's actionable lead time.
If you're running strategies across Binance, OKX, and Bybit simultaneously, checking them sequentially is wrong. If Binance takes 5 seconds to timeout, you're 5 seconds late detecting an OKX problem. Use a thread pool to fire all checks in parallel — total wall time equals your slowest response, not the sum of all of them.
import requests
import time
import concurrent.futures
from datetime import datetime, timezone
EXCHANGES = {
"Binance": "https://api.binance.com/api/v3/ping",
"Bybit": "https://api.bybit.com/v5/market/time",
"OKX": "https://www.okx.com/api/v5/public/time",
"Bitget": "https://api.bitget.com/api/v2/public/time",
"Coinbase": "https://api.exchange.coinbase.com/time",
}
LATENCY_WARN_MS = 500
def check_one(args: tuple) -> tuple:
name, url = args
try:
t0 = time.monotonic()
r = requests.get(url, timeout=5)
latency = round((time.monotonic() - t0) * 1000, 1)
if r.status_code != 200:
return name, {"status": "degraded", "code": r.status_code, "latency_ms": latency}
status = "slow" if latency > LATENCY_WARN_MS else "up"
return name, {"status": status, "latency_ms": latency}
except requests.exceptions.Timeout:
return name, {"status": "timeout", "latency_ms": None}
except Exception as e:
return name, {"status": "down", "error": type(e).__name__}
def monitor_all() -> dict:
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as pool:
return dict(pool.map(check_one, EXCHANGES.items()))
if __name__ == "__main__":
while True:
ts = datetime.now(timezone.utc).strftime("%H:%M:%S UTC")
results = monitor_all()
for exch, info in results.items():
icon = "OK" if info["status"] == "up" else "!!"
print(f"[{ts}] [{icon}] {exch}: {info}")
time.sleep(30)
On a good connection the entire sweep of five exchanges completes in under 300ms. You can run this on a cheap $5/month VPS in the same region as your trading server to get meaningful latency readings rather than your laptop's WiFi introducing noise. Platforms like Bybit and OKX both recommend co-locating monitoring processes close to their API endpoints for the most reliable signal.
A monitor that prints to stdout is useless when you're asleep or away from the terminal. You need it to reach you. The simplest production-grade alert channel for solo traders is Telegram — a bot takes under five minutes to set up and messages arrive instantly on your phone. The critical design principle: alert on state transitions, not on current state. One message when the exchange goes down, one when it recovers. Not a message every 30 seconds while it's still down — that trains you to ignore the alerts.
import requests
import time
import os
# Load from environment — never hardcode these
TELEGRAM_TOKEN = os.environ["TELEGRAM_TOKEN"]
TELEGRAM_CHAT_ID = os.environ["TELEGRAM_CHAT_ID"]
EXCHANGES = {
"Binance": "https://api.binance.com/api/v3/ping",
"OKX": "https://www.okx.com/api/v5/public/time",
"Bybit": "https://api.bybit.com/v5/market/time",
}
def send_telegram(msg: str) -> None:
url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
try:
requests.post(url, json={"chat_id": TELEGRAM_CHAT_ID, "text": msg}, timeout=5)
except Exception:
pass # don't crash monitor if Telegram itself has issues
def get_status(url: str) -> str:
try:
r = requests.get(url, timeout=5)
if r.status_code == 200:
return "up"
return f"degraded_{r.status_code}"
except requests.exceptions.Timeout:
return "timeout"
except Exception:
return "down"
# Initialize all as "up" — avoids false alerts on first run
prev = {name: "up" for name in EXCHANGES}
while True:
for name, url in EXCHANGES.items():
current = get_status(url)
was_ok = prev[name] == "up"
is_ok = current == "up"
if was_ok and not is_ok:
send_telegram(f"ALERT: {name} API is {current.upper()} — consider pausing bots")
elif not was_ok and is_ok:
send_telegram(f"RESOLVED: {name} API is back UP")
prev[name] = current
time.sleep(60)
Store TELEGRAM_TOKEN and TELEGRAM_CHAT_ID as environment variables. Never commit credentials to a repo. Use python-dotenv or export them in your shell profile. Binance and OKX have both had credential-scraping incidents from public repos — treat API keys with the same care as passwords.
For team setups or multi-strategy operations, route alerts to a dedicated Discord channel or PagerDuty. Both Bybit and Gate.io provide official status webhooks you can subscribe to for exchange-sourced incident notifications — check their developer portals. That gives you ground truth from the exchange itself in addition to your own external polling.
A solid API uptime monitor is one of those things that feels unnecessary right up until the moment it saves your account. The scripts above are production-usable with minimal modification — add your environment variables, pick your alert channel, and run them as a background process or small VPS service. Binance, OKX, Bybit, Bitget, Coinbase — all of them expose the health endpoints you need, you just have to poll them. For an additional layer of confidence that market data is actually flowing correctly and not just technically reachable, platforms like VoiceOfChain track real-time order flow signals across exchanges and can serve as a secondary health check. Set the monitor up once, run it always, and stop learning about exchange outages from your P&L.