Binance API Rate Limits: A Trader's Guide to Staying Connected
Learn how Binance API rate limits work across spot and futures, how to handle them in your trading bots, and practical code examples to avoid getting banned.
Learn how Binance API rate limits work across spot and futures, how to handle them in your trading bots, and practical code examples to avoid getting banned.
Every trading bot developer hits the same wall eventually: your perfectly coded strategy suddenly stops getting data, your orders fail silently, and your bot sits there doing nothing while the market moves. The culprit? Binance API rate limits. Understanding how these limits work is the difference between a bot that prints money and one that sits in timeout while opportunities pass you by.
Binance, like every major exchange including Bybit, OKX, and Coinbase, enforces strict API request limits to protect their infrastructure. If 500,000 traders all hammered the orderbook endpoint every millisecond, the matching engine would collapse. Rate limits keep things fair and functional. The trick is working within them intelligently rather than fighting them.
Binance uses a weight-based system rather than a simple request counter. Each endpoint costs a certain number of weight units, and you get a budget that resets every minute. Exceed that budget and you get a 429 response — or worse, a temporary IP ban. The Binance REST API rate limits are generous enough for most strategies, but you need to understand exactly how they work.
The Binance API request rate limit system differs between spot and futures markets, and confusing the two is a common mistake. Spot and futures run on separate API gateways with independent weight pools — burning through your spot limit doesn't affect your futures allowance.
| Limit Type | Spot | Futures (USD-M) | Futures (COIN-M) |
|---|---|---|---|
| Request Weight | 6,000 per minute | 2,400 per minute | 2,400 per minute |
| Order Rate | 10 per second / 200,000 per day | 10 per second / 200,000 per day | 10 per second / 200,000 per day |
| Raw Requests | 61,000 per 5 min | 61,000 per 5 min | 61,000 per 5 min |
| WebSocket Streams | 1,024 per connection | 200 per connection | 200 per connection |
The Binance API rate limit spot endpoints are more generous with request weight because spot trading handles a broader range of informational queries. The Binance API rate limit futures endpoints are tighter because futures infrastructure handles more complex margin calculations per request. Note that Binance public API rate limits — endpoints that don't require authentication like ticker prices — share the same IP-based weight pool, so even unauthenticated calls eat into your budget.
Pro tip: Every Binance API response includes X-MBX-USED-WEIGHT-1M in the headers. Monitor this value religiously. When it approaches 80% of your limit, throttle your requests. Don't wait for a 429 — by then you've already lost precious seconds.
For comparison, Bybit uses a similar weight-based system but with different thresholds, and OKX uses a simpler per-second rate limit on each endpoint. If your bot trades across multiple exchanges, you need separate rate limiting logic for each one. Many traders who monitor signals from platforms like VoiceOfChain run multi-exchange bots and quickly learn that one-size-fits-all rate limiting doesn't work.
Before writing any trading logic, set up proper rate limit monitoring. Here's a Python class that tracks your Binance API request limits automatically by reading response headers:
import time
import hmac
import hashlib
import requests
from urllib.parse import urlencode
class BinanceClient:
BASE_URL = "https://api.binance.com"
def __init__(self, api_key: str, api_secret: str):
self.api_key = api_key
self.api_secret = api_secret
self.session = requests.Session()
self.session.headers.update({"X-MBX-APIKEY": api_key})
self.used_weight = 0
self.weight_limit = 6000
self.order_count = 0
def _sign(self, params: dict) -> dict:
params["timestamp"] = int(time.time() * 1000)
query = urlencode(params)
params["signature"] = hmac.new(
self.api_secret.encode(), query.encode(), hashlib.sha256
).hexdigest()
return params
def _request(self, method: str, path: str, params: dict = None, signed: bool = False):
params = params or {}
if signed:
params = self._sign(params)
resp = self.session.request(method, f"{self.BASE_URL}{path}", params=params)
# Track rate limit usage from response headers
self.used_weight = int(resp.headers.get("X-MBX-USED-WEIGHT-1M", 0))
order_count = resp.headers.get("X-MBX-ORDER-COUNT-1M")
if order_count:
self.order_count = int(order_count)
if resp.status_code == 429:
retry_after = int(resp.headers.get("Retry-After", 60))
print(f"Rate limited! Waiting {retry_after}s...")
time.sleep(retry_after)
return self._request(method, path, params, signed)
if resp.status_code == 418:
# IP banned — stop all requests immediately
raise Exception("IP banned by Binance. Stop all requests.")
resp.raise_for_status()
return resp.json()
@property
def weight_remaining(self) -> int:
return self.weight_limit - self.used_weight
def get_klines(self, symbol: str, interval: str, limit: int = 500):
"""Kline endpoint costs weight of 2 (limit <= 500) or 5 (limit <= 1000)"""
return self._request("GET", "/api/v3/klines", {
"symbol": symbol, "interval": interval, "limit": limit
})
def get_account(self):
"""Account info costs weight of 20"""
return self._request("GET", "/api/v3/account", signed=True)
def place_order(self, symbol: str, side: str, quantity: float, price: float = None):
params = {"symbol": symbol, "side": side, "quantity": quantity}
if price:
params["type"] = "LIMIT"
params["timeInForce"] = "GTC"
params["price"] = price
else:
params["type"] = "MARKET"
return self._request("POST", "/api/v3/order", params, signed=True)
This client reads X-MBX-USED-WEIGHT-1M from every response so you always know exactly where you stand. The Binance kline API rate limit varies by the number of candles you request — pulling 500 candles costs 2 weight, while 1,000 candles costs 5. These small differences add up fast in a loop.
Reactive rate limiting — waiting until you hit a 429 — is amateur hour. Professional algo traders use proactive throttling. The token bucket pattern is the gold standard. You get a bucket of tokens that refills at a constant rate, and each request consumes tokens based on its weight:
import time
import threading
class RateLimiter:
"""Token bucket rate limiter for Binance API request rate limit management."""
def __init__(self, max_weight: int = 6000, refill_seconds: int = 60):
self.max_weight = max_weight
self.tokens = max_weight
self.refill_rate = max_weight / refill_seconds # tokens per second
self.last_refill = time.monotonic()
self.lock = threading.Lock()
def _refill(self):
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(self.max_weight, self.tokens + elapsed * self.refill_rate)
self.last_refill = now
def acquire(self, weight: int = 1) -> float:
"""Wait until enough weight is available. Returns wait time in seconds."""
with self.lock:
self._refill()
if self.tokens >= weight:
self.tokens -= weight
return 0.0
wait_time = (weight - self.tokens) / self.refill_rate
time.sleep(wait_time)
self._refill()
self.tokens -= weight
return wait_time
def sync_from_header(self, used_weight: int):
"""Sync internal state with actual usage from X-MBX-USED-WEIGHT-1M header."""
with self.lock:
self.tokens = self.max_weight - used_weight
# Usage in a trading loop
limiter = RateLimiter(max_weight=6000)
client = BinanceClient("your_key", "your_secret")
symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT", "BNBUSDT"]
for symbol in symbols:
wait = limiter.acquire(weight=2) # klines cost 2 weight
if wait > 0:
print(f"Throttled {wait:.1f}s for {symbol}")
klines = client.get_klines(symbol, "5m", limit=100)
limiter.sync_from_header(client.used_weight)
print(f"{symbol}: {len(klines)} candles, weight used: {client.used_weight}/{limiter.max_weight}")
The key insight is the sync_from_header method. Your local token count can drift from reality due to timing differences, so re-syncing with the actual header value on every response keeps your limiter accurate. This approach handles the Binance REST API rate limits gracefully across both spot and futures.
Here's what separates beginner bot builders from experienced ones: you shouldn't be polling REST endpoints for real-time data at all. Every price check via REST costs weight. WebSocket streams deliver the same data pushed to you with zero weight cost.
import json
import websocket
def on_message(ws, message):
data = json.loads(message)
# Mini ticker stream delivers price updates in real time
print(f"{data['s']}: ${float(data['c']):,.2f} | Vol: {float(data['v']):,.0f}")
def on_error(ws, error):
print(f"WebSocket error: {error}")
def on_close(ws, close_code, close_msg):
print(f"Connection closed ({close_code}): {close_msg}")
# Reconnect after 5 seconds
import time
time.sleep(5)
start_stream()
def start_stream():
# Subscribe to multiple symbols via combined stream
streams = "btcusdt@miniTicker/ethusdt@miniTicker/solusdt@miniTicker"
url = f"wss://stream.binance.com:9443/stream?streams={streams}"
ws = websocket.WebSocketApp(
url,
on_message=on_message,
on_error=on_error,
on_close=on_close
)
ws.run_forever()
start_stream()
Use REST calls only for actions that require them: placing orders, checking account balances, and pulling historical klines. For live prices, orderbook snapshots, and trade streams — always use WebSocket. This strategy alone can reduce your Binance API request limits consumption by 80% or more.
Binance US API rate limits are stricter than the global Binance API. If you're trading on Binance.US, reduce your weight budget to 1,200 per minute for spot and test conservatively. The yahoo finance API rate limits are even more restrictive for market data — another reason crypto-native APIs are preferred for trading bots.
After years of building trading bots on Binance, Bybit, and KuCoin, certain patterns keep causing the same problems:
Traders using VoiceOfChain signals for real-time entry points often run into rate limit issues because they try to validate signals by pulling fresh data across dozens of pairs simultaneously. A smarter approach is maintaining WebSocket streams for your watchlist and only making REST calls when you actually need to execute a trade.
On platforms like Gate.io and Bitget, rate limits are structured differently — typically per-endpoint rather than weight-based. This means your rate limiting middleware needs to be exchange-aware, not just a generic throttle. Always read the specific API documentation for each exchange you connect to.
Binance API rate limits are not obstacles — they're constraints that force you to write better code. The traders who build robust, rate-limit-aware bots are the ones whose strategies actually run in production for months without intervention. Monitor your weight usage through response headers, use WebSocket streams for real-time data, implement proactive throttling with a token bucket, and always handle 429 and 418 responses gracefully. Your bot's uptime depends on it.