◈   ⌘ api · Intermediate

Binance API Rate Limits: A Trader's Guide to Staying Connected

Learn how Binance API rate limits work across spot and futures, how to handle them in your trading bots, and practical code examples to avoid getting banned.

Uncle Solieditor · voc · 20.02.2026 ·views 500
◈   Contents
  1. → Why Rate Limits Exist and Why You Should Care
  2. → Binance API Rate Limit Structure: Spot vs Futures
  3. → Checking Your Rate Limit Usage in Real Time
  4. → Smart Rate Limit Management with Token Bucket
  5. → WebSocket Streams: Bypassing REST Rate Limits
  6. → Common Mistakes and How to Avoid Them
  7. → Frequently Asked Questions
  8. → Wrapping Up

Why Rate Limits Exist and Why You Should Care

Every trading bot developer hits the same wall eventually: your perfectly coded strategy suddenly stops getting data, your orders fail silently, and your bot sits there doing nothing while the market moves. The culprit? Binance API rate limits. Understanding how these limits work is the difference between a bot that prints money and one that sits in timeout while opportunities pass you by.

Binance, like every major exchange including Bybit, OKX, and Coinbase, enforces strict API request limits to protect their infrastructure. If 500,000 traders all hammered the orderbook endpoint every millisecond, the matching engine would collapse. Rate limits keep things fair and functional. The trick is working within them intelligently rather than fighting them.

Binance uses a weight-based system rather than a simple request counter. Each endpoint costs a certain number of weight units, and you get a budget that resets every minute. Exceed that budget and you get a 429 response — or worse, a temporary IP ban. The Binance REST API rate limits are generous enough for most strategies, but you need to understand exactly how they work.

Binance API Rate Limit Structure: Spot vs Futures

The Binance API request rate limit system differs between spot and futures markets, and confusing the two is a common mistake. Spot and futures run on separate API gateways with independent weight pools — burning through your spot limit doesn't affect your futures allowance.

Binance API Rate Limits by Market Type
Limit TypeSpotFutures (USD-M)Futures (COIN-M)
Request Weight6,000 per minute2,400 per minute2,400 per minute
Order Rate10 per second / 200,000 per day10 per second / 200,000 per day10 per second / 200,000 per day
Raw Requests61,000 per 5 min61,000 per 5 min61,000 per 5 min
WebSocket Streams1,024 per connection200 per connection200 per connection

The Binance API rate limit spot endpoints are more generous with request weight because spot trading handles a broader range of informational queries. The Binance API rate limit futures endpoints are tighter because futures infrastructure handles more complex margin calculations per request. Note that Binance public API rate limits — endpoints that don't require authentication like ticker prices — share the same IP-based weight pool, so even unauthenticated calls eat into your budget.

Pro tip: Every Binance API response includes X-MBX-USED-WEIGHT-1M in the headers. Monitor this value religiously. When it approaches 80% of your limit, throttle your requests. Don't wait for a 429 — by then you've already lost precious seconds.

For comparison, Bybit uses a similar weight-based system but with different thresholds, and OKX uses a simpler per-second rate limit on each endpoint. If your bot trades across multiple exchanges, you need separate rate limiting logic for each one. Many traders who monitor signals from platforms like VoiceOfChain run multi-exchange bots and quickly learn that one-size-fits-all rate limiting doesn't work.

Checking Your Rate Limit Usage in Real Time

Before writing any trading logic, set up proper rate limit monitoring. Here's a Python class that tracks your Binance API request limits automatically by reading response headers:

import time
import hmac
import hashlib
import requests
from urllib.parse import urlencode

class BinanceClient:
    BASE_URL = "https://api.binance.com"
    
    def __init__(self, api_key: str, api_secret: str):
        self.api_key = api_key
        self.api_secret = api_secret
        self.session = requests.Session()
        self.session.headers.update({"X-MBX-APIKEY": api_key})
        self.used_weight = 0
        self.weight_limit = 6000
        self.order_count = 0
    
    def _sign(self, params: dict) -> dict:
        params["timestamp"] = int(time.time() * 1000)
        query = urlencode(params)
        params["signature"] = hmac.new(
            self.api_secret.encode(), query.encode(), hashlib.sha256
        ).hexdigest()
        return params
    
    def _request(self, method: str, path: str, params: dict = None, signed: bool = False):
        params = params or {}
        if signed:
            params = self._sign(params)
        
        resp = self.session.request(method, f"{self.BASE_URL}{path}", params=params)
        
        # Track rate limit usage from response headers
        self.used_weight = int(resp.headers.get("X-MBX-USED-WEIGHT-1M", 0))
        order_count = resp.headers.get("X-MBX-ORDER-COUNT-1M")
        if order_count:
            self.order_count = int(order_count)
        
        if resp.status_code == 429:
            retry_after = int(resp.headers.get("Retry-After", 60))
            print(f"Rate limited! Waiting {retry_after}s...")
            time.sleep(retry_after)
            return self._request(method, path, params, signed)
        
        if resp.status_code == 418:
            # IP banned — stop all requests immediately
            raise Exception("IP banned by Binance. Stop all requests.")
        
        resp.raise_for_status()
        return resp.json()
    
    @property
    def weight_remaining(self) -> int:
        return self.weight_limit - self.used_weight
    
    def get_klines(self, symbol: str, interval: str, limit: int = 500):
        """Kline endpoint costs weight of 2 (limit <= 500) or 5 (limit <= 1000)"""
        return self._request("GET", "/api/v3/klines", {
            "symbol": symbol, "interval": interval, "limit": limit
        })
    
    def get_account(self):
        """Account info costs weight of 20"""
        return self._request("GET", "/api/v3/account", signed=True)
    
    def place_order(self, symbol: str, side: str, quantity: float, price: float = None):
        params = {"symbol": symbol, "side": side, "quantity": quantity}
        if price:
            params["type"] = "LIMIT"
            params["timeInForce"] = "GTC"
            params["price"] = price
        else:
            params["type"] = "MARKET"
        return self._request("POST", "/api/v3/order", params, signed=True)

This client reads X-MBX-USED-WEIGHT-1M from every response so you always know exactly where you stand. The Binance kline API rate limit varies by the number of candles you request — pulling 500 candles costs 2 weight, while 1,000 candles costs 5. These small differences add up fast in a loop.

Smart Rate Limit Management with Token Bucket

Reactive rate limiting — waiting until you hit a 429 — is amateur hour. Professional algo traders use proactive throttling. The token bucket pattern is the gold standard. You get a bucket of tokens that refills at a constant rate, and each request consumes tokens based on its weight:

import time
import threading

class RateLimiter:
    """Token bucket rate limiter for Binance API request rate limit management."""
    
    def __init__(self, max_weight: int = 6000, refill_seconds: int = 60):
        self.max_weight = max_weight
        self.tokens = max_weight
        self.refill_rate = max_weight / refill_seconds  # tokens per second
        self.last_refill = time.monotonic()
        self.lock = threading.Lock()
    
    def _refill(self):
        now = time.monotonic()
        elapsed = now - self.last_refill
        self.tokens = min(self.max_weight, self.tokens + elapsed * self.refill_rate)
        self.last_refill = now
    
    def acquire(self, weight: int = 1) -> float:
        """Wait until enough weight is available. Returns wait time in seconds."""
        with self.lock:
            self._refill()
            if self.tokens >= weight:
                self.tokens -= weight
                return 0.0
            
            wait_time = (weight - self.tokens) / self.refill_rate
            time.sleep(wait_time)
            self._refill()
            self.tokens -= weight
            return wait_time
    
    def sync_from_header(self, used_weight: int):
        """Sync internal state with actual usage from X-MBX-USED-WEIGHT-1M header."""
        with self.lock:
            self.tokens = self.max_weight - used_weight


# Usage in a trading loop
limiter = RateLimiter(max_weight=6000)
client = BinanceClient("your_key", "your_secret")

symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT", "BNBUSDT"]
for symbol in symbols:
    wait = limiter.acquire(weight=2)  # klines cost 2 weight
    if wait > 0:
        print(f"Throttled {wait:.1f}s for {symbol}")
    klines = client.get_klines(symbol, "5m", limit=100)
    limiter.sync_from_header(client.used_weight)
    print(f"{symbol}: {len(klines)} candles, weight used: {client.used_weight}/{limiter.max_weight}")

The key insight is the sync_from_header method. Your local token count can drift from reality due to timing differences, so re-syncing with the actual header value on every response keeps your limiter accurate. This approach handles the Binance REST API rate limits gracefully across both spot and futures.

WebSocket Streams: Bypassing REST Rate Limits

Here's what separates beginner bot builders from experienced ones: you shouldn't be polling REST endpoints for real-time data at all. Every price check via REST costs weight. WebSocket streams deliver the same data pushed to you with zero weight cost.

import json
import websocket

def on_message(ws, message):
    data = json.loads(message)
    # Mini ticker stream delivers price updates in real time
    print(f"{data['s']}: ${float(data['c']):,.2f} | Vol: {float(data['v']):,.0f}")

def on_error(ws, error):
    print(f"WebSocket error: {error}")

def on_close(ws, close_code, close_msg):
    print(f"Connection closed ({close_code}): {close_msg}")
    # Reconnect after 5 seconds
    import time
    time.sleep(5)
    start_stream()

def start_stream():
    # Subscribe to multiple symbols via combined stream
    streams = "btcusdt@miniTicker/ethusdt@miniTicker/solusdt@miniTicker"
    url = f"wss://stream.binance.com:9443/stream?streams={streams}"
    
    ws = websocket.WebSocketApp(
        url,
        on_message=on_message,
        on_error=on_error,
        on_close=on_close
    )
    ws.run_forever()

start_stream()

Use REST calls only for actions that require them: placing orders, checking account balances, and pulling historical klines. For live prices, orderbook snapshots, and trade streams — always use WebSocket. This strategy alone can reduce your Binance API request limits consumption by 80% or more.

Binance US API rate limits are stricter than the global Binance API. If you're trading on Binance.US, reduce your weight budget to 1,200 per minute for spot and test conservatively. The yahoo finance API rate limits are even more restrictive for market data — another reason crypto-native APIs are preferred for trading bots.

Common Mistakes and How to Avoid Them

After years of building trading bots on Binance, Bybit, and KuCoin, certain patterns keep causing the same problems:

Traders using VoiceOfChain signals for real-time entry points often run into rate limit issues because they try to validate signals by pulling fresh data across dozens of pairs simultaneously. A smarter approach is maintaining WebSocket streams for your watchlist and only making REST calls when you actually need to execute a trade.

On platforms like Gate.io and Bitget, rate limits are structured differently — typically per-endpoint rather than weight-based. This means your rate limiting middleware needs to be exchange-aware, not just a generic throttle. Always read the specific API documentation for each exchange you connect to.

Frequently Asked Questions

What happens when you exceed Binance API rate limits?
You first receive HTTP 429 responses with a Retry-After header telling you how long to wait. If you keep sending requests after a 429, Binance escalates to a 418 IP ban lasting anywhere from 2 minutes to 48 hours depending on severity. Your bot should immediately stop all requests on receiving a 429.
Are Binance API rate limits per API key or per IP address?
Request weight limits are per IP address, while order rate limits are per API key. This means multiple API keys on the same IP share the weight budget. If you run multiple bots, use separate IPs or implement a centralized rate limiter.
How do Binance futures API rate limits differ from spot?
Binance futures (USD-M and COIN-M) have a lower request weight limit of 2,400 per minute compared to spot's 6,000. However, futures and spot limits are tracked independently — using futures endpoints doesn't reduce your spot budget and vice versa.
Can I increase my Binance API rate limits?
Yes. Binance offers higher rate limits for institutional and VIP traders. You need to contact Binance support or your VIP account manager. Typically, VIP 1+ accounts can request elevated limits. Market makers also receive special rate limit tiers.
What is the Binance kline API rate limit?
The /api/v3/klines endpoint costs 2 weight for up to 500 candles and 5 weight for up to 1,000 candles. With a 6,000 weight budget per minute, you can pull about 3,000 kline requests (500 candles each) per minute before hitting the limit.
Do WebSocket connections count against Binance API rate limits?
No. WebSocket data streams do not consume request weight. Only REST API calls count. This is why switching from REST polling to WebSocket streams is the single most effective way to reduce your rate limit usage.

Wrapping Up

Binance API rate limits are not obstacles — they're constraints that force you to write better code. The traders who build robust, rate-limit-aware bots are the ones whose strategies actually run in production for months without intervention. Monitor your weight usage through response headers, use WebSocket streams for real-time data, implement proactive throttling with a token bucket, and always handle 429 and 418 responses gracefully. Your bot's uptime depends on it.

◈   more on this topic
◉ basics Mastering the ccxt library documentation for crypto traders ⌂ exchanges Mastering the Binance CCXT Library for Crypto Traders ⌬ bots Best Crypto Trading Bots 2025: Profitable AI-Powered Strategies