Binance API Too Many Requests: Fix Rate Limits Fast
Hit the Binance API too many requests error? Learn exact rate limits, Python code fixes, and pro strategies to keep your trading bot running without interruptions.
Hit the Binance API too many requests error? Learn exact rate limits, Python code fixes, and pro strategies to keep your trading bot running without interruptions.
Your trading bot was humming along — pulling price data, checking order books, executing on signals — and then it stops. HTTP 429. Binance API too many requests. If you've built anything that talks to the Binance API, you've hit this wall at least once. The frustrating part is that 429s don't come with a manual. You either burn hours debugging weight calculations or watch your bot get IP-banned while you figure it out. This guide cuts through that. You'll learn exactly how the Binance API request limit works, why bots blow past it, and the practical Python patterns that keep you under the threshold — permanently.
Binance doesn't limit you by raw request count — it uses a weight system. Every endpoint has a weight cost, and you're allowed 1,200 weight units per minute on the Spot API. Blow past that and you get a 429 response. Keep hammering it and Binance escalates to a 418, which means your IP is banned for a defined period. The ban duration increases with repeat violations: a few minutes the first time, up to several days for chronic offenders. Understanding this distinction matters. A 429 is recoverable if you back off immediately. A 418 means stop all requests and wait.
Beyond the per-minute weight limit, there's also a raw request cap of 6,100 requests per 5 minutes regardless of weight. And for order operations specifically, Binance enforces a binance limit per day of 200,000 orders per 24 hours, with an additional sub-limit of 100 orders per 10 seconds. Most data-pulling bots don't hit the order limits — but high-frequency algo traders running on Binance absolutely do. Platforms like OKX and Bybit have their own rate structures, but the Binance API rate limit system with its weight model is among the most granular in the industry.
| Endpoint | Weight Cost | Notes |
|---|---|---|
| /api/v3/ticker/price (one symbol) | 2 | Cheapest price fetch |
| /api/v3/ticker/price (all symbols) | 4 | Pulls 500+ pairs at once |
| /api/v3/depth (limit ≤ 100) | 5 | Order book snapshot |
| /api/v3/klines (limit ≤ 499) | 2 | Candlestick data |
| /api/v3/klines (limit 500–999) | 5 | 5x cost jump — avoid |
| /api/v3/klines (limit = 1000) | 10 | Max candles per call |
| /api/v3/order (GET single) | 4 | Order status lookup |
| /api/v3/openOrders | 40 | Very expensive — don't poll this |
Always check the X-MBX-USED-WEIGHT-1M header in every Binance API response. It tells you exactly how much weight you've consumed in the current minute window — use it to throttle proactively instead of reacting to 429s after the fact.
The most common mistake is polling. New bot developers fall into the pattern of calling GET /api/v3/ticker/price in a tight loop — every second, sometimes faster. At weight 2 per call, that's 120 weight per minute if you're pulling a single symbol once per second. Bump it to 10 symbols polled individually and you're already at 1,200 weight per minute — right at the wall — before your bot has executed a single trade. The fix isn't to slow down your data fetching; it's to switch to WebSocket streams, which don't consume any REST weight at all.
Another common trap is fetching all symbols on the ticker endpoint. Calling /api/v3/ticker/price with no symbol parameter returns prices for every trading pair on Binance and costs 4 weight, not 2. That sounds cheap until you're doing it 300 times per minute. Similarly, /api/v3/openOrders costs 40 weight per call — developers who poll it to detect fills burn through their weekly budget in minutes. This pattern also shows up elsewhere: the yahoo finance api too many requests error follows the same root cause — overpolling a REST endpoint instead of using streaming or response caching.
The first layer of defense is correct error handling. Your bot needs to detect 429 responses, read the Retry-After header, wait the full duration, and then retry. Without this, your bot crashes or triggers the escalating IP ban. Here's a minimal but complete implementation that handles both the 429 and the more serious 418:
import requests
import time
BASE_URL = "https://api.binance.com"
def get_price(symbol: str) -> dict:
url = f"{BASE_URL}/api/v3/ticker/price"
params = {"symbol": symbol}
response = requests.get(url, params=params)
# 429 = rate limited — back off and retry
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
print(f"Rate limited. Sleeping {retry_after}s...")
time.sleep(retry_after)
return get_price(symbol)
# 418 = IP banned — do not retry, stop immediately
if response.status_code == 418:
raise RuntimeError(
"IP banned by Binance. Halt all requests and wait before resuming."
)
response.raise_for_status()
return response.json()
data = get_price("BTCUSDT")
print(f"BTC price: ${float(data['price']):,.2f}")
That handles the reactive case — what to do after you've already hit the wall. The smarter approach is proactive throttling: read the X-MBX-USED-WEIGHT-1M header on every response and pause before you breach the limit. Here's a production-ready client class that tracks weight usage and backs off automatically, with support for authenticated endpoints:
import requests
import time
class BinanceClient:
BASE_URL = "https://api.binance.com"
WEIGHT_LIMIT = 1200 # per 1-minute window
SAFETY_RATIO = 0.80 # pause when 80% consumed
def __init__(self, api_key: str = ""):
self.session = requests.Session()
if api_key:
self.session.headers["X-MBX-APIKEY"] = api_key
self.weight_used = 0
def _request(self, method: str, endpoint: str, **kwargs) -> dict:
url = f"{self.BASE_URL}{endpoint}"
for attempt in range(3):
resp = self.session.request(method, url, **kwargs)
self.weight_used = int(
resp.headers.get("X-MBX-USED-WEIGHT-1M", 0)
)
if resp.status_code == 429:
wait = int(resp.headers.get("Retry-After", 60))
print(f"[429] Rate limited. Waiting {wait}s (attempt {attempt + 1}/3)")
time.sleep(wait)
continue
if resp.status_code == 418:
raise RuntimeError("[418] IP banned. Stop all requests immediately.")
# Proactive pause when approaching the weight ceiling
if self.weight_used > self.WEIGHT_LIMIT * self.SAFETY_RATIO:
pause = 61 - (int(time.time()) % 60)
print(f"[Throttle] {self.weight_used}/{self.WEIGHT_LIMIT} used. Pausing {pause}s")
time.sleep(pause)
resp.raise_for_status()
return resp.json()
raise RuntimeError("Max retries exceeded")
def get_klines(self, symbol: str, interval: str, limit: int = 200) -> list:
# Weight: 2 for limit<=499, 5 for 500-999, 10 for 1000
return self._request("GET", "/api/v3/klines",
params={"symbol": symbol, "interval": interval, "limit": limit})
def get_order_book(self, symbol: str, depth: int = 20) -> dict:
return self._request("GET", "/api/v3/depth",
params={"symbol": symbol, "limit": depth})
# Usage
client = BinanceClient(api_key="YOUR_KEY_HERE")
candles = client.get_klines("ETHUSDT", "15m", limit=200)
print(f"Fetched {len(candles)} candles | Weight used: {client.weight_used}/1200")
WebSocket streams are the permanent fix for rate limit problems on real-time data. Unlike REST calls, WebSocket connections push updates to your bot — no polling, zero weight consumed. Binance offers streams for individual trades, aggregated trades, candlestick updates, order book diffs, and user account events. The rule is simple: anything you need continuously should come from a WebSocket. REST calls should be reserved for one-time operations — loading historical candles at startup, placing orders, fetching account balance.
Services like VoiceOfChain use persistent WebSocket connections at the infrastructure level to ingest real-time market data from Binance without consuming any REST API weight. That's how a signal platform can monitor hundreds of pairs simultaneously and deliver alerts the moment conditions are met — no polling delay, no rate limit risk. If your bot spends more than 20% of its weight budget just keeping prices current, WebSocket streams are the fix.
import asyncio
import json
import websockets
async def stream_trades(symbols: list[str]):
"""Subscribe to real-time aggregated trade stream — no REST weight, no API key needed."""
streams = "/".join(f"{s.lower()}@aggTrade" for s in symbols)
url = f"wss://stream.binance.com:9443/stream?streams={streams}"
async with websockets.connect(url, ping_interval=20) as ws:
print(f"Connected | Streaming: {', '.join(symbols)}")
async for raw_msg in ws:
msg = json.loads(raw_msg)
trade = msg["data"]
symbol = trade["s"]
price = float(trade["p"])
quantity = float(trade["q"])
is_sell = trade["m"] # True = buyer is market maker = sell side
side = "SELL" if is_sell else "BUY "
print(f"{symbol} | {side} | ${price:>12,.4f} | qty: {quantity}")
# Streams BTC, ETH, and SOL trades simultaneously — zero rate limit impact
asyncio.run(stream_trades(["BTCUSDT", "ETHUSDT", "SOLUSDT"]))
A single Binance WebSocket connection supports up to 1,024 combined stream subscriptions. You can monitor hundreds of trading pairs for price, depth, and candlestick updates through one connection — consuming no REST API weight at all.
If you're building a multi-exchange system — common for arbitrage strategies or aggregating signals across Binance, Bybit, and OKX — each exchange has its own rate limit model. Knowing the differences lets you budget requests correctly across venues. Bybit uses a per-second model rather than weight-based: most public endpoints allow 10–20 requests per second, with institutional accounts getting higher thresholds. OKX applies tiered limits based on account level, generally 20 REST requests per 2 seconds per endpoint category. Gate.io and KuCoin both use points-per-second systems that function similarly to Binance's weight approach. Bitget recently standardized its limits around 10 requests per second per endpoint for spot data. Coinbase Advanced Trade API is generally more generous on market data but stricter on order operations — 10 requests per second for order placement.
| Exchange | Limit Model | Market Data Limit | WebSocket |
|---|---|---|---|
| Binance | Weight/minute | 1,200 weight/min | Yes — all streams |
| Bybit | Requests/second | 10–20 req/sec | Yes — all streams |
| OKX | Requests/interval | 20 req / 2 sec | Yes — all streams |
| KuCoin | Points/second | 30 points / 3 sec | Yes — all streams |
| Gate.io | Requests/second | 10–100 req/sec | Yes — limited |
| Bitget | Requests/second | 10 req/sec | Yes — all streams |
| Coinbase | Requests/second | 10 req/sec (orders) | Yes — price only |
The pattern is consistent across all of them: exchanges are relatively generous on public market data and strict on authenticated order endpoints. Whether you're on Binance, OKX, or Gate.io, a bot that polling REST endpoints for price data will eventually hit limits. The architecture fix — WebSockets for live data, REST for actions — applies everywhere.
The Binance API too many requests error is almost always a design problem, not a capacity problem. The default weight allowance of 1,200 per minute is generous for a well-architected bot. The ones that hit it are polling REST endpoints that should be streaming, or fetching data at higher limits than necessary. Fix the architecture — WebSockets for live data, weight-aware REST clients for one-time fetches and order operations — and 429s become a thing of the past. The same logic applies whether you're building on Binance, Bybit, OKX, or Bitget. Rate limits exist to separate bots that are designed well from the ones that aren't. Now you know which side to be on.