Crypto Exchange API Rate Limits: Full Comparison Guide
Compare API rate limits on Binance, Bybit, OKX, Coinbase, and KuCoin. Learn to handle throttling, build retry logic, and keep your trading bot running.
Compare API rate limits on Binance, Bybit, OKX, Coinbase, and KuCoin. Learn to handle throttling, build retry logic, and keep your trading bot running.
You've built a trading bot that works perfectly in testing. It fires orders, pulls data, manages positions — everything runs clean. Then you deploy it against a live exchange and suddenly requests start failing with 429 errors. Welcome to rate limits: the invisible ceiling every API-driven trader eventually collides with. Understanding how each exchange enforces them — and how to code around them — is the difference between a bot that runs 24/7 and one that crashes during the most volatile moments of the market.
Rate limits exist because exchanges operate shared infrastructure. Every API call you make competes with thousands of other bots and developers hitting the same endpoints. Without limits, a poorly written bot — or a malicious one — could saturate the exchange's servers, causing latency spikes for everyone. From the exchange's perspective, rate limits are fair-use enforcement. From yours, they're a hard constraint you need to architect around before you write a single line of trading logic.
Most exchanges implement rate limits in one of two models. The first is a simple request-per-second or request-per-minute counter, where each API call subtracts from a fixed quota that refills on a rolling window. The second — used by Binance — is a weight-based system where each endpoint carries a cost in 'weight' units, and your budget is measured in total weight consumed per minute rather than raw request count. A ticker price fetch costs 1 weight; a deep order book snapshot can cost 250. One endpoint can quietly drain your entire budget if you're not paying attention.
Hitting a rate limit doesn't just slow you down — on Binance, repeated violations after receiving 429 responses can escalate to a 418 IP ban lasting minutes to hours. Structure your request layer defensively from day one, not as an afterthought.
Every major exchange has its own approach to rate limiting, and the differences are significant enough to affect how you architect your bot. Binance's weight system is the most complex but also the most flexible once you understand it. Bybit uses a simpler sliding window model that's forgiving for bursty order flow. OKX distinguishes between public and private endpoint categories with separate limits for each. Coinbase Advanced Trade applies relatively conservative uniform limits. KuCoin offers some of the most generous public limits in the industry. Here's the side-by-side breakdown.
| Exchange | REST Limit | Reset Window | Trade Endpoint | WebSocket Subs |
|---|---|---|---|---|
| Binance | 1,200 weight/min | 1 min (rolling) | Weight 1 per order | 5 streams/conn |
| Bybit | 120 req / 5 sec | 5 sec (sliding) | 10 req/s (linear) | 10 topics/conn |
| OKX | 20 req / 2 sec (market) | 2 seconds | 60 req/2s (trade) | Varies by channel |
| Coinbase | 10 req/s (public) | 1 second | 10 req/s (private) | Available |
| KuCoin | 100 req / 10 sec (public) | 10 seconds | 45 req/3s (private) | 100 topics/conn |
Binance is the most widely used exchange for algorithmic trading, and its weight system is actually developer-friendly once you internalize it. A GET /api/v3/ticker/price call costs just 1 weight — you can make 1,200 of them per minute. But pulling a deep order book via GET /api/v3/depth with limit=5000 costs 250 weight, meaning only 4 such calls per minute before you hit the ceiling. On Bybit, the model is cleaner: most private endpoints allow 120 requests per 5-second sliding window, roughly 24 requests per second sustained. OKX splits its limits by category — market data is more restrictive (20 req/2s) while trade execution is more permissive (60 req/2s), letting you allocate your budget intelligently by function.
Every exchange embeds rate limit status directly in the HTTP response headers of each API call. Reading these headers lets your bot self-regulate in real time rather than waiting for a 429 error to learn you've gone too far. Binance sends X-MBX-USED-WEIGHT-1M telling you exactly how much weight you've consumed in the current minute window. OKX sends OK-ACCESS-RATE-LIMIT-REMAINING. Bybit includes X-Bapi-Limit-Status and X-Bapi-Limit-Reset-Timestamp. Build header-reading into every request wrapper you write from the start — it's the cleanest way to stay under limits without hardcoding fragile sleep timers.
import requests
import time
def get_binance_price(symbol):
url = 'https://api.binance.com/api/v3/ticker/price'
response = requests.get(url, params={'symbol': symbol})
response.raise_for_status()
# Check how much weight we've burned this minute
used_weight = int(response.headers.get('X-MBX-USED-WEIGHT-1M', 0))
print(f'Binance weight used: {used_weight}/1200')
# Back off at 83% of limit — don't wait until you hit the wall
if used_weight > 1000:
print('Approaching rate limit — sleeping 2s')
time.sleep(2)
return response.json()
result = get_binance_price('BTCUSDT')
price = result['price']
print(f'BTC/USDT: {float(price):,.2f} USDT')
For private endpoints on OKX, each request must be authenticated with a per-request signature derived from your API key, secret, and passphrase. The signature includes a timestamp, which means it must be regenerated for every call — you cannot cache it. Once authentication is wired up correctly, the rate limit headers behave identically to how they work on Binance: read them on every response and use the values to govern your request cadence proactively.
import requests
import hmac
import hashlib
import base64
from datetime import datetime, timezone
API_KEY = 'your_okx_api_key'
SECRET_KEY = 'your_okx_secret'
PASSPHRASE = 'your_passphrase'
BASE_URL = 'https://www.okx.com'
def okx_signature(timestamp, method, path, body=''):
message = timestamp + method + path + body
mac = hmac.new(SECRET_KEY.encode(), message.encode(), hashlib.sha256)
return base64.b64encode(mac.digest()).decode()
def okx_get(path, params=None):
ts = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'
headers = {
'OK-ACCESS-KEY': API_KEY,
'OK-ACCESS-SIGN': okx_signature(ts, 'GET', path),
'OK-ACCESS-TIMESTAMP': ts,
'OK-ACCESS-PASSPHRASE': PASSPHRASE,
'Content-Type': 'application/json',
}
response = requests.get(BASE_URL + path, headers=headers, params=params)
response.raise_for_status()
# OKX tells you exactly how many calls remain in the current window
remaining = response.headers.get('OK-ACCESS-RATE-LIMIT-REMAINING', 'N/A')
print(f'OKX calls remaining in window: {remaining}')
return response.json()
# Account balance — this endpoint allows 10 req/2s
balance = okx_get('/api/v5/account/balance')
total_eq = balance['data'][0]['totalEq']
print(f'Total account equity: {float(total_eq):,.2f} USD')
The most common mistake developers make is treating rate limits as an afterthought — dropping a sleep(0.1) between requests and hoping it holds. It doesn't. A production-grade implementation needs three layers: pre-emptive throttling (slow down before hitting the limit, not after), reactive backoff (handle 429 responses gracefully when they happen anyway), and circuit breaking (pause all activity if you've triggered a ban). The following decorator pattern covers all three and works against Binance, Bybit, KuCoin, or any other exchange that returns standard HTTP rate limit responses.
import requests
import time
import logging
from functools import wraps
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def with_rate_limit_retry(max_retries=5, base_delay=1.0):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
response = func(*args, **kwargs)
if response.status_code == 429:
retry_after = float(
response.headers.get('Retry-After', base_delay * (2 ** attempt))
)
logger.warning(
f'Rate limited. Waiting {retry_after:.1f}s '
f'(attempt {attempt + 1}/{max_retries})'
)
time.sleep(retry_after)
continue
if response.status_code == 418: # Binance IP ban
logger.error('Binance IP ban triggered. Pausing 5 minutes.')
time.sleep(300)
continue
response.raise_for_status()
return response
raise RuntimeError(f'Max retries ({max_retries}) exceeded — check rate limit config')
return wrapper
return decorator
@with_rate_limit_retry(max_retries=5)
def fetch_bybit_klines(symbol, interval, limit=200):
url = 'https://api.bybit.com/v5/market/kline'
params = {
'category': 'linear',
'symbol': symbol,
'interval': interval,
'limit': limit,
}
return requests.get(url, params=params, timeout=10)
# Automatically retries on 429, handles Binance IP bans, backs off exponentially
response = fetch_bybit_klines('BTCUSDT', '60')
data = response.json()
candles = data['result']['list']
print(f'Fetched {len(candles)} candles for BTCUSDT 1H')
API rate limits aren't a problem you solve once and forget — they're an ongoing architectural constraint that grows more important as your bot becomes more sophisticated. Start by internalizing each exchange's specific model: Binance's weight system rewards endpoint efficiency, Bybit's sliding window tolerates bursty trading activity, and OKX's category-based approach lets you allocate budget by function. Build header-reading and exponential backoff into your core request layer from the first commit. Shift market data consumption from REST polling to WebSocket streams wherever possible — it's the single highest-leverage change you can make to your rate limit budget. And for signal delivery and market alerts, tools like VoiceOfChain eliminate the need to poll price data yourself entirely, freeing your entire API quota for execution where precision actually matters.