Binance API Historical Data: Complete Trader's Guide
Learn how to fetch, parse, and analyze Binance API historical data with Python. Covers endpoints, limits, futures data, and real trading use cases.
Learn how to fetch, parse, and analyze Binance API historical data with Python. Covers endpoints, limits, futures data, and real trading use cases.
If you're building a trading bot, running a backtest, or just trying to understand how a coin behaved during a specific market event, historical OHLCV data is the starting point. Binance offers one of the most accessible APIs for pulling this data — free, fast, and well-documented. But there are gotchas: rate limits, pagination quirks, endpoint differences between spot and futures, and timestamp handling that will silently break your data if you're not careful. This guide walks through everything you need to pull clean historical data from Binance, with real Python code you can run today.
Binance API historical price data lives primarily in the `/api/v3/klines` endpoint for spot markets. Each kline (candlestick) returns open time, open, high, low, close, volume, close time, quote asset volume, number of trades, taker buy base/quote volume, and an ignored field. That's 12 values per candle — more than most traders use, but useful for volume analysis.
The endpoint accepts a `symbol` (like BTCUSDT), an `interval` (1m, 5m, 1h, 1d, etc.), and optional `startTime` and `endTime` in milliseconds. The `limit` parameter controls how many candles you get back — the Binance API historical data limit is 1000 candles per request. That's the hard ceiling. If you want more, you need to paginate by shifting your `startTime` forward using the last candle's close time.
Timestamps on Binance are Unix milliseconds, not seconds. Passing seconds will silently return wrong data — you'll get candles from 1970. Always multiply your Unix timestamp by 1000, or use int(datetime.timestamp() * 1000).
You don't need an API key to pull binance api historical data python — the klines endpoint is public. Authentication is only required for account-level endpoints like orders and balances. Here's a minimal working example to pull 30 days of hourly BTC/USDT candles:
import requests
import pandas as pd
from datetime import datetime, timedelta
BASE_URL = "https://api.binance.com"
def get_klines(symbol: str, interval: str, start_dt: datetime, end_dt: datetime) -> pd.DataFrame:
url = f"{BASE_URL}/api/v3/klines"
start_ms = int(start_dt.timestamp() * 1000)
end_ms = int(end_dt.timestamp() * 1000)
all_candles = []
while start_ms < end_ms:
params = {
"symbol": symbol,
"interval": interval,
"startTime": start_ms,
"endTime": end_ms,
"limit": 1000,
}
resp = requests.get(url, params=params, timeout=10)
resp.raise_for_status()
data = resp.json()
if not data:
break
all_candles.extend(data)
# advance past the last candle's close time
start_ms = data[-1][6] + 1
columns = [
"open_time", "open", "high", "low", "close", "volume",
"close_time", "quote_volume", "trades",
"taker_buy_base", "taker_buy_quote", "ignore"
]
df = pd.DataFrame(all_candles, columns=columns)
df["open_time"] = pd.to_datetime(df["open_time"], unit="ms")
df[["open", "high", "low", "close", "volume"]] = df[
["open", "high", "low", "close", "volume"]
].astype(float)
return df[["open_time", "open", "high", "low", "close", "volume"]]
if __name__ == "__main__":
end = datetime.utcnow()
start = end - timedelta(days=30)
df = get_klines("BTCUSDT", "1h", start, end)
print(df.head())
print(f"Total candles: {len(df)}")
This handles pagination automatically. Notice we advance `start_ms` using `data[-1][6] + 1` — that's the close time of the last candle plus one millisecond, which prevents duplicate candles at the boundary. Without this, you'll either loop forever or get overlapping data.
Binance enforces rate limits based on request weight. The klines endpoint has a weight of 2 per call (when limit ≤ 1000). The default limit is 1200 weight per minute. That gives you 600 paginated kline requests per minute — more than enough for most use cases, but easy to blow through if you're pulling data for 100 symbols at once.
When you hit the limit, Binance returns HTTP 429. If you keep hammering after that, you'll get a 418 IP ban. Here's a safer fetching pattern with exponential backoff:
import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def make_session() -> requests.Session:
session = requests.Session()
retry = Retry(
total=5,
backoff_factor=2,
status_forcelist=[429, 500, 502, 503, 504],
respect_retry_after_header=True,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount("https://", adapter)
return session
def safe_get(session: requests.Session, url: str, params: dict) -> list:
for attempt in range(5):
try:
resp = session.get(url, params=params, timeout=10)
if resp.status_code == 429:
retry_after = int(resp.headers.get("Retry-After", 60))
print(f"Rate limited. Sleeping {retry_after}s")
time.sleep(retry_after)
continue
resp.raise_for_status()
return resp.json()
except requests.exceptions.RequestException as e:
wait = 2 ** attempt
print(f"Error: {e}. Retrying in {wait}s")
time.sleep(wait)
raise RuntimeError("Max retries exceeded")
Always check the X-MBX-USED-WEIGHT-1M header in the response. It tells you how much of your per-minute weight budget you've consumed. Log it during development — you'll catch runaway loops before they get you banned.
Spot and futures use different base URLs and slightly different endpoints. For binance futures api historical data, the base is `https://fapi.binance.com` for USDT-margined perpetuals, and `https://dapi.binance.com` for coin-margined contracts. The klines endpoint path is the same — `/fapi/v1/klines` — but the symbols differ: use `BTCUSDT` for USDⓈ-M futures, `BTCUSD_PERP` for coin-margined.
Futures candles also include two extra fields beyond spot: `taker_buy_base_asset_volume` and `taker_buy_quote_asset_volume`. These are useful for measuring buying pressure — when taker buy volume is consistently high, it signals aggressive long positioning, which shows up as directional signals on platforms like VoiceOfChain before it becomes obvious on the chart.
import requests
import pandas as pd
from datetime import datetime, timedelta
FUTURES_URL = "https://fapi.binance.com"
def get_futures_klines(symbol: str, interval: str, days: int = 7) -> pd.DataFrame:
url = f"{FUTURES_URL}/fapi/v1/klines"
end_ms = int(datetime.utcnow().timestamp() * 1000)
start_ms = int((datetime.utcnow() - timedelta(days=days)).timestamp() * 1000)
all_candles = []
while start_ms < end_ms:
params = {
"symbol": symbol,
"interval": interval,
"startTime": start_ms,
"endTime": end_ms,
"limit": 1000,
}
resp = requests.get(url, params=params, timeout=10)
resp.raise_for_status()
data = resp.json()
if not data:
break
all_candles.extend(data)
start_ms = data[-1][6] + 1
columns = [
"open_time", "open", "high", "low", "close", "volume",
"close_time", "quote_volume", "trades",
"taker_buy_base", "taker_buy_quote", "ignore"
]
df = pd.DataFrame(all_candles, columns=columns)
df["open_time"] = pd.to_datetime(df["open_time"], unit="ms")
numeric_cols = ["open", "high", "low", "close", "volume", "taker_buy_base"]
df[numeric_cols] = df[numeric_cols].astype(float)
# compute taker buy ratio as a pressure indicator
df["buy_pressure"] = df["taker_buy_base"] / df["volume"]
return df
if __name__ == "__main__":
df = get_futures_klines("BTCUSDT", "1h", days=14)
print(df[["open_time", "close", "volume", "buy_pressure"]].tail(10))
Comparing platforms: Bybit and OKX also expose similar perpetual klines APIs, but Binance's futures API has deeper history and higher rate limits. For multi-exchange backtests, it's common to use Binance as the primary data source and validate signal quality against Bybit's equivalent markets.
For backtesting strategies over years of data, calling the API repeatedly is slow and wasteful. Binance offers bulk historical data downloads via their data portal at `data.binance.vision` — monthly and daily zip files of klines, trades, and aggTrades for every symbol. These are the same candles you'd get from the API, just pre-packaged. Downloading a year of 1-minute BTC/USDT klines takes seconds versus hours of API calls.
For ongoing data collection, a simple SQLite or Parquet approach works well at small scale. At larger scale, ClickHouse handles time-series OHLCV data extremely well — sub-second queries over billions of rows are normal. VoiceOfChain uses ClickHouse internally to process order flow from multiple exchanges including Binance, OKX, and Gate.io, enabling the kind of real-time signal generation that would choke a Postgres instance.
| Method | Max History | Rate Limited | Best For |
|---|---|---|---|
| REST API /klines | Full (varies by symbol) | Yes, 1200 weight/min | Recent data, live feeds |
| data.binance.vision bulk | Jan 2017+ | No | Backtests, initial loads |
| WebSocket streams | Real-time only | Connection limit | Live candle updates |
If you're coming from traditional finance and wondering about google finance api historical data or yahoo finance api historical data csv workflows — the structure is similar, but crypto data is tick-by-tick 24/7 and volumes are orders of magnitude larger per asset. The Binance REST API historical data approach maps directly to what you'd do with yfinance or pandas-datareader, except there are no market holidays and the granularity goes down to 1 second on some endpoints.
The Binance API is one of the best free data sources available to retail algo traders. The klines endpoint is reliable, the docs are solid, and the rate limits are generous enough for most use cases. The main traps are timestamp handling (always milliseconds), the 1000-candle-per-request ceiling, and the difference between spot and futures base URLs. Get those three things right and you'll have clean historical data flowing into your backtests within an hour.
For live trading signal generation — beyond historical analysis — platforms like VoiceOfChain aggregate real-time order flow from Binance, OKX, Bybit, and other major venues, translating raw market microstructure into actionable signals without requiring you to build and maintain the data pipeline yourself. Historical data tells you what happened; real-time signals tell you what's happening right now.