There are hundreds of tutorials on how to create a trading bot. Most of them skip the hardest part: getting reliable, real-time data. They walk you through connecting to Binance with CCXT, writing a moving average crossover, and placing an order. What they don't cover is how to get the data that decides what order to place, and whether that data is fresh enough to be useful.
Your bot is only as good as its data. Ten-second-old price data in a market that moves in milliseconds is not just useless. It's dangerous. You're not arbitraging. You're donating money to someone whose data pipeline is faster than yours.
This guide focuses entirely on the data layer: what data a crypto trading bot needs, how to get it in real time, how data delivery methods compare, and how to architect a data pipeline that separates your data infrastructure from your trading strategy. If you're building a DeFi or DEX trading bot, this is the guide every other tutorial skips.
What Data Does a Crypto Trading Bot Need?
Before writing a single line of strategy logic, you need to map out the data your bot requires. Most developers underestimate this. Here's a breakdown by data type, why it matters, and how fresh it needs to be:
Real-time prices
Current token price in USD. This drives every buy/sell decision your bot makes. If your price data is stale by even a few seconds, your bot is trading on a reality that no longer exists. For DEX trading, prices must be calculated from on-chain swap data across all liquidity sources, not pulled from a centralized aggregator with a 20-second delay.
OHLCV candle data
Open, high, low, close, and volume aggregated into time-based candles. This is the foundation of technical analysis and signal generation. You need reliable historical data for backtesting and real-time candle updates for live trading. One-minute and five-minute candles are the most common resolutions for bot strategies.
Liquidity data
Pool depth and available liquidity across DEX pools. This determines two critical things: how much slippage your trade will incur, and how large a position you can take without moving the market against yourself. A bot that ignores liquidity will size positions badly and bleed money on slippage.
Transaction stream
A live feed of individual trades happening on-chain. This is how you detect whale activity. A single wallet buying $500K of a token is a signal. It's also essential for MEV strategies and for understanding the real-time order flow on a token.
Token metadata
Contract address, name, symbol, DEX listings, and launchpad status. This is slower-moving data, but it's critical for token discovery and filtering. If your bot trades new tokens, you need to know what just launched, on which DEX, and with what initial liquidity.
Wallet and holder data
Who holds a token, how much they hold, and when they bought. This powers copy-trading bots that follow smart money wallets, whale tracking systems, and holder concentration analysis. A token where one wallet holds 80% of supply is a different risk profile than one with thousands of holders.
Data Delivery Methods: Polling vs WebSocket vs Webhook
How you receive data matters as much as what data you receive. There are three approaches, and choosing the wrong one is one of the most common mistakes when building a trading bot.
REST API polling
You send a request, you get a response. Simple and familiar. The problem: you're asking for data on a timer. If you poll every second, your best-case latency is one second, plus the round-trip time of the HTTP request. And you're paying for every request whether the data changed or not.
The math gets ugly fast. Polling one token's price every second is 86,400 requests per day. Ten tokens? 864,000. A hundred tokens? 8.6 million requests per day, just for price data. That burns through rate limits and API budgets before you've even looked at transaction data or OHLCV.
WebSocket subscriptions
You open a persistent connection. The server pushes data to you the instant something changes. No wasted requests, no polling delay. Sub-second latency is standard. This is what production trading bots use for any data that needs to be real-time.
WebSockets do require more engineering than a simple REST call. You need to handle reconnection, buffering, and connection lifecycle. But the performance difference is night and day.
Webhooks
The server sends an HTTP POST to your endpoint when a specific event occurs. Good for discrete events: a new token launched, a whale made a large trade, liquidity dropped below a threshold. Less suitable for continuous data streams like price feeds, because each event triggers a separate HTTP request to your server.
The right combination
Use WebSockets for price data and transaction streams, anything that needs to be continuous and real-time. Use REST for historical data and one-time lookups: backtesting OHLCV, token metadata, holder snapshots. Use webhooks for event-driven triggers like new token launches, liquidity alerts, and threshold breaches.
Most production bots use all three.
Choosing Your Data Provider
The data provider you choose determines your bot's ceiling. Here's what to look for:
- Sub-second data freshness. Not 10 seconds, not 20 seconds. If your provider is delivering data with a 20-30 second delay, you are trading blind in any fast-moving market.
- WebSocket support with high uptime. Your bot runs 24/7. Your data feed needs to run 24/7. Look for 99.9%+ uptime guarantees on WebSocket connections.
- Multi-chain support. Opportunities exist across Ethereum, Solana, Base, Arbitrum, and dozens of other chains. Locking yourself to one chain limits your bot's potential.
- Enriched data. You want USD pricing out of the box, not raw swap amounts that you have to cross-reference against stablecoin pools yourself. Enriched data (USD prices, aggregated volume, liquidity metrics) saves months of engineering.
- Reasonable rate limits for bot workloads. Bots are high-frequency consumers. Make sure your provider's rate limits can handle continuous data ingestion, not just occasional dashboard queries.
Common mistakes
Using CoinGecko or CoinMarketCap for bot data. Both have 20-30 second data delays on paid tiers, longer on free. That's fine for a portfolio tracker or a price widget. For a trading bot, it means your bot is making decisions based on a market that moved on 20 seconds ago. For a deeper comparison, see our guide to the best crypto APIs.
Building your own blockchain indexer. It works until it doesn't. You'll spend months building it, and then you'll spend every week maintaining it: handling chain forks, DEX contract upgrades, RPC node failures, and backfill jobs. Unless blockchain indexing is your actual product, it's a distraction. See our explainer on blockchain indexers for the full build-vs-buy analysis.
Using exchange APIs only. If you're trading on CEXes with CCXT and the Binance API, exchange APIs are sufficient. But if you're trading on DEXes, you miss on-chain activity, DEX-only tokens, and the thousands of new tokens that launch daily on platforms like Pump.fun. The DeFi data layer is fundamentally different from CEX market data.
Architecture: The Data Pipeline for a Trading Bot
Here's the architecture that separates hobby bots from production bots:
Data Sources (API)
|-- WebSocket: real-time prices, transactions
|-- REST: historical OHLCV, token discovery
|-- Webhook: new token launch alerts
|
Local Cache / State
(in-memory price book, candle buffer)
|
Strategy Engine
(signal generation, risk checks)
|
Execution Layer
(DEX transactions via RPC or aggregator)
The critical design principle: your data layer and strategy layer should be completely separate. Your data pipeline should work identically regardless of what strategy you're running. It ingests prices, transactions, and candles. It maintains a local state. And it exposes that state to whatever strategy engine sits on top.
This separation means you can swap strategies without touching your data infrastructure. You can run multiple strategies on the same data feed. And you can test new strategies against historical data from the same pipeline.
The local cache layer is often overlooked. Your bot should not query the API every time it needs a price. Instead, WebSocket updates write to an in-memory price book. Your strategy reads from that local state. This eliminates redundant requests and gives your strategy engine zero-latency access to the latest data.
How to Build a Crypto Trading Bot Data Layer with Codex
Let's build out the data layer. The examples below use the Codex API, which provides enriched blockchain data across 80+ networks with sub-second freshness, WebSocket subscriptions, and a TypeScript SDK. Note: queries work on every plan, while WebSocket subscriptions require a Growth plan.
Setup
npm install @codex-data/sdk graphql-tag
import { Codex } from "@codex-data/sdk";
import { gql } from "graphql-tag";
const sdk = new Codex(process.env.CODEX_API_KEY!);
Sign up for a free API key at dashboard.codex.io/signup. The free tier includes 10,000 requests per month, enough to build and test your data pipeline before committing to a paid plan.
Get current price (REST)
The simplest data point: what's the current USD price of a token?
const { getTokenPrices } = await sdk.query(gql`
query {
getTokenPrices(inputs: [{
address: "So11111111111111111111111111111111111111112",
networkId: 1399811149
}]) {
priceUsd
address
}
}
`);
You can pass up to 25 token addresses in a single request. Useful for monitoring a basket of tokens without burning through rate limits.
Subscribe to real-time price updates (WebSocket)
For a trading bot, you don't want to poll for prices. You want prices pushed to you the instant they change.
sdk.subscribe(gql`
subscription {
onPriceUpdated(
address: "So11111111111111111111111111111111111111112"
networkId: 1399811149
) {
priceUsd
timestamp
}
}
`, {
next: (data) => {
const price = data.data.onPriceUpdated;
console.log(`SOL: $${price.priceUsd} at ${new Date(price.timestamp * 1000).toISOString()}`);
// Feed into your strategy engine
}
});
This opens a persistent WebSocket connection and pushes every price update for SOL directly to your callback. No polling, no wasted requests, sub-second latency.
Fetch OHLCV for backtesting (REST)
Before running a strategy live, you need historical candle data for backtesting. The getBars query returns OHLCV data at any resolution from 1-second to 7-day candles:
const { getBars } = await sdk.query(gql`
query {
getBars(
symbol: "So11111111111111111111111111111111111111112:1399811149"
from: ${Math.floor(Date.now() / 1000) - 86400 * 7}
to: ${Math.floor(Date.now() / 1000)}
resolution: "60"
) {
o h l c volume t
}
}
`);
This returns one week of 1-hour candles for SOL (resolution "60" means 60 minutes). You get up to 1,500 bars per request. For longer backtesting windows, paginate with the from and to parameters.
Stream live transactions
Whale detection, copy trading, and MEV strategies all require a live transaction feed. The onTokenEventsCreated subscription streams trades for a token across all of its pools as they happen on-chain:
sdk.subscribe(gql`
subscription {
onTokenEventsCreated(input: {
address: "So11111111111111111111111111111111111111112"
networkId: 1399811149
}) {
events {
eventDisplayType
maker
timestamp
data {
... on SwapEventData {
priceUsd
priceUsdTotal
}
}
}
}
}
`, {
next: (data) => {
for (const event of data.data.onTokenEventsCreated.events) {
if (Number(event.data?.priceUsdTotal) > 10000) {
console.log(`Whale alert: ${event.eventDisplayType} $${event.data.priceUsdTotal} by ${event.maker}`);
}
}
}
});
Each event includes the trade direction (buy or sell), the maker wallet address, the USD value, and the execution price. Filter for large trades to detect whale activity, or track specific wallet addresses for copy trading.
Discover new tokens
For bots that trade newly launched tokens, use the filterTokens query to scan for recent launches. You can filter by network, launchpad, minimum liquidity, and more. For a detailed walkthrough of new token scanning, including WebSocket subscriptions for launch events, see our Pump.fun API guide.
Common Pitfalls
After the data layer is built, these are the mistakes that still catch developers:
Stale data kills bots. This bears repeating. If your price data is 10 seconds old in a market with 400ms block times, you are not running a trading strategy. You are running a donation strategy. Every second of staleness erodes your edge. If you wouldn't trade with 10-second-old data manually, don't let your bot do it either.
Rate limits are real. Polling 100 tokens every second is 8.6 million requests per day. Even generous rate limits will buckle under that load, and your API bill will be enormous. WebSockets solve this completely: one connection, all the data, pushed to you as it changes.
Single-chain thinking is limiting. The same token can trade on Ethereum, Base, and Arbitrum with different prices and liquidity. Arbitrage opportunities exist across chains constantly. Building your bot against a single-chain data source means rebuilding when you want to expand. A multi-chain data provider like Codex lets you add new chains with a parameter change, not a rewrite. For Solana-specific data patterns, see our Solana API guide.
Don't build your own indexer. Unless blockchain indexing is your core product, maintaining your own indexing infrastructure is a full-time engineering job that distracts from building your actual trading system. Raw RPC data requires you to decode transactions, calculate USD prices from swap amounts, aggregate candles, and handle chain-specific edge cases. An enriched API does all of this for you.
Next Steps
If you're ready to build:
- Sign up for Codex at dashboard.codex.io/signup. The free tier gives you 10,000 requests per month, enough to prototype your data pipeline and validate your approach.
- Start with one data feed. Subscribe to real-time price updates for a single token. Get comfortable with the WebSocket connection lifecycle.
- Build your strategy engine on top. Keep it separate from the data layer. Start with a simple signal (moving average crossover, price threshold) and iterate.
- Expand to more tokens and data types as you validate your strategy. Add transaction streams for whale detection, OHLCV for technical indicators, liquidity data for position sizing.
The data infrastructure is the foundation. Get it right, and every strategy you build on top of it benefits. Get it wrong, and no strategy can save you.
FAQ
What data does a crypto trading bot need?
At minimum: real-time prices (sub-second freshness), OHLCV candle data for technical analysis, and liquidity data for position sizing and slippage estimation. More advanced bots also use transaction streams (whale detection, order flow analysis), holder data (copy trading, smart money tracking), and token metadata (new token discovery and filtering).
Should I use WebSocket or REST API for a trading bot?
Use WebSockets for any data that needs to be real-time: prices, transactions, and live trade events. Use REST for one-time lookups like historical candles for backtesting, token metadata, and holder snapshots. Most production bots use both: WebSockets for the live data feed, REST for historical and reference data.
How fast does my price data need to be for a trading bot?
For DeFi and DEX trading, sub-second is the standard for production bots. CoinGecko's 20-30 second delay is acceptable for a portfolio tracker but dangerous for automated trading. Markets that move in milliseconds require data infrastructure that keeps up. If your data is slower than your competitors' data, you're systematically trading at a disadvantage.
Is it legal to use a crypto trading bot?
In most jurisdictions, yes. Automated trading is legal for cryptocurrency markets and is widely used by individuals and institutions. However, regulations vary by country, and certain strategies (like front-running on regulated exchanges) may have legal implications. This is not legal advice. Consult a lawyer for your specific situation and jurisdiction.
How much does it cost to build a crypto trading bot?
The data infrastructure can range from free (Codex free tier: 10,000 requests/month) to a few hundred dollars per month for production workloads. Codex's Growth plan starts at $350/month for 1 million requests and includes the WebSocket subscriptions a bot needs. The main costs are data API access, server hosting (a small VPS is sufficient for most bots), and your own development time. Building your own blockchain indexer instead of using an API can cost $100K+ per year in infrastructure alone.
What is the best programming language for a crypto trading bot?
TypeScript and Python are the two most popular choices. TypeScript has the advantage of first-class SDK support from most data providers (including Codex) and strong async/WebSocket handling built into the language. Python is popular for its data analysis libraries (pandas, numpy) and backtesting frameworks. For the data layer, either works well. Choose whichever you're more productive in.
Ready to build? Get a free API key and follow along with the code examples in this guide.

