Architecture of a Trading‑Grade On‑Chain Data Layer: How Codex Indexes 700M+ Wallets in Real Time

Product teams building trading apps don’t want to run indexers.

They want fast, consistent, trading‑grade data.

Codex exists to do the former so you can ship the latter.

This post breaks down how Codex’s on‑chain data layer is architected to index hundreds of millions of wallets and tens of millions of tokens in real time, and what that means for engineers building mission‑critical crypto products.

Codex’s public claims (as of early 2026) include:

70M–75M+ tokens indexed
700M–750M+ wallets
80–100+ networks supported
Thousands of on‑chain events processed per second

All of that is exposed through a single GraphQL endpoint: https://graph.codex.io/graphql.

Why Trading‑Grade On‑Chain Data Needs a Different Architecture

Most blockchain data APIs start from node RPCs and expose lightly processed logs.

That works for explorers and hobby dashboards. It breaks down for:

High‑traffic trading interfaces
Portfolio and PnL dashboards
Market‑making and quant infra
Prediction market frontends

These use cases care about:

Latency: sub‑second or better for prices, charts, and balances
Correctness: no missing trades, double‑counted volume, or broken OHLC
Consistency: the same token looks the same across chains, DEXes, and time
Scalability: millions of users, billions of API calls per month

Codex’s architecture is built around those constraints, not as an afterthought.

At a high level, Codex does four things for you:

Ingest raw chain events from 80+ networks
Normalize and enrich them into tokens, wallets, trades, pools, and markets
Aggregate and pre‑compute trading‑grade views (OHLCV, liquidity, holders)
Serve via a unified GraphQL API with aggressive caching and streaming updates

Let’s walk through each layer.

1. Multi‑Network Ingestion: Streaming the Chain, Not Polling It

To index 700M+ wallets in real time, the first challenge is simply getting the data.

Codex runs a dedicated ingestion pipeline per network:

Full node + archive node access where needed for deep history
Event‑driven ingestion (logs, blocks, and mempool where applicable)
Chain‑specific adapters that translate raw protocol events to a common schema

Key design points:

Parallel pipelines per chain – Each network (EVM and non‑EVM) is ingested independently, then converges into a shared enrichment layer.
Backfill vs. tail handling – Historical blocks are backfilled in bulk; live blocks are consumed in streaming mode with strict ordering guarantees.
Resilience to reorgs – Ingestion tracks chain reorgs and updates downstream aggregates (bars, balances, holders) accordingly, rather than assuming immutability at N confirmations.

This makes Codex behave more like a trading data feed than a simple block explorer indexer.

2. Normalized Entity Model: From Logs to Tokens, Pairs, Wallets, and Markets

Raw blockchain data is not the product; normalized entities are.

Codex’s public GraphQL schema makes its internal model fairly clear. Everything centers around a small number of entities:

Token – token, tokens, tokenMetadata
Pairs & pools – pairMetadata, liquidity/volume endpoints
Bars & events – getTokenBars, getTokenEvents
Wallets – holders, balances, filterWallets, walletChart
Prediction markets – predictionMarkets, predictionEvents, predictionTrades, trader analytics

How raw events become entities

Decode protocol events
- Parse logs (e.g., ERC‑20 Transfer, AMM swap events, prediction market trades)
- Normalize across DEXes, bridges, launchpads, and prediction markets
Resolve references
- Map contract addresses to tokens (including launchpad‑minted assets)
- Associate swaps with pairs/pools and underlying base/quote tokens
- Attach wallet operations to wallet entities (including cross‑chain views)
Apply opinionated rules
- Classify known scam tokens and outliers
- Normalize decimals, symbols, and chain IDs
- Track token lifecycles: creation, renames, deprecations

The outcome is a set of structured, queryable objects that look the same across 80+ networks.

For you, that means:

The same getTokenBars query works on Ethereum, Solana, and any new chain Codex adds.
Wallet analytics (holders, balances, walletChart) follow a consistent shape, even when underlying chains behave differently.

3. Enrichment: Turning On‑Chain Noise Into Trading‑Grade Signals

Indexing is table stakes. Trading‑grade data requires enrichment.

Codex’s enrichment layer applies domain‑specific logic on top of decoded events.

Price and chart construction

The getTokenBars endpoint illustrates this well:

Multi‑pool aggregation – Prices are computed using weighted averages based on liquidity across all tracked pools/pairs for a token.
Liquidity‑weighted pricing – Thin pools are de‑emphasized so that one low‑liquidity trade doesn’t nuke your chart.
Filtered vs unfiltered modes – For live bars, Codex offers Filtered mode (excluding suspected bot/sandwich activity) vs Unfiltered (all trades), so you can tune for UX vs rawness.

This is the difference between:

“The chain says a swap happened at X.”
“A user‑facing chart should display a sensible OHLCV for this interval given all relevant markets.”

Aggregated liquidity and volume

Beyond OHLC, Codex pre‑computes:

Per‑pair and per‑token volume by timeframe (e.g., 5m, 1h, 1d)
Liquidity and TVL‑like metrics at pool and token level
Unique wallets interacting with a token or protocol over time

These become inputs to queries like:

“Top tokens by 24h on‑chain volume on any EVM chain”
“Newly launched tokens with fast‑rising liquidity on supported launchpads”

Metadata and scam filtering

Codex also enriches with metadata and quality filters:

Token metadata – Name, symbol, decimals, logos, links, and (via The Grid) verified org context
Scam filtering – Internal heuristics and community signals used to flag rugs, honeypots, and spoof contracts
Launchpad context – Associations to 16+ launchpads so new tokens are discoverable quickly

The net effect: the data coming out of the API is far closer to what a trading app actually needs than raw logs.

4. Wallet‑Scale Indexing: 700M+ Wallets and Cross‑Chain Views

Wallet indexing is where many DIY pipelines fall over.

To serve holders, balances, and walletChart across 700M+ wallets, Codex maintains:

Incremental state per wallet – Updated as new transfer, swap, and interaction events flow through
Token‑holder views – Who holds how much of a given token across chains
Cross‑chain wallet analytics – Aggregated balances and activity across 80+ networks

Under the hood, this requires:

Append‑only event logs for each wallet and token
Materialized views and indices for common access paths (by wallet, by token, by protocol, by time)
Sharding and partitioning by chain ID, token ID, wallet ID, and time buckets

For engineers, this matters because:

You can call holders or balances without pre‑aggregating per chain.
You can build cross‑chain portfolios and leaderboards from a single API.
You don’t have to touch RPCs or run ETL jobs when a new network is added.

5. Prediction Markets: One Schema Across Polymarket, Kalshi, and Beyond

Prediction markets are now a first‑class category inside Codex.

Instead of treating them as bespoke APIs, Codex exposes a unified schema:

predictionEvents – the underlying real‑world event (e.g., election outcome)
predictionMarkets – the individual markets/contracts for that event
predictionTrades – trade history and order‑flow details
Trader‑level analytics – positions, realized/unrealized PnL, volumes

Key architectural choices:

One query, one ranking system – Markets from Polymarket, Kalshi, and future venues share a consistent model and filtering interface.
Trading‑grade performance – Same low‑latency guarantees as token data, suitable for real‑time prediction market frontends.

For builders of prediction market apps, this effectively acts as the prediction market data API, saving you from integrating multiple venues independently.

6. Serving Layer: GraphQL, Caching, and Real‑Time Streams

Once data is enriched, Codex’s serving layer makes it consumable for products.

All of this is fronted by a single endpoint:

HTTP: https://graph.codex.io/graphql
WebSocket: wss://graph.codex.io/graphql

GraphQL as the boundary

Codex’s GraphQL schema provides:

Strongly typed entities – tokens, wallets, bars, markets, trades, events
Filter‑rich queries – 100+ filters across endpoints (time, chain, volume, liquidity, holders, etc.)
One schema across all networks – you pass network: { chainId } rather than using a different API per chain

This matters for both backend and frontend teams:

You can evolve queries without changing endpoints.
You can co‑locate multiple data products (prices, charts, prediction markets) in a single gateway.

Caching and pre‑computation

Trading apps hammer a small number of hot paths: latest prices, recent bars, top movers, wallet balances.

Codex optimizes for those patterns via:

Time‑bucketed aggregation stores for bars and volume
Materialized leaderboards for top tokens and markets
Multi‑layer caching (in‑memory + distributed) keyed by query signature and parameters

In practice, that’s how Codex hits:

Sub‑second response times for common queries
1,000+ RPS capacity on higher‑tier plans, with higher internal ceilings

For example, TradingView’s public case study cites:

~15 seconds faster responses vs their prior stitched‑together stack
2M+ additional tokens indexed
200+ engineering hours saved by consolidation

Real‑time subscriptions and fan‑out

Codex supports streaming updates using GraphQL subscriptions for:

Live token prices and bars (onBarsUpdated)
Real‑time trades and token events
Prediction market trades and order‑flow updates

Operational details that matter:

Subscriptions are billed per message, not just per connection.
A typical Growth plan offers:
- 1M requests/month
- ~300 RPS
- WebSockets + webhooks
- 300 concurrent connections
Internal guidance: ~100 tokens per connection is a practical limit depending on activity.

Design implication:

Use backend fan‑out and caching (e.g., your own gateway) for very high‑traffic frontends, rather than giving every client its own direct Codex subscription.

7. Consistency, Correctness, and Failure Modes

For mission‑critical trading and wallet apps, you need to know how the data behaves under stress.

Codex’s architecture is opinionated about consistency and correctness:

Event ordering and reorg handling

Per‑chain ordering guarantees – Within a given chain and block height, events are processed deterministically.
Reorg‑aware aggregates – If a block is reorged, dependent aggregates (bars, balances, holders) are updated, not left in an inconsistent state.

You get:

Stable OHLCV series for historical intervals
Balances and holders that reflect the canonical chain state

Idempotent ingestion and enrichment

Idempotent processors ensure that replayed or duplicated events don’t double‑count volume or balances.
Versioned enrichment logic ensures that improvements to heuristics (e.g., better sandwich detection) can be rolled out without corrupting existing aggregates.

Service‑level behavior

Graceful degradation – If a single network is degraded, Codex continues to serve data for others.
Fail‑fast errors – Misconfigured queries or unsupported tokens fail clearly, not silently.

For builders, this translates into fewer edge‑case incidents where a user’s chart or PnL “looks wrong” because of subtle data bugs.

8. What This Lets You Delete From Your Roadmap

Codex’s architecture effectively removes a category of work from your backlog.

With Codex, you typically do not need to:

Run your own RPC nodes and archival infrastructure
Build custom indexers for each new chain or DEX
Maintain ETL pipelines and backfills for bars, holders, and TVL
Stitch together multiple vendors for
- Price feeds
- Token metadata
- Wallet state
- Prediction markets

Instead, your engineering focus can move to:

Product UX and differentiating features
Strategy and analytics on top of Codex’s normalized entities
Performance tuning around a single, well‑understood external dependency

This is why large teams like Coinbase, TradingView, Uniswap, Magic Eden, Rainbow, MoonPay, and others treat Codex as critical infrastructure.

9. Practical Design Tips When Building on Codex

If you’re evaluating Codex (or architecting around any trading‑grade on‑chain data API), a few practical patterns help.

1) Separate hot and cold paths

Use subscriptions or short‑interval queries for:
- Live prices and recent trades
- Active prediction markets
Use cached or pre‑fetched queries for:
- Historical charts beyond the last few days
- Token metadata and The Grid’s verified info

2) Cache by query signature

Treat Codex GraphQL queries as pure functions of their variables.
Cache results in your edge or backend by:
- queryName
- variables (chainId, token address, interval)

3) Design for rate and connection limits

Centralize Codex access behind an internal gateway.
Aggregate UI needs into a smaller number of shared subscriptions.
Use webhooks for server‑side reactions (e.g., threshold alerts, portfolio rebalances).

4) Be explicit about consistency requirements

For trading actions, combine Codex data with direct on‑chain confirmations.
For UX‑level charts and stats, rely fully on Codex’s enriched aggregates for performance.

FAQ: Building on a Trading‑Grade On‑Chain Data Layer

1. How is Codex different from a generic blockchain node or RPC provider?

A node or RPC gives you raw blocks, transactions, and logs for a single chain.

Codex provides normalized, enriched entities across 80+ networks: tokens, prices, OHLCV, holders, wallets, liquidity, and prediction markets, all via one GraphQL schema.

You don’t manage nodes, indexers, or ETL; you just consume structured data.

2. How fast is Codex for real‑time trading and portfolio apps?

Codex is built for trading‑grade latency.

Public claims and customer case studies indicate:

Sub‑second response times for hot queries
1,000+ RPS supported on higher‑tier plans
TradingView saw ~15 seconds faster responses compared to its prior multi‑vendor setup.

For UI responsiveness, you typically use:

HTTP GraphQL queries for initial loads
WebSocket subscriptions for live updates (onBarsUpdated, trade streams)

3. How does Codex handle new chains, tokens, and launchpads?

Codex’s ingestion layer is designed to add new networks and launchpads with minimal surface change.

When a new chain or launchpad is added:

It’s plugged into the existing normalized entity model.
You continue using the same GraphQL queries with a different network: { chainId } value.

This is why Codex can cover 70M–75M+ tokens and 16+ launchpads without forcing you to change APIs.

4. Can Codex support a high‑traffic CEX or major wallet product?

Yes. Codex already powers apps like Coinbase, TradingView, Uniswap, Magic Eden, Rainbow, MoonPay, Farcaster, and pump.fun.

Case studies highlight:

Billions of API requests per month across customers
Individual products doing hundreds of millions of requests/month

If you’re sizing a deployment, Codex’s Growth and higher‑tier plans expose concrete limits (RPS, connections, keys), and their team typically works with you on custom SLAs.

5. Why not just build my own on‑chain data pipeline in‑house?

You can—but you’re signing up for:

Running and maintaining nodes across 80+ networks
Writing and operating custom indexers for each protocol and DEX
Handling upgrades, reorgs, chain outages, and new ecosystems
Designing your own enrichment, OHLCV, and prediction market schemas

Codex’s founding story is exactly this: they tried to build a trading platform, got blocked by raw or wrong data, and ended up spending years on infra instead.

Most teams prefer to buy a mature, infrastructure‑grade data layer that’s already powering industry‑leading products, and focus their engineers on product.

If you’re evaluating the most reliable on‑chain data APIs for trading apps or prediction market frontends, Codex’s architecture is designed to be that trading‑grade source of truth. You can explore the full schema and examples in the docs at docs.codex.io.

How to Build a Polymarket Trading Bot with Real-Time Data

Step-by-step guide to building a Polymarket trading bot. Real-time odds monitoring, market scanning, and signal detection with Python code examples.

Kalshi API: How to Access Prediction Market Data for Developers

Access Kalshi prediction market data via API. Compare Kalshi's native API vs enriched data through Codex. Code examples in Python and TypeScript.

Polymarket API: How to Get Real-Time Prediction Market Data

Get real-time Polymarket prediction market data via API. Step-by-step guide with Python and TypeScript code examples. Odds, volume, and historical data.