290+ models · 40+ providers · real-time pricing

Choose the right inference
for your customer support

Compare LLM providers. Size GPUs for self-hosting.
Understand the economics behind every token.

Tools

Three ways to reason about AI costs

Pick your use case — customer support, RAG, code gen — and compare per-call and monthly costs across all providers.

System prompt + user context split · prompt caching estimates

Estimate VRAM for any model — weights, KV-cache, precision, safety margins. Find the right accelerator.

A100, H100, L40S, 4090 · side-by-side comparison

How caching, batching, quantization, and routing affect your real spend. Data center trends and supply analytics.

Cost optimization · industry benchmarks · hardware supply

Experiment with prompt caching, model routing, and quantization strategies. See cost impact before you ship.

Test models side-by-side. Compare response quality, latency, and cost in real-time.

What you can do

Compare inference costs for your exact workload

Estimate VRAM requirements for inference and training

Compare accelerators — real memory and bandwidth

Understand trade-offs: context, batch size, precision

Connect model choices to cost and scalability

Philosophy

A technical planning tool for people who design, deploy, and scale LLM systems. Focused on how models actually consume memory and bandwidth.

No benchmarks for marketing. No abstract performance scores.
All assumptions are explicit and inspectable.

ML EngineersCTOsPlatform TeamsFounders