← All free toolsFree · live calculator

How much will your AI cost?

Real 2026 per-million-token prices for Claude, GPT-5, and Gemini. Real production assumptions. A monthly estimate in 30 seconds — plus opinionated recommendations on where you’re probably overpaying.

Start from a preset

Your workload

5,000

Inference calls to the LLM, averaged over a typical day

type a precise value
1,500 tok

System prompt + user input + retrieved context (1 token ≈ 0.75 words)

type a precise value
350 tok

What the model actually generates back

type a precise value
Model tier
Prompt caching
Retrieval

Estimated cost

$973 / mo

$11,675 / year · $0.006 per request

Input / mo

228.0M

Output / mo

53.2M

Embedded / mo

30.4M

Breakdown

  • Input tokens (uncached)$285
  • Input tokens (cached)$45.60
  • Output tokens$638
  • Embedding for retrieval$3.95

Optimized estimate

$295 / mo

$678 / mo with the recommendations below

Send this to hello@mini-trends.com →

Pre-fills the email with your inputs and result

Where you’re probably overpaying

Heuristics from running this exact optimization on dozens of production systems. Most clients realize 40–80% of these savings within a single sprint.

Route easy calls to a small model

save ~$365/mo

Classification, routing, intent detection, and lightweight extraction can almost always run on Haiku / Nano / Flash without quality loss. Build a router and split traffic.

Cache more aggressively

save ~$168/mo

You are caching the system prompt. The next-largest stable region — retrieved-doc context — can usually be cached too. Look at your prompt structure for cache-friendly ordering.

Use batch APIs for non-realtime work

save ~$146/mo

For evals, summarization jobs, and async pipelines, batch APIs from Anthropic / OpenAI / Google price 50% off. If even 30% of your traffic is non-interactive, this stacks with the other savings.

How this calculator works (and what it doesn’t cover)

Token costs are computed against per-million-token prices roughly representative of frontier, mid-tier, and small models from Anthropic, OpenAI, and Google as of early 2026. Cached input is priced at the provider’s reduced cached-read rate. Embeddings use a typical $0.13/1M-token rate.

Not included: vector-database hosting, GPU/inference if self-hosting, observability infra, evaluation runs, agentic-loop multipliers (an agent that iterates 3× pays 3× the per-request cost shown here), or human review costs. In real engagements we model these explicitly — this calculator is a first-order estimate to anchor the conversation, not the proposal itself.

Want this done on your real system?

We do AI cost optimization as a Discovery Sprint.

Two weeks, fixed price ($35k). We instrument your production system, identify the real savings, ship the changes, and hand you a written report. Most engagements pay back the engagement fee in under three months.