Published 10 June 2026 · reference prices, verify before budgeting
"Which AI API is cheapest?" has no single answer — it depends entirely on your token mix. But once you look at the actual per-token prices, clear patterns emerge. Here's the honest, numbers-first comparison.
| Model | Input | Output | Tier |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | Frontier |
| Claude Sonnet 4 | $3.00 | $15.00 | Frontier |
| Claude Opus 4 | $15.00 | $75.00 | Top-end |
| Gemini 2.5 Pro | $1.25 | $10.00 | Frontier |
| GPT-4o mini cheap | $0.15 | $0.60 | Small |
| Claude Haiku 3.5 | $0.80 | $4.00 | Small |
| Gemini 2.0 Flash cheapest | $0.10 | $0.40 | Small |
Notice the gap. Within the same provider, the small model is 15–40× cheaper than the frontier one. GPT-4o mini vs GPT-4o: about 16× cheaper on input. Gemini 2.0 Flash vs Gemini 2.5 Pro: more than 10× cheaper. That difference dwarfs any gap between providers. So the first question is never "GPT or Claude?" — it's "do I actually need a frontier model for this task?"
For classification, routing, extraction, tagging, short summaries and simple chat, a small model almost always passes the eval. Reserve the expensive models for genuinely hard reasoning, long-context analysis, or quality-critical writing.
Say you process 1,000 requests/day, each with 1,000 input tokens and 500 output tokens. Monthly cost:
| Model | Cost / month |
|---|---|
| Gemini 2.0 Flash | $9 |
| GPT-4o mini | $13.50 |
| Claude Haiku 3.5 | $84 |
| Gemini 2.5 Pro | $187 |
| GPT-4o | $225 |
| Claude Sonnet 4 | $315 |
| Claude Opus 4 | $1,575 |
Same workload, a 175× spread from cheapest to most expensive. If a small model does the job, you're choosing between ~$9 and ~$1,575/month. That's the whole game.
Cheapest overall: Gemini 2.0 Flash — and Gemini also has the most usable free tier. Cheapest from OpenAI: GPT-4o mini, the safe default for most production tasks. Best free tier: Gemini (AI Studio), no credit card needed. Strongest reasoning per dollar: debatable, but Claude Sonnet and GPT-4o trade blows; output-heavy workloads favour whichever has the lower output price for your case.
1. Default to the small model. Only escalate when an eval proves you need more.
2. Watch output, not input. Output costs 3–5× more — cap max tokens.
3. Measure before you scale. Run your real token counts through the calculator before switching a feature on for every user.
Want the exact number for your usage? Open the AI API Cost Calculator →
Reference estimates, June 2026. Not affiliated with OpenAI, Anthropic or Google.