Replicate API — pricing, free tier & how to get a key

Q: How do I get a Replicate API token?

Sign up at replicate.com, open Account → API tokens, and copy your token.

Replicate lets you run thousands of open-source AI models — image, video, audio and language — with one API call, billing by the second of GPU time. You don't manage servers; you pay only while a model runs. Here's what it costs and how to get your token.

Replicate API pricing (reference, July 2026)

Hardware	≈ $/second	≈ $/1,000 runs*	Best for
CPU	$0.0001	~$5	Light pre/post-processing
Nvidia T4 GPU	$0.000225	~$25	Small image / audio models
Nvidia A100 (40GB)	$0.00115	~$140	LLMs, SDXL, video
Nvidia H100	$0.00250	~$300	Large / fast inference

⚠️ Reference prices, July 2026 — Replicate updates pricing regularly. Confirm on replicate.com/pricing before budgeting. *Rough: assumes ~5s per run; real cost = seconds used × hardware rate. Cold starts add time. · Report outdated price →

✓ Last verified: 2026-07-15· Source: official provider pricing page· Auto-monitored — report change →

→ Running language models? Compare token-priced APIs on the AI API cost calculator — sometimes a hosted LLM is cheaper than per-second GPU.

Is there a free tier?

Replicate gives a small amount of free credit to try models, then it's pay-as-you-go by the second with no monthly minimum. There's no perpetual free tier, but you only pay while a model actually runs, so idle costs are zero.

How to get a Replicate API key (step by step)

1. Sign up at replicate.com (GitHub login works).
2. Open Account → API tokens and copy your token.
3. Add a payment method for usage beyond the free credit.
4. Call any model by its version id — billing starts when the run starts, stops when it ends.

Test it with a simple request:

# run a model (replace $REPLICATE_API_TOKEN)
curl -s -X POST https://api.replicate.com/v1/predictions \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"version":"MODEL_VERSION","input":{"prompt":"a cat"}}'

Cheaper alternatives

For hosted LLMs by token (often cheaper than per-second GPU), see OpenAI, DeepSeek or Together AI. For image generation specifically, Stability AI and Fal.ai compete on price; for raw model hosting, Hugging Face Inference Endpoints is the closest rival.

FAQ

How does Replicate billing work?

You pay per second of compute while a model runs, at a rate set by the hardware (CPU, T4, A100, H100). No monthly fee; idle time costs nothing.

Is Replicate cheaper than an LLM API?

For language tasks, a token-priced API like DeepSeek or GPT-4o mini is usually cheaper and simpler. Replicate wins when you need a specific open model (image, video, audio) that hosted APIs don't offer.

Does Replicate have a free tier?

A small free trial credit, then pay-as-you-go. No perpetual free tier.

How do I get a Replicate API token?

Not affiliated with Replicate. Prices are reference estimates — always verify on the official pricing page.