HomeAPIs › Replicate

Replicate API — pricing, free tier & how to get a key

Replicate lets you run thousands of open-source AI models — image, video, audio and language — with one API call, billing by the second of GPU time. You don't manage servers; you pay only while a model runs. Here's what it costs and how to get your token.

Replicate API pricing (reference, June 2026)

Hardware≈ $/second≈ $/1,000 runs*Best for
CPU$0.0001~$5Light pre/post-processing
Nvidia T4 GPU$0.000225~$25Small image / audio models
Nvidia A100 (40GB)$0.00115~$140LLMs, SDXL, video
Nvidia H100$0.00250~$300Large / fast inference
⚠️ Reference prices, June 2026 — Replicate updates pricing regularly. Confirm on replicate.com/pricing before budgeting. *Rough: assumes ~5s per run; real cost = seconds used × hardware rate. Cold starts add time.

→ Running language models? Compare token-priced APIs on the AI API cost calculator — sometimes a hosted LLM is cheaper than per-second GPU.

Is there a free tier?

Replicate gives a small amount of free credit to try models, then it's pay-as-you-go by the second with no monthly minimum. There's no perpetual free tier, but you only pay while a model actually runs, so idle costs are zero.

How to get a Replicate API key (step by step)

1. Sign up at replicate.com (GitHub login works).
2. Open Account → API tokens and copy your token.
3. Add a payment method for usage beyond the free credit.
4. Call any model by its version id — billing starts when the run starts, stops when it ends.

Test it with a simple request:

# run a model (replace $REPLICATE_API_TOKEN)
curl -s -X POST https://api.replicate.com/v1/predictions \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"version":"MODEL_VERSION","input":{"prompt":"a cat"}}'

Cheaper alternatives

For hosted LLMs by token (often cheaper than per-second GPU), see OpenAI, DeepSeek or Together AI. For image generation specifically, Stability AI and Fal.ai compete on price; for raw model hosting, Hugging Face Inference Endpoints is the closest rival.

FAQ

How does Replicate billing work?

You pay per second of compute while a model runs, at a rate set by the hardware (CPU, T4, A100, H100). No monthly fee; idle time costs nothing.

Is Replicate cheaper than an LLM API?

For language tasks, a token-priced API like DeepSeek or GPT-4o mini is usually cheaper and simpler. Replicate wins when you need a specific open model (image, video, audio) that hosted APIs don't offer.

Does Replicate have a free tier?

A small free trial credit, then pay-as-you-go. No perpetual free tier.

How do I get a Replicate API token?

Sign up at replicate.com, open Account → API tokens, and copy your token.

Not affiliated with Replicate. Prices are reference estimates — always verify on the official pricing page.