HomeAPIs › Hugging Face

Hugging Face API — pricing, free tier & how to get a token

Hugging Face hosts hundreds of thousands of open models and lets you run them through the Serverless Inference API, dedicated Inference Endpoints, or Spaces. Here's what it costs, the free tier, and how to get your access token.

Hugging Face pricing (reference, June 2026)

Plan / productPriceWhat you get
Free account free tier$0Limited monthly serverless credits, rate-limited inference, free CPU Spaces
PRO~$9/moMore serverless credits, higher rate limits, ZeroGPU Spaces
Inference Endpoints (CPU)from ~$0.03/hrDedicated, autoscaling, billed per hour up
Inference Endpoints (GPU)from ~$0.50/hrSmall GPU; larger GPUs scale to several $/hr
Team / Enterprisefrom ~$20/user/moSSO, private hub, support, controls
⚠️ Reference prices, June 2026 — Hugging Face changes credits, hardware rates and plan names often. Confirm on huggingface.co/pricing. Serverless calls routed to third-party Inference Providers are billed at that provider's per-token rate.

The free tier

A free Hugging Face account includes a small monthly credit for the Serverless Inference API plus rate-limited access to many hosted models, and free CPU Spaces for demos. It's enough to prototype and test models. For steady traffic you either upgrade to PRO (~$9/mo) for bigger credits and limits, or spin up a dedicated Inference Endpoint billed per hour by hardware.

How to get a Hugging Face token (step by step)

1. Create an account at huggingface.co.
2. Go to Settings → Access Tokens.
3. Click New token, choose a fine-grained scope (or simple read/write), name it.
4. Copy the hf_… token once — treat it like a password.

Call a hosted model:

# replace $HF_TOKEN and the model id
curl https://api-inference.huggingface.co/models/google/flan-t5-base \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{"inputs":"Translate to French: Hello"}'

Cheaper / alternative options

If you want a hosted, pay-per-token API instead of managing models, compare Together AI, Replicate, Groq and OpenRouter — all run open models with simple per-token billing. For closed frontier models see OpenAI and Anthropic. To estimate any of these for your usage, use the AI cost calculator.

FAQ

Does Hugging Face have a free tier?

Yes — a free account with limited monthly serverless credits, rate-limited inference and free CPU Spaces. Heavier use moves to PRO (~$9/mo) or per-hour Inference Endpoints.

How do I get a Hugging Face access token?

Account → Settings → Access Tokens → New token. Pick a fine-grained or read/write scope and copy the hf_ token once.

What's the difference between Serverless Inference and Inference Endpoints?

Serverless runs shared, rate-limited and is great for testing. Inference Endpoints are dedicated, autoscaling deployments billed per hour by the hardware you pick — predictable for production traffic.

Not affiliated with Hugging Face. Prices are reference estimates — always verify on the official pricing page.