HomeAPIs › Groq

Groq API — pricing, free tier & how to get a key

Groq runs open models like Llama and Mixtral on its custom LPU hardware, and its claim to fame is speed: hundreds of tokens per second, far faster than typical GPU inference. Pricing is per token like everyone else, but the latency is the selling point — ideal for chatbots, voice agents and anything where users wait on the response. Here's what it costs and how to get your key.

Groq API pricing (reference, June 2026)

ModelInput $/1MOutput $/1MBest for
Llama 3.3 70B$0.59$0.79Quality + speed balance
Llama 3.1 8B cheapest$0.05$0.08High volume, simple tasks
Mixtral 8x7B$0.24$0.24Cheap mixture-of-experts
⚠️ Reference prices, June 2026 — Groq adds and retires models often and adjusts pricing. Confirm on groq.com/pricing before budgeting. Prices are per 1M tokens; output is billed separately.

→ Estimate your bill on the AI API cost calculator or model a whole app with the AI app cost estimator.

Is there a free tier?

Yes — GroqCloud includes a free tier with generous per-minute and per-day rate limits, ideal for development and low-volume apps with no card required to start. For production throughput you move to paid on-demand. If a free quota matters, compare with Google Gemini and Mistral, which also offer real free tiers.

How to get a Groq API key (step by step)

1. Go to console.groq.com and create an account.
2. Open the API Keys page and click Create API Key; copy it once.
3. Use the free tier immediately, or add billing under Settings → Billing for higher limits.
4. The API is OpenAI-compatible; most OpenAI SDKs work by pointing the base URL at Groq.

Test it with a simple request:

# quick test (replace $GROQ_API_KEY)
curl https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"llama-3.1-8b-instant","messages":[{"role":"user","content":"hi"}]}'

Estimate your cost

Use the AI API Cost Calculator to plug in your token counts and request volume — it ranks Groq's open models against GPT-4o, Claude, Gemini and DeepSeek from cheapest to most expensive for your workload.

Cheaper alternatives

Groq's Llama 3.1 8B is already among the cheapest hosted models anywhere. For comparably cheap frontier-ish quality, DeepSeek-V3 and Mistral Small are the obvious comparisons. If you want one key across many providers and automatic routing, see OpenRouter.

FAQ

Why use Groq instead of OpenAI?

Speed. Groq serves open models at very high tokens-per-second, so responses feel instant — valuable for chat, voice and agent loops. You trade proprietary frontier models for raw latency and low cost.

Does Groq have a free tier?

Yes — the GroqCloud free tier has rate limits and no upfront cost, good for testing and low-volume use.

How do I get a Groq API key?

Sign up at console.groq.com, open API Keys, create a key, copy it once, and use the free tier or add billing.

Is the Groq API OpenAI-compatible?

Yes — it exposes an OpenAI-style chat completions endpoint, so most OpenAI client libraries work by changing the base URL and key.

Not affiliated with Groq. Prices are reference estimates — always verify on the official pricing page.