Groq runs open models like Llama and Mixtral on its custom LPU hardware, and its claim to fame is speed: hundreds of tokens per second, far faster than typical GPU inference. Pricing is per token like everyone else, but the latency is the selling point — ideal for chatbots, voice agents and anything where users wait on the response. Here's what it costs and how to get your key.
| Model | Input $/1M | Output $/1M | Best for |
|---|---|---|---|
| Llama 3.3 70B | $0.59 | $0.79 | Quality + speed balance |
| Llama 3.1 8B cheapest | $0.05 | $0.08 | High volume, simple tasks |
| Mixtral 8x7B | $0.24 | $0.24 | Cheap mixture-of-experts |
→ Estimate your bill on the AI API cost calculator or model a whole app with the AI app cost estimator.
Yes — GroqCloud includes a free tier with generous per-minute and per-day rate limits, ideal for development and low-volume apps with no card required to start. For production throughput you move to paid on-demand. If a free quota matters, compare with Google Gemini and Mistral, which also offer real free tiers.
1. Go to console.groq.com and create an account.
2. Open the API Keys page and click Create API Key; copy it once.
3. Use the free tier immediately, or add billing under Settings → Billing for higher limits.
4. The API is OpenAI-compatible; most OpenAI SDKs work by pointing the base URL at Groq.
Test it with a simple request:
Use the AI API Cost Calculator to plug in your token counts and request volume — it ranks Groq's open models against GPT-4o, Claude, Gemini and DeepSeek from cheapest to most expensive for your workload.
Groq's Llama 3.1 8B is already among the cheapest hosted models anywhere. For comparably cheap frontier-ish quality, DeepSeek-V3 and Mistral Small are the obvious comparisons. If you want one key across many providers and automatic routing, see OpenRouter.
Speed. Groq serves open models at very high tokens-per-second, so responses feel instant — valuable for chat, voice and agent loops. You trade proprietary frontier models for raw latency and low cost.
Yes — the GroqCloud free tier has rate limits and no upfront cost, good for testing and low-volume use.
Sign up at console.groq.com, open API Keys, create a key, copy it once, and use the free tier or add billing.
Yes — it exposes an OpenAI-style chat completions endpoint, so most OpenAI client libraries work by changing the base URL and key.
Not affiliated with Groq. Prices are reference estimates — always verify on the official pricing page.