Together AI hosts open-source LLMs — Llama, Mixtral, Qwen, DeepSeek and more — behind a fast, OpenAI-compatible API, usually far cheaper than closed frontier models. If you want open weights without running your own GPUs, it's a top pick. Here's the pricing and setup.
| Model | Input $/1M | Output $/1M | Best for |
|---|---|---|---|
| Llama 3.1 8B cheapest | $0.18 | $0.18 | High volume, cheap |
| Mixtral 8x7B | $0.60 | $0.60 | Balanced quality |
| Llama 3.1 70B | $0.88 | $0.88 | Strong reasoning |
| Llama 3.1 405B | $3.50 | $3.50 | Frontier-class open model |
→ Compare Together against GPT-4o, Claude and Gemini on the AI API cost calculator.
Together AI gives new accounts free credit to start, then pay-as-you-go per token. There's no perpetual free tier, but open-model pricing is low enough that light use costs cents. For a genuine free quota, Gemini is the alternative.
1. Sign up at together.ai.
2. Open Settings → API Keys and create a key.
3. Add billing for usage beyond the starting credit.
4. The API is OpenAI-compatible — point the OpenAI SDK at Together's base URL.
Test it with a simple request:
For the cheapest closed models, DeepSeek-V3 and Gemini Flash compete hard. Other open-model hosts include Fireworks AI, Groq (fastest) and OpenRouter (one key, many providers).
Yes — open models like Llama 3.1 8B run around $0.18 per million tokens, far below frontier closed models. Larger models cost more but still undercut most closed APIs.
Free starting credit, then pay-as-you-go. No perpetual free tier.
Yes — change the base URL and key, and most OpenAI SDK code works unchanged.
Sign up at together.ai, open Settings → API Keys, create a key, and add billing.
Not affiliated with Together AI. Prices are reference estimates — always verify on the official pricing page.