Cheapest answer model for this RAG
Same retrieval and traffic, every model ranked by monthly cost.
| Model | Cost / month | Per query |
|---|
Where RAG money actually goes
People assume embeddings are the expensive part of RAG. They're usually the cheapest. Embedding 10,000 short documents is often a few cents, paid once. The real, recurring cost is the LLM on every query โ and specifically the retrieved context you feed it. Pull 5 chunks of 400 tokens and you've added 2,000 input tokens to every single question before the user even finishes typing. Multiply by your query volume and that's your bill.
The big levers, in order: retrieve fewer / smaller chunks (top-k and chunk size), use a cheaper answer model for routine queries, and cap answer length. Re-embedding only changed documents (not the whole corpus) keeps the one-time cost from recurring. Building the rest of the app? Price a support bot with the chatbot cost calculator, a whole feature with the AI app cost estimator, or the full backend with the API stack cost calculator.