Retrieval

RAG

Retrieval-Augmented Generation. The dominant architecture for grounded LLM applications.

RAG fetches relevant documents at query time and inserts them into the model's context, so the model answers from current, authoritative sources rather than its training data. Almost every production LLM application is some flavor of RAG. The hard parts are not the LLM call — they are chunking, embedding choice, hybrid search, reranking, and citation tracking. Most failed AI applications are failed retrieval applications wearing an LLM costume.

Related terms

Hybrid search
Graph RAG

Building with RAG?

We ship production AI systems built around concepts like this every quarter. Send a brief and get a written proposal in 48 hours.

Send a brief →