The RAG Architecture

Learning Objectives

By the end of this module you will understand:

In previous modules we learned:

Now we connect these components into a complete RAG system.


1. What is RAG?

Retrieval-Augmented Generation (RAG) is a framework that combines:


1. A retrieval system → finds relevant knowledge
2. A language model → generates answers based on retrieved knowledge

Instead of asking the LLM to remember everything, we retrieve relevant context on demand.

This makes LLMs:


2. Why RAG?

Traditional LLM-only approaches:

RAG overcomes this by feeding the model retrieved information, so the model has the context it needs.

Example:


Question: "What is our company refund policy for electronics purchased last month?"

LLM alone: may not know last month’s updated policy
RAG: retrieves relevant policy document chunk → LLM generates an accurate answer


3. The RAG Pipeline Overview

The RAG process can be broken down into four main steps:

  1. Query Encoding
    Convert user query into an embedding vector

  2. Document Retrieval
    Search the vector database for top K most similar embeddings

  3. Context Construction
    Combine retrieved document chunks into a context for the LLM

  4. Answer Generation
    Feed query + retrieved context to the LLM → generate answer


User Query → Embedding → Vector DB → Retrieve Chunks → LLM → Answer

````id="b1m0c7"

---

# 4. Step 1: Query Encoding

* Convert the user’s query into a vector using the **same embedding model** as used for documents  
* Ensures embeddings are in the same semantic space  
* Example:

```python
query_embedding = embedding_model.encode("Return policy for electronics")

This embedding is used to find similar document chunks in the vector database.


5. Step 2: Document Retrieval

Top 3 retrieved chunks:
1. Electronics return policy, section 2
2. Refund process details
3. Exclusions and conditions

These chunks become the context for the LLM.


6. Step 3: Context Construction

You are a helpful assistant. Use the following retrieved information to answer the question.

Context:
[Chunk 1]
[Chunk 2]
[Chunk 3]

Question: What is the return policy for electronics purchased last month?
Answer:

Important: Be mindful of LLM context window — too many chunks may exceed it.


7. Step 4: Answer Generation

Example output:

Customers can return electronics within 15 days of purchase, provided the items are unused and in original packaging.

8. RAG Variants

There are two main types of RAG:

8.1 RAG-Sequence

8.2 RAG-Token

Both approaches are used depending on accuracy vs speed requirements.


9. Key Components of RAG

Together, these components create a scalable and accurate RAG system.


Key Takeaways


Next Module

In the next module we will build a basic RAG system from scratch:

💬
AI Learning Assistant