Large Language Models: From Fundamentals to Production

0 %

Course content

RAG Pipeline Design

Prev Next

Fullscreen Share

RAG Pipeline Design

The Basic RAG Pipeline

Ingestion: Load documents → Chunk → Embed → Store in vector DB
Retrieval: Embed user query → Search vector DB → Get top-K relevant chunks
Generation: Construct prompt with retrieved context + user question → Send to LLM → Return answer

Retrieval Prompt Template

Answer the user's question based on the following context.
If the context doesn't contain enough information, say so.

Context:
{retrieved_chunks}

Question: {user_question}

Answer:

Advanced RAG Techniques

Query Transformation

Query rewriting: Use an LLM to reformulate the query for better retrieval
HyDE: Generate a hypothetical answer, embed that instead of the question
Multi-query: Generate multiple query variants and combine results

Re-ranking

After initial retrieval, use a cross-encoder model to re-rank results for relevance. More accurate than embedding similarity alone.

Agentic RAG

Use an AI agent that can iteratively search, evaluate results, and refine queries until it finds sufficient information.

Evaluation Metrics

Retrieval: Precision@K, Recall@K, MRR (Mean Reciprocal Rank)
Generation: Faithfulness (does the answer match the context?), Relevance, Completeness
Tools: RAGAS, DeepEval, LangSmith for automated RAG evaluation

Common Pitfalls

Chunks too large → noisy, irrelevant context fills the prompt
Chunks too small → missing context, fragmented information
No metadata filtering → retrieving outdated or irrelevant documents
Ignoring evaluation → no way to know if changes improve quality

🌼 Daisy+ in Action: Grounded AI Responses

Daisy+ implements a practical RAG pipeline: incoming customer questions on livechat trigger a search against product docs and FAQ content, the relevant chunks are injected into the LLM's context, and the response is grounded in actual company data rather than hallucinated answers. This is why DaisyBot can answer "what's your return policy?" accurately — it retrieves the real policy, not a guess.

Large Language Models: From Fundamentals to Production

Completed

RAG Pipeline Design

RAG Pipeline Design

The Basic RAG Pipeline

Retrieval Prompt Template

Advanced RAG Techniques

Query Transformation

Re-ranking

Agentic RAG

Evaluation Metrics

Common Pitfalls

🌼 Daisy+ in Action: Grounded AI Responses

Follow us