Quote

"Between stimulus and response there is a space. In that space is our power to choose our response.
In our response lies our growth and freedom"


“The only way to discover the limits of the possible is to go beyond them into the impossible.”


Monday, 17 February 2025

RAG for Relevancy of AI Generated Responses

 

Why RAG?

Imagine you have a huge library full of books, but instead of reading every book to find the answer to your question, you have a super-smart librarian who quickly finds the most relevant pages and summarizes them for you.

That’s how RAG (Retrieval-Augmented Generation) works in AI. Instead of just relying on what it already knows (which might be outdated or incomplete), a RAG system first retrieves fresh and relevant information from a database or the internet and then generates a response using that information.

This makes RAG much smarter and more accurate because it pulls in the latest, most relevant facts before answering—just like a good researcher!

RAG Techniques

Retrieval-Augmented Generation (RAG) relies on a combination of retrieval and generation techniques to improve the accuracy and relevance of AI-generated responses. 


Image Credit

Here are some common RAG techniques:

1. Retrieval Techniques

These methods help fetch relevant documents or information before generating a response.

  • Dense Retrieval (e.g., FAISS, DPR) – Uses deep learning-based embeddings to find the most relevant documents efficiently.
  • Sparse Retrieval (e.g., BM25, TF-IDF) – Uses traditional keyword-based search to rank documents by relevance.
  • Hybrid Retrieval – Combines dense and sparse retrieval methods to improve accuracy.
  • Hierarchical Retrieval – First retrieves broad categories of information, then drills down into specific details.
  • Re-ranking Models (e.g., Cross-Encoders, Rank-BERT) – After retrieving initial results, these models refine the ranking for better precision.

2. Augmentation Techniques

These techniques improve the way retrieved information is used in response generation.

  • Chunking – Breaks documents into smaller passages to improve retrieval relevance.
  • Context Expansion – Adds additional metadata or summaries to make retrieval more precise.
  • Memory-Augmented Retrieval – Keeps track of past queries and retrieved data for better long-term context.

3. Generation Techniques

Once the right documents are retrieved, these methods help generate better responses.

  • Fusion-in-Decoder (FiD) – Feeds retrieved documents directly into a language model to improve response quality.
  • Retrieval-Augmented Fine-Tuning – The model is fine-tuned on retrieved data to improve accuracy.
  • Contrastive Learning – Helps the model distinguish between useful and irrelevant retrieved data.
  • Chain-of-Thought (CoT) Prompting – Encourages step-by-step reasoning to improve complex responses.

4. Post-Processing Techniques

These techniques refine the final output.

  • Answer Verification – Cross-checks generated answers against retrieved documents.
  • Fact-checking & Consistency Checking – Uses additional models or logic rules to ensure accuracy.
  • Human-in-the-Loop Feedback – Uses human reviewers to refine the retrieval and generation process over time.