Understanding RAG (Retrieval Augmented Generation) and its essential role Legal AI

General•19 Nov 24

Imagine you’re a lawyer tasked with advising a client on a case. Your client’s query involves a specific nuance that traditional tools struggle to address effectively because of a lack of actually understanding your search query. This is where Retrieval-Augmented Generation (RAG) steps in as a game-changer. RAG connects Generative AI models with domain-specific information, enabling legal professionals to retrieve and apply the most accurate and up-to-date case law, laws, and regulations to their queries. In situations like this, RAG doesn’t just save time—it ensures precision and trust, making it an indispensable tool for modern legal practice.

RAG is changing the way we use Large Language Models (LLMs) and it has truly become a field of research of its own. But what exactly is RAG, and why is it becoming an essential tool in legal practice?

What is RAG and how does it work?

Retrieval-Augmented Generation (RAG) is an AI architecture that combines the power of Large Language Models (LLMs) with a knowledge retrieval system. A RAG system can access, retrieve, and incorporate external information in real-time to generate more accurate and contextually relevant responses from LLMs.

Here’s how RAG works:

Retrieval: Based on the user’s query, the system identifies and retrieves relevant information—commonly referred to as “chunks”—from a connected database or knowledge source.
Augmentation: The retrieved data is then added to the prompt, providing the LLM with additional context to improve its response.
Generation: Finally, the augmented prompt, enriched with the retrieved information, is sent to the LLM, enabling it to produce a more accurate response.

Image 1: RAG explained

By grounding its outputs in specific, retrievable sources, RAG not only improves response quality but also enhances user trust by providing transparent citations.

Especially in the Retrieval part and the Generation part, you can apply advanced techniques - like vector/hybrid search, reranking, and agents - to improve the overall performance.

Why do we need RAG?

LLMs are “pre-trained” systems (hence the “P” in GPT), meaning they are trained on large datasets up to a specific cut-off date. For example, GPT-4’s training data ends in October 2023. This creates two critical limitations:

Outdated Information: LLMs lack access to new or emerging data post-training. Without RAG, they cannot adapt to real-time developments or include recent data.
Restricted Access to Proprietary Data: Much of the data crucial for professional fields like law isn’t available to the LLM-providers (like OpenAI, Anthropic, Google, and Meta). For instance, legal databases, document management systems (DMS), or subscription-only resources are inaccessible to LLM training models.

Since AI systems cannot "guess" what they don’t know (although it often tries, hence the hallucinations), RAG fills this gap by connecting LLMs to external, up-to-date, and proprietary data sources. AI is not magic; it does not know what it is not trained on.

Image 2

RAG in legal practice

The legal field is a great example of RAG's potential. Legal professionals often need precise and accurate answers based on up-to-date case law, and laws and regulations—data that may not be included in an LLM’s training set.

For instance, asking questions about Dutch court cases isn’t reliable without RAG. We can’t be certain whether specific cases (especially recent ones) are part of the training data. Additionally, standalone tools like ChatGPT can sometimes “hallucinate” cases—fabricating court rulings or citations that don’t exist. This issue makes standalone LLMs unreliable in some legal contexts.

By integrating RAG, legal practitioners can decrease hallucinations, ensuring that responses are grounded in factual data. And I hope that after reading this short article, you agree that RAG is a crucial component of working with legal AI.

What did we not discuss in this blog?

Like I said, RAG has grown to be its own subfield in AI. When speaking about RAG, there are still many things to be considered:

How to evaluate a RAG system
Vector databases
Context window limits
Chunking strategies
Agentic RAG systems
Prompt transformations
System prompts
Temperature settings (often called creativity)
Reranking

The landscape of RAG is evolving, and staying informed is key. With each advancement, we make a shift on the AI Viability Space (see our previous article). So, stay alert for more information on GenAI and RAG in legal from one of our Legal AI experts.

Remco Visser