Generating a Synthetic Test Set for RAG-Based Question Answering with Ragas
Overview
In this tutorial, we'll explore the test set generation module in Ragas to create a synthetic test set for a Retrieval-Augmented Generation (RAG)-based question-answering bot. Our goal is to design a Ragas Airline Assistant capable of answering customer queries on various topics, including:
- Flight booking
- Flight changes and cancellations
- Baggage policies
- Viewing reservations
- Flight delays
- In-flight services
- Special assistance
To make sure our synthetic dataset is as realistic and diverse as possible, we will create different customer personas. Each persona will represent distinct traveler types and behaviors, helping us build a comprehensive and representative test set. This approach ensures that we can thoroughly evaluate the effectiveness and robustness of our RAG model.
Letβs get started!
Download and Load documents
Run the command below to download the dummy Ragas Airline dataset and load the documents using LangChain.
from langchain_community.document_loaders import DirectoryLoader
path = "ragas-airline-dataset"
loader = DirectoryLoader(path, glob="**/*.md")
docs = loader.load()
Set up the LLM and Embedding Model
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings(model="text-embedding-3-small"))
Create Knowledge Graph
Create a base knowledge graph with the documents
from ragas.testset.graph import KnowledgeGraph
from ragas.testset.graph import Node, NodeType
kg = KnowledgeGraph()
for doc in docs:
kg.nodes.append(
Node(
type=NodeType.DOCUMENT,
properties={"page_content": doc.page_content, "document_metadata": doc.metadata}
)
)
kg
Setup the transforms
In this tutorial, we create a Single Hop Query dataset using a knowledge graph built solely from nodes. To enhance our graph and improve query generation, we apply three key transformations:
- Headline Extraction: Uses a language model to extract clear section titles from each document (e.g., βAirline Initiated Cancellationsβ from flight cancellations.md). These titles isolate specific topics and provide direct context for generating focused questions.
- Headline Splitting: Divides documents into manageable subsections based on the extracted headlines. This increases the number of nodes and ensures more granular, context-specific query generation.
- Keyphrase Extraction: Identifies core thematic keyphrases (such as key seating information) that serve as semantic seed points, enriching the diversity and relevance of the generated queries.
from ragas.testset.transforms import apply_transforms
from ragas.testset.transforms import HeadlinesExtractor, HeadlineSplitter, KeyphrasesExtractor
headline_extractor = HeadlinesExtractor(llm=generator_llm, max_num=20)
headline_splitter = HeadlineSplitter(max_tokens=1500)
keyphrase_extractor = KeyphrasesExtractor(llm=generator_llm)
transforms = [
headline_extractor,
headline_splitter,
keyphrase_extractor
]
apply_transforms(kg, transforms=transforms)
Applying HeadlinesExtractor: 100%|ββββββββββ| 8/8 [00:00<?, ?it/s]
Applying HeadlineSplitter: 100%|ββββββββββ| 8/8 [00:00<?, ?it/s]
Applying KeyphrasesExtractor: 100%|ββββββββββ| 25/25 [00:00<?, ?it/s]
Configuring Personas for Query Generation
Personas provide context and perspective, ensuring that generated queries are natural, user-specific, and diverse. By tailoring queries to different user viewpoints, our test set covers a wide range of scenarios:
- First Time Flier: Generates queries with detailed, step-by-step guidance, catering to newcomers who need clear instructions.
- Frequent Flier: Produces concise, efficiency-focused queries for experienced travelers.
- Angry Business Class Flier: Yields queries with a critical, urgent tone to reflect high expectations and immediate resolution demands.
from ragas.testset.persona import Persona
persona_first_time_flier = Persona(
name="First Time Flier",
role_description="Is flying for the first time and may feel anxious. Needs clear guidance on flight procedures, safety protocols, and what to expect throughout the journey.",
)
persona_frequent_flier = Persona(
name="Frequent Flier",
role_description="Travels regularly and values efficiency and comfort. Interested in loyalty programs, express services, and a seamless travel experience.",
)
persona_angry_business_flier = Persona(
name="Angry Business Class Flier",
role_description="Demands top-tier service and is easily irritated by any delays or issues. Expects immediate resolutions and is quick to express frustration if standards are not met.",
)
personas = [persona_first_time_flier, persona_frequent_flier, persona_angry_business_flier]
Query Generation Using Synthesizers
Synthesizers are responsible for converting enriched nodes and personas into queries. They achieve this by selecting a node property (e.g., "entities" or "keyphrases"), pairing it with a persona, style, and query length, and then using a LLM to generate a query-answer pair based on the content of the node.
Two instances of the SingleHopSpecificQuerySynthesizer
are used to define the query distribution:
- Headlines-Based Synthesizer β Generates queries using extracted document headlines, leading to structured questions that reference specific sections.
- Keyphrases-Based Synthesizer β Forms queries around key concepts, generating broader, thematic questions.
Both synthesizers are weighted equally (0.5 each), ensuring a balanced mix of specific and conceptual queries, which ultimately enhances the diversity of the test set.
from ragas.testset.synthesizers.single_hop.specific import (
SingleHopSpecificQuerySynthesizer,
)
query_distibution = [
(
SingleHopSpecificQuerySynthesizer(llm=generator_llm, property_name="headlines"),
0.5,
),
(
SingleHopSpecificQuerySynthesizer(
llm=generator_llm, property_name="keyphrases"
),
0.5,
),
]
Testset Generation
from ragas.testset import TestsetGenerator
generator = TestsetGenerator(
llm=generator_llm,
embedding_model=generator_embeddings,
knowledge_graph=kg,
persona_list=personas,
)
Now we can generate the testset.
testset = generator.generate(testset_size=10, query_distribution=query_distibution)
testset.to_pandas()
Generating Scenarios: 100%|ββββββββββ| 2/2 [00:00<?, ?it/s]
Generating Samples: 100%|ββββββββββ| 10/10 [00:00<?, ?it/s]
user_input | reference_contexts | reference | synthesizer_name | |
---|---|---|---|---|
0 | Wut do I do if my baggage is Delayed, Lost, or... | [Baggage Policies\n\nThis section provides a d... | If your baggage is delayed, lost, or damaged, ... | single_hop_specifc_query_synthesizer |
1 | Wht asistance is provided by the airline durin... | [Flight Delays\n\nFlight delays can be caused ... | Depending on the length of the delay, Ragas Ai... | single_hop_specifc_query_synthesizer |
2 | What is Step 1: Check Fare Rules in the contex... | [Flight Cancellations\n\nFlight cancellations ... | Step 1: Check Fare Rules involves logging into... | single_hop_specifc_query_synthesizer |
3 | How can I access my booking online with Ragas ... | [Managing Reservations\n\nManaging your reserv... | To access your booking online with Ragas Airli... | single_hop_specifc_query_synthesizer |
4 | What assistance does Ragas Airlines provide fo... | [Special Assistance\n\nRagas Airlines provides... | Ragas Airlines provides special assistance ser... | single_hop_specifc_query_synthesizer |
5 | What steps should I take if my baggage is dela... | [Baggage Policies This section provides a deta... | If your baggage is delayed, lost, or damaged w... | single_hop_specifc_query_synthesizer |
6 | How can I resubmit the claim for my baggage is... | [Potential Issues and Resolutions for Baggage ... | To resubmit the claim for your baggage issue, ... | single_hop_specifc_query_synthesizer |
7 | Wut are the main causes of flight delays and h... | [Flight Delays Flight delays can be caused by ... | Flight delays can be caused by weather conditi... | single_hop_specifc_query_synthesizer |
8 | How can I request reimbursement for additional... | [2. Additional Expenses Incurred Due to Delay ... | To request reimbursement for additional expens... | single_hop_specifc_query_synthesizer |
9 | What are passenger-initiated cancelations? | [Flight Cancellations Flight cancellations can... | Passenger-initiated cancellations occur when a... | single_hop_specifc_query_synthesizer |
Final Thoughts
In this tutorial, we explored test set generation using the Ragas library, focusing primarily on single-hop queries. In our upcoming tutorial, weβll dive into multi-hop queries, expanding on these concepts for even richer test set scenarios.