Building HF Dataset with your own Data¶
This tutorial notebook provides a step-by-step guide on how to prepare data for experimenting and evaluating using ragas.
Note
If you’re using popular frameworks like llama-index, langchain, etc to build your RAG application, Ragas provides integrations with these frameworks. Checkout integrations
This tutorial assumes that you have the 4 required data points from your RAG pipeline
Question: A set of questions.
Contexts: Retrieved contexts corresponding to each question. This is a
list[list]
since each question can retrieve multiple text chunks.Answer: Generated answer corresponding to each question.
Ground truths: Ground truths corresponding to each question. This is a
str
which corresponds to the expected answer for each question.
Example dataset¶
from datasets import Dataset
data_samples = {
'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
'answer': ['The first superbowl was held on January 15, 1967', 'The most super bowls have been won by The New England Patriots'],
'contexts' : [['The Super Bowl....season since 1966,','replacing the NFL...in February.'],
['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
}
dataset = Dataset.from_dict(data_samples)