Testset Generation for RAG
This simple guide will help you generate a testset for evaluating your RAG pipeline using your own documents.
Quickstart
Let's walk through an quick example of generating a testset for a RAG pipeline. Following that will will explore the main components of the testset generation pipeline.
Load Sample Documents
For the sake of this tutorial we will use sample documents from this repository. You can replace this with your own documents.
Load documents
Now we will load the documents from the sample dataset using DirectoryLoader
, which is one of the document loaders from langchain_community. You may also use any loaders from llama_index
from langchain_community.document_loaders import DirectoryLoader
path = "Sample_Docs_Markdown/"
loader = DirectoryLoader(path, glob="**/*.md")
docs = loader.load()
Choose your LLM
You may choose to use any LLM of your choice
Install the langchain-openai package
then ensure you have your OpenAI key ready and available in your environment
Wrapp the LLMs in LangchainLLMWrapper
so that it can be used with ragas.
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())
Install the langchain-aws package
then you have to set your AWS credentials and configurations
config = {
"credentials_profile_name": "your-profile-name", # E.g "default"
"region_name": "your-region-name", # E.g. "us-east-1"
"llm": "your-llm-model-id", # E.g "anthropic.claude-3-5-sonnet-20241022-v2:0"
"embeddings": "your-embedding-model-id", # E.g "amazon.titan-embed-text-v2:0"
"temperature": 0.4,
}
define you LLMs and wrap them in LangchainLLMWrapper
so that it can be used with ragas.
from langchain_aws import ChatBedrockConverse
from langchain_aws import BedrockEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
generator_llm = LangchainLLMWrapper(ChatBedrockConverse(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
base_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
model=config["llm"],
temperature=config["temperature"],
))
generator_embeddings = LangchainEmbeddingsWrapper(BedrockEmbeddings(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
model_id=config["embeddings"],
))
If you want more information on how to use other AWS services, please refer to the langchain-aws documentation.
Install the langchain-openai package
Ensure you have your Azure OpenAI key ready and available in your environment.
import os
os.environ["AZURE_OPENAI_API_KEY"] = "your-azure-openai-key"
# other configuration
azure_config = {
"base_url": "", # your endpoint
"model_deployment": "", # your model deployment name
"model_name": "", # your model name
"embedding_deployment": "", # your embedding deployment name
"embedding_name": "", # your embedding name
}
Define your LLMs and wrap them in LangchainLLMWrapper
so that it can be used with ragas.
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
generator_llm = LangchainLLMWrapper(AzureChatOpenAI(
openai_api_version="2023-05-15",
azure_endpoint=azure_configs["base_url"],
azure_deployment=azure_configs["model_deployment"],
model=azure_configs["model_name"],
validate_base_url=False,
))
# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
generator_embeddings = LangchainEmbeddingsWrapper(AzureOpenAIEmbeddings(
openai_api_version="2023-05-15",
azure_endpoint=azure_configs["base_url"],
azure_deployment=azure_configs["embedding_deployment"],
model=azure_configs["embedding_name"],
))
If you want more information on how to use other Azure services, please refer to the langchain-azure documentation.
If you are using a different LLM provider and using Langchain to interact with it, you can wrap your LLM in LangchainLLMWrapper
so that it can be used with ragas.
For a more detailed guide, checkout the guide on customizing models.
If you using LlamaIndex, you can use the LlamaIndexLLMWrapper
to wrap your LLM so that it can be used with ragas.
For more information on how to use LlamaIndex, please refer to the LlamaIndex Integration guide.
If your still not able use Ragas with your favorite LLM provider, please let us know by by commenting on this issue and we'll add support for it 🙂.
Generate Testset
Now we will run the test generation using the loaded documents and the LLM setup. If you have used llama_index
to load documents, please use generate_with_llama_index_docs
method instead.
from ragas.testset import TestsetGenerator
generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(docs, testset_size=10)
Analyzing the testset
Once you have generated a testset, you would want to view it and select the queries you see fit to include in your final testset. You can export the testset to a pandas dataframe and do various analysis on it.
You can also use other tools like app.ragas.io or any other similar tools available for you in the Integrations section.
In order to use the app.ragas.io dashboard, you need to have an account on app.ragas.io. If you don't have one, you can sign up for one here. You will also need to have a Ragas APP token.
Once you have the API key, you can use the upload()
method to export the results to the dashboard.
Now you can view the results in the dashboard by following the link in the output of the upload()
method.
A Deeper Look
Now that we have a seen how to generate a testset, let's take a closer look at the main components of the testset generation pipeline and how you can quickly customize it.
At the core there are 2 main operations that are performed to generate a testset.
- KnowledgeGraph Creation: We first create a KnowledgeGraph using the documents you provide and use various Transformations to enrich the knowledge graph with additional information that we can use to generate the testset. You can learn more about this from the core concepts section.
- Testset Generation: We use the KnowledgeGraph to generate a set of scenarios. These scenarios are used to generate the testset. You can learn more about this from the core concepts section.
Now let's see an example of how these components work together to generate a testset.
KnowledgeGraph Creation
Let's first create a KnowledgeGraph using the documents we loaded earlier.
Outputand then add the documents to the knowledge graph.
from ragas.testset.graph import Node, NodeType
for doc in docs:
kg.nodes.append(
Node(
type=NodeType.DOCUMENT,
properties={"page_content": doc.page_content, "document_metadata": doc.metadata}
)
)
Now we will enrich the knowledge graph with additional information using Transformations. Here we will use default_transforms to create a set of default transformations to apply with an LLM and Embedding Model of your choice. But you can mix and match transforms or build your own as needed.
from ragas.testset.transforms import default_transforms, apply_transforms
# define your LLM and Embedding Model
# here we are using the same LLM and Embedding Model that we used to generate the testset
transformer_llm = generator_llm
embedding_model = generator_embeddings
trans = default_transforms(documents=docs, llm=transformer_llm, embedding_model=embedding_model)
apply_transforms(kg, trans)
Now we have a knowledge graph with additional information. You can save the knowledge graph too.
Output
Testset Generation
Now we will use the loaded_kg
to create the TestsetGenerator.
from ragas.testset import TestsetGenerator
generator = TestsetGenerator(llm=generator_llm, embedding_model=embedding_model, knowledge_graph=loaded_kg)
We can also define the distribution of queries we would like to generate. Here lets use the default distribution.
from ragas.testset.synthesizers import default_query_distribution
query_distribution = default_query_distribution(generator_llm)
Output
[
(SingleHopSpecificQuerySynthesizer(llm=llm), 0.5),
(MultiHopAbstractQuerySynthesizer(llm=llm), 0.25),
(MultiHopSpecificQuerySynthesizer(llm=llm), 0.25),
]
Now we can generate the testset.
testset = generator.generate(testset_size=10, query_distribution=query_distribution)
testset.to_pandas()