Testset Generation for RAG

This simple guide will help you generate a testset for evaluating your RAG pipeline using your own documents.

Quickstart

Let's walk through an quick example of generating a testset for a RAG pipeline. Following that will will explore the main components of the testset generation pipeline.

Load Sample Documents

For the sake of this tutorial we will use sample documents from this repository. You can replace this with your own documents.

git clone https://huggingface.co/datasets/explodinggradients/Sample_Docs_Markdown

Load documents

Now we will load the documents from the sample dataset using DirectoryLoader, which is one of document loaders from langchain_community. You may also use any loaders from llama_index

from langchain_community.document_loaders import DirectoryLoader

path = "Sample_Docs_Markdown/"
loader = DirectoryLoader(path, glob="**/*.md")
docs = loader.load()

Choose your LLM

You may choose to use any LLM of your choice

OpenAIAmazon Bedrock

Install the langchain-openai package

pip install langchain-openai

then ensure you have your OpenAI key ready and available in your environment

import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"

Wrapp the LLMs in LangchainLLMWrapper so that it can be used with ragas.

from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

Install the langchain-aws package

pip install langchain-aws

then you have to set your AWS credentials and configurations

config = {
    "credentials_profile_name": "your-profile-name",  # E.g "default"
    "region_name": "your-region-name",  # E.g. "us-east-1"
    "llm": "your-llm-model-id",  # E.g "anthropic.claude-3-5-sonnet-20240620-v1:0"
    "embeddings": "your-embedding-model-id",  # E.g "amazon.titan-embed-text-v2:0"
    "temperature": 0.4,
}

define you LLMs and wrap them in LangchainLLMWrapper so that it can be used with ragas.

from langchain_aws import ChatBedrockConverse
from langchain_aws import BedrockEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

generator_llm = LangchainLLMWrapper(ChatBedrockConverse(
    credentials_profile_name=config["credentials_profile_name"],
    region_name=config["region_name"],
    base_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
    model=config["llm"],
    temperature=config["temperature"],
))
generator_embeddings = LangchainEmbeddingsWrapper(BedrockEmbeddings(
    credentials_profile_name=config["credentials_profile_name"],
    region_name=config["region_name"],
    model_id=config["embeddings"],
))

If you want more information on how to use other AWS services, please refer to the langchain-aws documentation.

Generate Testset

Now we will run the test generation using the loaded documents and the LLM setup. If you have used llama_index to load documents, please use generate_with_llama_index_docs method instead.

from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm)
dataset = generator.generate_with_langchain_docs(docs, testset_size=10)

Export

You may now export and inspect the generated testset.

dataset.to_pandas()

A Deeper Look

Now that we have a seen how to generate a testset, let's take a closer look at the main components of the testset generation pipeline and how you can quickly customize it.

At the core there are 2 main operations that are performed to generate a testset.

KnowledgeGraph Creation: We first create a KnowledgeGraph using the documents you provide and use various Transformations to enrich the knowledge graph with additional information that we can use to generate the testset. You can learn more about this from the core concepts section.
Testset Generation: We use the KnowledgeGraph to generate a set of scenarios. These scenarios are used to generate the testset. You can learn more about this from the core concepts section.

Now let's see an example of how these components work together to generate a testset.

KnowledgeGraph Creation

Let's first create a KnowledgeGraph using the documents we loaded earlier.

from ragas.testset.graph import KnowledgeGraph

kg = KnowledgeGraph()

KnowledgeGraph(nodes: 0, relationships: 0)

and then add the documents to the knowledge graph.

from ragas.testset.graph import Node, NodeType

for doc in docs:
    kg.nodes.append(
        Node(
            type=NodeType.DOCUMENT,
            properties={"page_content": doc.page_content, "document_metadata": doc.metadata}
        )
    )

KnowledgeGraph(nodes: 10, relationships: 0)

Now we will enrich the knowledge graph with additional information using Transformations. Here we will use default_transforms to create a set of default transformations to apply with an LLM and Embedding Model of your choice. But you can mix and match transforms or build your own as needed.

from ragas.testset.transforms import default_transforms, apply_transforms


# define your LLM and Embedding Model
# here we are using the same LLM and Embedding Model that we used to generate the testset
transformer_llm = generator_llm
embedding_model = generator_embeddings

trans = default_transforms(llm=transformer_llm, embedding_model=embedding_model)
apply_transforms(kg, trans)

Now we have a knowledge graph with additional information. You can save the knowledge graph too.

kg.save("knowledge_graph.json")
loaded_kg = KnowledgeGraph.load("knowledge_graph.json")
loaded_kg

KnowledgeGraph(nodes: 48, relationships: 605)

Testset Generation

Now we will use the loaded_kg to create the TestsetGenerator.

from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, knowledge_graph=loaded_kg)

We can also define the distribution of queries we would like to generate. Here lets use the default distribution.

from ragas.testset.synthesizers import default_query_distribution

query_distribution = default_query_distribution(generator_llm)

[
    (AbstractQuerySynthesizer(llm=generator_llm), 0.25),
    (ComparativeAbstractQuerySynthesizer(llm=generator_llm), 0.25),
    (SpecificQuerySynthesizer(llm=generator_llm), 0.5),
]

Now we can generate the testset.

testset = generator.generate(testset_size=10, query_distribution=query_distribution)
testset.to_pandas()