LlamaIndex
LlamaIndex is a data framework for LLM applications to ingest, structure, and access private or domain-specific data. Makes it super easy to connect LLMs with your own data. But in order to figure out the best configuration for llamaIndex and your data you need a object measure of the performance. This is where ragas comes in. Ragas will help you evaluate your QueryEngine
and gives you the confidence to tweak the configuration to get hightest score.
This guide assumes you have familarity with the LlamaIndex framework.
Building the Testset
You will need an testset to evaluate your QueryEngine
against. You can either build one yourself or use the Testset Generator Module in Ragas to get started with a small synthetic one.
Let's see how that works with Llamaindex
load the documents
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./nyc_wikipedia").load_data()
Now lets init the TestsetGenerator
object with the corresponding generator and critic llms
from ragas.testset import TestsetGenerator
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# generator with openai models
generator_llm = OpenAI(model="gpt-4o")
embeddings = OpenAIEmbedding(model="text-embedding-3-large")
generator = TestsetGenerator.from_llama_index(
llm=generator_llm,
embedding_model=embeddings,
)
Now you are all set to generate the dataset
user_input | reference_contexts | reference | synthesizer_name | |
---|---|---|---|---|
0 | Why was New York named after the Duke of York? | [Etymology ==\n\nIn 1664, New York was named i... | New York was named after the Duke of York in 1... | AbstractQuerySynthesizer |
1 | How did the early Europan exploraton and setle... | [History ==\n\n\n=== Early history ===\nIn the... | The early European exploration and settlement ... | AbstractQuerySynthesizer |
2 | New York City population culture finance diver... | [New York City, the most populous city in the ... | New York City is a global cultural, financial,... | ComparativeAbstractQuerySynthesizer |
3 | How do the economic aspects of New York City, ... | [New York City, the most populous city in the ... | New York City's economic aspects, such as its ... | ComparativeAbstractQuerySynthesizer |
4 | What role do biomedical research institutions ... | [Education ==\n\n \n\nNew York City has the la... | Biomedical research institutions in New York C... | SpecificQuerySynthesizer |
with a test dataset to test our QueryEngine
lets now build one and evaluate it.
Building the QueryEngine
To start lets build an VectorStoreIndex
over the New York Citie's wikipedia page as an example and use ragas to evaluate it.
Since we already loaded the dataset into documents
lets use that.
# build query engine
from llama_index.core import VectorStoreIndex
vector_index = VectorStoreIndex.from_documents(documents)
query_engine = vector_index.as_query_engine()
Lets try an sample question from the generated testset to see if it is working
'Why was New York named after the Duke of York?'
New York was named after the Duke of York because in 1664, the city was named in honor of the Duke of York, who later became King James II of England.
Evaluating the QueryEngine
Now that we have a QueryEngine
for the VectorStoreIndex
we can use the llama_index integration Ragas has to evaluate it.
In order to run an evaluation with Ragas and LlamaIndex you need 3 things
- LlamaIndex
QueryEngine
: what we will be evaluating - Metrics: Ragas defines a set of metrics that can measure different aspects of the
QueryEngine
. The available metrics and their meaning can be found here - Questions: A list of questions that ragas will test the
QueryEngine
against.
first lets generate the questions. Ideally you should use that you see in production so that the distribution of question with which we evaluate matches the distribution of questions seen in production. This ensures that the scores reflect the performance seen in production but to start off we'll be using a few example question.
Now lets import the metrics we will be using to evaluate
# import metrics
from ragas.metrics import (
Faithfulness,
AnswerRelevancy,
ContextPrecision,
ContextRecall,
)
# init metrics with evaluator LLM
from ragas.llms import LlamaIndexLLMWrapper
evaluator_llm = LlamaIndexLLMWrapper(OpenAI(model="gpt-4o"))
metrics = [
Faithfulness(llm=evaluator_llm),
AnswerRelevancy(llm=evaluator_llm),
ContextPrecision(llm=evaluator_llm),
ContextRecall(llm=evaluator_llm),
]
the evaluate()
function expects a dict of "question" and "ground_truth" for metrics. You can easily convert the testset
to that format
EvaluationDataset(features=['user_input', 'reference_contexts', 'reference'], len=7)
Finally lets run the evaluation
from ragas.integrations.llama_index import evaluate
result = evaluate(
query_engine=query_engine,
metrics=metrics,
dataset=ragas_dataset,
)
{'faithfulness': 0.9746, 'answer_relevancy': 0.9421, 'context_precision': 0.9286, 'context_recall': 0.6857}
You can convert into a pandas dataframe to run more analysis on it.
user_input | retrieved_contexts | reference_contexts | response | reference | faithfulness | answer_relevancy | context_precision | context_recall | |
---|---|---|---|---|---|---|---|---|---|
0 | What events led to New York being named after ... | [New York City is the headquarters of the glob... | [Etymology ==\n\nIn 1664, New York was named i... | New York was named in honor of the Duke of Yor... | New York was named after the Duke of York in 1... | 1.000000 | 0.950377 | 1.0 | 1.0 |
1 | How early European explorers and Native Americ... | [=== Dutch rule ===\n\nA permanent European pr... | [History ==\n\n\n=== Early history ===\nIn the... | Early European explorers established a permane... | Early European explorers and Native Americans ... | 1.000000 | 0.896300 | 1.0 | 0.8 |
2 | New York City population economy challenges | [=== Wealth and income disparity ===\nNew York... | [New York City, the most populous city in the ... | New York City has faced challenges related to ... | New York City, as the most populous city in th... | 1.000000 | 0.915717 | 1.0 | 0.0 |
3 | How do the economic aspects of New York City, ... | [=== Wealth and income disparity ===\nNew York... | [New York City, the most populous city in the ... | The economic aspects of New York City, as a gl... | New York City's economic aspects as a global c... | 0.913043 | 0.929317 | 1.0 | 0.0 |
4 | What are some of the cultural and architectura... | [==== Staten Island ====\nStaten Island (Richm... | [Geography ==\n\nDuring the Wisconsin glaciati... | Brooklyn is known for its cultural diversity, ... | Brooklyn is distinct within New York City due ... | 1.000000 | 0.902664 | 0.5 | 1.0 |
5 | What measures has New York City implemented to... | [==== International events ====\nIn terms of h... | [Environment ==\n\n \nEnvironmental issues in ... | New York City has implemented various measures... | New York City has implemented several measures... | 0.909091 | 1.000000 | 1.0 | 1.0 |
6 | What role did New York City play during the Am... | [=== Province of New York and slavery ===\n\nI... | [History ==\n\n\n=== Early history ===\nIn the... | New York City served as a significant military... | During the American Revolution, New York City ... | 1.000000 | 1.000000 | 1.0 | 1.0 |