Migration from v0.1 to v0.2

v0.2 is the start of the transition for Ragas from an evaluation library for RAG pipelines to a more general library that you can use to evaluate any LLM applications you build. The meant we had to make some fundamental changes to the library that will break your workflow. Hopeful this guide will make that transition as easy as possible.

Outline

Evaluation Dataset
Metrics
Testset Generation
Prompt Object

Evaluation Dataset

We have moved from using HuggingFace Datasets to our own EvaluationDataset . You can read more about it from the core concepts section for EvaluationDataset and EvaluationSample

You can easily translate

from ragas import EvaluationDataset, SingleTurnSample

hf_dataset = ... # your huggingface evaluation dataset
eval_dataset = EvaluationDataset.from_hf_dataset(hf_dataset)

# save eval dataset
eval_dataset.to_csv("path/to/save/dataset.csv")

# load eva dataset
eval_dataset = EvaluationDataset.from_csv("path/to/save/dataset.csv")

Metrics

All the default metrics are still supported and many new metrics have been added. Take a look at the documentation page for the entire list.

How ever there are a couple of changes in how you use metrics

Firstly it is now preferred to initialize metrics with the evaluator LLM of your choice as oppose to using the initialized version of the metrics into evaluate() . This avoids a lot of confusion regarding which LLMs are used where.

from ragas.metrics import faithfullness # old way, not recommended but still supported till v0.3
from ragas.metrics import Faithfulness

# preffered way
faithfulness_metric = Faithfulness(llm=your_evaluator_llm)

Second is that metrics.ascore is now being deprecated in favor of metrics.single_score . You can make the transition as such

# create a Single Turn Sample
from ragas import SingleTurnSample
sample = SingleTurnSample(
    user_input="user query",
    response="response from your pipeline"
)

# Init the metric
from ragas.metrics import Faithfulness
faithfulness_metric = Faithfulness(llm=your_evaluator_llm)
score = faithfulness.single_turn_ascore(sample=sample)
print(score)
# 0.9

Testset Generation

Testset Generation has been redesigned to be much more cost efficient. If you were using the end-to-end workflow checkout the getting started.

Notable Changes

Removed Docstore in favor of a new Knowledge Graph
Added Transforms which will convert the documents passed into a rich knowledge graph
More customizable with Synthesizer objects. Also refer to the documentation.
New workflow makes it much cheaper and intermediate states can be saved easily

This might be a bit rough but if you do need help here, feel free to chat or mention it here and we would love to help you out 🙂

Prompt Object

All the prompts have been rewritten to use PydanticPrompts which is based on BasePrompt object. If you are using the old Prompt object you will have to upgrade it to the new one, check the docs to learn more on how to do it

Need Further Assistance?

If you have any further questions feel free to post them in this github issue or reach out to us on cal.com