Migration from v0.1 to v0.2
v0.2 is the start of the transition for Ragas from an evaluation library for RAG pipelines to a more general library that you can use to evaluate any LLM applications you build. The meant we had to make some fundamental changes to the library that will break your workflow. Hopeful this guide will make that transition as easy as possible.
Outline
- Evaluation Dataset
- Metrics
- Testset Generation
- Prompt Object
Evaluation Dataset
We have moved from using HuggingFace Datasets
to our own EvaluationDataset
. You can read more about it from the core concepts section for EvaluationDataset and EvaluationSample
You can easily translate
from ragas import EvaluationDataset, SingleTurnSample
hf_dataset = ... # your huggingface evaluation dataset
eval_dataset = EvaluationDataset.from_hf_dataset(hf_dataset)
# save eval dataset
eval_dataset.to_csv("path/to/save/dataset.csv")
# load eva dataset
eval_dataset = EvaluationDataset.from_csv("path/to/save/dataset.csv")
Metrics
All the default metrics are still supported and many new metrics have been added. Take a look at the documentation page for the entire list.
How ever there are a couple of changes in how you use metrics
Firstly it is now preferred to initialize metrics with the evaluator LLM of your choice as oppose to using the initialized version of the metrics into evaluate()
. This avoids a lot of confusion regarding which LLMs are used where.
from ragas.metrics import faithfullness # old way, not recommended but still supported till v0.3
from ragas.metrics import Faithfulness
# preffered way
faithfulness_metric = Faithfulness(llm=your_evaluator_llm)
metrics.ascore
is now being deprecated in favor of metrics.single_score
. You can make the transition as such
# create a Single Turn Sample
from ragas import SingleTurnSample
sample = SingleTurnSample(
user_input="user query",
response="response from your pipeline"
)
# Init the metric
from ragas.metrics import Faithfulness
faithfulness_metric = Faithfulness(llm=your_evaluator_llm)
score = faithfulness.single_turn_ascore(sample=sample)
print(score)
# 0.9
Testset Generation
Testset Generation has been redesigned to be much more cost efficient. If you were using the end-to-end workflow checkout the getting started.
Notable Changes
- Removed
Docstore
in favor of a newKnowledge Graph
- Added
Transforms
which will convert the documents passed into a rich knowledge graph - More customizable with
Synthesizer
objects. Also refer to the documentation. - New workflow makes it much cheaper and intermediate states can be saved easily
This might be a bit rough but if you do need help here, feel free to chat or mention it here and we would love to help you out 🙂
Prompt Object
All the prompts have been rewritten to use PydanticPrompts
which is based on BasePrompt
object. If you are using the old Prompt
object you will have to upgrade it to the new one, check the docs to learn more on how to do it
Need Further Assistance?
If you have any further questions feel free to post them in this github issue or reach out to us on cal.com