Adding to your CI pipeline with Pytest¶
You can add Ragas evaluations as part of your Continious Integration pipeline to keep track of the qualitative performance of your RAG pipeline. Consider these as part of your end-to-end test suite which you run before major changes and releases.
The usage is straight forward but the main things is to set the in_ci
argument for the
evaluate()
function to True
. This runs Ragas metrics in a special mode that ensures
it produces more reproducable metrics but will be more costlier.
You can easily write a pytest test as follows
Note
This dataset that is already populated with outputs from a reference RAG
When testing your own system make sure you use outputs from RAG pipeline
you want to test. For more information on how to build your datasets check
Building HF Dataset
with your own Data docs.
1import pytest
2from datasets import load_dataset
3
4from ragas import evaluate
5from ragas.metrics import (
6 answer_relevancy,
7 faithfulness,
8 context_recall,
9 context_precision,
10)
11
12def assert_in_range(score: float, value: float, plus_or_minus: float):
13 """
14 Check if computed score is within the range of value +/- max_range
15 """
16 assert value - plus_or_minus <= score <= value + plus_or_minus
17
18
19def test_amnesty_e2e():
20 # loading the V2 dataset
21 amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")["eval"]
22
23
24 result = evaluate(
25 amnesty_qa,
26 metrics=[answer_relevancy, faithfulness, context_recall, context_precision],
27 in_ci=True,
28 )
29 assert result["answer_relevancy"] >= 0.9
30 assert result["context_recall"] >= 0.95
31 assert result["context_precision"] >= 0.95
32 assert_in_range(result["faithfulness"], value=0.4, plus_or_minus=0.1)
Using Pytest Markers for Ragas E2E tests¶
Because these are long end-to-end test one thing that you can leverage is Pytest Markers which help you mark your tests with special tags. It is recommended to mark Ragas tests with special tags so you can run them only when needed.
To add a new ragas_ci
tag to pytest add the following to your conftest.py
def pytest_configure(config):
"""
configure pytest
"""
# add `ragas_ci`
config.addinivalue_line(
"markers", "ragas_ci: Set of tests that will be run as part of Ragas CI"
)
now you can use ragas_ci
to mark all the tests that are part of Ragas CI.
1import pytest
2from datasets import load_dataset
3
4from ragas import evaluate
5from ragas.metrics import (
6 answer_relevancy,
7 faithfulness,
8 context_recall,
9 context_precision,
10)
11
12def assert_in_range(score: float, value: float, plus_or_minus: float):
13 """
14 Check if computed score is within the range of value +/- max_range
15 """
16 assert value - plus_or_minus <= score <= value + plus_or_minus
17
18
19@pytest.mark.ragas_ci
20def test_amnesty_e2e():
21 # loading the V2 dataset
22 amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")["eval"]
23
24
25 result = evaluate(
26 amnesty_qa,
27 metrics=[answer_relevancy, faithfulness, context_recall, context_precision],
28 in_ci=True,
29 )
30 assert result["answer_relevancy"] >= 0.9
31 assert result["context_recall"] >= 0.95
32 assert result["context_precision"] >= 0.95
33 assert_in_range(result["faithfulness"], value=0.4, plus_or_minus=0.1)