Adding to your CI pipeline with Pytest

You can add Ragas evaluations as part of your Continious Integration pipeline to keep track of the qualitative performance of your RAG pipeline. Consider these as part of your end-to-end test suite which you run before major changes and releases.

The usage is straight forward but the main things is to set the in_ci argument for the evaluate() function to True. This runs Ragas metrics in a special mode that ensures it produces more reproducable metrics but will be more costlier.

You can easily write a pytest test as follows

Note

This dataset that is already populated with outputs from a reference RAG When testing your own system make sure you use outputs from RAG pipeline you want to test. For more information on how to build your datasets check Building HF Dataset with your own Data docs.

tests/e2e/test_amnesty_e2e.py
 1import pytest
 2from datasets import load_dataset
 3
 4from ragas import evaluate
 5from ragas.metrics import (
 6    answer_relevancy,
 7    faithfulness,
 8    context_recall,
 9    context_precision,
10)
11
12def assert_in_range(score: float, value: float, plus_or_minus: float):
13    """
14    Check if computed score is within the range of value +/- max_range
15    """
16    assert value - plus_or_minus <= score <= value + plus_or_minus
17
18
19def test_amnesty_e2e():
20    # loading the V2 dataset
21    amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")["eval"]
22
23
24    result = evaluate(
25        amnesty_qa,
26        metrics=[answer_relevancy, faithfulness, context_recall, context_precision],
27        in_ci=True,
28    )
29    assert result["answer_relevancy"] >= 0.9
30    assert result["context_recall"] >= 0.95
31    assert result["context_precision"] >= 0.95
32    assert_in_range(result["faithfulness"], value=0.4, plus_or_minus=0.1)

Using Pytest Markers for Ragas E2E tests

Because these are long end-to-end test one thing that you can leverage is Pytest Markers which help you mark your tests with special tags. It is recommended to mark Ragas tests with special tags so you can run them only when needed.

To add a new ragas_ci tag to pytest add the following to your conftest.py

def pytest_configure(config):
    """
    configure pytest
    """
    # add `ragas_ci`
    config.addinivalue_line(
        "markers", "ragas_ci: Set of tests that will be run as part of Ragas CI"
    )

now you can use ragas_ci to mark all the tests that are part of Ragas CI.

tests/e2e/test_amnesty_e2e.py
 1import pytest
 2from datasets import load_dataset
 3
 4from ragas import evaluate
 5from ragas.metrics import (
 6    answer_relevancy,
 7    faithfulness,
 8    context_recall,
 9    context_precision,
10)
11
12def assert_in_range(score: float, value: float, plus_or_minus: float):
13    """
14    Check if computed score is within the range of value +/- max_range
15    """
16    assert value - plus_or_minus <= score <= value + plus_or_minus
17
18
19@pytest.mark.ragas_ci
20def test_amnesty_e2e():
21    # loading the V2 dataset
22    amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")["eval"]
23
24
25    result = evaluate(
26        amnesty_qa,
27        metrics=[answer_relevancy, faithfulness, context_recall, context_precision],
28        in_ci=True,
29    )
30    assert result["answer_relevancy"] >= 0.9
31    assert result["context_recall"] >= 0.95
32    assert result["context_precision"] >= 0.95
33    assert_in_range(result["faithfulness"], value=0.4, plus_or_minus=0.1)