Langsmith Integrations

Langsmith in a platform for building production-grade LLM applications from the langchain team. It helps you with tracing, debugging and evaluting LLM applications.

The langsmith + ragas integrations offer 2 features

  1. View the traces of ragas evaluator

  2. Use ragas metrics in langchain evaluation - (soon)

Tracing ragas metrics

since ragas uses langchain under the hood all you have to do is setup langsmith and your traces will be logged.

to setup langsmith make sure the following env-vars are set (you can read more in the langsmith docs

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
export LANGCHAIN_API_KEY=<your-api-key>
export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"

Once langsmith is setup, just run the evaluations as your normally would

from datasets import load_dataset
from ragas.metrics import context_precision, answer_relevancy, faithfulness
from ragas import evaluate


fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")

result = evaluate(
    fiqa_eval["baseline"].select(range(3)),
    metrics=[context_precision, faithfulness, answer_relevancy],
)

result
Found cached dataset fiqa (/home/jjmachan/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8)
evaluating with [context_precision]
100%|█████████████████████████████████████████████████████████████| 1/1 [00:23<00:00, 23.21s/it]
evaluating with [faithfulness]
100%|█████████████████████████████████████████████████████████████| 1/1 [00:36<00:00, 36.94s/it]
evaluating with [answer_relevancy]
100%|█████████████████████████████████████████████████████████████| 1/1 [00:10<00:00, 10.58s/it]
{'context_precision': 0.5976, 'faithfulness': 0.8889, 'answer_relevancy': 0.9300}

Voila! Now you can head over to your project and see the traces