LangsmithΒΆ
Dataset and Tracing VisualisationΒΆ
Langsmith in a platform for building production-grade LLM applications from the langchain team. It helps you with tracing, debugging and evaluting LLM applications.
The langsmith + ragas integrations offer 2 features
View the traces of ragas
evaluator
Use ragas metrics in langchain evaluation - (soon)
Tracing ragas metricsΒΆ
since ragas uses langchain under the hood all you have to do is setup langsmith and your traces will be logged.
to setup langsmith make sure the following env-vars are set (you can read more in the langsmith docs
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
export LANGCHAIN_API_KEY=<your-api-key>
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
Once langsmith is setup, just run the evaluations as your normally would
from datasets import load_dataset
from ragas.metrics import context_precision, answer_relevancy, faithfulness
from ragas import evaluate
fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
result = evaluate(
fiqa_eval["baseline"].select(range(3)),
metrics=[context_precision, faithfulness, answer_relevancy],
)
result
Found cached dataset fiqa (/home/jjmachan/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8)
evaluating with [context_precision]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:23<00:00, 23.21s/it]
evaluating with [faithfulness]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:36<00:00, 36.94s/it]
evaluating with [answer_relevancy]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:10<00:00, 10.58s/it]
{'context_precision': 0.5976, 'faithfulness': 0.8889, 'answer_relevancy': 0.9300}
Voila! Now you can head over to your project and see the traces