Ragas is a library that provides tools to supercharge the evaluation of Large Language Model (LLM) applications. It is designed to help you evaluate your LLM applications with ease and confidence.
π Get Started
Install with pip and get started with Ragas with these tutorials.
There isn't a single correct answer to this question. With the rapid pace of AI model development, new open-source models are released every week, often claiming to outperform previous versions. The best model for your needs depends largely on your GPU capacity and the type of data you're working with.
It's a good idea to explore newer, widely accepted models with strong general capabilities. You can refer to this list for available open-source models, their release dates, and fine-tuned variants.
βΆ Why do NaN values appear in evaluation results?
NaN stands for "Not a Number." In ragas evaluation results, NaN can appear for two main reasons:
JSON Parsing Issue: The model's output is not JSON-parsable. ragas requires models to output JSON-compatible responses because all prompts are structured using Pydantic. This ensures efficient parsing of LLM outputs.
Non-Ideal Cases for Scoring: Certain cases in the sample may not be ideal for scoring. For example, scoring the faithfulness of a response like "I donβt know" might not be appropriate.
βΆ How can I make evaluation results more explainable?
The best way is to trace and log your evaluation, then inspect the results using LLM traces. You can follow a detailed example of this process here.