Evaluation¶
- ragas.evaluation.evaluate(dataset: Dataset, metrics: list[ragas.metrics.base.Metric] | None = None, column_map: dict[str, str] = {'answer': 'answer', 'contexts': 'contexts', 'ground_truths': 'ground_truths', 'question': 'question'}) Result ¶
Run the evaluation on the dataset with different metrics
- Parameters:
dataset (Dataset[question: list[str], contexts: list[list[str]], answer: list[str]]) – The dataset in the format of ragas which the metrics will use to score the RAG pipeline with
metrics (list[Metric] , optional) – List of metrics to use for evaluation. If not provided then ragas will run the evaluation on the best set of metrics to give a complete view.
column_map (dict[str, str], optional) – The column names of the dataset to use for evaluation. If the column names of the dataset are different from the default ones then you can provide the mapping as a dictionary here.
- Returns:
Result object containing the scores of each metric. You can use this do analysis later. If the top 3 metrics are provided then it also returns the ragas_score for the entire pipeline.
- Return type:
- Raises:
ValueError – if validation fails because the columns required for the metrics are missing or if the columns are of the wrong format.
Examples
the basic usage is as follows: ``` from ragas import evaluate
>>> dataset Dataset({ features: ['question', 'ground_truths', 'answer', 'contexts'], num_rows: 30 })
>>> result = evaluate(dataset) >>> print(result["ragas_score"]) {'ragas_score': 0.860, 'context_precision': 0.817, 'faithfulness': 0.892, 'answer_relevancy': 0.874} ```
- class ragas.evaluation.Result(scores: 'Dataset', dataset: 'Dataset | None' = None, binary_columns: 'list[str]' = <factory>)¶