Schemas
BaseSample
Bases: BaseModel
Base class for evaluation samples.
to_dict
SingleTurnSample
Bases: BaseSample
Represents evaluation samples for single-turn interactions.
Attributes:
Name | Type | Description |
---|---|---|
user_input |
Optional[str]
|
The input query from the user. |
retrieved_contexts |
Optional[List[str]]
|
List of contexts retrieved for the query. |
reference_contexts |
Optional[List[str]]
|
List of reference contexts for the query. |
response |
Optional[str]
|
The generated response for the query. |
multi_responses |
Optional[List[str]]
|
List of multiple responses generated for the query. |
reference |
Optional[str]
|
The reference answer for the query. |
rubric |
Optional[Dict[str, str]]
|
Evaluation rubric for the sample. |
MultiTurnSample
Bases: BaseSample
Represents evaluation samples for multi-turn interactions.
Attributes:
Name | Type | Description |
---|---|---|
user_input |
List[Union[HumanMessage, AIMessage, ToolMessage]]
|
A list of messages representing the conversation turns. |
reference |
(Optional[str], optional)
|
The reference answer or expected outcome for the conversation. |
reference_tool_calls |
(Optional[List[ToolCall]], optional)
|
A list of expected tool calls for the conversation. |
rubrics |
(Optional[Dict[str, str]], optional)
|
Evaluation rubrics for the conversation. |
reference_topics |
(Optional[List[str]], optional)
|
A list of reference topics for the conversation. |
validate_user_input
classmethod
validate_user_input(messages: List[Union[HumanMessage, AIMessage, ToolMessage]]) -> List[Union[HumanMessage, AIMessage, ToolMessage]]
Validates the user input messages.
Source code in src/ragas/dataset_schema.py
to_messages
pretty_repr
Returns a pretty string representation of the conversation.
RagasDataset
dataclass
Bases: ABC
, Generic[Sample]
to_list
abstractmethod
from_list
abstractmethod
classmethod
validate_samples
Validates that all samples are of the same type.
Source code in src/ragas/dataset_schema.py
get_sample_type
to_hf_dataset
Converts the dataset to a Hugging Face Dataset.
Source code in src/ragas/dataset_schema.py
from_hf_dataset
classmethod
to_pandas
Converts the dataset to a pandas DataFrame.
Source code in src/ragas/dataset_schema.py
features
from_dict
classmethod
Creates an EvaluationDataset from a dictionary.
Source code in src/ragas/dataset_schema.py
to_csv
Converts the dataset to a CSV file.
Source code in src/ragas/dataset_schema.py
to_jsonl
Converts the dataset to a JSONL file.
from_jsonl
classmethod
Creates an EvaluationDataset from a JSONL file.
Source code in src/ragas/dataset_schema.py
EvaluationDataset
dataclass
Bases: RagasDataset[SingleTurnSampleOrMultiTurnSample]
Represents a dataset of evaluation samples.
Attributes:
Name | Type | Description |
---|---|---|
samples |
List[BaseSample]
|
A list of evaluation samples. |
Methods:
Name | Description |
---|---|
validate_samples |
Validates that all samples are of the same type. |
get_sample_type |
Returns the type of the samples in the dataset. |
to_hf_dataset |
Converts the dataset to a Hugging Face Dataset. |
to_pandas |
Converts the dataset to a pandas DataFrame. |
features |
Returns the features of the samples. |
from_list |
Creates an EvaluationDataset from a list of dictionaries. |
from_dict |
Creates an EvaluationDataset from a dictionary. |
to_csv |
Converts the dataset to a CSV file. |
to_jsonl |
Converts the dataset to a JSONL file. |
from_jsonl |
Creates an EvaluationDataset from a JSONL file. |
EvaluationResult
dataclass
EvaluationResult(scores: List[Dict[str, Any]], dataset: EvaluationDataset, binary_columns: List[str] = list(), cost_cb: Optional[CostCallbackHandler] = None, traces: List[Dict[str, Any]] = list(), ragas_traces: Dict[UUID, ChainRun] = dict())
A class to store and process the results of the evaluation.
Attributes:
Name | Type | Description |
---|---|---|
scores |
Dataset
|
The dataset containing the scores of the evaluation. |
dataset |
(Dataset, optional)
|
The original dataset used for the evaluation. Default is None. |
binary_columns |
list of str, optional
|
List of columns that are binary metrics. Default is an empty list. |
cost_cb |
(CostCallbackHandler, optional)
|
The callback handler for cost computation. Default is None. |
to_pandas
Convert the result to a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
int
|
The batch size for conversion. Default is None. |
None
|
batched
|
bool
|
Whether to convert in batches. Default is False. |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
The result as a pandas DataFrame. |
Raises:
Type | Description |
---|---|
ValueError
|
If the dataset is not provided. |
Source code in src/ragas/dataset_schema.py
total_tokens
Compute the total tokens used in the evaluation.
Returns:
Type | Description |
---|---|
list of TokenUsage or TokenUsage
|
The total tokens used. |
Raises:
Type | Description |
---|---|
ValueError
|
If the cost callback handler is not provided. |
Source code in src/ragas/dataset_schema.py
total_cost
total_cost(cost_per_input_token: Optional[float] = None, cost_per_output_token: Optional[float] = None, per_model_costs: Dict[str, Tuple[float, float]] = {}) -> float
Compute the total cost of the evaluation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cost_per_input_token
|
float
|
The cost per input token. Default is None. |
None
|
cost_per_output_token
|
float
|
The cost per output token. Default is None. |
None
|
per_model_costs
|
dict of str to tuple of float
|
The per model costs. Default is an empty dictionary. |
{}
|
Returns:
Type | Description |
---|---|
float
|
The total cost of the evaluation. |
Raises:
Type | Description |
---|---|
ValueError
|
If the cost callback handler is not provided. |
Source code in src/ragas/dataset_schema.py
Message
Bases: BaseModel
Represents a generic message.
Attributes:
Name | Type | Description |
---|---|---|
content |
str
|
The content of the message. |
metadata |
(Optional[Dict[str, Any]], optional)
|
Additional metadata associated with the message. |
ToolCall
Bases: BaseModel
Represents a tool call with a name and arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the tool being called. |
required |
args
|
Dict[str, Union[str, int, float]]
|
A dictionary of arguments for the tool call, where keys are argument names and values can be strings, integers, or floats. |
required |
HumanMessage
Bases: Message
Represents a message from a human user.
Attributes:
Name | Type | Description |
---|---|---|
type |
Literal[human]
|
The type of the message, always set to "human". |
Methods:
Name | Description |
---|---|
pretty_repr |
Returns a formatted string representation of the human message. |
ToolMessage
Bases: Message
Represents a message from a tool.
Attributes:
Name | Type | Description |
---|---|---|
type |
Literal[tool]
|
The type of the message, always set to "tool". |
Methods:
Name | Description |
---|---|
pretty_repr |
Returns a formatted string representation of the tool message. |
AIMessage
Bases: Message
Represents a message from an AI.
Attributes:
Name | Type | Description |
---|---|---|
type |
Literal[ai]
|
The type of the message, always set to "ai". |
tool_calls |
Optional[List[ToolCall]]
|
A list of tool calls made by the AI, if any. |
metadata |
Optional[Dict[str, Any]]
|
Additional metadata associated with the AI message. |
Methods:
Name | Description |
---|---|
dict |
Returns a dictionary representation of the AI message. |
pretty_repr |
Returns a formatted string representation of the AI message. |
to_dict
Returns a dictionary representation of the AI message.
Source code in src/ragas/messages.py
pretty_repr
Returns a formatted string representation of the AI message.
Source code in src/ragas/messages.py
ragas.evaluation.EvaluationResult
dataclass
EvaluationResult(scores: List[Dict[str, Any]], dataset: EvaluationDataset, binary_columns: List[str] = list(), cost_cb: Optional[CostCallbackHandler] = None, traces: List[Dict[str, Any]] = list(), ragas_traces: Dict[UUID, ChainRun] = dict())
A class to store and process the results of the evaluation.
Attributes:
Name | Type | Description |
---|---|---|
scores |
Dataset
|
The dataset containing the scores of the evaluation. |
dataset |
(Dataset, optional)
|
The original dataset used for the evaluation. Default is None. |
binary_columns |
list of str, optional
|
List of columns that are binary metrics. Default is an empty list. |
cost_cb |
(CostCallbackHandler, optional)
|
The callback handler for cost computation. Default is None. |
to_pandas
Convert the result to a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
int
|
The batch size for conversion. Default is None. |
None
|
batched
|
bool
|
Whether to convert in batches. Default is False. |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
The result as a pandas DataFrame. |
Raises:
Type | Description |
---|---|
ValueError
|
If the dataset is not provided. |
Source code in src/ragas/dataset_schema.py
total_tokens
Compute the total tokens used in the evaluation.
Returns:
Type | Description |
---|---|
list of TokenUsage or TokenUsage
|
The total tokens used. |
Raises:
Type | Description |
---|---|
ValueError
|
If the cost callback handler is not provided. |
Source code in src/ragas/dataset_schema.py
total_cost
total_cost(cost_per_input_token: Optional[float] = None, cost_per_output_token: Optional[float] = None, per_model_costs: Dict[str, Tuple[float, float]] = {}) -> float
Compute the total cost of the evaluation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cost_per_input_token
|
float
|
The cost per input token. Default is None. |
None
|
cost_per_output_token
|
float
|
The cost per output token. Default is None. |
None
|
per_model_costs
|
dict of str to tuple of float
|
The per model costs. Default is an empty dictionary. |
{}
|
Returns:
Type | Description |
---|---|
float
|
The total cost of the evaluation. |
Raises:
Type | Description |
---|---|
ValueError
|
If the cost callback handler is not provided. |