## TestsetSample

Bases: `BaseSample`

Represents a sample in a test set.

Attributes:

| Name               | Type                                       | Description                                                                    |
| ------------------ | ------------------------------------------ | ------------------------------------------------------------------------------ |
| `eval_sample`      | `Union[SingleTurnSample, MultiTurnSample]` | The evaluation sample, which can be either a single-turn or multi-turn sample. |
| `synthesizer_name` | `str`                                      | The name of the synthesizer used to generate this sample.                      |

## TestsetPacket

Bases: `BaseModel`

A packet of testset samples to be uploaded to the server.

## Testset

```python
Testset(samples: List[TestsetSample], run_id: str = (lambda: str(uuid4()))(), cost_cb: Optional[CostCallbackHandler] = None)
```

Bases: `RagasDataset[TestsetSample]`

Represents a test set containing multiple test samples.

Attributes:

| Name      | Type                  | Description                                                               |
| --------- | --------------------- | ------------------------------------------------------------------------- |
| `samples` | `List[TestsetSample]` | A list of TestsetSample objects representing the samples in the test set. |

### to_evaluation_dataset

```python
to_evaluation_dataset() -> EvaluationDataset
```

Converts the Testset to an EvaluationDataset.

Source code in `src/ragas/testset/synthesizers/testset_schema.py`

```python
def to_evaluation_dataset(self) -> EvaluationDataset:
    """
    Converts the Testset to an EvaluationDataset.
    """
    return EvaluationDataset(
        samples=[sample.eval_sample for sample in self.samples]
    )
```

### to_list

```python
to_list() -> List[Dict]
```

Converts the Testset to a list of dictionaries.

Source code in `src/ragas/testset/synthesizers/testset_schema.py`

```python
def to_list(self) -> t.List[t.Dict]:
    """
    Converts the Testset to a list of dictionaries.
    """
    list_dict = []
    for sample in self.samples:
        sample_dict = sample.eval_sample.model_dump(exclude_none=True)
        sample_dict["synthesizer_name"] = sample.synthesizer_name
        list_dict.append(sample_dict)
    return list_dict
```

### from_list

```python
from_list(data: List[Dict]) -> Testset
```

Converts a list of dictionaries to a Testset.

Source code in `src/ragas/testset/synthesizers/testset_schema.py`

```python
@classmethod
def from_list(cls, data: t.List[t.Dict]) -> Testset:
    """
    Converts a list of dictionaries to a Testset.
    """
    # first create the samples
    samples = []
    for sample in data:
        synthesizer_name = sample["synthesizer_name"]
        # remove the synthesizer name from the sample
        sample.pop("synthesizer_name")
        # the remaining sample is the eval_sample
        eval_sample = sample

        # if user_input is a list it is MultiTurnSample
        if "user_input" in eval_sample and not isinstance(
            eval_sample.get("user_input"), list
        ):
            eval_sample = SingleTurnSample(**eval_sample)
        else:
            eval_sample = MultiTurnSample(**eval_sample)

        samples.append(
            TestsetSample(
                eval_sample=eval_sample, synthesizer_name=synthesizer_name
            )
        )
    # then create the testset
    return Testset(samples=samples)
```

### total_tokens

```python
total_tokens() -> Union[List[TokenUsage], TokenUsage]
```

Compute the total tokens used in the evaluation.

Source code in `src/ragas/testset/synthesizers/testset_schema.py`

```python
def total_tokens(self) -> t.Union[t.List[TokenUsage], TokenUsage]:
    """
    Compute the total tokens used in the evaluation.
    """
    if self.cost_cb is None:
        raise ValueError(
            "The Testset was not configured for computing cost. Please provide a token_usage_parser function to TestsetGenerator to compute cost."
        )
    return self.cost_cb.total_tokens()
```

### total_cost

```python
total_cost(cost_per_input_token: Optional[float] = None, cost_per_output_token: Optional[float] = None) -> float
```

Compute the total cost of the evaluation.

Source code in `src/ragas/testset/synthesizers/testset_schema.py`

```python
def total_cost(
    self,
    cost_per_input_token: t.Optional[float] = None,
    cost_per_output_token: t.Optional[float] = None,
) -> float:
    """
    Compute the total cost of the evaluation.
    """
    if self.cost_cb is None:
        raise ValueError(
            "The Testset was not configured for computing cost. Please provide a token_usage_parser function to TestsetGenerator to compute cost."
        )
    return self.cost_cb.total_cost(
        cost_per_input_token=cost_per_input_token,
        cost_per_output_token=cost_per_output_token,
    )
```

### from_annotated

```python
from_annotated(path: str) -> Testset
```

Loads a testset from an annotated JSON file.

Source code in `src/ragas/testset/synthesizers/testset_schema.py`

```python
@classmethod
def from_annotated(cls, path: str) -> Testset:
    """
    Loads a testset from an annotated JSON file.
    """
    import json

    with open(path, "r") as f:
        annotated_testset = json.load(f)

    samples = []
    for sample in annotated_testset:
        if sample["approval_status"] == "approved":
            samples.append(TestsetSample(**sample))
    return cls(samples=samples)
```

## QueryLength

Bases: `str`, `Enum`

Enumeration of query lengths. Available options are: LONG, MEDIUM, SHORT

## QueryStyle

Bases: `str`, `Enum`

Enumeration of query styles. Available options are: MISSPELLED, PERFECT_GRAMMAR, POOR_GRAMMAR, WEB_SEARCH_LIKE

## BaseScenario

Bases: `BaseModel`

Base class for representing a scenario for generating test samples.

Attributes:

| Name      | Type          | Description                             |
| --------- | ------------- | --------------------------------------- |
| `nodes`   | `List[Node]`  | List of nodes involved in the scenario. |
| `style`   | `QueryStyle`  | The style of the query.                 |
| `length`  | `QueryLength` | The length of the query.                |
| `persona` | `Persona`     | A persona associated with the scenario. |

## SingleHopSpecificQuerySynthesizer

```python
SingleHopSpecificQuerySynthesizer(name: str = 'single_hop_specific_query_synthesizer', llm: Union[BaseRagasLLM, 'InstructorBaseRagasLLM'] = _default_llm_factory(), llm_context: Optional[str] = None, generate_query_reference_prompt: PydanticPrompt = QueryAnswerGenerationPrompt(), theme_persona_matching_prompt: PydanticPrompt = ThemesPersonasMatchingPrompt(), property_name: str = 'entities')
```

Bases: `SingleHopQuerySynthesizer`

## MultiHopSpecificQuerySynthesizer

```python
MultiHopSpecificQuerySynthesizer(name: str = 'multi_hop_specific_query_synthesizer', llm: Union[BaseRagasLLM, 'InstructorBaseRagasLLM'] = _default_llm_factory(), llm_context: Optional[str] = None, generate_query_reference_prompt: PydanticPrompt = QueryAnswerGenerationPrompt(), property_name: str = 'entities', relation_type: str = 'entities_overlap', relation_overlap_property: str = 'overlapped_items', theme_persona_matching_prompt: PydanticPrompt = ThemesPersonasMatchingPrompt())
```

Bases: `MultiHopQuerySynthesizer`

Synthesize multi-hop queries based on a chunk cluster defined by entity overlap.

### get_node_clusters

```python
get_node_clusters(knowledge_graph: KnowledgeGraph) -> List[Tuple]
```

Identify clusters of nodes based on the specified relationship condition.

Source code in `src/ragas/testset/synthesizers/multi_hop/specific.py`

```python
def get_node_clusters(self, knowledge_graph: KnowledgeGraph) -> t.List[t.Tuple]:
    """Identify clusters of nodes based on the specified relationship condition."""
    node_clusters = knowledge_graph.find_two_nodes_single_rel(
        relationship_condition=lambda rel: rel.type == self.relation_type
    )
    logger.info("found %d clusters", len(node_clusters))
    return node_clusters
```
