Skip to content

Transforms

BaseGraphTransformation dataclass

BaseGraphTransformation(name: str = '')

Bases: ABC

Abstract base class for graph transformations on a KnowledgeGraph.

transform abstractmethod async

transform(kg: KnowledgeGraph) -> Any

Abstract method to transform the KnowledgeGraph. Transformations should be idempotent, meaning that applying the transformation multiple times should yield the same result as applying it once.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
Any

The transformed knowledge graph.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
async def transform(self, kg: KnowledgeGraph) -> t.Any:
    """
    Abstract method to transform the KnowledgeGraph. Transformations should be
    idempotent, meaning that applying the transformation multiple times should
    yield the same result as applying it once.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Any
        The transformed knowledge graph.
    """
    pass

filter

Filters the KnowledgeGraph and returns the filtered graph.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be filtered.

required

Returns:

Type Description
KnowledgeGraph

The filtered knowledge graph.

Source code in src/ragas/testset/transforms/base.py
def filter(self, kg: KnowledgeGraph) -> KnowledgeGraph:
    """
    Filters the KnowledgeGraph and returns the filtered graph.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be filtered.

    Returns
    -------
    KnowledgeGraph
        The filtered knowledge graph.
    """
    return kg

generate_execution_plan abstractmethod

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

Generates a list of coroutines to be executed in sequence by the Executor. This coroutine will, upon execution, write the transformation into the KnowledgeGraph.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Coroutine]

A list of coroutines to be executed in parallel.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a list of coroutines to be executed in sequence by the Executor. This
    coroutine will, upon execution, write the transformation into the KnowledgeGraph.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Coroutine]
        A list of coroutines to be executed in parallel.
    """
    pass

Extractor dataclass

Extractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter())

Bases: BaseGraphTransformation

Abstract base class for extractors that transform a KnowledgeGraph by extracting specific properties from its nodes.

Methods:

Name Description
transform

Transforms the KnowledgeGraph by extracting properties from its nodes.

extract

Abstract method to extract a specific property from a node.

transform async

transform(kg: KnowledgeGraph) -> List[Tuple[Node, Tuple[str, Any]]]

Transforms the KnowledgeGraph by extracting properties from its nodes. Uses the filter method to filter the graph and the extract method to extract properties from each node.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Tuple[Node, Tuple[str, Any]]]

A list of tuples where each tuple contains a node and the extracted property.

Examples:

>>> kg = KnowledgeGraph(nodes=[Node(id=1, properties={"name": "Node1"}), Node(id=2, properties={"name": "Node2"})])
>>> extractor = SomeConcreteExtractor()
>>> extractor.transform(kg)
[(Node(id=1, properties={"name": "Node1"}), ("property_name", "extracted_value")),
 (Node(id=2, properties={"name": "Node2"}), ("property_name", "extracted_value"))]
Source code in src/ragas/testset/transforms/base.py
async def transform(
    self, kg: KnowledgeGraph
) -> t.List[t.Tuple[Node, t.Tuple[str, t.Any]]]:
    """
    Transforms the KnowledgeGraph by extracting properties from its nodes. Uses
    the `filter` method to filter the graph and the `extract` method to extract
    properties from each node.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Tuple[Node, t.Tuple[str, t.Any]]]
        A list of tuples where each tuple contains a node and the extracted
        property.

    Examples
    --------
    >>> kg = KnowledgeGraph(nodes=[Node(id=1, properties={"name": "Node1"}), Node(id=2, properties={"name": "Node2"})])
    >>> extractor = SomeConcreteExtractor()
    >>> extractor.transform(kg)
    [(Node(id=1, properties={"name": "Node1"}), ("property_name", "extracted_value")),
     (Node(id=2, properties={"name": "Node2"}), ("property_name", "extracted_value"))]
    """
    filtered = self.filter(kg)
    return [(node, await self.extract(node)) for node in filtered.nodes]

extract abstractmethod async

extract(node: Node) -> Tuple[str, Any]

Abstract method to extract a specific property from a node.

Parameters:

Name Type Description Default
node Node

The node from which to extract the property.

required

Returns:

Type Description
Tuple[str, Any]

A tuple containing the property name and the extracted value.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
async def extract(self, node: Node) -> t.Tuple[str, t.Any]:
    """
    Abstract method to extract a specific property from a node.

    Parameters
    ----------
    node : Node
        The node from which to extract the property.

    Returns
    -------
    t.Tuple[str, t.Any]
        A tuple containing the property name and the extracted value.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

Generates a list of coroutines to be executed in parallel by the Executor.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Coroutine]

A list of coroutines to be executed in parallel.

Source code in src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a list of coroutines to be executed in parallel by the Executor.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Coroutine]
        A list of coroutines to be executed in parallel.
    """

    async def apply_extract(node: Node):
        property_name, property_value = await self.extract(node)
        if node.get_property(property_name) is None:
            node.add_property(property_name, property_value)
        else:
            logger.warning(
                "Property '%s' already exists in node '%.6s'. Skipping!",
                property_name,
                node.id,
            )

    filtered = self.filter(kg)
    return [apply_extract(node) for node in filtered.nodes]

RelationshipBuilder dataclass

RelationshipBuilder(name: str = '')

Bases: BaseGraphTransformation

Abstract base class for building relationships in a KnowledgeGraph.

Methods:

Name Description
transform

Transforms the KnowledgeGraph by building relationships.

transform abstractmethod async

transform(kg: KnowledgeGraph) -> List[Relationship]

Transforms the KnowledgeGraph by building relationships.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Relationship]

A list of new relationships.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
async def transform(self, kg: KnowledgeGraph) -> t.List[Relationship]:
    """
    Transforms the KnowledgeGraph by building relationships.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[Relationship]
        A list of new relationships.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

Generates a list of coroutines to be executed in parallel by the Executor.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Coroutine]

A list of coroutines to be executed in parallel.

Source code in src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a list of coroutines to be executed in parallel by the Executor.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Coroutine]
        A list of coroutines to be executed in parallel.
    """

    async def apply_build_relationships(
        filtered_kg: KnowledgeGraph, original_kg: KnowledgeGraph
    ):
        relationships = await self.transform(filtered_kg)
        original_kg.relationships.extend(relationships)

    filtered_kg = self.filter(kg)
    return [apply_build_relationships(filtered_kg=filtered_kg, original_kg=kg)]

Splitter dataclass

Splitter(name: str = '')

Bases: BaseGraphTransformation

Abstract base class for splitters that transform a KnowledgeGraph by splitting its nodes into smaller chunks.

Methods:

Name Description
transform

Transforms the KnowledgeGraph by splitting its nodes into smaller chunks.

split

Abstract method to split a node into smaller chunks.

transform async

transform(kg: KnowledgeGraph) -> Tuple[List[Node], List[Relationship]]

Transforms the KnowledgeGraph by splitting its nodes into smaller chunks.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
Tuple[List[Node], List[Relationship]]

A tuple containing a list of new nodes and a list of new relationships.

Source code in src/ragas/testset/transforms/base.py
async def transform(
    self, kg: KnowledgeGraph
) -> t.Tuple[t.List[Node], t.List[Relationship]]:
    """
    Transforms the KnowledgeGraph by splitting its nodes into smaller chunks.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Tuple[t.List[Node], t.List[Relationship]]
        A tuple containing a list of new nodes and a list of new relationships.
    """
    filtered = self.filter(kg)

    all_nodes = []
    all_relationships = []
    for node in filtered.nodes:
        nodes, relationships = await self.split(node)
        all_nodes.extend(nodes)
        all_relationships.extend(relationships)

    return all_nodes, all_relationships

split abstractmethod async

split(node: Node) -> Tuple[List[Node], List[Relationship]]

Abstract method to split a node into smaller chunks.

Parameters:

Name Type Description Default
node Node

The node to be split.

required

Returns:

Type Description
Tuple[List[Node], List[Relationship]]

A tuple containing a list of new nodes and a list of new relationships.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
async def split(self, node: Node) -> t.Tuple[t.List[Node], t.List[Relationship]]:
    """
    Abstract method to split a node into smaller chunks.

    Parameters
    ----------
    node : Node
        The node to be split.

    Returns
    -------
    t.Tuple[t.List[Node], t.List[Relationship]]
        A tuple containing a list of new nodes and a list of new relationships.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

Generates a list of coroutines to be executed in parallel by the Executor.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Coroutine]

A list of coroutines to be executed in parallel.

Source code in src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a list of coroutines to be executed in parallel by the Executor.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Coroutine]
        A list of coroutines to be executed in parallel.
    """

    async def apply_split(node: Node):
        nodes, relationships = await self.split(node)
        kg.nodes.extend(nodes)
        kg.relationships.extend(relationships)

    filtered = self.filter(kg)
    return [apply_split(node) for node in filtered.nodes]

Parallel

Parallel(*transformations: BaseGraphTransformation)

Collection of transformations to be applied in parallel.

Examples:

>>> Parallel(HeadlinesExtractor(), SummaryExtractor())
Source code in src/ragas/testset/transforms/engine.py
def __init__(self, *transformations: BaseGraphTransformation):
    self.transformations = list(transformations)

EmbeddingExtractor dataclass

EmbeddingExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), property_name: str = 'embedding', embed_property_name: str = 'page_content', embedding_model: BaseRagasEmbeddings = embedding_factory())

Bases: Extractor

A class for extracting embeddings from nodes in a knowledge graph.

Attributes:

Name Type Description
property_name str

The name of the property to store the embedding

embed_property_name str

The name of the property containing the text to embed

embedding_model BaseRagasEmbeddings

The embedding model used for generating embeddings

extract async

extract(node: Node) -> Tuple[str, Any]

Extracts the embedding for a given node.

Raises:

Type Description
ValueError

If the property to be embedded is not a string.

Source code in src/ragas/testset/transforms/extractors/embeddings.py
async def extract(self, node: Node) -> t.Tuple[str, t.Any]:
    """
    Extracts the embedding for a given node.

    Raises
    ------
    ValueError
        If the property to be embedded is not a string.
    """
    text = node.get_property(self.embed_property_name)
    if not isinstance(text, str):
        raise ValueError(
            f"node.property('{self.embed_property_name}') must be a string, found '{type(text)}'"
        )
    embedding = self.embedding_model.embed_query(text)
    return self.property_name, embedding

HeadlinesExtractor dataclass

HeadlinesExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), llm: BaseRagasLLM = llm_factory(), merge_if_possible: bool = True, property_name: str = 'headlines', prompt: HeadlinesExtractorPrompt = HeadlinesExtractorPrompt())

Bases: LLMBasedExtractor

Extracts the headlines from the given text.

Attributes:

Name Type Description
property_name str

The name of the property to extract.

prompt HeadlinesExtractorPrompt

The prompt used for extraction.

KeyphrasesExtractor dataclass

KeyphrasesExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), llm: BaseRagasLLM = llm_factory(), merge_if_possible: bool = True, property_name: str = 'keyphrases', prompt: KeyphrasesExtractorPrompt = KeyphrasesExtractorPrompt())

Bases: LLMBasedExtractor

Extracts top 5 keyphrases from the given text.

Attributes:

Name Type Description
property_name str

The name of the property to extract.

prompt KeyphrasesExtractorPrompt

The prompt used for extraction.

SummaryExtractor dataclass

SummaryExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), llm: BaseRagasLLM = llm_factory(), merge_if_possible: bool = True, property_name: str = 'summary', prompt: SummaryExtractorPrompt = SummaryExtractorPrompt())

Bases: LLMBasedExtractor

Extracts a summary from the given text.

Attributes:

Name Type Description
property_name str

The name of the property to extract.

prompt SummaryExtractorPrompt

The prompt used for extraction.

TitleExtractor dataclass

TitleExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), llm: BaseRagasLLM = llm_factory(), merge_if_possible: bool = True, property_name: str = 'title', prompt: TitleExtractorPrompt = TitleExtractorPrompt())

Bases: LLMBasedExtractor

Extracts the title from the given text.

Attributes:

Name Type Description
property_name str

The name of the property to extract.

prompt TitleExtractorPrompt

The prompt used for extraction.

SummaryCosineSimilarityBuilder dataclass

SummaryCosineSimilarityBuilder(name: str = '', property_name: str = 'summary_embedding', new_property_name: str = 'summary_cosine_similarity', threshold: float = 0.1)

Bases: CosineSimilarityBuilder

filter

Filters the knowledge graph to only include nodes with a summary embedding.

Source code in src/ragas/testset/transforms/relationship_builders/cosine.py
def filter(self, kg: KnowledgeGraph) -> KnowledgeGraph:
    """
    Filters the knowledge graph to only include nodes with a summary embedding.
    """
    nodes = []
    for node in kg.nodes:
        if node.type == NodeType.DOCUMENT:
            emb = node.get_property(self.property_name)
            if emb is None:
                raise ValueError(f"Node {node.id} has no {self.property_name}")
            nodes.append(node)
    return KnowledgeGraph(nodes=nodes)

apply_transforms

apply_transforms(kg: KnowledgeGraph, transforms: Transforms, run_config: RunConfig = RunConfig())

Apply a list of transformations to a knowledge graph in place.

Source code in src/ragas/testset/transforms/engine.py
def apply_transforms(
    kg: KnowledgeGraph,
    transforms: Transforms,
    run_config: RunConfig = RunConfig(),
):
    """
    Apply a list of transformations to a knowledge graph in place.
    """
    # apply nest_asyncio to fix the event loop issue in jupyter
    apply_nest_asyncio()

    # if single transformation, wrap it in a list
    if isinstance(transforms, BaseGraphTransformation):
        transforms = [transforms]

    # apply the transformations
    # if Sequences, apply each transformation sequentially
    if isinstance(transforms, t.List):
        for transform in transforms:
            asyncio.run(
                run_coroutines(
                    transform.generate_execution_plan(kg),
                    get_desc(transform),
                    run_config.max_workers,
                )
            )
    # if Parallel, collect inside it and run it all
    elif isinstance(transforms, Parallel):
        asyncio.run(
            run_coroutines(
                transforms.generate_execution_plan(kg),
                get_desc(transforms),
                run_config.max_workers,
            )
        )
    else:
        raise ValueError(
            f"Invalid transforms type: {type(transforms)}. Expects a list of BaseGraphTransformations or a Parallel instance."
        )

rollback_transforms

rollback_transforms(kg: KnowledgeGraph, transforms: Transforms)

Rollback a list of transformations from a knowledge graph.

Note

This is not yet implemented. Please open an issue if you need this feature.

Source code in src/ragas/testset/transforms/engine.py
def rollback_transforms(kg: KnowledgeGraph, transforms: Transforms):
    """
    Rollback a list of transformations from a knowledge graph.

    Note
    ----
    This is not yet implemented. Please open an issue if you need this feature.
    """
    # this will allow you to roll back the transformations
    raise NotImplementedError

default_transforms

default_transforms() -> Transforms

Creates and returns a default set of transforms for processing a knowledge graph.

This function defines a series of transformation steps to be applied to a knowledge graph, including extracting summaries, keyphrases, titles, headlines, and embeddings, as well as building similarity relationships between nodes.

The transforms are applied in the following order: 1. Parallel extraction of summaries and headlines 2. Embedding of summaries for document nodes 3. Splitting of headlines 4. Parallel extraction of embeddings, keyphrases, and titles 5. Building cosine similarity relationships between nodes 6. Building cosine similarity relationships between summaries

Returns:

Type Description
Transforms

A list of transformation steps to be applied to the knowledge graph.

Source code in src/ragas/testset/transforms/__init__.py
def default_transforms() -> Transforms:
    """
    Creates and returns a default set of transforms for processing a knowledge graph.

    This function defines a series of transformation steps to be applied to a
    knowledge graph, including extracting summaries, keyphrases, titles,
    headlines, and embeddings, as well as building similarity relationships
    between nodes.

    The transforms are applied in the following order:
    1. Parallel extraction of summaries and headlines
    2. Embedding of summaries for document nodes
    3. Splitting of headlines
    4. Parallel extraction of embeddings, keyphrases, and titles
    5. Building cosine similarity relationships between nodes
    6. Building cosine similarity relationships between summaries

    Returns
    -------
    Transforms
        A list of transformation steps to be applied to the knowledge graph.

    """
    from ragas.testset.graph import NodeType

    # define the transforms
    summary_extractor = SummaryExtractor()
    keyphrase_extractor = KeyphrasesExtractor()
    title_extractor = TitleExtractor()
    headline_extractor = HeadlinesExtractor()
    embedding_extractor = EmbeddingExtractor()
    headline_splitter = HeadlineSplitter()
    cosine_sim_builder = CosineSimilarityBuilder(threshold=0.8)
    summary_embedder = EmbeddingExtractor(
        name="summary_embedder",
        property_name="summary_embedding",
        embed_property_name="summary",
        filter_nodes=lambda node: True if node.type == NodeType.DOCUMENT else False,
    )
    summary_cosine_sim_builder = SummaryCosineSimilarityBuilder(threshold=0.6)

    # specify the transforms and their order to be applied
    transforms = [
        Parallel(summary_extractor, headline_extractor),
        summary_embedder,
        headline_splitter,
        Parallel(embedding_extractor, keyphrase_extractor, title_extractor),
        cosine_sim_builder,
        summary_cosine_sim_builder,
    ]
    return transforms