Skip to content

Transforms

BaseGraphTransformation dataclass

BaseGraphTransformation(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter())

Bases: ABC

Abstract base class for graph transformations on a KnowledgeGraph.

transform abstractmethod async

transform(kg: KnowledgeGraph) -> Any

Abstract method to transform the KnowledgeGraph. Transformations should be idempotent, meaning that applying the transformation multiple times should yield the same result as applying it once.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
Any

The transformed knowledge graph.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
async def transform(self, kg: KnowledgeGraph) -> t.Any:
    """
    Abstract method to transform the KnowledgeGraph. Transformations should be
    idempotent, meaning that applying the transformation multiple times should
    yield the same result as applying it once.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Any
        The transformed knowledge graph.
    """
    pass

filter

Filters the KnowledgeGraph and returns the filtered graph.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be filtered.

required

Returns:

Type Description
KnowledgeGraph

The filtered knowledge graph.

Source code in src/ragas/testset/transforms/base.py
def filter(self, kg: KnowledgeGraph) -> KnowledgeGraph:
    """
    Filters the KnowledgeGraph and returns the filtered graph.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be filtered.

    Returns
    -------
    KnowledgeGraph
        The filtered knowledge graph.
    """

    return KnowledgeGraph(
        nodes=[node for node in kg.nodes if self.filter_nodes(node)],
        relationships=[
            rel
            for rel in kg.relationships
            if rel.source in kg.nodes and rel.target in kg.nodes
        ],
    )

generate_execution_plan abstractmethod

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

Generates a list of coroutines to be executed in sequence by the Executor. This coroutine will, upon execution, write the transformation into the KnowledgeGraph.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Coroutine]

A list of coroutines to be executed in parallel.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a list of coroutines to be executed in sequence by the Executor. This
    coroutine will, upon execution, write the transformation into the KnowledgeGraph.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Coroutine]
        A list of coroutines to be executed in parallel.
    """
    pass

Extractor dataclass

Extractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter())

Bases: BaseGraphTransformation

Abstract base class for extractors that transform a KnowledgeGraph by extracting specific properties from its nodes.

Methods:

Name Description
transform

Transforms the KnowledgeGraph by extracting properties from its nodes.

extract

Abstract method to extract a specific property from a node.

transform async

transform(kg: KnowledgeGraph) -> List[Tuple[Node, Tuple[str, Any]]]

Transforms the KnowledgeGraph by extracting properties from its nodes. Uses the filter method to filter the graph and the extract method to extract properties from each node.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Tuple[Node, Tuple[str, Any]]]

A list of tuples where each tuple contains a node and the extracted property.

Examples:

>>> kg = KnowledgeGraph(nodes=[Node(id=1, properties={"name": "Node1"}), Node(id=2, properties={"name": "Node2"})])
>>> extractor = SomeConcreteExtractor()
>>> extractor.transform(kg)
[(Node(id=1, properties={"name": "Node1"}), ("property_name", "extracted_value")),
 (Node(id=2, properties={"name": "Node2"}), ("property_name", "extracted_value"))]
Source code in src/ragas/testset/transforms/base.py
async def transform(
    self, kg: KnowledgeGraph
) -> t.List[t.Tuple[Node, t.Tuple[str, t.Any]]]:
    """
    Transforms the KnowledgeGraph by extracting properties from its nodes. Uses
    the `filter` method to filter the graph and the `extract` method to extract
    properties from each node.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Tuple[Node, t.Tuple[str, t.Any]]]
        A list of tuples where each tuple contains a node and the extracted
        property.

    Examples
    --------
    >>> kg = KnowledgeGraph(nodes=[Node(id=1, properties={"name": "Node1"}), Node(id=2, properties={"name": "Node2"})])
    >>> extractor = SomeConcreteExtractor()
    >>> extractor.transform(kg)
    [(Node(id=1, properties={"name": "Node1"}), ("property_name", "extracted_value")),
     (Node(id=2, properties={"name": "Node2"}), ("property_name", "extracted_value"))]
    """
    filtered = self.filter(kg)
    return [(node, await self.extract(node)) for node in filtered.nodes]

extract abstractmethod async

extract(node: Node) -> Tuple[str, Any]

Abstract method to extract a specific property from a node.

Parameters:

Name Type Description Default
node Node

The node from which to extract the property.

required

Returns:

Type Description
Tuple[str, Any]

A tuple containing the property name and the extracted value.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
async def extract(self, node: Node) -> t.Tuple[str, t.Any]:
    """
    Abstract method to extract a specific property from a node.

    Parameters
    ----------
    node : Node
        The node from which to extract the property.

    Returns
    -------
    t.Tuple[str, t.Any]
        A tuple containing the property name and the extracted value.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

Generates a list of coroutines to be executed in parallel by the Executor.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Coroutine]

A list of coroutines to be executed in parallel.

Source code in src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a list of coroutines to be executed in parallel by the Executor.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Coroutine]
        A list of coroutines to be executed in parallel.
    """

    async def apply_extract(node: Node):
        property_name, property_value = await self.extract(node)
        if node.get_property(property_name) is None:
            node.add_property(property_name, property_value)
        else:
            logger.warning(
                "Property '%s' already exists in node '%.6s'. Skipping!",
                property_name,
                node.id,
            )

    filtered = self.filter(kg)
    return [apply_extract(node) for node in filtered.nodes]

RelationshipBuilder dataclass

RelationshipBuilder(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter())

Bases: BaseGraphTransformation

Abstract base class for building relationships in a KnowledgeGraph.

Methods:

Name Description
transform

Transforms the KnowledgeGraph by building relationships.

transform abstractmethod async

transform(kg: KnowledgeGraph) -> List[Relationship]

Transforms the KnowledgeGraph by building relationships.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Relationship]

A list of new relationships.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
async def transform(self, kg: KnowledgeGraph) -> t.List[Relationship]:
    """
    Transforms the KnowledgeGraph by building relationships.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[Relationship]
        A list of new relationships.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

Generates a list of coroutines to be executed in parallel by the Executor.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Coroutine]

A list of coroutines to be executed in parallel.

Source code in src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a list of coroutines to be executed in parallel by the Executor.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Coroutine]
        A list of coroutines to be executed in parallel.
    """

    async def apply_build_relationships(
        filtered_kg: KnowledgeGraph, original_kg: KnowledgeGraph
    ):
        relationships = await self.transform(filtered_kg)
        original_kg.relationships.extend(relationships)

    filtered_kg = self.filter(kg)
    return [apply_build_relationships(filtered_kg=filtered_kg, original_kg=kg)]

Splitter dataclass

Splitter(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter())

Bases: BaseGraphTransformation

Abstract base class for splitters that transform a KnowledgeGraph by splitting its nodes into smaller chunks.

Methods:

Name Description
transform

Transforms the KnowledgeGraph by splitting its nodes into smaller chunks.

split

Abstract method to split a node into smaller chunks.

transform async

transform(kg: KnowledgeGraph) -> Tuple[List[Node], List[Relationship]]

Transforms the KnowledgeGraph by splitting its nodes into smaller chunks.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
Tuple[List[Node], List[Relationship]]

A tuple containing a list of new nodes and a list of new relationships.

Source code in src/ragas/testset/transforms/base.py
async def transform(
    self, kg: KnowledgeGraph
) -> t.Tuple[t.List[Node], t.List[Relationship]]:
    """
    Transforms the KnowledgeGraph by splitting its nodes into smaller chunks.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Tuple[t.List[Node], t.List[Relationship]]
        A tuple containing a list of new nodes and a list of new relationships.
    """
    filtered = self.filter(kg)

    all_nodes = []
    all_relationships = []
    for node in filtered.nodes:
        nodes, relationships = await self.split(node)
        all_nodes.extend(nodes)
        all_relationships.extend(relationships)

    return all_nodes, all_relationships

split abstractmethod async

split(node: Node) -> Tuple[List[Node], List[Relationship]]

Abstract method to split a node into smaller chunks.

Parameters:

Name Type Description Default
node Node

The node to be split.

required

Returns:

Type Description
Tuple[List[Node], List[Relationship]]

A tuple containing a list of new nodes and a list of new relationships.

Source code in src/ragas/testset/transforms/base.py
@abstractmethod
async def split(self, node: Node) -> t.Tuple[t.List[Node], t.List[Relationship]]:
    """
    Abstract method to split a node into smaller chunks.

    Parameters
    ----------
    node : Node
        The node to be split.

    Returns
    -------
    t.Tuple[t.List[Node], t.List[Relationship]]
        A tuple containing a list of new nodes and a list of new relationships.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

Generates a list of coroutines to be executed in parallel by the Executor.

Parameters:

Name Type Description Default
kg KnowledgeGraph

The knowledge graph to be transformed.

required

Returns:

Type Description
List[Coroutine]

A list of coroutines to be executed in parallel.

Source code in src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a list of coroutines to be executed in parallel by the Executor.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Coroutine]
        A list of coroutines to be executed in parallel.
    """

    async def apply_split(node: Node):
        nodes, relationships = await self.split(node)
        kg.nodes.extend(nodes)
        kg.relationships.extend(relationships)

    filtered = self.filter(kg)
    return [apply_split(node) for node in filtered.nodes]

Parallel

Parallel(*transformations: BaseGraphTransformation)

Collection of transformations to be applied in parallel.

Examples:

>>> Parallel(HeadlinesExtractor(), SummaryExtractor())
Source code in src/ragas/testset/transforms/engine.py
def __init__(self, *transformations: BaseGraphTransformation):
    self.transformations = list(transformations)

EmbeddingExtractor dataclass

EmbeddingExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), property_name: str = 'embedding', embed_property_name: str = 'page_content', embedding_model: BaseRagasEmbeddings = embedding_factory())

Bases: Extractor

A class for extracting embeddings from nodes in a knowledge graph.

Attributes:

Name Type Description
property_name str

The name of the property to store the embedding

embed_property_name str

The name of the property containing the text to embed

embedding_model BaseRagasEmbeddings

The embedding model used for generating embeddings

extract async

extract(node: Node) -> Tuple[str, Any]

Extracts the embedding for a given node.

Raises:

Type Description
ValueError

If the property to be embedded is not a string.

Source code in src/ragas/testset/transforms/extractors/embeddings.py
async def extract(self, node: Node) -> t.Tuple[str, t.Any]:
    """
    Extracts the embedding for a given node.

    Raises
    ------
    ValueError
        If the property to be embedded is not a string.
    """
    text = node.get_property(self.embed_property_name)
    if not isinstance(text, str):
        raise ValueError(
            f"node.property('{self.embed_property_name}') must be a string, found '{type(text)}'"
        )
    embedding = self.embedding_model.embed_query(text)
    return self.property_name, embedding

HeadlinesExtractor dataclass

HeadlinesExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), llm: BaseRagasLLM = llm_factory(), merge_if_possible: bool = True, property_name: str = 'headlines', prompt: HeadlinesExtractorPrompt = HeadlinesExtractorPrompt())

Bases: LLMBasedExtractor

Extracts the headlines from the given text.

Attributes:

Name Type Description
property_name str

The name of the property to extract.

prompt HeadlinesExtractorPrompt

The prompt used for extraction.

KeyphrasesExtractor dataclass

KeyphrasesExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), llm: BaseRagasLLM = llm_factory(), merge_if_possible: bool = True, property_name: str = 'keyphrases', prompt: KeyphrasesExtractorPrompt = KeyphrasesExtractorPrompt())

Bases: LLMBasedExtractor

Extracts top 5 keyphrases from the given text.

Attributes:

Name Type Description
property_name str

The name of the property to extract.

prompt KeyphrasesExtractorPrompt

The prompt used for extraction.

SummaryExtractor dataclass

SummaryExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), llm: BaseRagasLLM = llm_factory(), merge_if_possible: bool = True, property_name: str = 'summary', prompt: SummaryExtractorPrompt = SummaryExtractorPrompt())

Bases: LLMBasedExtractor

Extracts a summary from the given text.

Attributes:

Name Type Description
property_name str

The name of the property to extract.

prompt SummaryExtractorPrompt

The prompt used for extraction.

TitleExtractor dataclass

TitleExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), llm: BaseRagasLLM = llm_factory(), merge_if_possible: bool = True, property_name: str = 'title', prompt: TitleExtractorPrompt = TitleExtractorPrompt())

Bases: LLMBasedExtractor

Extracts the title from the given text.

Attributes:

Name Type Description
property_name str

The name of the property to extract.

prompt TitleExtractorPrompt

The prompt used for extraction.

SummaryCosineSimilarityBuilder dataclass

SummaryCosineSimilarityBuilder(name: str = '', filter_nodes: Callable[[Node], bool] = lambda: default_filter(), property_name: str = 'summary_embedding', new_property_name: str = 'summary_cosine_similarity', threshold: float = 0.1)

Bases: CosineSimilarityBuilder

filter

Filters the knowledge graph to only include nodes with a summary embedding.

Source code in src/ragas/testset/transforms/relationship_builders/cosine.py
def filter(self, kg: KnowledgeGraph) -> KnowledgeGraph:
    """
    Filters the knowledge graph to only include nodes with a summary embedding.
    """
    nodes = []
    for node in kg.nodes:
        if node.type == NodeType.DOCUMENT:
            emb = node.get_property(self.property_name)
            if emb is None:
                raise ValueError(f"Node {node.id} has no {self.property_name}")
            nodes.append(node)
    return KnowledgeGraph(nodes=nodes)

default_transforms

default_transforms(llm: BaseRagasLLM, embedding_model: BaseRagasEmbeddings) -> Transforms

Creates and returns a default set of transforms for processing a knowledge graph.

This function defines a series of transformation steps to be applied to a knowledge graph, including extracting summaries, keyphrases, titles, headlines, and embeddings, as well as building similarity relationships between nodes.

Returns:

Type Description
Transforms

A list of transformation steps to be applied to the knowledge graph.

Source code in src/ragas/testset/transforms/default.py
def default_transforms(
    llm: BaseRagasLLM,
    embedding_model: BaseRagasEmbeddings,
) -> Transforms:
    """
    Creates and returns a default set of transforms for processing a knowledge graph.

    This function defines a series of transformation steps to be applied to a
    knowledge graph, including extracting summaries, keyphrases, titles,
    headlines, and embeddings, as well as building similarity relationships
    between nodes.



    Returns
    -------
    Transforms
        A list of transformation steps to be applied to the knowledge graph.

    """

    headline_extractor = HeadlinesExtractor(llm=llm)
    splitter = HeadlineSplitter(min_tokens=500)

    def summary_filter(node):
        return (
            node.type == NodeType.DOCUMENT
            and num_tokens_from_string(node.properties["page_content"]) > 500
        )

    summary_extractor = SummaryExtractor(
        llm=llm, filter_nodes=lambda node: summary_filter(node)
    )

    theme_extractor = ThemesExtractor(llm=llm)
    ner_extractor = NERExtractor(
        llm=llm, filter_nodes=lambda node: node.type == NodeType.CHUNK
    )

    summary_emb_extractor = EmbeddingExtractor(
        embedding_model=embedding_model,
        property_name="summary_embedding",
        embed_property_name="summary",
        filter_nodes=lambda node: summary_filter(node),
    )

    cosine_sim_builder = CosineSimilarityBuilder(
        property_name="summary_embedding",
        new_property_name="summary_similarity",
        threshold=0.7,
        filter_nodes=lambda node: summary_filter(node),
    )

    ner_overlap_sim = OverlapScoreBuilder(
        threshold=0.01, filter_nodes=lambda node: node.type == NodeType.CHUNK
    )

    transforms = [
        headline_extractor,
        splitter,
        Parallel(summary_extractor, theme_extractor, ner_extractor),
        summary_emb_extractor,
        Parallel(cosine_sim_builder, ner_overlap_sim),
    ]

    return transforms

apply_transforms

apply_transforms(kg: KnowledgeGraph, transforms: Transforms, run_config: RunConfig = RunConfig(), callbacks: Optional[Callbacks] = None)

Apply a list of transformations to a knowledge graph in place.

Source code in src/ragas/testset/transforms/engine.py
def apply_transforms(
    kg: KnowledgeGraph,
    transforms: Transforms,
    run_config: RunConfig = RunConfig(),
    callbacks: t.Optional[Callbacks] = None,
):
    """
    Apply a list of transformations to a knowledge graph in place.
    """
    # apply nest_asyncio to fix the event loop issue in jupyter
    apply_nest_asyncio()

    # if single transformation, wrap it in a list
    if isinstance(transforms, BaseGraphTransformation):
        transforms = [transforms]

    # apply the transformations
    # if Sequences, apply each transformation sequentially
    if isinstance(transforms, t.List):
        for transform in transforms:
            asyncio.run(
                run_coroutines(
                    transform.generate_execution_plan(kg),
                    get_desc(transform),
                    run_config.max_workers,
                )
            )
    # if Parallel, collect inside it and run it all
    elif isinstance(transforms, Parallel):
        asyncio.run(
            run_coroutines(
                transforms.generate_execution_plan(kg),
                get_desc(transforms),
                run_config.max_workers,
            )
        )
    else:
        raise ValueError(
            f"Invalid transforms type: {type(transforms)}. Expects a list of BaseGraphTransformations or a Parallel instance."
        )

rollback_transforms

rollback_transforms(kg: KnowledgeGraph, transforms: Transforms)

Rollback a list of transformations from a knowledge graph.

Note

This is not yet implemented. Please open an issue if you need this feature.

Source code in src/ragas/testset/transforms/engine.py
def rollback_transforms(kg: KnowledgeGraph, transforms: Transforms):
    """
    Rollback a list of transformations from a knowledge graph.

    Note
    ----
    This is not yet implemented. Please open an issue if you need this feature.
    """
    # this will allow you to roll back the transformations
    raise NotImplementedError