Graph

NodeType

Bases: str, Enum

Enumeration of node types in the knowledge graph.

Currently supported node types are: UNKNOWN, DOCUMENT, CHUNK

Node

Bases: BaseModel

Represents a node in the knowledge graph.

Attributes:

Name	Type	Description
`id`	`UUID`	Unique identifier for the node.
`properties`	`dict`	Dictionary of properties associated with the node.
`type`	`NodeType`	Type of the node.

add_property

add_property(key: str, value: Any)

Adds a property to the node.

Raises:

Type	Description
`ValueError`	If the property already exists.

Source code in src/ragas/testset/graph.py

def add_property(self, key: str, value: t.Any):
    """
    Adds a property to the node.

    Raises
    ------
    ValueError
        If the property already exists.
    """
    if key.lower() in self.properties:
        raise ValueError(f"Property {key} already exists")
    self.properties[key.lower()] = value

get_property

get_property(key: str) -> Optional[Any]

Retrieves a property value by key.

Notes

The key is case-insensitive.

Source code in src/ragas/testset/graph.py

def get_property(self, key: str) -> t.Optional[t.Any]:
    """
    Retrieves a property value by key.

    Notes
    -----
    The key is case-insensitive.
    """
    return self.properties.get(key.lower(), None)

Relationship

Bases: BaseModel

Represents a relationship between two nodes in a knowledge graph.

Attributes:

Name	Type	Description
`id`	`(UUID, optional)`	Unique identifier for the relationship. Defaults to a new UUID.
`type`	`str`	The type of the relationship.
`source`	`Node`	The source node of the relationship.
`target`	`Node`	The target node of the relationship.
`bidirectional`	`(bool, optional)`	Whether the relationship is bidirectional. Defaults to False.
`properties`	`(dict, optional)`	Dictionary of properties associated with the relationship. Defaults to an empty dict.

get_property

get_property(key: str) -> Optional[Any]

Retrieves a property value by key. The key is case-insensitive.

Source code in src/ragas/testset/graph.py

def get_property(self, key: str) -> t.Optional[t.Any]:
    """
    Retrieves a property value by key. The key is case-insensitive.
    """
    return self.properties.get(key.lower(), None)

KnowledgeGraph `dataclass`

KnowledgeGraph(nodes: List[Node] = list(), relationships: List[Relationship] = list())

Represents a knowledge graph containing nodes and relationships.

Attributes:

Name	Type	Description
`nodes`	`List[Node]`	List of nodes in the knowledge graph.
`relationships`	`List[Relationship]`	List of relationships in the knowledge graph.

add

add(item: Union[Node, Relationship])

Adds a node or relationship to the knowledge graph.

Raises:

Type	Description
`ValueError`	If the item type is not Node or Relationship.

Source code in src/ragas/testset/graph.py

def add(self, item: t.Union[Node, Relationship]):
    """
    Adds a node or relationship to the knowledge graph.

    Raises
    ------
    ValueError
        If the item type is not Node or Relationship.
    """
    if isinstance(item, Node):
        self._add_node(item)
    elif isinstance(item, Relationship):
        self._add_relationship(item)
    else:
        raise ValueError(f"Invalid item type: {type(item)}")

save

save(path: Union[str, Path])

Saves the knowledge graph to a JSON file.

Parameters:

Name	Type	Description	Default
`path`	`Union[str, Path]`	Path where the JSON file should be saved.	required

Notes

The file is saved using UTF-8 encoding to ensure proper handling of Unicode characters across different platforms.

Source code in src/ragas/testset/graph.py

def save(self, path: t.Union[str, Path]):
    """Saves the knowledge graph to a JSON file.

    Parameters
    ----------
    path : Union[str, Path]
        Path where the JSON file should be saved.

    Notes
    -----
    The file is saved using UTF-8 encoding to ensure proper handling of Unicode characters
    across different platforms.
    """
    if isinstance(path, str):
        path = Path(path)

    data = {
        "nodes": [node.model_dump() for node in self.nodes],
        "relationships": [rel.model_dump() for rel in self.relationships],
    }
    with open(path, "w", encoding="utf-8") as f:
        json.dump(data, f, cls=UUIDEncoder, indent=2, ensure_ascii=False)

load `classmethod`

load(path: Union[str, Path]) -> KnowledgeGraph

Loads a knowledge graph from a path.

Parameters:

Name	Type	Description	Default
`path`	`Union[str, Path]`	Path to the JSON file containing the knowledge graph.	required

Returns:

Type	Description
`KnowledgeGraph`	The loaded knowledge graph.

Notes

The file is read using UTF-8 encoding to ensure proper handling of Unicode characters across different platforms.

Source code in src/ragas/testset/graph.py

@classmethod
def load(cls, path: t.Union[str, Path]) -> "KnowledgeGraph":
    """Loads a knowledge graph from a path.

    Parameters
    ----------
    path : Union[str, Path]
        Path to the JSON file containing the knowledge graph.

    Returns
    -------
    KnowledgeGraph
        The loaded knowledge graph.

    Notes
    -----
    The file is read using UTF-8 encoding to ensure proper handling of Unicode characters
    across different platforms.
    """
    if isinstance(path, str):
        path = Path(path)

    with open(path, "r", encoding="utf-8") as f:
        data = json.load(f)

    nodes = [Node(**node_data) for node_data in data["nodes"]]

    nodes_map = {str(node.id): node for node in nodes}
    relationships = [
        Relationship(
            id=rel_data["id"],
            type=rel_data["type"],
            source=nodes_map[rel_data["source"]],
            target=nodes_map[rel_data["target"]],
            bidirectional=rel_data["bidirectional"],
            properties=rel_data["properties"],
        )
        for rel_data in data["relationships"]
    ]

    kg = cls()
    kg.nodes.extend(nodes)
    kg.relationships.extend(relationships)
    return kg

get_node_by_id

get_node_by_id(node_id: Union[UUID, str]) -> Optional[Node]

Retrieves a node by its ID.

Parameters:

Name	Type	Description	Default
`node_id`	`UUID`	The ID of the node to retrieve.	required

Returns:

Type	Description
`Node or None`	The node with the specified ID, or None if not found.

Source code in src/ragas/testset/graph.py

def get_node_by_id(self, node_id: t.Union[uuid.UUID, str]) -> t.Optional[Node]:
    """
    Retrieves a node by its ID.

    Parameters
    ----------
    node_id : uuid.UUID
        The ID of the node to retrieve.

    Returns
    -------
    Node or None
        The node with the specified ID, or None if not found.
    """
    if isinstance(node_id, str):
        node_id = uuid.UUID(node_id)

    return next(filter(lambda n: n.id == node_id, self.nodes), None)

find_indirect_clusters

find_indirect_clusters(relationship_condition: Callable[[Relationship], bool] = lambda _: True, depth_limit: int = 3) -> List[Set[Node]]

Finds "indirect clusters" of nodes in the knowledge graph based on a relationship condition. Uses Leiden algorithm for community detection and identifies unique paths within each cluster.

NOTE: "indirect clusters" as used in the method name are "groups of nodes that are not directly connected but share a common relationship through other nodes", while the Leiden algorithm is a "clustering" algorithm that defines neighborhoods of nodes based on their connections -- these definitions of "cluster" are NOT equivalent.

Parameters:

Name	Type	Description	Default
`relationship_condition`	`Callable[[Relationship], bool]`	A function that takes a Relationship and returns a boolean, by default lambda _: True	`lambda _: True`
`depth_limit`	`int`	The maximum depth of relationships (number of edges) to consider for clustering, by default 3.	`3`

Returns:

Type	Description
`List[Set[Node]]`	A list of sets, where each set contains nodes that form a cluster.

Source code in src/ragas/testset/graph.py

def find_indirect_clusters(
    self,
    relationship_condition: t.Callable[[Relationship], bool] = lambda _: True,
    depth_limit: int = 3,
) -> t.List[t.Set[Node]]:
    """
    Finds "indirect clusters" of nodes in the knowledge graph based on a relationship condition.
    Uses Leiden algorithm for community detection and identifies unique paths within each cluster.

    NOTE: "indirect clusters" as used in the method name are
    "groups of nodes that are not directly connected
    but share a common relationship through other nodes",
    while the Leiden algorithm is a "clustering" algorithm that defines
    neighborhoods of nodes based on their connections --
    these definitions of "cluster" are NOT equivalent.

    Parameters
    ----------
    relationship_condition : Callable[[Relationship], bool], optional
        A function that takes a Relationship and returns a boolean, by default lambda _: True
    depth_limit : int, optional
        The maximum depth of relationships (number of edges) to consider for clustering, by default 3.

    Returns
    -------
    List[Set[Node]]
        A list of sets, where each set contains nodes that form a cluster.
    """

    import networkx as nx

    def get_node_clusters(
        relationships: list[Relationship],
    ) -> dict[int, set[uuid.UUID]]:
        """Identify clusters of nodes using Leiden algorithm."""
        import numpy as np
        from sknetwork.clustering import Leiden
        from sknetwork.data import Dataset as SKDataset, from_edge_list

        # NOTE: the upstream sknetwork Dataset has some issues with type hints,
        # so we use type: ignore to bypass them.
        graph: SKDataset = from_edge_list(  # type: ignore
            [(str(rel.source.id), str(rel.target.id)) for rel in relationships],
            directed=True,
        )

        # Apply Leiden clustering
        leiden = Leiden(random_state=42)
        cluster_labels: np.ndarray = leiden.fit_predict(graph.adjacency)

        # Group nodes by cluster
        clusters: defaultdict[int, set[uuid.UUID]] = defaultdict(set)
        for label, node_id in zip(cluster_labels, graph.names):
            clusters[int(label)].add(uuid.UUID(node_id))

        return dict(clusters)

    def to_nx_digraph(
        nodes: set[uuid.UUID], relationships: list[Relationship]
    ) -> nx.DiGraph:
        """Convert a set of nodes and relationships to a directed graph."""
        # Create directed subgraph for this cluster
        graph = nx.DiGraph()
        for node_id in nodes:
            graph.add_node(
                node_id,
                node_obj=self.get_node_by_id(node_id),
            )
        for rel in relationships:
            if rel.source.id in nodes and rel.target.id in nodes:
                graph.add_edge(rel.source.id, rel.target.id, relationship_obj=rel)
        return graph

    def max_simple_paths(n: int, k: int = depth_limit) -> int:
        """Estimate the number of paths up to depth_limit that would exist in a fully-connected graph of size cluster_nodes."""
        from math import prod

        if n - k - 1 <= 0:
            return 0

        return prod(n - i for i in range(k + 1))

    def exhaustive_paths(
        graph: nx.DiGraph, depth_limit: int
    ) -> list[list[uuid.UUID]]:
        """Find all simple paths in the subgraph up to depth_limit."""
        import itertools

        # Check if graph has enough nodes for meaningful paths
        if len(graph) < 2:
            return []

        all_paths: list[list[uuid.UUID]] = []
        for source, target in itertools.permutations(graph.nodes(), 2):
            if not nx.has_path(graph, source, target):
                continue
            try:
                paths = nx.all_simple_paths(
                    graph,
                    source,
                    target,
                    cutoff=depth_limit,
                )
                all_paths.extend(paths)
            except nx.NetworkXNoPath:
                continue

        return all_paths

    def sample_paths_from_graph(
        graph: nx.DiGraph, depth_limit: int, sample_size: int = 1000
    ) -> list[list[uuid.UUID]]:
        """Sample random paths in the graph up to depth_limit."""
        # we're using a DiGraph, so we need to account for directionality
        # if a node has no out-paths, then it will cause an error in `generate_random_paths`

        # Iteratively remove nodes with no out-paths to handle cascading effects
        while True:
            nodes_with_no_outpaths = [
                n for n in graph.nodes() if graph.out_degree(n) == 0
            ]
            if not nodes_with_no_outpaths:
                break
            graph.remove_nodes_from(nodes_with_no_outpaths)

        # Check if graph is empty after node removal
        if len(graph) == 0:
            return []

        sampled_paths: list[list[uuid.UUID]] = []
        for depth in range(2, depth_limit + 1):
            # Additional safety check before generating paths
            if (
                len(graph) < depth + 1
            ):  # Need at least depth+1 nodes for a path of length depth
                continue

            paths = nx.generate_random_paths(
                graph,
                sample_size=sample_size,
                path_length=depth,
            )
            sampled_paths.extend(paths)
        return sampled_paths

    # depth 2: 3 nodes, 2 edges (A -> B -> C)
    if depth_limit < 2:
        raise ValueError("Depth limit must be at least 2")

    # Filter relationships based on the condition
    filtered_relationships: list[Relationship] = []
    relationship_map: defaultdict[uuid.UUID, set[uuid.UUID]] = defaultdict(set)
    for rel in self.relationships:
        if relationship_condition(rel):
            filtered_relationships.append(rel)
            relationship_map[rel.source.id].add(rel.target.id)
            if rel.bidirectional:
                relationship_map[rel.target.id].add(rel.source.id)

    if not filtered_relationships:
        return []

    clusters = get_node_clusters(filtered_relationships)

    # For each cluster, find valid paths up to depth_limit
    cluster_sets: set[frozenset] = set()
    for _cluster_label, cluster_nodes in tqdm(
        clusters.items(), desc="Processing clusters"
    ):
        if len(cluster_nodes) < depth_limit:
            continue

        subgraph = to_nx_digraph(
            nodes=cluster_nodes, relationships=filtered_relationships
        )

        sampled_paths: list[list[uuid.UUID]] = []
        # if the expected number of paths is small, use exhaustive search
        # otherwise sample with random walks
        if max_simple_paths(n=len(cluster_nodes), k=depth_limit) < 1000:
            sampled_paths.extend(exhaustive_paths(subgraph, depth_limit))
        else:
            sampled_paths.extend(sample_paths_from_graph(subgraph, depth_limit))

        # convert paths (node IDs) to sets of Node objects
        # and deduplicate
        for path in sampled_paths:
            path_nodes = {subgraph.nodes[node_id]["node_obj"] for node_id in path}
            cluster_sets.add(frozenset(path_nodes))

    return [set(path_nodes) for path_nodes in cluster_sets]

remove_node

remove_node(node: Node, inplace: bool = True) -> Optional[KnowledgeGraph]

Removes a node and its associated relationships from the knowledge graph.

Parameters:

Name	Type	Description	Default
`node`	`Node`	The node to be removed from the knowledge graph.	required
`inplace`	`bool`	If True, modifies the knowledge graph in place. If False, returns a modified copy with the node removed.	`True`

Returns:

Type	Description
`KnowledgeGraph or None`	Returns a modified copy of the knowledge graph if `inplace` is False. Returns None if `inplace` is True.

Raises:

Type	Description
`ValueError`	If the node is not present in the knowledge graph.

Source code in src/ragas/testset/graph.py

def remove_node(
    self, node: Node, inplace: bool = True
) -> t.Optional["KnowledgeGraph"]:
    """
    Removes a node and its associated relationships from the knowledge graph.

    Parameters
    ----------
    node : Node
        The node to be removed from the knowledge graph.
    inplace : bool, optional
        If True, modifies the knowledge graph in place.
        If False, returns a modified copy with the node removed.

    Returns
    -------
    KnowledgeGraph or None
        Returns a modified copy of the knowledge graph if `inplace` is False.
        Returns None if `inplace` is True.

    Raises
    ------
    ValueError
        If the node is not present in the knowledge graph.
    """
    if node not in self.nodes:
        raise ValueError("Node is not present in the knowledge graph.")

    if inplace:
        # Modify the current instance
        self.nodes.remove(node)
        self.relationships = [
            rel
            for rel in self.relationships
            if rel.source != node and rel.target != node
        ]
    else:
        # Create a deep copy and modify it
        new_graph = deepcopy(self)
        new_graph.nodes.remove(node)
        new_graph.relationships = [
            rel
            for rel in new_graph.relationships
            if rel.source != node and rel.target != node
        ]
        return new_graph

find_two_nodes_single_rel

find_two_nodes_single_rel(relationship_condition: Callable[[Relationship], bool] = lambda _: True) -> List[Tuple[Node, Relationship, Node]]

Finds nodes in the knowledge graph based on a relationship condition. (NodeA, NodeB, Rel) triples are considered as multi-hop nodes.

Parameters:

Name	Type	Description	Default
`relationship_condition`	`Callable[[Relationship], bool]`	A function that takes a Relationship and returns a boolean, by default lambda _: True	`lambda _: True`

Returns:

Type	Description
`List[Set[Node, Relationship, Node]]`	A list of sets, where each set contains two nodes and a relationship forming a multi-hop node.

Source code in src/ragas/testset/graph.py

def find_two_nodes_single_rel(
    self, relationship_condition: t.Callable[[Relationship], bool] = lambda _: True
) -> t.List[t.Tuple[Node, Relationship, Node]]:
    """
    Finds nodes in the knowledge graph based on a relationship condition.
    (NodeA, NodeB, Rel) triples are considered as multi-hop nodes.

    Parameters
    ----------
    relationship_condition : Callable[[Relationship], bool], optional
        A function that takes a Relationship and returns a boolean, by default lambda _: True

    Returns
    -------
    List[Set[Node, Relationship, Node]]
        A list of sets, where each set contains two nodes and a relationship forming a multi-hop node.
    """

    relationships = [
        relationship
        for relationship in self.relationships
        if relationship_condition(relationship)
    ]

    triplets = set()

    for relationship in relationships:
        if relationship.source != relationship.target:
            node_a = relationship.source
            node_b = relationship.target
            # Ensure the smaller ID node is always first
            if node_a.id < node_b.id:
                normalized_tuple = (node_a, relationship, node_b)
            else:
                normalized_relationship = Relationship(
                    source=node_b,
                    target=node_a,
                    type=relationship.type,
                    properties=relationship.properties,
                )
                normalized_tuple = (node_b, normalized_relationship, node_a)

            triplets.add(normalized_tuple)

    return list(triplets)

Graph

NodeType

Node

add_property

get_property

Relationship

get_property

KnowledgeGraph dataclass

add

save

load classmethod

get_node_by_id

find_indirect_clusters

remove_node

find_two_nodes_single_rel

KnowledgeGraph `dataclass`

load `classmethod`