Skip to content

Graph

NodeType

Bases: str, Enum

Enumeration of node types in the knowledge graph.

Currently supported node types are: UNKNOWN, DOCUMENT, CHUNK

Node

Bases: BaseModel

Represents a node in the knowledge graph.

Attributes:

Name Type Description
id UUID

Unique identifier for the node.

properties dict

Dictionary of properties associated with the node.

type NodeType

Type of the node.

add_property

add_property(key: str, value: Any)

Adds a property to the node.

Raises:

Type Description
ValueError

If the property already exists.

Source code in src/ragas/testset/graph.py
def add_property(self, key: str, value: t.Any):
    """
    Adds a property to the node.

    Raises
    ------
    ValueError
        If the property already exists.
    """
    if key.lower() in self.properties:
        raise ValueError(f"Property {key} already exists")
    self.properties[key.lower()] = value

get_property

get_property(key: str) -> Optional[Any]

Retrieves a property value by key.

Notes

The key is case-insensitive.

Source code in src/ragas/testset/graph.py
def get_property(self, key: str) -> t.Optional[t.Any]:
    """
    Retrieves a property value by key.

    Notes
    -----
    The key is case-insensitive.
    """
    return self.properties.get(key.lower(), None)

Relationship

Bases: BaseModel

Represents a relationship between two nodes in a knowledge graph.

Attributes:

Name Type Description
id (UUID, optional)

Unique identifier for the relationship. Defaults to a new UUID.

type str

The type of the relationship.

source Node

The source node of the relationship.

target Node

The target node of the relationship.

bidirectional (bool, optional)

Whether the relationship is bidirectional. Defaults to False.

properties (dict, optional)

Dictionary of properties associated with the relationship. Defaults to an empty dict.

get_property

get_property(key: str) -> Optional[Any]

Retrieves a property value by key. The key is case-insensitive.

Source code in src/ragas/testset/graph.py
def get_property(self, key: str) -> t.Optional[t.Any]:
    """
    Retrieves a property value by key. The key is case-insensitive.
    """
    return self.properties.get(key.lower(), None)

KnowledgeGraph dataclass

KnowledgeGraph(nodes: List[Node] = list(), relationships: List[Relationship] = list())

Represents a knowledge graph containing nodes and relationships.

Attributes:

Name Type Description
nodes List[Node]

List of nodes in the knowledge graph.

relationships List[Relationship]

List of relationships in the knowledge graph.

add

add(item: Union[Node, Relationship])

Adds a node or relationship to the knowledge graph.

Raises:

Type Description
ValueError

If the item type is not Node or Relationship.

Source code in src/ragas/testset/graph.py
def add(self, item: t.Union[Node, Relationship]):
    """
    Adds a node or relationship to the knowledge graph.

    Raises
    ------
    ValueError
        If the item type is not Node or Relationship.
    """
    if isinstance(item, Node):
        self._add_node(item)
    elif isinstance(item, Relationship):
        self._add_relationship(item)
    else:
        raise ValueError(f"Invalid item type: {type(item)}")

save

save(path: Union[str, Path])

Saves the knowledge graph to a JSON file.

Source code in src/ragas/testset/graph.py
def save(self, path: t.Union[str, Path]):
    """Saves the knowledge graph to a JSON file."""
    if isinstance(path, str):
        path = Path(path)

    data = {
        "nodes": [node.model_dump() for node in self.nodes],
        "relationships": [rel.model_dump() for rel in self.relationships],
    }
    with open(path, "w") as f:
        json.dump(data, f, cls=UUIDEncoder, indent=2, ensure_ascii=False)

load classmethod

load(path: Union[str, Path]) -> KnowledgeGraph

Loads a knowledge graph from a path.

Source code in src/ragas/testset/graph.py
@classmethod
def load(cls, path: t.Union[str, Path]) -> "KnowledgeGraph":
    """Loads a knowledge graph from a path."""
    if isinstance(path, str):
        path = Path(path)

    with open(path, "r") as f:
        data = json.load(f)

    nodes = [Node(**node_data) for node_data in data["nodes"]]
    relationships = [Relationship(**rel_data) for rel_data in data["relationships"]]

    kg = cls()
    kg.nodes.extend(nodes)
    kg.relationships.extend(relationships)
    return kg

find_clusters

find_clusters(relationship_condition: Callable[[Relationship], bool] = lambda _: True) -> List[Set[Node]]

Finds clusters of nodes in the knowledge graph based on a relationship condition.

Parameters:

Name Type Description Default
relationship_condition Callable[[Relationship], bool]

A function that takes a Relationship and returns a boolean, by default lambda _: True

lambda _: True

Returns:

Type Description
List[Set[Node]]

A list of sets, where each set contains nodes that form a cluster.

Source code in src/ragas/testset/graph.py
def find_clusters(
    self, relationship_condition: t.Callable[[Relationship], bool] = lambda _: True
) -> t.List[t.Set[Node]]:
    """
    Finds clusters of nodes in the knowledge graph based on a relationship condition.

    Parameters
    ----------
    relationship_condition : Callable[[Relationship], bool], optional
        A function that takes a Relationship and returns a boolean, by default lambda _: True

    Returns
    -------
    List[Set[Node]]
        A list of sets, where each set contains nodes that form a cluster.
    """
    clusters = []
    visited = set()

    relationships = [
        rel for rel in self.relationships if relationship_condition(rel)
    ]

    def dfs(node: Node, cluster: t.Set[Node]):
        visited.add(node)
        cluster.add(node)
        for rel in relationships:
            if rel.source == node and rel.target not in visited:
                dfs(rel.target, cluster)
            # if the relationship is bidirectional, we need to check the reverse
            elif (
                rel.bidirectional
                and rel.target == node
                and rel.source not in visited
            ):
                dfs(rel.source, cluster)

    for node in self.nodes:
        if node not in visited:
            cluster = set()
            dfs(node, cluster)
            if len(cluster) > 1:
                clusters.append(cluster)

    return clusters