Faithfulness
Faithfulness
Faithfulness
metric measures the factual consistency of the generated answer against the given context. It is calculated from answer and retrieved context. The answer is scaled to (0,1) range. Higher the better.
The generated answer is regarded as faithful if all the claims made in the answer can be inferred from the given context. To calculate this, a set of claims from the generated answer is first identified. Then each of these claims is cross-checked with the given context to determine if it can be inferred from the context. The faithfulness score is given by:
Example
from ragas.database_schema import SingleTurnSample
from ragas.metrics import Faithfulness
sample = SingleTurnSample(
user_input="When was the first super bowl?",
response="The first superbowl was held on Jan 15, 1967",
retrieved_contexts=[
"The First AFLβNFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles."
]
)
scorer = Faithfulness()
await scorer.single_turn_ascore(sample)
Faithfullness with HHEM-2.1-Open
Vectara's HHEM-2.1-Open is a classifier model (T5) that is trained to detect hallucinations from LLM generated text. This model can be used in the second step of calculating faithfulness, i.e. when claims are cross-checked with the given context to determine if it can be inferred from the context. The model is free, small, and open-source, making it very efficient in production use cases. To use the model to calculate faithfulness, you can use the following code snippet:
from ragas.database_schema import SingleTurnSample
from ragas.metrics import FaithfulnesswithHHEM
sample = SingleTurnSample(
user_input="When was the first super bowl?",
response="The first superbowl was held on Jan 15, 1967",
retrieved_contexts=[
"The First AFLβNFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles."
]
)
scorer = FaithfulnesswithHHEM()
await scorer.single_turn_ascore(sample)
You can load the model onto a specified device by setting the device
argument and adjust the batch size for inference using the batch_size
parameter. By default, the model is loaded on the CPU with a batch size of 10
my_device = "cuda:0"
my_batch_size = 10
scorer = FaithfulnesswithHHEM(device=my_device, batch_size=my_batch_size)
await scorer.single_turn_ascore(sample)
How Itβs Calculated
Example
Question: Where and when was Einstein born?
Context: Albert Einstein (born 14 March 1879) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time
High faithfulness answer: Einstein was born in Germany on 14th March 1879.
Low faithfulness answer: Einstein was born in Germany on 20th March 1879.
Let's examine how faithfulness was calculated using the low faithfulness answer:
-
Step 1: Break the generated answer into individual statements.
- Statements:
- Statement 1: "Einstein was born in Germany."
- Statement 2: "Einstein was born on 20th March 1879."
- Statements:
-
Step 2: For each of the generated statements, verify if it can be inferred from the given context.
- Statement 1: Yes
- Statement 2: No
-
Step 3: Use the formula depicted above to calculate faithfulness.
\[ \text{Faithfulness} = { \text{1} \over \text{2} } = 0.5 \]