Adapting Metrics to Target Language
When evaluating LLM applications in languages other than English, adapt your metrics to the target language. Ragas uses an LLM to translate the few-shot examples in prompts.
Setup
from openai import AsyncOpenAI
from ragas.llms import llm_factory
from ragas.metrics.collections import Faithfulness
client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
metric = Faithfulness(llm=llm)
Adapt Prompts to Target Language
Collections metrics have prompts as direct attributes. Use the adapt() method to translate the few-shot examples:
# Check original language
print(metric.statement_generator_prompt.language)
# english
# Adapt prompts to Hindi
metric.statement_generator_prompt = await metric.statement_generator_prompt.adapt(
target_language="hindi", llm=llm
)
metric.nli_statement_prompt = await metric.nli_statement_prompt.adapt(
target_language="hindi", llm=llm
)
# Verify adaptation
print(metric.statement_generator_prompt.language)
# hindi
# See translated example
print(metric.statement_generator_prompt.examples[0][0].question)
# अल्बर्ट आइंस्टीन कौन थे और वे किस चीज़ के लिए सबसे अधिक जाने जाते हैं?
Note
By default, only few-shot examples are translated. Instructions remain in English. To also translate instructions, set adapt_instruction=True.
Evaluate with Adapted Metric
result = await metric.ascore(
user_input="भारत की राजधानी क्या है?",
response="भारत की राजधानी नई दिल्ली है।",
retrieved_contexts=["भारत की राजधानी नई दिल्ली है, जो देश का सबसे बड़ा शहर भी है।"],
)
print(f"Faithfulness: {result.value}")
# Faithfulness: 1.0
Adapting Other Metrics
The same pattern works for any collections metric with prompts:
from ragas.metrics.collections import AnswerRelevancy
from ragas.embeddings.base import embedding_factory
embeddings = embedding_factory("openai", client=client)
relevancy = AnswerRelevancy(llm=llm, embeddings=embeddings)
# Adapt the prompt
relevancy.prompt = await relevancy.prompt.adapt(
target_language="spanish", llm=llm
)
# See translated example
print(relevancy.prompt.examples[0][0].response)
# Albert Einstein nació en Alemania.
Adapting FactualCorrectness
FactualCorrectness has two prompts that both need to be adapted:
from ragas.metrics.collections import FactualCorrectness
metric = FactualCorrectness(llm=llm)
# Adapt both prompts to German
metric.prompt = await metric.prompt.adapt(
target_language="german", llm=llm
)
metric.nli_prompt = await metric.nli_prompt.adapt(
target_language="german", llm=llm
)
# Verify adaptation
print(metric.prompt.language) # german
print(metric.nli_prompt.language) # german
# Now use the adapted metric
result = await metric.ascore(
response="Einstein wurde 1879 in Deutschland geboren.",
reference="Albert Einstein wurde am 14. März 1879 in Ulm, Deutschland geboren."
)
print(f"Factual Correctness: {result.value}")
Tip
Like Faithfulness, FactualCorrectness uses two prompts internally:
- prompt - ClaimDecompositionPrompt for breaking text into claims
- nli_prompt - NLIStatementPrompt for verifying claims
Both prompts should be adapted when evaluating in non-English languages.