Skip to content

Adapting metrics to target language

While using ragas to evaluate LLM application workflows, you may have applications to be evaluated that are in languages other than english. In this case, it is best to adapt your LLM powered evaluation metrics to the target language. One obivous way to do this is to manually change the instruction and demonstration, but this can be time consuming. Ragas here offers automatic language adaptation where you can automatically adapt any metrics to target language by using LLM itself. This notebook demonstrates this with simple example

For the sake of this example, let's choose and metric and inspect the default prompts

from ragas.metrics import SimpleCriteriaScoreWithReference

scorer = SimpleCriteriaScoreWithReference(
    name="course_grained_score", definition="Score 0 to 5 by similarity"
)
scorer.get_prompts()
{'multi_turn_prompt': <ragas.metrics._simple_criteria.MultiTurnSimpleCriteriaWithReferencePrompt at 0x7fcf409c3880>,
 'single_turn_prompt': <ragas.metrics._simple_criteria.SingleTurnSimpleCriteriaWithReferencePrompt at 0x7fcf409c3a00>}

As you can see, the instruction and demonstration are both in english. Setting up LLM to be used for this conversion

from ragas.llms import llm_factory

llm = llm_factory()

To view the supported language codes

from ragas.utils import RAGAS_SUPPORTED_LANGUAGE_CODES

print(list(RAGAS_SUPPORTED_LANGUAGE_CODES.keys()))
['english', 'hindi', 'marathi', 'chinese', 'spanish', 'amharic', 'arabic', 'armenian', 'bulgarian', 'urdu', 'russian', 'polish', 'persian', 'dutch', 'danish', 'french', 'burmese', 'greek', 'italian', 'japanese', 'deutsch', 'kazakh', 'slovak']

Now let's adapt it to 'hindi' as the target language using adapt method. Language adaptation in Ragas works by translating few shot examples given along with the prompts to the target language. Instructions remains in english.

adapted_prompts = await scorer.adapt_prompts(language="hindi", llm=llm)

Inspect the adapted prompts and make corrections if needed

adapted_prompts
{'multi_turn_prompt': <ragas.metrics._simple_criteria.MultiTurnSimpleCriteriaWithReferencePrompt at 0x7fcf42bc40a0>,
 'single_turn_prompt': <ragas.metrics._simple_criteria.SingleTurnSimpleCriteriaWithReferencePrompt at 0x7fcf722de890>}

set the prompts to new adapted prompts using set_prompts method

scorer.set_prompts(**adapted_prompts)

Evaluate using adapted metrics

from ragas.dataset_schema import SingleTurnSample

sample = SingleTurnSample(
    user_input="рдПрдлрд┐рд▓ рдЯреЙрд╡рд░ рдХрд╣рд╛рдБ рд╕реНрдерд┐рдд рд╣реИ?",
    response="рдПрдлрд┐рд▓ рдЯреЙрд╡рд░ рдкреЗрд░рд┐рд╕ рдореЗрдВ рд╕реНрдерд┐рдд рд╣реИред",
    reference="рдПрдлрд┐рд▓ рдЯреЙрд╡рд░ рдорд┐рд╕реНрд░ рдореЗрдВ рд╕реНрдерд┐рдд рд╣реИ",
)

scorer.llm = llm
await scorer.single_turn_ascore(sample)
0

Trace of reasoning and score

{ "reason": "рдкреНрд░рддрд┐рдХреНрд░рд┐рдпрд╛ рдФрд░ рд╕рдВрджрд░реНрдн рдХреЗ рдЙрддреНрддрд░ рдореЗрдВ рд╕реНрдерд╛рди рдХреЗ рд╕рдВрджрд░реНрдн рдореЗрдВ рдорд╣рддреНрд╡рдкреВрд░реНрдг рднрд┐рдиреНрдирддрд╛ рд╣реИред", "score": 0 }