Using Azure OpenAI¶

This tutorial will show you how to use Azure OpenAI endpoints instead of OpenAI endpoints.

Evaluation
Test set generation

Note

this guide is for folks who are using the Azure OpenAI endpoints. Check the evaluation guide if your using OpenAI endpoints.

Load sample dataset¶

# data
from datasets import load_dataset

amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
amnesty_qa

Found cached dataset fiqa (/home/jjmachan/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8)

DatasetDict({
    baseline: Dataset({
        features: ['question', 'ground_truth', 'answer', 'contexts'],
        num_rows: 30
    })
})

Lets import metrics that we are going to use. To learn more about what each metrics do, check out this doc

from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
)
from ragas.metrics.critique import harmfulness

# list of metrics we're going to use
metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    harmfulness,
]

Configuring them for Azure OpenAI endpoints¶

Ragas also uses AzureOpenAI for running some metrics so make sure you have your Azure OpenAI key, base URL and other information available in your environment. You can check the langchain docs or the Azure docs for more information.

But basically you need the following information.

azure_configs = {
    "base_url": "https://<your-endpoint>.openai.azure.com/",
    "model_deployment": "your-deployment-name",
    "model_name": "your-model-name",
    "embedding_deployment": "your-deployment-name",
    "embedding_name": "text-embedding-ada-002",  # most likely
}

import os

# assuming you already have you key available via your environment variable. If not use this
# os.environ["AZURE_OPENAI_API_KEY"] = "..."

Now lets create the chat model and embedding model instances so that ragas can use it for evaluation.

from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas import evaluate

azure_model = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
)

# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["embedding_deployment"],
    model=azure_configs["embedding_name"],
)

In case of any doubts on how to configure the Azure endpont through langchain do reffer to the AzureChatOpenai and AzureOpenAIEmbeddings documentations from the langchain docs.

Evaluation¶

Running the evalutation is as simple as calling evaluate on the Dataset with the metrics of your choice.

result = evaluate(
    amnesty_qa["eval"], metrics=metrics, llm=azure_model, embeddings=azure_embeddings
)

result

{'faithfulness': 0.7083, 'answer_relevancy': 0.9416, 'context_recall': 0.7762, 'context_precision': 0.8000, 'harmfulness': 0.0000}

and there you have the it, all the scores you need.

now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!

df = result.to_pandas()
df.head()

	question	ground_truth	answer	contexts	faithfulness	answer_relevancy	context_recall	context_precision
0	How to deposit a cheque issued to an associate...	[Have the check reissued to the proper payee.J...	\nThe best way to deposit a cheque issued to a...	[Just have the associate sign the back and the...	1.0	0.982491	0.888889	1.0
1	Can I send a money order from USPS as a business?	[Sure you can. You can fill in whatever you w...	\nYes, you can send a money order from USPS as...	[Sure you can. You can fill in whatever you w...	1.0	0.995249	1.000000	1.0
2	1 EIN doing business under multiple business n...	[You're confusing a lot of things here. Compan...	\nYes, it is possible to have one EIN doing bu...	[You're confusing a lot of things here. Compan...	1.0	0.948876	1.000000	1.0
3	Applying for and receiving business credit	["I'm afraid the great myth of limited liabili...	\nApplying for and receiving business credit c...	[Set up a meeting with the bank that handles y...	1.0	0.813285	1.000000	1.0
4	401k Transfer After Business Closure	[You should probably consult an attorney. Howe...	\nIf your employer has closed and you need to ...	[The time horizon for your 401K/IRA is essenti...	0.0	0.894836	0.000000	0.0

And thats it!

if you have any suggestion/feedbacks/things your not happy about, please do share it in the issue section. We love hearing from you 😁

Test set generation¶

Here you will learn how to generate a test set from your dataset using the Azure OpenAI endpoints.

! git clone https://huggingface.co/datasets/explodinggradients/2023-llm-papers

from langchain.document_loaders import DirectoryLoader
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context


loader = DirectoryLoader(
    "./2023-llm-papers/", use_multithreading=True, silent_errors=True, sample_size=1
)
documents = loader.load()

for document in documents:
    document.metadata["filename"] = document.metadata["source"]

Use the azure_model and azure_embedding that we initialized in above section to generate the test set

generator = TestsetGenerator.from_langchain(
    generator_llm=azure_model, critic_llm=azure_model, embeddings=azure_embeddings
)

testset = generator.generate_with_langchain_docs(
    documents,
    test_size=10,
    raise_exceptions=False,
    with_debugging_logs=False,
    distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
)

testset.to_pandas()

Using Ragas Critic Model instead of GPT-4

Using Amazon Bedrock