Using Amazon Bedrock¶

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.

This tutorial will show you how to use Amazon Bedrock with Ragas.

Metrics
Testset generation

Note

this guide is for folks who are using the Amazon Bedrock endpoints. Check the evaluation guide if your using OpenAI endpoints.

Metrics¶

Load sample dataset¶

# data
from datasets import load_dataset

amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
amnesty_qa

Repo card metadata block was not found. Setting CardData to empty.

Lets import metrics that we are going to use

from ragas.metrics import (
    context_precision,
    faithfulness,
    context_recall,
)
from ragas.metrics.critique import harmfulness

# list of metrics we're going to use
metrics = [
    faithfulness,
    context_recall,
    context_precision,
    harmfulness,
]

Now lets use the llm from Bedrock using BedrockChat class from Langchain. Init a new instance of BedrockChat with the model_id of the model you want to use. You will also have to change the BedrockEmbeddings in the evaluate function with the metrics that we use.

from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings

config = {
    "credentials_profile_name": "your-profile-name",  # E.g "default"
    "region_name": "your-region-name",  # E.g. "us-east-1"
    "model_id": "your-model-id",  # E.g "anthropic.claude-v2"
    "model_kwargs": {"temperature": 0.4},
}

bedrock_model = BedrockChat(
    credentials_profile_name=config["credentials_profile_name"],
    region_name=config["region_name"],
    endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
    model_id=config["model_id"],
    model_kwargs=config["model_kwargs"],
)

# init the embeddings
bedrock_embeddings = BedrockEmbeddings(
    credentials_profile_name=config["credentials_profile_name"],
    region_name=config["region_name"],
)

Now we can use the llm and embeddings with Bedrock by passing it in the evaluate function.

Evaluation¶

Running the evalutation is as simple as calling evaluate on the Dataset with the metrics of your choice.

amnesty_qa

DatasetDict({
    eval: Dataset({
        features: ['question', 'ground_truth', 'answer', 'contexts'],
        num_rows: 20
    })
})

from ragas import evaluate
import nest_asyncio  # CHECK NOTES

# NOTES: Only used when running on a jupyter notebook, otherwise comment or remove this function.
nest_asyncio.apply()

result = evaluate(
    amnesty_qa["eval"].select(range(3)),
    metrics=metrics,
    llm=bedrock_model,
    embeddings=bedrock_embeddings,
)

result

Evaluating: 100%|██████████| 12/12 [01:05<00:00,  5.48s/it]

{'faithfulness': 0.6250, 'context_recall': 1.0000, 'context_precision': 1.0000, 'harmfulness': 0.0000}

and there you have the it, all the scores you need.

now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!

df = result.to_pandas()
df.head()

	question	ground_truth	answer	contexts	faithfulness	context_recall	context_precision	harmfulness
0	What are the global implications of the USA Su...	[The global implications of the USA Supreme Co...	The global implications of the USA Supreme Cou...	[- In 2022, the USA Supreme Court handed down ...	NaN	NaN	1.0	NaN
1	Which companies are the main contributors to G...	[According to the Carbon Majors database, the ...	According to the Carbon Majors database, the m...	[- Fossil fuel companies, whether state or pri...	0.00	NaN	NaN	NaN
2	Which private companies in the Americas are th...	[The largest private companies in the Americas...	According to the Carbon Majors database, the l...	[The private companies responsible for the mos...	0.25	1.0	NaN	0.0

Test Data Generation¶

Load the documents using desired dataloader.

from langchain_community.document_loaders import UnstructuredURLLoader

urls = [
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023",
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023",
]
loader = UnstructuredURLLoader(urls=urls)
documents = loader.load()

now we have documents created in the form of langchain Document Next step is to wrap the embedding and llm model into ragas schema.

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings.base import LangchainEmbeddingsWrapper

bedrock_model = LangchainLLMWrapper(bedrock_model)
bedrock_embeddings = LangchainEmbeddingsWrapper(bedrock_embeddings)

Next Step is to create chunks from the documents and store the chunks InMemoryDocumentStore

from ragas.testset.extractor import KeyphraseExtractor
from langchain.text_splitter import TokenTextSplitter
from ragas.testset.docstore import InMemoryDocumentStore

splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=100)
keyphrase_extractor = KeyphraseExtractor(llm=bedrock_model)

docstore = InMemoryDocumentStore(
    splitter=splitter,
    embeddings=bedrock_embeddings,
    extractor=keyphrase_extractor,
)

Initializing TestsetGenerator with required arguments and generating data

from ragas.testset import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

test_generator = TestsetGenerator(
    generator_llm=bedrock_model,
    critic_llm=bedrock_model,
    embeddings=bedrock_embeddings,
    docstore=docstore,
)

distributions = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = test_generator.generate_with_langchain_docs(
    documents=documents, test_size=10, distributions=distributions
)

Export the results into pandas¶

test_df = testset.to_pandas()
test_df.head()

And thats it!

if you have any suggestion/feedbacks/things your not happy about, please do share it in the issue section. We love hearing from you 😁

Using Azure OpenAI

Using Vertex AI