Using Amazon Bedrock¶
Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.
This tutorial will show you how to use Amazon Bedrock with Ragas.
Note
this guide is for folks who are using the Amazon Bedrock endpoints. Check the evaluation guide if your using OpenAI endpoints.
Metrics¶
Load sample dataset¶
# data
from datasets import load_dataset
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
amnesty_qa
Repo card metadata block was not found. Setting CardData to empty.
Lets import metrics that we are going to use
from ragas.metrics import (
context_precision,
faithfulness,
context_recall,
)
from ragas.metrics.critique import harmfulness
# list of metrics we're going to use
metrics = [
faithfulness,
context_recall,
context_precision,
harmfulness,
]
Now lets use the llm from Bedrock using BedrockChat
class from Langchain. Init a new instance of BedrockChat
with the model_id
of the model you want to use. You will also have to change the BedrockEmbeddings
in the evaluate function with the metrics that we use.
from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings
config = {
"credentials_profile_name": "your-profile-name", # E.g "default"
"region_name": "your-region-name", # E.g. "us-east-1"
"model_id": "your-model-id", # E.g "anthropic.claude-v2"
"model_kwargs": {"temperature": 0.4},
}
bedrock_model = BedrockChat(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
model_id=config["model_id"],
model_kwargs=config["model_kwargs"],
)
# init the embeddings
bedrock_embeddings = BedrockEmbeddings(
credentials_profile_name=config["credentials_profile_name"],
region_name=config["region_name"],
)
Now we can use the llm and embeddings with Bedrock
by passing it in the evaluate function.
Evaluation¶
Running the evalutation is as simple as calling evaluate on the Dataset
with the metrics of your choice.
amnesty_qa
DatasetDict({
eval: Dataset({
features: ['question', 'ground_truth', 'answer', 'contexts'],
num_rows: 20
})
})
from ragas import evaluate
import nest_asyncio # CHECK NOTES
# NOTES: Only used when running on a jupyter notebook, otherwise comment or remove this function.
nest_asyncio.apply()
result = evaluate(
amnesty_qa["eval"].select(range(3)),
metrics=metrics,
llm=bedrock_model,
embeddings=bedrock_embeddings,
)
result
Evaluating: 100%|██████████| 12/12 [01:05<00:00, 5.48s/it]
{'faithfulness': 0.6250, 'context_recall': 1.0000, 'context_precision': 1.0000, 'harmfulness': 0.0000}
and there you have the it, all the scores you need.
now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!
df = result.to_pandas()
df.head()
question | ground_truth | answer | contexts | faithfulness | context_recall | context_precision | harmfulness | |
---|---|---|---|---|---|---|---|---|
0 | What are the global implications of the USA Su... | [The global implications of the USA Supreme Co... | The global implications of the USA Supreme Cou... | [- In 2022, the USA Supreme Court handed down ... | NaN | NaN | 1.0 | NaN |
1 | Which companies are the main contributors to G... | [According to the Carbon Majors database, the ... | According to the Carbon Majors database, the m... | [- Fossil fuel companies, whether state or pri... | 0.00 | NaN | NaN | NaN |
2 | Which private companies in the Americas are th... | [The largest private companies in the Americas... | According to the Carbon Majors database, the l... | [The private companies responsible for the mos... | 0.25 | 1.0 | NaN | 0.0 |
Test Data Generation¶
Load the documents using desired dataloader.
from langchain_community.document_loaders import UnstructuredURLLoader
urls = [
"https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023",
"https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023",
]
loader = UnstructuredURLLoader(urls=urls)
documents = loader.load()
now we have documents created in the form of langchain Document
Next step is to wrap the embedding and llm model into ragas schema.
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings.base import LangchainEmbeddingsWrapper
bedrock_model = LangchainLLMWrapper(bedrock_model)
bedrock_embeddings = LangchainEmbeddingsWrapper(bedrock_embeddings)
Next Step is to create chunks from the documents and store the chunks InMemoryDocumentStore
from ragas.testset.extractor import KeyphraseExtractor
from langchain.text_splitter import TokenTextSplitter
from ragas.testset.docstore import InMemoryDocumentStore
splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=100)
keyphrase_extractor = KeyphraseExtractor(llm=bedrock_model)
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=bedrock_embeddings,
extractor=keyphrase_extractor,
)
Initializing TestsetGenerator
with required arguments and generating data
from ragas.testset import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
test_generator = TestsetGenerator(
generator_llm=bedrock_model,
critic_llm=bedrock_model,
embeddings=bedrock_embeddings,
docstore=docstore,
)
distributions = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}
# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = test_generator.generate_with_langchain_docs(
documents=documents, test_size=10, distributions=distributions
)
Export the results into pandas¶
test_df = testset.to_pandas()
test_df.head()
And thats it!
if you have any suggestion/feedbacks/things your not happy about, please do share it in the issue section. We love hearing from you 😁