Using Ragas Critic Model instead of GPT-4¶
Synthetic test data generation using LLMs for two purposes:
Generation of QA pairs, evolution, etc
LLM as Critic model to give feedback to generated QA pairs to ensure and improve quality
We have built and opensourced a custom model as critic model to be used instead of GPT-4 (default). This model is available here for free and can deliver upto 200 tokens per second of an A10 instance.
Follow the rest of the notebook to use this model as critic model instead of GPT-4.
Initialize the required Models¶
from langchain_openai import ChatOpenAI
import os
from ragas.testset.prompts import (
context_scoring_prompt,
evolution_elimination_prompt,
filter_question_prompt,
)
from langchain_community.document_loaders import DirectoryLoader
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
/home/shahul/.conda/envs/ragas/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Setting up generator Model (gpt-3.5)¶
Any model to be used as generator - here gpt 3.5 or use any models by checking docs
os.environ["OPENAI_API_KEY"] = "key"
Setting up Model¶
Ragas critic can generate upto 200 tokens/sec on a single A10 instance
Host the model using VLLM
Run this on your terminal with GPU enabled
python -m vllm.entrypoints.openai.api_server --model explodinggradients/Ragas-critic-llm-Qwen1.5-GPTQ
inference_server_url = "http://localhost:8000/v1"
MODEL = "explodinggradients/Ragas-critic-llm-Qwen1.5-GPTQ"
chat = ChatOpenAI(
model=MODEL,
openai_api_key="token-abc123",
openai_api_base=inference_server_url,
max_tokens=2048,
temperature=0,
)
Set up custom Critic Model instead of GPT-4¶
# remove demonstrations from examples
for prompt in [
context_scoring_prompt,
evolution_elimination_prompt,
filter_question_prompt,
]:
prompt.examples = []
from ragas.testset.filters import QuestionFilter, EvolutionFilter, NodeFilter
from ragas.llms import LangchainLLMWrapper
langchain_llm = LangchainLLMWrapper(chat)
qa_filter = QuestionFilter(langchain_llm, filter_question_prompt)
node_filter = NodeFilter(langchain_llm, context_scoring_prompt=context_scoring_prompt)
evolution_filter = EvolutionFilter(langchain_llm, evolution_elimination_prompt)
distributions = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}
# customise the filters
from ragas.testset.evolutions import ComplexEvolution
for evolution in distributions:
if evolution.question_filter is None:
evolution.question_filter = qa_filter
if evolution.node_filter is None:
evolution.node_filter = node_filter
if isinstance(evolution, ComplexEvolution):
if evolution.evolution_filter is None:
evolution.evolution_filter = evolution_filter
Loading data¶
! git clone https://huggingface.co/datasets/explodinggradients/prompt-engineering-guide-papers
fatal: destination path 'prompt-engineering-guide-papers' already exists and is not an empty directory.
loader = DirectoryLoader("./prompt-engineering-guide-papers/", glob="*.pdf")
documents = loader.load()
for document in documents:
document.metadata["filename"] = document.metadata["source"]
documents = [doc for doc in documents if len(doc.page_content.split()) > 5000]
len(documents)
4
Generating¶
generator = TestsetGenerator.with_openai(chunk_size=512)
testset = generator.generate_with_langchain_docs(
documents[:10],
test_size=10,
raise_exceptions=False,
with_debugging_logs=False,
distributions=distributions,
)
/tmp/ipykernel_10833/120543537.py:1: DeprecationWarning: The function with_openai was deprecated in 0.1.4, and will be removed in the 0.2.0 release. Use from_langchain instead.
generator = TestsetGenerator.with_openai(chunk_size=512)
Generating: 90%|█████████ | 9/10 [00:12<00:01, 1.44s/it] Failed to parse output. Returning None.
Failed to parse output. Returning None.
Generating: 100%|██████████| 10/10 [00:18<00:00, 1.88s/it]
testset.to_pandas()
question | contexts | ground_truth | evolution_type | metadata | episode_done | |
---|---|---|---|---|---|---|
0 | What is GPT-Neo and its significance in the fi... | [ in robotic affordances, 2022. URL https://ar... | GPT-Neo is a large-scale autoregressive langua... | simple | [{'source': 'prompt-engineering-guide-papers/2... | True |
1 | What action did the assistant take after findi... | [ can you bring me some chips.\n\nExplanation:... | nan | simple | [{'source': 'prompt-engineering-guide-papers/2... | True |
2 | What is the bootstrapping version of Auto-CoT ... | [\n8\n\n9 10\n\nFigure 6: Effect of wrong demo... | The bootstrapping version of Auto-CoT is calle... | simple | [{'source': 'prompt-engineering-guide-papers/2... | True |
3 | What is the purpose or function of Few-Shot-CoT? | [ candy last her? A: Megan received 11 pieces ... | nan | simple | [{'source': 'prompt-engineering-guide-papers/2... | True |
4 | What is the focus of the paper "Zero-shot text... | [, China. Association for Computational Lingui... | The focus of the paper "Zero-shot text classif... | simple | [{'source': 'prompt-engineering-guide-papers/2... | True |
5 | How can diversity-based sampling in Auto-CoT m... | [ multiple similar questions inside a frequent... | The clustering-based sampling method in Auto-C... | reasoning | [{'source': 'prompt-engineering-guide-papers/2... | True |
6 | What error category did the model miss when de... | [ was missed by the model. An example of this ... | one step missing error | reasoning | [{'source': 'prompt-engineering-guide-papers/2... | True |
7 | Q: If Luke made 9 dollars mowing lawns and 18 ... | [ pick up 9 trays from one table and 7 trays f... | Let’s think step by step. Luke made 9 dollars ... | multi_context | [{'source': 'prompt-engineering-guide-papers/2... | True |
8 | How can the number of trees planted by the gro... | [ION: Can you bring me something salty?\n\nMOD... | There are 21 trees after the grove workers pla... | multi_context | [{'source': 'prompt-engineering-guide-papers/2... | True |
9 | Q: If Megan received 11 pieces of candy from n... | [ the number of trees they planted. So, they m... | Megan received a total of 16 pieces of candy a... | multi_context | [{'source': 'prompt-engineering-guide-papers/2... | True |