Using Ragas Critic Model instead of GPT-4

Synthetic test data generation using LLMs for two purposes:

  1. Generation of QA pairs, evolution, etc

  2. LLM as Critic model to give feedback to generated QA pairs to ensure and improve quality

We have built and opensourced a custom model as critic model to be used instead of GPT-4 (default). This model is avaialble here for free and can deliver upto 200 tokens per second of an A10 instance.

Follow the rest of the notebook to use this model as critic model instead of GPT-4.

Importing required modules

from langchain_openai import ChatOpenAI
import os

from ragas.testset.prompts import (
from langchain.document_loaders import DirectoryLoader
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
/home/shahul/.conda/envs/ragas/lib/python3.10/site-packages/tqdm/ TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See
  from .autonotebook import tqdm as notebook_tqdm

Setting up generator Model (gpt-3.5)

Any model to be used as generator - here gpt 3.5 or use any models by checking docs

os.environ["OPENAI_API_KEY"] = "key"

Setting up Model

Ragas critic can generate upto 200 tokens/sec on a single A10 instance

Host the model using VLLM

Run this on your terminal with GPU enabled

python -m vllm.entrypoints.openai.api_server --model explodinggradients/Ragas-critic-llm-Qwen1.5-GPTQ
inference_server_url = "http://localhost:8000/v1"
MODEL = "explodinggradients/Ragas-critic-llm-Qwen1.5-GPTQ"
chat = ChatOpenAI(

Set up custom Critic Model instead of GPT-4

# remove demonstrations from examples
for prompt in [
    prompt.examples = []
from ragas.testset.filters import QuestionFilter, EvolutionFilter, NodeFilter

from ragas.llms import LangchainLLMWrapper

langchain_llm = LangchainLLMWrapper(chat)

qa_filter = QuestionFilter(langchain_llm, filter_question_prompt)
node_filter = NodeFilter(langchain_llm, context_scoring_prompt=context_scoring_prompt)
evolution_filter = EvolutionFilter(langchain_llm, evolution_elimination_prompt)
distributions = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}
# customise the filters
from ragas.testset.evolutions import ComplexEvolution

for evolution in distributions:
    if evolution.question_filter is None:
        evolution.question_filter = qa_filter
    if evolution.node_filter is None:
        evolution.node_filter = node_filter

    if isinstance(evolution, ComplexEvolution):
        if evolution.evolution_filter is None:
            evolution.evolution_filter = evolution_filter

Loading data

! git clone
fatal: destination path 'prompt-engineering-guide-papers' already exists and is not an empty directory.
loader = DirectoryLoader("./prompt-engineering-guide-papers/", glob="*.pdf")
documents = loader.load()

for document in documents:
    document.metadata["filename"] = document.metadata["source"]

documents = [doc for doc in documents if len(doc.page_content.split()) > 5000]


generator = TestsetGenerator.with_openai(chunk_size=512)
testset = generator.generate_with_langchain_docs(
/tmp/ipykernel_10833/ DeprecationWarning: The function with_openai was deprecated in 0.1.4, and will be removed in the 0.2.0 release. Use from_langchain instead.
  generator = TestsetGenerator.with_openai(chunk_size=512)
Generating:  90%|█████████ | 9/10 [00:12<00:01,  1.44s/it]        Failed to parse output. Returning None.
Failed to parse output. Returning None.
Generating: 100%|██████████| 10/10 [00:18<00:00,  1.88s/it]
question contexts ground_truth evolution_type metadata episode_done
0 What is GPT-Neo and its significance in the fi... [ in robotic affordances, 2022. URL https://ar... GPT-Neo is a large-scale autoregressive langua... simple [{'source': 'prompt-engineering-guide-papers/2... True
1 What action did the assistant take after findi... [ can you bring me some chips.\n\nExplanation:... nan simple [{'source': 'prompt-engineering-guide-papers/2... True
2 What is the bootstrapping version of Auto-CoT ... [\n8\n\n9 10\n\nFigure 6: Effect of wrong demo... The bootstrapping version of Auto-CoT is calle... simple [{'source': 'prompt-engineering-guide-papers/2... True
3 What is the purpose or function of Few-Shot-CoT? [ candy last her? A: Megan received 11 pieces ... nan simple [{'source': 'prompt-engineering-guide-papers/2... True
4 What is the focus of the paper "Zero-shot text... [, China. Association for Computational Lingui... The focus of the paper "Zero-shot text classif... simple [{'source': 'prompt-engineering-guide-papers/2... True
5 How can diversity-based sampling in Auto-CoT m... [ multiple similar questions inside a frequent... The clustering-based sampling method in Auto-C... reasoning [{'source': 'prompt-engineering-guide-papers/2... True
6 What error category did the model miss when de... [ was missed by the model. An example of this ... one step missing error reasoning [{'source': 'prompt-engineering-guide-papers/2... True
7 Q: If Luke made 9 dollars mowing lawns and 18 ... [ pick up 9 trays from one table and 7 trays f... Let’s think step by step. Luke made 9 dollars ... multi_context [{'source': 'prompt-engineering-guide-papers/2... True
8 How can the number of trees planted by the gro... [ION: Can you bring me something salty?\n\nMOD... There are 21 trees after the grove workers pla... multi_context [{'source': 'prompt-engineering-guide-papers/2... True
9 Q: If Megan received 11 pieces of candy from n... [ the number of trees they planted. So, they m... Megan received a total of 16 pieces of candy a... multi_context [{'source': 'prompt-engineering-guide-papers/2... True