Google Gemini Integration Guide

This guide covers setting up and using Google's Gemini models with Ragas for evaluation.

Overview

Ragas supports Google Gemini models with automatic adapter selection. The framework works with both the new google-genai SDK (recommended) and the legacy google-generativeai SDK.

Setup

Prerequisites

Google API Key with Gemini API access
Python 3.8+
Ragas installed

Installation

Install required dependencies:

# Recommended: New Google GenAI SDK
pip install ragas google-genai

# Legacy (deprecated, support ends Aug 2025)
pip install ragas google-generativeai

Configuration

Option 1: Using New Google GenAI SDK (Recommended)

The new google-genai SDK is the recommended approach:

import os
from google import genai
from ragas.llms import llm_factory

# Create client with API key
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))

# Create LLM - adapter is auto-detected for google provider
llm = llm_factory(
    "gemini-2.0-flash",
    provider="google",
    client=client
)

Option 2: Using Legacy SDK (Deprecated)

The old google-generativeai SDK still works but is deprecated (support ends Aug 2025):

import os
import google.generativeai as genai
from ragas.llms import llm_factory

# Configure with your API key
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))

# Create client
client = genai.GenerativeModel("gemini-2.0-flash")

# Create LLM
llm = llm_factory(
    "gemini-2.0-flash",
    provider="google",
    client=client
)

Option 3: Using LiteLLM Proxy (Advanced)

For advanced use cases where you need LiteLLM's proxy capabilities, set up the LiteLLM proxy server first, then use:

import os
from openai import OpenAI
from ragas.llms import llm_factory

# Requires running: litellm --model gemini-2.0-flash
client = OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"  # LiteLLM proxy endpoint
)

# Create LLM with explicit adapter selection
llm = llm_factory("gemini-2.0-flash", client=client, adapter="litellm")

Supported Models

Ragas works with all Gemini models:

Latest: gemini-2.0-flash (recommended)
1.5 Series: gemini-1.5-pro, gemini-1.5-flash
1.0 Series: gemini-1.0-pro

For the latest models and pricing, see Google AI Studio.

Embeddings Configuration

Ragas metrics fall into two categories:

LLM-only metrics (don't require embeddings):
ContextPrecision
ContextRecall
Faithfulness
AspectCritic
Embedding-dependent metrics (require embeddings):
AnswerCorrectness
AnswerRelevancy
AnswerSimilarity
SemanticSimilarity
ContextEntityRecall

Automatic Provider Matching

When using Ragas with Gemini, the embedding provider is automatically matched to your LLM provider. If you provide a Gemini LLM, Ragas will default to using Google embeddings. No OpenAI API key is needed.

Option 1: Default Embeddings (Recommended)

Let Ragas automatically select the right embeddings based on your LLM:

import os
from datasets import Dataset
from google import genai
from ragas import evaluate
from ragas.llms import llm_factory
from ragas.metrics import (
    AnswerCorrectness,
    ContextPrecision,
    ContextRecall,
    Faithfulness
)

# Initialize Gemini client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Create sample evaluation data
data = {
    "question": ["What is the capital of France?"],
    "answer": ["Paris is the capital of France."],
    "contexts": [["France is a country in Western Europe. Paris is its capital."]],
    "ground_truth": ["Paris"]
}

dataset = Dataset.from_dict(data)

# Define metrics - embeddings are auto-configured for Google
metrics = [
    ContextPrecision(llm=llm),
    ContextRecall(llm=llm),
    Faithfulness(llm=llm),
    AnswerCorrectness(llm=llm)  # Uses Google embeddings automatically
]

# Run evaluation
results = evaluate(dataset, metrics=metrics)
print(results)

Option 2: Explicit Embeddings

For explicit control over embeddings, you can create them separately. Google embeddings work with multiple configuration options:

import os
from google import genai
from ragas.llms import llm_factory
from ragas.embeddings import GoogleEmbeddings
from ragas.embeddings.base import embedding_factory
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import AnswerCorrectness, ContextPrecision, ContextRecall, Faithfulness

# Initialize Gemini client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Initialize Google embeddings (multiple options):

# Option A: Using the same client (recommended for new SDK)
embeddings = GoogleEmbeddings(client=client, model="gemini-embedding-001")

# Option B: Using embedding factory
embeddings = embedding_factory("google", model="gemini-embedding-001")

# Option C: Auto-import (creates client automatically)
embeddings = GoogleEmbeddings(model="gemini-embedding-001")

# Create sample evaluation data
data = {
    "question": ["What is the capital of France?"],
    "answer": ["Paris is the capital of France."],
    "contexts": [["France is a country in Western Europe. Paris is its capital."]],
    "ground_truth": ["Paris"]
}

dataset = Dataset.from_dict(data)

# Define metrics with explicit embeddings
metrics = [
    ContextPrecision(llm=llm),
    ContextRecall(llm=llm),
    Faithfulness(llm=llm),
    AnswerCorrectness(llm=llm, embeddings=embeddings)
]

# Run evaluation
results = evaluate(dataset, metrics=metrics)
print(results)

Example: Complete Evaluation

Here's a complete example evaluating a RAG application with Gemini (using automatic embedding provider matching):

import os
from datasets import Dataset
from google import genai
from ragas import evaluate
from ragas.llms import llm_factory
from ragas.metrics import (
    AnswerCorrectness,
    ContextPrecision,
    ContextRecall,
    Faithfulness
)

# Initialize Gemini client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Create sample evaluation data
data = {
    "question": ["What is the capital of France?"],
    "answer": ["Paris is the capital of France."],
    "contexts": [["France is a country in Western Europe. Paris is its capital."]],
    "ground_truth": ["Paris"]
}

dataset = Dataset.from_dict(data)

# Define metrics - embeddings automatically use Google provider
metrics = [
    ContextPrecision(llm=llm),
    ContextRecall(llm=llm),
    Faithfulness(llm=llm),
    AnswerCorrectness(llm=llm)
]

# Run evaluation
results = evaluate(dataset, metrics=metrics)
print(results)

Performance Considerations

Model Selection

gemini-2.0-flash: Best for speed and efficiency
gemini-1.5-pro: Better reasoning for complex evaluations
gemini-1.5-flash: Good balance of speed and cost

Cost Optimization

Gemini models are cost-effective. For large-scale evaluations:

Use gemini-2.0-flash for most metrics
Consider batch processing for multiple evaluations
Cache prompts when possible (Gemini supports prompt caching)

Async Support

For high-throughput evaluations, use async operations:

import os
from google import genai
from ragas.llms import llm_factory

# Create client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Use in async evaluation
# response = await llm.agenerate(prompt, ResponseModel)

Adapter Selection

Ragas automatically selects the appropriate adapter based on your setup:

# Auto-detection happens automatically
# For Gemini: uses LiteLLM adapter
# For other providers: uses Instructor adapter

# Explicit selection (if needed)
llm = llm_factory(
    "gemini-2.0-flash",
    client=client,
    adapter="litellm"  # Explicit adapter selection
)

# Check auto-detected adapter
from ragas.llms.adapters import auto_detect_adapter
adapter_name = auto_detect_adapter(client, "google")
print(f"Using adapter: {adapter_name}")  # Output: Using adapter: litellm

Troubleshooting

API Key Issues

# Make sure your API key is set
import os
if not os.environ.get("GOOGLE_API_KEY"):
    raise ValueError("GOOGLE_API_KEY environment variable not set")

Known Issue: Instructor Safety Settings (New SDK)

There is a known upstream issue with the instructor library where it sends invalid safety settings to the Gemini API when using the new google-genai SDK. This may cause errors like:

Invalid value at 'safety_settings[5].category'... "HARM_CATEGORY_JAILBREAK"

Workarounds:

Use the OpenAI-compatible endpoint (recommended for now):

from openai import OpenAI
client = OpenAI(
    api_key=os.environ.get("GOOGLE_API_KEY"),
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
llm = llm_factory("gemini-2.0-flash", provider="openai", client=client)

Track the upstream issue: instructor#1658

Note: Embeddings work correctly with the new SDK - this issue only affects LLM generation.

Rate Limits

Gemini has rate limits. For production use, the LLM adapter handles retries and timeouts automatically. If you need fine-grained control, ensure your client is properly configured with appropriate timeouts at the HTTP client level.

Model Availability

If a model isn't available:

Check your region/quota in Google Cloud Console
Try a different model from the supported list
Verify your API key has access to the Generative AI API

Migration from Other Providers

From OpenAI

# Before: OpenAI-only
from openai import OpenAI
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
llm = llm_factory("gpt-4o", client=client)

# After: Gemini with new SDK
from google import genai
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

From Anthropic

# Before: Anthropic
from anthropic import Anthropic
client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
llm = llm_factory("claude-3-sonnet", provider="anthropic", client=client)

# After: Gemini with new SDK
from google import genai
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

From Legacy google-generativeai SDK

# Before: Legacy SDK (deprecated)
import google.generativeai as genai
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
client = genai.GenerativeModel("gemini-2.0-flash")
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# After: New SDK (recommended)
from google import genai
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

Using with Metrics Collections (Modern Approach)

For the modern metrics collections API, you need to explicitly create both LLM and embeddings:

import os
from google import genai
from ragas.llms import llm_factory
from ragas.embeddings import GoogleEmbeddings
from ragas.metrics.collections import AnswerCorrectness, ContextPrecision

# Create client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))

# Create LLM
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Create embeddings using the same client
embeddings = GoogleEmbeddings(client=client, model="gemini-embedding-001")

# Create metrics with explicit LLM and embeddings
metrics = [
    ContextPrecision(llm=llm),  # LLM-only metric
    AnswerCorrectness(llm=llm, embeddings=embeddings),  # Needs both
]

# Use metrics with your evaluation workflow
result = await metrics[1].ascore(
    user_input="What is the capital of France?",
    response="Paris",
    reference="Paris is the capital of France."
)

Key difference from legacy approach: - Legacy evaluate(): Auto-creates embeddings from LLM provider - Modern collections: You explicitly pass embeddings to each metric

This gives you more control and works seamlessly with Gemini!

Supported Metrics

All Ragas metrics work with Gemini:

Answer Correctness
Answer Relevancy
Answer Similarity
Aspect Critique
Context Precision
Context Recall
Context Entities Recall
Faithfulness
NLI Eval
Response Relevancy

See Metrics Reference for details.

Advanced: Custom Model Parameters

Pass custom parameters to Gemini:

llm = llm_factory(
    "gemini-2.0-flash",
    client=client,
    temperature=0.5,
    max_tokens=2048,
    top_p=0.9,
    top_k=40,
)