DSPy Optimizer for Advanced Prompt Optimization
The DSPyOptimizer provides state-of-the-art prompt optimization for Ragas metrics using DSPy's MIPROv2 algorithm. It combines instruction and demonstration optimization to find better prompts than simple evolutionary approaches.
Overview
DSPyOptimizer uses MIPROv2 (Multi-prompt Instruction Proposal with Ranked Outcomes) to optimize metric prompts through:
- Instruction optimization: Generates and tests multiple prompt variations
- Demonstration optimization: Automatically selects effective few-shot examples
- Combined search: Explores both instruction and demonstration spaces simultaneously
This typically produces better results than the simpler GeneticOptimizer, especially when you have high-quality annotated data.
Installation
DSPy is an optional dependency. Install it with:
Basic Usage
Prerequisites
You need:
- Annotated dataset: Ground truth scores for your metric
- Metric with prompts: A metric that uses PydanticPrompt (most Ragas metrics)
- LLM: An LLM for optimization (gpt-4o-mini recommended for cost)
Quick Start
from openai import OpenAI
from ragas.llms import llm_factory
from ragas.metrics.collections import Faithfulness
from ragas.optimizers import DSPyOptimizer
from ragas.config import InstructionConfig
# Setup LLM for optimization
client = OpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
# Initialize metric
metric = Faithfulness(llm=llm)
# Create annotated dataset (see below for format)
dataset = create_annotated_dataset()
# Configure DSPy optimizer
config = InstructionConfig(
llm=llm,
optimizer=DSPyOptimizer(
num_candidates=10, # Try 10 prompt variations
max_bootstrapped_demos=5, # Generate up to 5 examples
max_labeled_demos=5, # Use up to 5 human annotations
)
)
# Optimize the metric's prompts
metric.optimize_prompts(dataset, config)
# Save optimized prompts for reuse
metric.save_prompts("optimized_faithfulness.json")
Annotated Dataset Format
DSPy optimizer requires ground truth annotations:
from ragas.dataset_schema import (
PromptAnnotation,
SampleAnnotation,
SingleMetricAnnotation
)
# Create prompt annotations
prompt_annotation = PromptAnnotation(
prompt_input={"user_input": "...", "response": "..."},
prompt_output={"score": 0.9}, # Actual metric output
edited_output=None, # Or corrected output if needed
)
# Create sample with annotations
sample = SampleAnnotation(
metric_input={"user_input": "...", "response": "..."},
metric_output=0.9, # Ground truth score
prompts={"faithfulness_prompt": prompt_annotation},
is_accepted=True, # Whether to use in optimization
)
# Create dataset
dataset = SingleMetricAnnotation(
name="faithfulness",
samples=[sample, ...] # Need 20-50+ samples for best results
)
Advanced Configuration
Optimization Parameters
Control MIPROv2 behavior:
optimizer = DSPyOptimizer(
num_candidates=20, # More candidates = better prompts, higher cost
max_bootstrapped_demos=10, # Auto-generated few-shot examples
max_labeled_demos=10, # Human-annotated examples to use
init_temperature=1.0, # Exploration temperature (0.0-2.0)
)
Parameter Guide:
| Parameter | Default | Description | Cost Impact |
|---|---|---|---|
num_candidates |
10 | Prompt variations to try | High - linear scaling |
max_bootstrapped_demos |
5 | Auto-generated examples | Medium - adds LLM calls |
max_labeled_demos |
5 | Human annotations to use | Low - uses existing data |
init_temperature |
1.0 | Exploration randomness | None - algorithmic only |
Cost Optimization
MIPROv2 optimization can be expensive. Reduce costs by:
# Budget-conscious configuration
budget_optimizer = DSPyOptimizer(
num_candidates=5, # Fewer candidates
max_bootstrapped_demos=2, # Fewer generated examples
max_labeled_demos=3, # More reliance on annotations
init_temperature=0.5, # Less exploration
)
# Use cheaper LLM for optimization
cheap_llm = llm_factory("gpt-4o-mini", client=client)
config = InstructionConfig(llm=cheap_llm, optimizer=budget_optimizer)
Cost Estimation:
- ~10-50 LLM calls per candidate
- ~5-10 calls per bootstrapped demo
- Total:
num_candidates * 30 + max_bootstrapped_demos * 7calls (approximate)
Comparing with GeneticOptimizer
When to Use DSPyOptimizer
✅ Use DSPyOptimizer when:
- You have 50+ high-quality annotated examples
- You need the best possible metric accuracy
- You can afford 100-500 LLM calls for optimization
- You're optimizing critical production metrics
When to Use GeneticOptimizer
✅ Use GeneticOptimizer when:
- You have limited annotated data (<20 examples)
- You need faster, cheaper optimization
- You're doing initial prototyping
- Simple instruction-only optimization is sufficient
Side-by-Side Comparison
from ragas.optimizers import GeneticOptimizer, DSPyOptimizer
# Genetic optimizer - simpler, faster, cheaper
genetic_config = InstructionConfig(
llm=llm,
optimizer=GeneticOptimizer(
max_steps=50, # Evolution steps
population_size=10, # Population per generation
)
)
# DSPy optimizer - advanced, better results, more expensive
dspy_config = InstructionConfig(
llm=llm,
optimizer=DSPyOptimizer(
num_candidates=10,
max_bootstrapped_demos=5,
max_labeled_demos=5,
)
)
# Compare results
metric_genetic = Faithfulness(llm=llm)
metric_genetic.optimize_prompts(dataset, genetic_config)
metric_dspy = Faithfulness(llm=llm)
metric_dspy.optimize_prompts(dataset, dspy_config)
# Evaluate on holdout set
test_scores_genetic = metric_genetic.batch_score(test_set)
test_scores_dspy = metric_dspy.batch_score(test_set)
Typical Results:
| Metric | GeneticOptimizer | DSPyOptimizer | Improvement |
|---|---|---|---|
| Faithfulness | 0.82 | 0.89 | +8.5% |
| Answer Relevancy | 0.75 | 0.84 | +12% |
| Context Precision | 0.78 | 0.86 | +10% |
Working with Multiple Metrics
Optimize several metrics with the same approach:
from ragas.metrics.collections import (
Faithfulness,
AnswerRelevancy,
ContextPrecision
)
metrics = {
"faithfulness": Faithfulness(llm=llm),
"answer_relevancy": AnswerRelevancy(llm=llm),
"context_precision": ContextPrecision(llm=llm),
}
# Optimize each metric
for name, metric in metrics.items():
print(f"Optimizing {name}...")
# Load metric-specific dataset
dataset = load_annotated_dataset(name)
# Optimize
metric.optimize_prompts(dataset, dspy_config)
# Save
metric.save_prompts(f"optimized_{name}.json")
Troubleshooting
Import Error
If you get ImportError: DSPy optimizer requires dspy-ai:
Optimization Takes Too Long
Reduce the number of LLM calls:
fast_optimizer = DSPyOptimizer(
num_candidates=3, # Minimum viable
max_bootstrapped_demos=1,
max_labeled_demos=3,
)
Poor Results
Common causes:
- Insufficient data: Need 20+ high-quality annotations
- Low-quality annotations: Ensure ground truth scores are accurate
- Wrong LLM: Use gpt-4o or better for optimization
- Bad configuration: Try default parameters first
Memory Issues
MIPROv2 can use significant memory for large datasets:
# Process in smaller batches
from ragas.dataset_schema import SingleMetricAnnotation
def optimize_in_batches(dataset, batch_size=20):
# Split dataset
batches = [
dataset.select(range(i, min(i + batch_size, len(dataset.samples))))
for i in range(0, len(dataset.samples), batch_size)
]
# Optimize on first batch for speed
best_batch = batches[0]
metric.optimize_prompts(best_batch, dspy_config)
Best Practices
Data Quality
- Diverse examples: Cover edge cases and common scenarios
- Accurate labels: Double-check ground truth scores
- Sufficient quantity: 50+ examples for production metrics
Optimization Strategy
- Start small: Test with 3-5 candidates first
- Iterate: Gradually increase parameters as needed
- Validate: Always test on a holdout set
- Cache: Save optimized prompts to avoid re-running
Production Deployment
# 1. Optimize offline
metric = Faithfulness(llm=optimization_llm)
metric.optimize_prompts(training_dataset, dspy_config)
metric.save_prompts("production_faithfulness.json")
# 2. Load in production
production_metric = Faithfulness(llm=production_llm)
production_metric.load_prompts("production_faithfulness.json")
# 3. Use for evaluation
results = production_metric.batch_score(production_samples)
See Also
- Optimizers API Reference - Full API documentation
- Metric Customization - Creating custom metrics
- DSPy Documentation - Learn more about DSPy