Ragas CLI
The Ragas Command Line Interface (CLI) provides tools for quickly setting up evaluation projects and running experiments from the terminal.
Installation
The CLI is included with the ragas package:
Or use uvx to run without installation:
Available Commands
ragas quickstart
Create a complete evaluation project from a template. This is the fastest way to get started with Ragas.
Arguments:
TEMPLATE: Template name (optional). Leave empty to see available templates.
Options:
-o, --output-dir: Directory to create the project in (default: current directory)
Examples:
# List available templates
ragas quickstart
# Create a RAG evaluation project
ragas quickstart rag_eval
# Create project in a specific directory
ragas quickstart rag_eval --output-dir ./my-project
ragas evals
Run evaluations on a dataset using an evaluation file.
Arguments:
EVAL_FILE: Path to the evaluation file (required)
Options:
--dataset: Name of the dataset in the project (required)--metrics: Comma-separated list of metric field names to evaluate (required)--baseline: Baseline experiment name to compare against (optional)--name: Name of the experiment run (optional)
Example:
ragas hello_world
Create a simple hello world example to verify your installation.
Arguments:
DIRECTORY: Directory to create the example in (default: current directory)
Quickstart Templates
RAG & Retrieval
- RAG Evaluation (
rag_eval) - Evaluate RAG systems with custom metrics - Improve RAG (
improve_rag) - Compare naive vs agentic RAG approaches
Agent Evaluation
- Agent Evaluation (
agent_evals) - Evaluate AI agents solving math problems - LlamaIndex Agent Evaluation (
llamaIndex_agent_evals) - Evaluate LlamaIndex agents with tool call metrics
Specialized Use Cases
- Text-to-SQL Evaluation (
text2sql) - Evaluate text-to-SQL systems with execution accuracy - Workflow Evaluation (
workflow_eval) - Evaluate complex LLM workflows - Prompt Evaluation (
prompt_evals) - Compare different prompt variations
LLM Testing
- Judge Alignment (
judge_alignment) - Measure LLM-as-judge alignment with human standards - LLM Benchmarking (
benchmark_llm) - Benchmark and compare different LLM models
Quick Start
Get running in 60 seconds:
# Create project
uvx ragas quickstart rag_eval
cd rag_eval
# Install dependencies
uv sync
# Set API key
export OPENAI_API_KEY="your-key"
# Run evaluation
uv run python evals.py
Next Steps
- RAG Evaluation Guide - Detailed walkthrough of the rag_eval template
- Improve RAG Guide - Compare naive vs agentic RAG approaches
- Custom Metrics - Create your own evaluation metrics