Monitor Your RAG in Production¶

Maintaining the quality and performance of a RAG application in a production environment is challenging. RAG currently provides the essential building blocks for production-quality monitoring, offering valuable insights into your application’s performance. However, we are also working towards building a more advanced production monitoring solution by addressing three key areas:

How to ensure the distribution of your production dataset remains consistent with your test set.
How to effectively extract insights from the explicit and implicit signals your users provide to infer the quality of your RAG application and identify areas that require attention.
How to construct custom, smaller, more cost-effective, and faster models for evaluation and advanced test set generation.

Note

We are still developing and gathering feedback for upcoming releases. You can request early access to try it out or share the challenges you face in this area. We would love to hear your thoughts and challenges.

In addition, you can use the RAG metrics with other LLM observability tools like:

These tools can provide model-based feedback about various aspects of your application, such as the ones mentioned below:

Aspects to Monitor¶

Faithfulness: This feature assists in identifying and quantifying instances of hallucination.
Bad Retrieval: This feature helps identify and quantify poor context retrievals.
Bad Response: This feature assists in recognizing and quantifying evasive, harmful, or toxic responses.
Bad Format: This feature enables the detection and quantification of responses with incorrect formatting.
Custom Use-Case: For monitoring other critical aspects that are specific to your use-case, Talk to the founders.

Evaluating Using Your Test Set

📚 Core Concepts