DeepEval
Python framework for unit testing and evaluating LLM applications with metrics like G-Eval.
About
DeepEval is an open-source framework for evaluating LLM applications, structured like Pytest. It provides metrics including G-Eval, answer relevancy, hallucination detection, and task completion, with evaluation models running locally. The framework integrates with OpenAI, LangChain, CrewAI, Anthropic, and other popular tools, and is used to test RAG pipelines, chatbots, and agents.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- AI Observability & Evaluation
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Easy (2/5)
- License
- Apache-2.0
- Added
- May 7, 2026
Related Tools
Open source LLM engineering platform for tracing and analytics
Open source platform for the ML lifecycle
AI observability and evaluation from Arize
ML experiment tracking, visualization, and collaboration
Open-source AI metadata tracker for logging and comparing ML experiments.
CLI and library for testing, evaluating, and red-teaming LLM outputs.