DeepEval

Python framework for unit testing and evaluating LLM applications with metrics like G-Eval.

Open SourceSelf HostedOffline Capable
0.0 (0)

About

DeepEval is an open-source framework for evaluating LLM applications, structured like Pytest. It provides metrics including G-Eval, answer relevancy, hallucination detection, and task completion, with evaluation models running locally. The framework integrates with OpenAI, LangChain, CrewAI, Anthropic, and other popular tools, and is used to test RAG pipelines, chatbots, and agents.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Price
Free
Platform
Local/Desktop
Difficulty
Easy (2/5)
License
Apache-2.0
Added
May 7, 2026

Related Tools

Featured

Open source LLM engineering platform for tracing and analytics

Open SourceSelf Hosted
Easy
0.0 (0)

Open source platform for the ML lifecycle

Open SourceSelf HostedOffline
Intermediate
0.0 (0)

AI observability and evaluation from Arize

Open SourceSelf Hosted
Easy
0.0 (0)
Featured

ML experiment tracking, visualization, and collaboration

Open Source
Easy
0.0 (0)

Open-source AI metadata tracker for logging and comparing ML experiments.

Open SourceSelf HostedOffline
Easy
0.0 (0)
Featured

CLI and library for testing, evaluating, and red-teaming LLM outputs.

Open SourceSelf HostedOffline
Easy
0.0 (0)
Browse all AI Observability & Evaluation tools