Aphrodite Engine
High-performance LLM inference engine forked from vLLM with extra features.
About
Aphrodite Engine is an LLM inference server forked from vLLM and built on its PagedAttention memory management, tuned for serving many concurrent users. It adds support for additional quantization formats, speculative decoding, and LoRA handling, and powers PygmalionAI's chat infrastructure. It exposes an OpenAI-compatible API that plugs into front ends like SillyTavern. Released under the AGPL-3.0 license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- LLM Inference & Serving
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- AGPL-3.0
- Minimum VRAM
- 8 GB
- Added
- Apr 3, 2026
Related Tools
Port of Meta's LLaMA model in C/C++ for efficient CPU inference
High-throughput LLM serving engine with PagedAttention
Minimalist ML framework in Rust by Hugging Face for fast inference.
Optimized inference library for running quantized LLMs on consumer GPUs.
Open-source ChatGPT alternative that runs 100% offline on your computer.
Fast LLM inference on consumer GPUs using neuron-aware sparse computation.
Mentioned in
Why Aphrodite Engine Is the Dark Horse of LLM Serving
Aphrodite Engine forks vLLM and adds the long tail of quantization formats and samplers that the...
Max P
Running Qwen3 Locally with vLLM on a Single 4090, Setup and Notes
A practical setup walkthrough for serving a Qwen3 variant locally with vLLM on a single 24GB consumer GPU,...
Billy C
The State of Open-Source LLM Inference Engines in 2026
A survey of where the major open-source LLM inference engines stand: vLLM, llama.cpp, Aphrodite, SGLang,...
Max P