Tools/LLM Inference & Serving/Aphrodite Engine

Aphrodite Engine

High-performance LLM inference engine forked from vLLM with extra features.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

Aphrodite Engine is an LLM inference server forked from vLLM and built on its PagedAttention memory management, tuned for serving many concurrent users. It adds support for additional quantization formats, speculative decoding, and LoRA handling, and powers PygmalionAI's chat infrastructure. It exposes an OpenAI-compatible API that plugs into front ends like SillyTavern. Released under the AGPL-3.0 license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: LLM Inference & Serving
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: AGPL-3.0
Minimum VRAM: 8 GB
Added: Apr 3, 2026

Tags

inference vllm-fork exl2 speculative-decoding lora

Related Tools

Candle

LLM Inference & Serving

Minimalist ML framework in Rust by Hugging Face for fast inference.

Open SourceSelf HostedOffline

Advanced

0.0 (0)

Jan

LLM Inference & Serving

Open-source ChatGPT alternative that runs 100% offline on your computer.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

Featured

llama.cpp

LLM Inference & Serving

Port of Meta's LLaMA model in C/C++ for efficient CPU inference

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

PowerInfer

LLM Inference & Serving

Fast LLM inference on consumer GPUs using neuron-aware sparse computation.

Open SourceSelf HostedOfflineGPU 4GB+

Advanced

0.0 (0)

Featured

vLLM

LLM Inference & Serving

High-throughput LLM serving engine with PagedAttention

Open SourceSelf HostedOfflineGPU 16GB+

Intermediate

0.0 (0)

Candle

LLM Inference & Serving

Minimalist machine learning framework for Rust focused on performance and serverless inference.

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

Browse all LLM Inference & Serving tools

Mentioned in

Why Aphrodite Engine Is the Dark Horse of LLM Serving

Aphrodite Engine forks vLLM and adds the long tail of quantization formats and samplers that the...

Max P

Running Qwen3 Locally with vLLM on a Single 4090, Setup and Notes

A practical setup walkthrough for serving a Qwen3 variant locally with vLLM on a single 24GB consumer GPU,...

Billy C

The State of Open-Source LLM Inference Engines in 2026

A survey of where the major open-source LLM inference engines stand: vLLM, llama.cpp, Aphrodite, SGLang,...

Max P