Aphrodite Engine

High-performance LLM inference engine forked from vLLM with extra features.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)
0.0 (0)

About

Aphrodite Engine is an LLM inference server forked from vLLM and built on its PagedAttention memory management, tuned for serving many concurrent users. It adds support for additional quantization formats, speculative decoding, and LoRA handling, and powers PygmalionAI's chat infrastructure. It exposes an OpenAI-compatible API that plugs into front ends like SillyTavern. Released under the AGPL-3.0 license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Price
Free
Platform
Local/Desktop
Difficulty
Intermediate (3/5)
License
AGPL-3.0
Minimum VRAM
8 GB
Added
Apr 3, 2026

Related Tools

Featured

Port of Meta's LLaMA model in C/C++ for efficient CPU inference

Open SourceSelf HostedOffline
Intermediate
0.0 (0)
Featured

High-throughput LLM serving engine with PagedAttention

Open SourceSelf HostedOfflineGPU 16GB+
Intermediate
0.0 (0)

Minimalist ML framework in Rust by Hugging Face for fast inference.

Open SourceSelf HostedOffline
Advanced
0.0 (0)

Optimized inference library for running quantized LLMs on consumer GPUs.

Open SourceSelf HostedOfflineGPU 6GB+
Intermediate
0.0 (0)

Open-source ChatGPT alternative that runs 100% offline on your computer.

Open SourceSelf HostedOffline
Beginner
0.0 (0)

Fast LLM inference on consumer GPUs using neuron-aware sparse computation.

Open SourceSelf HostedOfflineGPU 4GB+
Advanced
0.0 (0)
Browse all LLM Inference & Serving tools