TabbyAPI
Fast ExLlamaV2-based OpenAI-compatible API server for quantized models.
Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)
0.0 (0)
About
TabbyAPI is an OpenAI-compatible API server built on ExLlamaV2 for serving EXL2 and GPTQ quantized models. Fast inference on consumer GPUs. Supports streaming, function calling, and multi-user. AGPL-3.0 license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- LLM Inference & Serving
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Easy (2/5)
- License
- AGPL-3.0
- Minimum VRAM
- 6 GB
- Added
- Apr 3, 2026
Similar Tools
Featured
Desktop application for discovering, downloading, and running local LLMs.
Self HostedOffline
Beginner
0.0 (0)
Open-source ChatGPT alternative that runs 100% offline on your computer.
Open SourceSelf HostedOffline
Beginner
0.0 (0)
Open-source ecosystem for running LLMs locally on consumer hardware.
Open SourceSelf HostedOffline
Beginner
0.0 (0)