ExLlamaV2
Optimized inference library for running quantized LLMs on consumer GPUs.
Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)
0.0 (0)
About
ExLlamaV2 is an optimized inference library for running GPTQ and EXL2 quantized language models on consumer NVIDIA GPUs. Achieves fast inference through custom CUDA kernels. Supports dynamic batching and speculative decoding. MIT license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- LLM Inference & Serving
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- MIT
- Minimum VRAM
- 6 GB
- Added
- Apr 3, 2026
Similar Tools
Featured
Desktop application for discovering, downloading, and running local LLMs.
Self HostedOffline
Beginner
0.0 (0)
Open-source ChatGPT alternative that runs 100% offline on your computer.
Open SourceSelf HostedOffline
Beginner
0.0 (0)
Open-source ecosystem for running LLMs locally on consumer hardware.
Open SourceSelf HostedOffline
Beginner
0.0 (0)