ExLlamaV2

Optimized inference library for running quantized LLMs on consumer GPUs.

Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)
0.0 (0)

About

ExLlamaV2 is an optimized inference library for running GPTQ and EXL2 quantized language models on consumer NVIDIA GPUs. Achieves fast inference through custom CUDA kernels. Supports dynamic batching and speculative decoding. MIT license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Price
Free
Platform
Local/Desktop
Difficulty
Intermediate (3/5)
License
MIT
Minimum VRAM
6 GB
Added
Apr 3, 2026

Similar Tools

Featured

Desktop application for discovering, downloading, and running local LLMs.

Self HostedOffline
Beginner
0.0 (0)

Open-source ChatGPT alternative that runs 100% offline on your computer.

Open SourceSelf HostedOffline
Beginner
0.0 (0)

Open-source ecosystem for running LLMs locally on consumer hardware.

Open SourceSelf HostedOffline
Beginner
0.0 (0)