Tools/LLM Inference & Serving/llama.cpp

Featured Tool

llama.cpp

Port of Meta's LLaMA model in C/C++ for efficient CPU inference

Open SourceSelf HostedOffline Capable

0.0 (0)

View on GitHub Documentation

About

llama.cpp by Georgi Gerganov is a C and C++ inference engine for LLaMA-family and many other transformer language models, designed to run with minimal setup on a wide range of hardware including CPU-only laptops. It supports the GGUF quantized model format, multiple backends (CUDA, Metal, Vulkan, ROCm, BLAS), a server with an OpenAI-compatible API, and bindings for many languages. MIT licensed; the substrate for much of the local LLM ecosystem.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: LLM Inference & Serving
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: MIT
Added: Jan 29, 2026

Tags

llm inference c++cpu quantization

Related Tools

Candle

LLM Inference & Serving

Minimalist ML framework in Rust by Hugging Face for fast inference.

Open SourceSelf HostedOffline

Advanced

0.0 (0)

Jan

LLM Inference & Serving

Open-source ChatGPT alternative that runs 100% offline on your computer.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

PowerInfer

LLM Inference & Serving

Fast LLM inference on consumer GPUs using neuron-aware sparse computation.

Open SourceSelf HostedOfflineGPU 4GB+

Advanced

0.0 (0)

Featured

vLLM

LLM Inference & Serving

High-throughput LLM serving engine with PagedAttention

Open SourceSelf HostedOfflineGPU 16GB+

Intermediate

0.0 (0)

Kobold.cpp

LLM Inference & Serving

Easy-to-use local AI inference with built-in web UI and API.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

Candle

LLM Inference & Serving

Minimalist machine learning framework for Rust focused on performance and serverless inference.

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

Browse all LLM Inference & Serving tools

Mentioned in

Fine-Tuning Llama 3.3 with Unsloth on a 16GB GPU, Step-by-Step

A practical, end-to-end fine-tuning walkthrough with Unsloth: dataset prep, LoRA config, 4-bit quantization,...

Billy C

The State of Open-Source LLM Inference Engines in 2026

A survey of where the major open-source LLM inference engines stand: vLLM, llama.cpp, Aphrodite, SGLang,...

Max P