Tools/LLM Inference & Serving/llama-cpp-python

llama-cpp-python

Python bindings for llama.cpp with OpenAI-compatible API server.

Open SourceSelf HostedOffline Capable

0.0 (0)

Visit Website View on GitHub

About

llama-cpp-python provides Python bindings for Georgi Gerganov's llama.cpp inference library, making local LLM inference available from Python with a simple pip install that builds llama.cpp from source. It includes an OpenAI-compatible API server, function-calling support, and GPU acceleration options, and exposes both low-level and high-level interfaces. Released under the MIT license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: LLM Inference & Serving
Price: Free
Platform: Local/Desktop
Difficulty: Easy (2/5)
License: MIT
Added: Apr 3, 2026

Tags

inference python llama-cpp bindings api openai-compatible

Related Tools

Featured

llama.cpp

LLM Inference & Serving

Port of Meta's LLaMA model in C/C++ for efficient CPU inference

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

Featured

vLLM

LLM Inference & Serving

High-throughput LLM serving engine with PagedAttention

Open SourceSelf HostedOfflineGPU 16GB+

Intermediate

0.0 (0)

Candle

LLM Inference & Serving

Minimalist ML framework in Rust by Hugging Face for fast inference.

Open SourceSelf HostedOffline

Advanced

0.0 (0)

ExLlamaV2

LLM Inference & Serving

Optimized inference library for running quantized LLMs on consumer GPUs.

Open SourceSelf HostedOfflineGPU 6GB+

Intermediate

0.0 (0)

Jan

LLM Inference & Serving

Open-source ChatGPT alternative that runs 100% offline on your computer.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

PowerInfer

LLM Inference & Serving

Fast LLM inference on consumer GPUs using neuron-aware sparse computation.

Open SourceSelf HostedOfflineGPU 4GB+

Advanced

0.0 (0)

Browse all LLM Inference & Serving tools