Ollama
Run large language models locally with a simple CLI interface
About
Ollama makes running large language models locally simple by bundling weights, configuration, and a template into a single Modelfile and exposing a command-line tool and local API. It supports Llama, Mistral, Gemma, and many other models, ships an official Docker image, and integrates with coding tools like Claude Code, Codex, and OpenCode. Distributed as open source.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- LLM Inference & Serving
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Beginner (1/5)
- License
- MIT
- Added
- Jan 29, 2026
Related Tools
Port of Meta's LLaMA model in C/C++ for efficient CPU inference
High-throughput LLM serving engine with PagedAttention
Minimalist ML framework in Rust by Hugging Face for fast inference.
Optimized inference library for running quantized LLMs on consumer GPUs.
Open-source ChatGPT alternative that runs 100% offline on your computer.
Fast LLM inference on consumer GPUs using neuron-aware sparse computation.
Mentioned in
From OpenAI to LiteLLM: Cutting the AI Bill with Smart Routing
A first-person take on putting LiteLLM in front of OpenAI, Anthropic, and a local Ollama instance, with...
Billy C
Self-Hosting an Open WebUI ChatGPT Clone with Model Rotation
A practical walkthrough for standing up Open WebUI on your own box, plugging Ollama in for local models, and...
Billy C
Building a Private RAG Stack with Ollama, Qdrant, and AnythingLLM
An end-to-end blueprint for a fully self-hosted RAG system using Ollama for inference, Qdrant for the vector...
Billy C