Nitro
Lightweight inference engine for local AI with OpenAI-compatible API.
About
Nitro by Jan AI was a lightweight C++ inference engine that exposed an OpenAI-compatible API for running local models through llama.cpp and TensorRT-LLM backends. It was designed to embed easily into desktop and server applications. The repository is now archived and development has moved to the Menlo Research fork of llama.cpp, which is the recommended path forward. Released under the Apache 2.0 license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- LLM Inference & Serving
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Easy (2/5)
- License
- Apache-2.0
- Added
- Apr 3, 2026
Related Tools
Port of Meta's LLaMA model in C/C++ for efficient CPU inference
High-throughput LLM serving engine with PagedAttention
Minimalist ML framework in Rust by Hugging Face for fast inference.
Optimized inference library for running quantized LLMs on consumer GPUs.
Open-source ChatGPT alternative that runs 100% offline on your computer.
Fast LLM inference on consumer GPUs using neuron-aware sparse computation.