Tools/LLM Inference & Serving/Nitro

Nitro

Lightweight inference engine for local AI with OpenAI-compatible API.

Open SourceSelf HostedOffline Capable

0.0 (0)

Visit Website View on GitHub

About

Nitro by Jan AI was a lightweight C++ inference engine that exposed an OpenAI-compatible API for running local models through llama.cpp and TensorRT-LLM backends. It was designed to embed easily into desktop and server applications. The repository is now archived and development has moved to the Menlo Research fork of llama.cpp, which is the recommended path forward. Released under the Apache 2.0 license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: LLM Inference & Serving
Price: Free
Platform: Local/Desktop
Difficulty: Easy (2/5)
License: Apache-2.0
Added: Apr 3, 2026

Tags

inference lightweight cpp openai-compatible jan

Related Tools

Candle

LLM Inference & Serving

Minimalist ML framework in Rust by Hugging Face for fast inference.

Open SourceSelf HostedOffline

Advanced

0.0 (0)

Jan

LLM Inference & Serving

Open-source ChatGPT alternative that runs 100% offline on your computer.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

Featured

llama.cpp

LLM Inference & Serving

Port of Meta's LLaMA model in C/C++ for efficient CPU inference

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

PowerInfer

LLM Inference & Serving

Fast LLM inference on consumer GPUs using neuron-aware sparse computation.

Open SourceSelf HostedOfflineGPU 4GB+

Advanced

0.0 (0)

Featured

vLLM

LLM Inference & Serving

High-throughput LLM serving engine with PagedAttention

Open SourceSelf HostedOfflineGPU 16GB+

Intermediate

0.0 (0)

Candle

LLM Inference & Serving

Minimalist machine learning framework for Rust focused on performance and serverless inference.

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

Browse all LLM Inference & Serving tools