Tools/LLM Inference & Serving/Petals

Petals

Run large language models collaboratively by distributing layers across users.

Open SourceSelf HostedGPU Required (4GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

Petals runs large language models in a distributed, BitTorrent-style swarm where each participant hosts a few model layers, so models like Llama 3.1 405B, Mixtral, Falcon, or BLOOM can run for inference or fine-tuning by pooling GPUs over the internet. It is faster than local offloading for very large models and runs from a desktop or Colab. Sensitive data can use a private swarm. Released under the MIT license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: LLM Inference & Serving
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: MIT
Minimum VRAM: 4 GB
Added: Apr 3, 2026

Tags

inference distributed collaborative p2p large-models

Related Tools

Featured

llama.cpp

LLM Inference & Serving

Port of Meta's LLaMA model in C/C++ for efficient CPU inference

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

Featured

vLLM

LLM Inference & Serving

High-throughput LLM serving engine with PagedAttention

Open SourceSelf HostedOfflineGPU 16GB+

Intermediate

0.0 (0)

Candle

LLM Inference & Serving

Minimalist ML framework in Rust by Hugging Face for fast inference.

Open SourceSelf HostedOffline

Advanced

0.0 (0)

ExLlamaV2

LLM Inference & Serving

Optimized inference library for running quantized LLMs on consumer GPUs.

Open SourceSelf HostedOfflineGPU 6GB+

Intermediate

0.0 (0)

Jan

LLM Inference & Serving

Open-source ChatGPT alternative that runs 100% offline on your computer.

Open SourceSelf HostedOffline

Beginner

0.0 (0)

PowerInfer

LLM Inference & Serving

Fast LLM inference on consumer GPUs using neuron-aware sparse computation.

Open SourceSelf HostedOfflineGPU 4GB+

Advanced

0.0 (0)

Browse all LLM Inference & Serving tools