Petals
Run large language models collaboratively by distributing layers across users.
About
Petals runs large language models in a distributed, BitTorrent-style swarm where each participant hosts a few model layers, so models like Llama 3.1 405B, Mixtral, Falcon, or BLOOM can run for inference or fine-tuning by pooling GPUs over the internet. It is faster than local offloading for very large models and runs from a desktop or Colab. Sensitive data can use a private swarm. Released under the MIT license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- LLM Inference & Serving
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- MIT
- Minimum VRAM
- 4 GB
- Added
- Apr 3, 2026
Related Tools
Port of Meta's LLaMA model in C/C++ for efficient CPU inference
High-throughput LLM serving engine with PagedAttention
Minimalist ML framework in Rust by Hugging Face for fast inference.
Optimized inference library for running quantized LLMs on consumer GPUs.
Open-source ChatGPT alternative that runs 100% offline on your computer.
Fast LLM inference on consumer GPUs using neuron-aware sparse computation.