KTransformers
Heterogeneous CPU and GPU inference framework for very large language models on limited hardware.
About
KTransformers is a research framework for efficient CPU and GPU heterogeneous inference and fine-tuning of large language models. It includes a kt-kernel serving stack with AMX and AVX acceleration, MoE optimizations, and quantization support. The project integrates with LLaMA-Factory for fine-tuning ultra-large MoE models on constrained hardware.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- LLM Inference & Serving
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Advanced (4/5)
- License
- Apache-2.0
- Minimum VRAM
- 24 GB
- Added
- May 7, 2026
Related Tools
Port of Meta's LLaMA model in C/C++ for efficient CPU inference
High-throughput LLM serving engine with PagedAttention
Minimalist ML framework in Rust by Hugging Face for fast inference.
Optimized inference library for running quantized LLMs on consumer GPUs.
Open-source ChatGPT alternative that runs 100% offline on your computer.
Fast LLM inference on consumer GPUs using neuron-aware sparse computation.