GPTQ (Quantization)
Post-training quantization method for compressing large language models.
About
GPTQ is a one-shot post-training quantization method for large language models from the ICLR 2023 paper on accurate post-training compression of generative transformers. It compresses model weights to 4-bit or 3-bit precision with little quality loss, and an activation-order heuristic further improves accuracy on outlier-heavy models. This makes large models runnable on consumer GPUs. The reference implementation is openly available.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- Model Training & Fine-Tuning
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- Apache-2.0
- Minimum VRAM
- 8 GB
- Added
- Apr 3, 2026
Related Tools
No-code tool by Hugging Face for training ML models automatically.
Efficient LLM quantization preserving important weight channels.
Video model fine-tuning toolkit by Hugging Face Diffusers team.
Low-code framework for building custom AI models by Predibase.
Library for training LLMs with reinforcement learning (RLHF, DPO, PPO).
Efficient fine-tuning method using 4-bit quantized base model with LoRA adapters.