FlashAttention
IO-aware exact attention algorithm that is 2-4x faster and uses less memory.
About
FlashAttention by Tri Dao is an IO-aware exact attention algorithm that reduces memory usage from O(N^2) to O(N) while being 2-4x faster than standard attention. Critical optimization for training and inference of large transformer models. BSD-3-Clause license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- AI Frameworks & Libraries
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Expert (5/5)
- License
- BSD-3-Clause
- Minimum VRAM
- 8 GB
- Added
- Apr 3, 2026
Related Tools
Tensor library for machine learning on commodity hardware
Structured output extraction from LLMs with Pydantic
Deploy LangChain runnables as REST APIs
Unified system for large-scale distributed training and inference.
High-level deep learning library making neural nets accessible with best practices.
Open-source machine learning framework by Meta with dynamic computation graphs.