Tools/AI Frameworks & Libraries/FlashAttention

Featured Tool

FlashAttention

IO-aware exact attention algorithm that is 2-4x faster and uses less memory.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

FlashAttention by Tri Dao is an IO-aware exact attention algorithm that reduces memory usage from O(N^2) to O(N) while being 2-4x faster than standard attention. Critical optimization for training and inference of large transformer models. BSD-3-Clause license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: AI Frameworks & Libraries
Price: Free
Platform: Local/Desktop
Difficulty: Expert (5/5)
License: BSD-3-Clause
Minimum VRAM: 8 GB
Added: Apr 3, 2026

Tags

attention optimization memory-efficient fast transformer cuda

Related Tools

GGML

AI Frameworks & Libraries

Tensor library for machine learning on commodity hardware

Open SourceSelf HostedOffline

Expert

0.0 (0)

Instructor

AI Frameworks & Libraries

Structured output extraction from LLMs with Pydantic

Open SourceSelf Hosted

Easy

0.0 (0)

LangServe

AI Frameworks & Libraries

Deploy LangChain runnables as REST APIs

Open SourceSelf Hosted

Easy

0.0 (0)

ColossalAI

AI Frameworks & Libraries

Unified system for large-scale distributed training and inference.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

FastAI

AI Frameworks & Libraries

High-level deep learning library making neural nets accessible with best practices.

Open SourceSelf HostedOfflineGPU 4GB+

Easy

0.0 (0)

Featured

PyTorch

AI Frameworks & Libraries

Open-source machine learning framework by Meta with dynamic computation graphs.

Open SourceSelf HostedOffline

Intermediate

0.0 (0)

Browse all AI Frameworks & Libraries tools