OWL-ViT
Open-vocabulary object detection model by Google using vision transformers.
Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)
0.0 (0)
About
OWL-ViT (Vision Transformer for Open-World Localization) by Google enables open-vocabulary object detection using text queries. Transfers CLIP to detection without task-specific training data. Apache 2.0 license.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- Apache-2.0
- Minimum VRAM
- 6 GB
- Added
- Apr 3, 2026
Similar Tools
Featured
State-of-the-art real-time object detection supporting YOLOv5 through v11.
Open SourceSelf HostedOffline
Easy
0.0 (0)
Open-vocabulary real-time object detection using YOLO with text prompts.
Open SourceSelf HostedOfflineGPU 4GB+
Intermediate
0.0 (0)
Featured
Open-set object detection combining DINO with grounded pre-training.
Open SourceSelf HostedOfflineGPU 4GB+
Intermediate
0.0 (0)