Tools/Computer Vision & Object Detection/OWL-ViT

OWL-ViT

Open-vocabulary object detection model by Google using vision transformers.

Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

OWL-ViT, Vision Transformer for Open-World Localization from Google, performs open-vocabulary object detection by accepting free-text queries rather than a fixed label set. It transfers image-text pretraining in the style of CLIP to detection without task-specific training data, so it can localize objects described by arbitrary text. It is distributed within Google's Scenic research codebase for attention-based vision models. Released under the Apache 2.0 license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Computer Vision & Object Detection
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: Apache-2.0
Minimum VRAM: 6 GB
Added: Apr 3, 2026

Tags

object-detection open-vocabulary vision-transformer google clip

Related Tools

Featured

CLIP

Computer Vision & Object Detection

Contrastive language-image pre-training model by OpenAI for zero-shot visual classification.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

DeepFace

Computer Vision & Object Detection

Lightweight face recognition and analysis framework wrapping multiple models.

Open SourceSelf HostedOffline

Easy

0.0 (0)

Depth Anything V1

Computer Vision & Object Detection

Foundation model for monocular depth estimation by TikTok.

Open SourceSelf HostedOfflineGPU 4GB+

Easy

0.0 (0)

Depth Anything V2

Computer Vision & Object Detection

Monocular depth estimation model producing detailed depth maps from single images.

Open SourceSelf HostedOfflineGPU 4GB+

Easy

0.0 (0)

Detectron2

Computer Vision & Object Detection

Meta AI research platform for object detection, segmentation, and pose estimation.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

ByteTrack

Computer Vision & Object Detection

Simple and effective multi-object tracking using every detection box.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Browse all Computer Vision & Object Detection tools