Computer Vision & Object Detection AI Tools
Open-source models and libraries for object detection, image segmentation, depth estimation, pose estimation, and visual understanding.
Open-source models and libraries for object detection, image segmentation, depth estimation, pose estimation, and visual understanding.
State-of-the-art real-time object detection supporting YOLOv5 through v11.
Open-set object detection combining DINO with grounded pre-training.
Foundation model by Meta for promptable image and video segmentation.
The most widely-used open-source computer vision library with 2500+ algorithms.
Contrastive language-image pre-training model by OpenAI for zero-shot visual classification.
Cross-platform ML solutions by Google for face, hand, pose, and object detection.
End-to-end object detection with transformers by Meta, eliminating hand-designed components.
OpenMMLab detection toolbox with 300+ pre-trained models and 80+ algorithms.
Foundation model for monocular depth estimation by TikTok.
Reusable computer vision tools for detection, tracking, and annotation by Roboflow.
Original Segment Anything Model by Meta for zero-shot image segmentation.
Self-supervised vision transformer by Meta producing universal visual features.
Monocular depth estimation model producing detailed depth maps from single images.
Monocular depth estimation model by Intel ISL supporting multiple backbones.
Combines Grounding DINO with SAM 2 for text-prompted segmentation and tracking.
Lightweight face recognition and analysis framework wrapping multiple models.
Open-source face analysis toolbox for recognition, detection, and alignment.
Real-time multi-person pose estimation by CMU for body, hand, and face keypoints.
Real-time multi-person pose estimation by OpenMMLab with high accuracy.
Simple and effective multi-object tracking using every detection box.
Improved vision-language model by Google using sigmoid loss for contrastive learning.
Open-vocabulary object detection model by Google using vision transformers.
Interactive tool for tracking and segmenting any object in video.
Open-vocabulary real-time object detection using YOLO with text prompts.
Robust multi-object tracking combining motion and appearance cues.
Additive angular margin loss for deep face recognition.
Unified vision foundation model by Microsoft for captioning, detection, and segmentation.
Meta AI research platform for object detection, segmentation, and pose estimation.