Computer Vision & Object Detection AI Tools
Open-source models and libraries for object detection, image segmentation, depth estimation, pose estimation, and visual understanding.
Open-source models and libraries for object detection, image segmentation, depth estimation, pose estimation, and visual understanding.
Foundation model by Meta for promptable image and video segmentation.
State-of-the-art real-time object detection supporting YOLOv5 through v11.
The most widely-used open-source computer vision library with 2500+ algorithms.
Open-set object detection combining DINO with grounded pre-training.
Contrastive language-image pre-training model by OpenAI for zero-shot visual classification.
Cross-platform ML solutions by Google for face, hand, pose, and object detection.
Monocular depth estimation model by Intel ISL supporting multiple backbones.
OpenMMLab detection toolbox with 300+ pre-trained models and 80+ algorithms.
Open-vocabulary real-time object detection using YOLO with text prompts.
Meta AI research platform for object detection, segmentation, and pose estimation.
Reusable computer vision tools for detection, tracking, and annotation by Roboflow.
Lightweight face recognition and analysis framework wrapping multiple models.
Open-source face analysis toolbox for recognition, detection, and alignment.
Real-time multi-person pose estimation by CMU for body, hand, and face keypoints.
Real-time multi-person pose estimation by OpenMMLab with high accuracy.
Improved vision-language model by Google using sigmoid loss for contrastive learning.
Open-vocabulary object detection model by Google using vision transformers.
Interactive tool for tracking and segmenting any object in video.
Additive angular margin loss for deep face recognition.
Foundation model for monocular depth estimation by TikTok.
Original Segment Anything Model by Meta for zero-shot image segmentation.
Combines Grounding DINO with SAM 2 for text-prompted segmentation and tracking.
Robust multi-object tracking combining motion and appearance cues.
Foundation model for promptable visual segmentation in images and videos with streaming memory.
Simple and effective multi-object tracking using every detection box.
Monocular depth estimation model producing detailed depth maps from single images.
End-to-end object detection with transformers by Meta, eliminating hand-designed components.
Self-supervised vision transformer by Meta producing universal visual features.
Unified vision foundation model by Microsoft for captioning, detection, and segmentation.