Tools/Computer Vision & Object Detection

Computer Vision & Object Detection AI Tools

Open-source models and libraries for object detection, image segmentation, depth estimation, pose estimation, and visual understanding.

Getting structured data out of raw pixels is the job: bounding boxes, instance masks, keypoints, depth maps, and track IDs that downstream code can act on. The teams reaching for it build factory inspection lines, retail and sports analytics, robotics perception, and the labeling pipelines feeding all of it. In 2026 the field splits into two camps. Closed vocabulary detectors trained on a fixed label set still own production: Ultralytics YOLO, Detectron2, MMDetection, and RF-DETR deliver predictable latency on modest hardware, but somebody has to annotate the data first. Opposite them sit promptable, open vocabulary models such as Grounding DINO, YOLO-World, SAM 3, and Florence-2, which take text or click prompts and need no task specific training, at the cost of heavier weights, slower inference, and results that shift with prompt wording. The tradeoff is annotation cost against inference cost, and most shipped systems use an open vocabulary model to bootstrap labels for a small fast one. A sane entry point is OpenCV for the classical primitives no network has replaced, Ultralytics YOLO for a working detection baseline in an afternoon, and Supervision for the tracking, zone, and annotation plumbing every project otherwise rewrites badly. Add SAM 2 when masks matter more than boxes. Check licensing before committing, though: Ultralytics YOLO is AGPL-3.0, which reaches into any network service built on it absent a paid commercial license, and several InsightFace models carry non-commercial weights. Detectron2, RF-DETR, and timm are permissive and worth the extra setup for anything shipping to customers.

Computer Vision & Object Detection AI Tools

Segment Anything 2 (SAM 2)

OpenCV

Ultralytics YOLO

MediaPipe

Grounding DINO

CLIP

ArcFace

SigLIP

OWL-ViT

BoT-SORT

MiDaS

Florence-2

YOLO-World

DETR

Track Anything

CoTracker

Depth Pro

Kornia

RF-DETR

Roboflow Inference

InsightFace

SAHI

Grounded SAM 2

SAM 3

SmolVLM

timm

MMDetection

RTMPose

Segment Anything 2

Supervision

OpenPose

ByteTrack

SAM (Segment Anything v1)

DeepFace

Depth Anything V1

Depth Anything V2

Detectron2

DINOv2

Filters