Qwen-VL

Multimodal vision-language model by Alibaba for image understanding.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)
0.0 (0)

About

Qwen-VL by Alibaba Cloud is a multimodal vision-language model that processes images alongside text prompts for visual question answering, captioning, OCR, and grounding, with a Qwen-VL-Chat variant for dialogue. It is released in multiple sizes with open weights, and the hosted Qwen-VL-Plus and Qwen-VL-Max versions extend the family. The open models are distributed under the Apache 2.0 license with weights on Hugging Face.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Price
Free
Platform
Local/Desktop
Difficulty
Intermediate (3/5)
License
Apache-2.0
Minimum VRAM
8 GB
Added
Apr 3, 2026

Related Tools

Open-weight models by Google in 2B, 9B, and 27B sizes with strong performance.

Open SourceSelf HostedOfflineGPU 4GB+
Easy
0.0 (0)

Open-weight LLM by Meta in 8B and 70B sizes with strong general capabilities.

Open SourceSelf HostedOfflineGPU 8GB+
Intermediate
0.0 (0)
Featured

Open-weight LLM family by Alibaba with strong multilingual and coding abilities.

Open SourceSelf HostedOfflineGPU 8GB+
Intermediate
0.0 (0)

Small language model by Microsoft in 3.8B size with strong benchmark performance.

Open SourceSelf HostedOfflineGPU 4GB+
Easy
0.0 (0)

Open-access 176B parameter multilingual LLM by BigScience supporting 46 languages.

Open SourceSelf HostedOfflineGPU 80GB+
Expert
0.0 (0)

Retrieval-augmented generation optimized LLM by Cohere with 128K context.

Open SourceSelf HostedOfflineGPU 24GB+
Advanced
0.0 (0)
Browse all Large Language Models (LLMs) tools