DeepSeek VL2 Small

DeepSeekDeepSeekOpen Weightdeepseek

Description

An advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.

Release Date

2024-12-13

Parameters

16.0B

Context Length

—

Modalities

—

Capability Radar

general

coding

reasoning

scienceest.

agents

multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain	#Rank	Score	Source
Multimodal Ranking	53	75.0	LS

Benchmark Scores (LLM Stats)

General

MMT-Bench

62.9%SR

MMStar

57.0%SR

MMMU

48.0%SR

Image To Text

DocVQA

92.3%SR

TextVQA

83.4%SR

OCRBench

83.4%SR

Math

MathVista

60.7%SR

Multimodal

ChartQA

84.5%SR

MMBench

80.3%SR

AI2D

80.0%SR

MMBench-V1.1

79.3%SR

InfoVQA

75.8%SR

MME

21.2%SR

Spatial Reasoning

RealWorldQA

65.4%SR

AA Evaluation Indices

No AA evaluation data available

LLM Stats Category Scores

Image To Text

Multimodal

Spatial Reasoning

Vision

Math

Reasoning

General

Healthcare

Pricing

No pricing data available

Speed

No speed data available

Provider Price Ranking

No provider data available

External Sources

LLM Stats Artificial Analysis