DeepSeek VL2

DeepSeekDeepSeekOpen Weightdeepseek

Description

An advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.

Release Date

2024-12-13

Parameters

27.0B

Context Length

—

Modalities

image, text

Capability Radar

general

coding

reasoning

scienceest.

agents

multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain	#Rank	Score	Source
Multimodal Ranking	47	76.0	LS

Benchmark Scores (LLM Stats)

General

MMT-Bench

63.6%SR

MMStar

61.3%SR

MMMU

51.1%SR

Image To Text

DocVQA

93.3%SR

TextVQA

84.2%SR

OCRBench

81.1%SR

Math

MathVista

62.8%SR

Multimodal

ChartQA

86.0%SR

AI2D

81.4%SR

MMBench

79.6%SR

MMBench-V1.1

79.2%SR

InfoVQA

78.1%SR

MME

22.5%SR

Spatial Reasoning

RealWorldQA

68.4%SR

AA Evaluation Indices

No AA evaluation data available

LLM Stats Category Scores

Image To Text

Multimodal

Reasoning

Spatial Reasoning

Vision

Math

General

Healthcare

Pricing

No pricing data available

Speed

No speed data available

Provider Price Ranking

2 providers

Cheapest: SiliconFlow (China)Most Expensive: SiliconFlow

ProviderInputOutput

1SiliconFlow (China)Cheapest

$0.15

2SiliconFlow

$0.15

Compare pricing across different API providers for this model.

External Sources

LLM Stats Artificial Analysis