Skip to main content

Grok-1.5V

xAIGrokProprietary

Description

A multimodal model capable of processing text and visual information, including documents, diagrams, charts, screenshots, and photographs. Notable for strong real-world spatial understanding capabilities.

Release Date
2024-04-12
Parameters
Context Length
Modalities

Capability Radar

50
general
0
coding
50
reasoning
43
scienceest.
0
agents
80
multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain#RankScoreSource
Multimodal Ranking26
82.0
LS

Benchmark Scores (LLM Stats)

General

MMMU53.6%SR

Image To Text

DocVQA85.6%SR
TextVQA78.1%SR

Math

MathVista52.8%SR

Multimodal

AI2D88.3%SR
ChartQA76.1%SR

Spatial Reasoning

RealWorldQA68.7%SR

AA Evaluation Indices

No AA evaluation data available

LLM Stats Category Scores

Image To Text
80
Spatial Reasoning
70
Vision
70
Multimodal
70
Reasoning
70
General
50
Healthcare
50
Math
50

Pricing

No pricing data available

Speed

No speed data available

Available Providers

(LS internal units)

No provider data available

External Sources