Llama 3.2 Instruct 90B (Vision)

MetaLlamaOpen WeightLlama 3.2 · Commercial OK

Description

Llama 3.2 90B is a large multimodal language model optimized for visual recognition, image reasoning, and captioning tasks. It supports a context length of 128,000 tokens and is designed for deployment on edge and mobile devices, offering state-of-the-art performance in image understanding and generative tasks.

Release Date

2024-09-25

Parameters

90.0B

Context Length

—

Modalities

image, text

Capability Radar

general

coding

reasoning

scienceest.

agents

multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain	#Rank	Score	Source
Code Ranking	338	23.0	AA
General Ranking	360	30.0	AA
Math Reasoning	254	33.0	AA
Multimodal Ranking	31	81.0	LS
Science	379	29.0	AA

Benchmark Scores (LLM Stats)

Biology

GPQA

46.7%SR

Finance

MMLU

86.0%SR

General

MMMU

60.3%SR

MMMU-Pro

45.2%SR

Image To Text

DocVQA

90.1%SR

VQAv2

78.1%SR

TextVQA

73.5%SR

Math

MGSM

86.9%SR

MATH

68.0%SR

MathVista

57.3%SR

Multimodal

AI2D

92.3%SR

ChartQA

85.5%SR

InfographicsQA

56.8%SR

AA Evaluation Indices

Intelligence Index

6.2

Mmlu Pro

0.7

Math 500

0.6

Gpqa

0.4

Scicode

0.2

Livecodebench

0.2

Aime

0.1

Hle

0.0

LLM Stats Category Scores

Language

Legal

Finance

Image To Text

Math

Multimodal

Reasoning

Healthcare

Vision

General

Physics

Biology

Chemistry

Pricing

Input Price$1.38 / 1M tokens

Output Price$1.38 / 1M tokens

Blended Price (3:1)$1.38 / 1M tokens

Speed

Tokens/sec55.0

Time to First Token0.60s

Time to Answer0.60s

Provider Price Ranking

5 providers

Cheapest: IO.NETMost Expensive: Azure

ProviderInputOutput

1IO.NETCheapest

$0.35

$0.4

2Vercel AI Gateway

$0.72

3MetaPRIMARY

$1.38

4Azure Cognitive Services

$2.04

5Azure

$2.04

Compare pricing across different API providers for this model.

External Sources

LLM Stats Artificial Analysis