Llama 3.2 Instruct 11B (Vision)
MetaLlamaOpen WeightLlama 3.2 Community License
描述
Llama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. It accepts text and images as input and generates text as output.
發布日期
2024-09-25
參數規模
10.6B
上下文長度
131K
支援模態
image, text
能力雷達圖
19
general
7
coding
13
reasoning
15
science估算
0
agents
90
multimodal
Science 在缺少專門科學評測時使用推理能力代理估算。
排行榜排名
基準測試分數 (LLM Stats)
Biology
GPQA
32.8%自報
Finance
MMLU
73.0%自報
General
MMMU
50.7%自報
MMMU-Pro
33.0%自報
Image To Text
DocVQA
88.4%自報
VQAv2 (test)
75.2%自報
Math
MGSM
68.9%自報
MATH
51.9%自報
MathVista
51.5%自報
Multimodal
AI2D
91.1%自報
ChartQA
83.4%自報
AA 評測指數
Intelligence Index8.7
Coding Index4.3
Math Index1.7
Math 5000.5
Mmlu Pro0.5
Ifbench0.3
Gpqa0.2
Tau20.1
Lcr0.1
Scicode0.1
Livecodebench0.1
Aime0.1
Hle0.1
Aime 250.0
Terminalbench Hard0.0
LLM Stats 分類評分
Image To Text90
Vision70
Finance70
Language70
Legal70
Multimodal70
Healthcare60
Math60
Reasoning60
General50
Biology30
Chemistry30
Physics30
定價
輸入價格$0.245 / 1M tokens
輸出價格$0.245 / 1M tokens
混合價格(3:1)$0.245 / 1M tokens
速度
Tokens/秒86.7 tokens/s
首Token延遲0.52s
首回答延遲0.52s
可用提供商
(LS 內部計價單位)暫無提供商資料