Llama 3.2 Instruct 11B (Vision)
MetaLlamaOpen WeightLlama 3.2 Community License
描述
Llama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. It accepts text and images as input and generates text as output.
发布日期
2024-09-25
参数规模
10.6B
上下文长度
131K
支持模态
image, text
能力雷达图
19
general
7
coding
13
reasoning
15
science估算
0
agents
90
multimodal
Science 在缺少专门科学评测时使用推理能力代理估算。
排行榜排名
基准测试分数 (LLM Stats)
Biology
GPQA
32.8%自报
Finance
MMLU
73.0%自报
General
MMMU
50.7%自报
MMMU-Pro
33.0%自报
Image To Text
DocVQA
88.4%自报
VQAv2 (test)
75.2%自报
Math
MGSM
68.9%自报
MATH
51.9%自报
MathVista
51.5%自报
Multimodal
AI2D
91.1%自报
ChartQA
83.4%自报
AA 评测指数
Intelligence Index8.7
Coding Index4.3
Math Index1.7
Math 5000.5
Mmlu Pro0.5
Ifbench0.3
Gpqa0.2
Tau20.1
Lcr0.1
Scicode0.1
Livecodebench0.1
Aime0.1
Hle0.1
Aime 250.0
Terminalbench Hard0.0
LLM Stats 分类评分
Image To Text90
Vision70
Finance70
Language70
Legal70
Multimodal70
Healthcare60
Math60
Reasoning60
General50
Biology30
Chemistry30
Physics30
定价
输入价格$0.245 / 1M tokens
输出价格$0.245 / 1M tokens
混合价格(3:1)$0.245 / 1M tokens
速度
Tokens/秒86.7 tokens/s
首Token延迟0.52s
首回答延迟0.52s
可用提供商
(LS 内部计价单位)暂无提供商数据