Skip to main content

Gemma 4 12B

GoogleGemmaOpen WeightApache 2.0 · Commercial OK

Description

Gemma 4 12B is Google DeepMind's encoder-free multimodal instruction-tuned model with 11.95 billion parameters and a 256K context window. It supports text, image, audio, and video inputs with text output, projecting image patches and audio waveforms directly into a single decoder-only transformer for streamlined local deployment.

Release Date
2026-05-23
Parameters
12.0B
Context Length
131K
Modalities
image, text

Capability Radar

70
general
0
coding
60
reasoning
68
scienceest.
42
agents
50
multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain#RankScoreSource
Multimodal Ranking80
16.0
LS

Benchmark Scores (LLM Stats)

Audio

CoVoST238.5%SR

Biology

GPQA78.8%SR

Finance

MMLU-Pro77.2%SR

General

MMMLU83.4%SR
LiveCodeBench v672.0%SR
MMMU-Pro69.1%SR
BIG-Bench Extra Hard53.0%SR
MRCR v243.4%SR

Healthcare

MedXpertQA48.7%SR

Language

FLEURS93.1%SR

Math

MathVision79.7%SR
AIME 202677.5%SR
CodeForces0.55 / 3000SR
Humanity's Last Exam5.2%SR

Multimodal

OmniDocBench 1.516.4%SR

AA Evaluation Indices

No AA evaluation data available

LLM Stats Category Scores

Finance
80
Legal
80
Physics
80
Biology
80
Chemistry
80
Speech To Text
70
General
70
Language
70
Reasoning
60
Healthcare
60
Math
60
Multimodal
50
Long Context
40
Vision
40
Audio
40
Structured Output
20

Pricing

Input Price$0.05 / 1M tokens
Output Price$0.15 / 1M tokens
Blended Price (3:1)$0.075 / 1M tokens

Speed

No speed data available

Provider Price Ranking

Provider Price Ranking

4 providers

Cheapest: Kilo GatewayMost Expensive: NovitaAI
ProviderInputOutput
1Kilo GatewayCheapest
$0.04
$0.13
2GooglePRIMARY
$0.05
$0.15
3OpenRouter
$0.05
$0.15
4NovitaAI
$0.05
$0.1

Compare pricing across different API providers for this model.

External Sources