Gemma 4 12B (Reasoning)

GoogleGemma

Description

Gemma 4 12B is Google DeepMind's encoder-free multimodal instruction-tuned model with 11.95 billion parameters and a 256K context window. It supports text, image, audio, and video inputs with text output, projecting image patches and audio waveforms directly into a single decoder-only transformer for streamlined local deployment.

Release Date

2026-06-03

Parameters

—

Context Length

131K

Modalities

image, text

Capability Radar

general

coding

reasoning

scienceest.

agents

multimodal

Science uses a reasoning proxy when dedicated science benchmarks are unavailable.

Rankings

Domain	#Rank	Score	Source
Code Ranking	206	51.0	AA
General Ranking	226	49.0	AA
Science	163	56.0	AA

Benchmark Scores (LLM Stats)

Audio

CoVoST2

38.5%SR

Biology

GPQA

78.8%SR

Finance

MMLU-Pro

77.2%SR

General

MMMLU

83.4%SR

LiveCodeBench v6

72.0%SR

MMMU-Pro

69.1%SR

BIG-Bench Extra Hard

53.0%SR

MRCR v2 (8-needle)

43.4%SR

Healthcare

MedXpertQA

48.7%SR

Language

FLEURS

93.1%SR

Math

MathVision

79.7%SR

AIME 2026

77.5%SR

CodeForces

0.55 / 3000SR

Humanity's Last Exam

5.2%SR

Multimodal

OmniDocBench 1.5

16.4%SR

AA Evaluation Indices

Intelligence Index

22.0

Gpqa

0.8

Ifbench

0.7

Lcr

0.6

Scicode

0.4

Tau2

0.4

Terminalbench Hard

0.2

Hle

0.1

LLM Stats Category Scores

Legal

Physics

Finance

Biology

Chemistry

Language

Speech To Text

Math

Reasoning

General

Healthcare

Multimodal

Long Context

Audio

Vision

Structured Output

Pricing

Input Price$0.1 / 1M tokens

Output Price$0.3 / 1M tokens

Blended Price (3:1)$0.15 / 1M tokens

Speed

Tokens/sec126.0

Time to First Token1.45s

Time to Answer17.33s

Provider Price Ranking

1 providers

ProviderInputOutput

1GooglePRIMARY

$0.1

$0.3

Compare pricing across different API providers for this model.

External Sources

Artificial Analysis