Skip to content

Gemma 3 4B vs Gemma 3 2B

Google
Gemma 3 4B

Google · 4.3B params · Quality: 54

Google
Gemma 3 2B

Google · 2B params · Quality: 42

Architecture Comparison

SpecGemma 3 4BGemma 3 2B
TypeDENSEDENSE
Total Parameters4.3B2B
Active Parameters4.3B2B
Layers3426
Hidden Dimension2,5602,304
Attention Heads328
KV Heads84
Context Length131,0728,192
Precision (default)BF16BF16

Memory Requirements

PrecisionGemma 3 4BGemma 3 2B
BF16 Weights8.6 GB4.0 GB
FP8 Weights4.3 GB2.0 GB
INT4 Weights2.1 GB1.0 GB
KV-Cache / Token139264 B26624 B
Activation Estimate0.50 GB0.30 GB

Minimum GPUs Needed (BF16)

H100 SXM1 GPU1 GPU
L40S1 GPU1 GPU

Quality Benchmarks

BenchmarkGemma 3 4BGemma 3 2B
Overall5442
MMLU60.050.0
HumanEval32.022.0
GSM8K58.042.0
MT-Bench72.065.0

Gemma 3 4B

MMLU
60.0
HumanEval
32.0
GSM8K
58.0
MT-Bench
72.0

Gemma 3 2B

MMLU
50.0
HumanEval
22.0
GSM8K
42.0
MT-Bench
65.0

Capabilities

FeatureGemma 3 4BGemma 3 2B
Tool Use✓ Yes✗ No
Vision✓ Yes✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes

API Pricing Comparison

Cheapest Output (Gemma 3 4B)

$0.10/M

Input: $0.05/M

Cheapest Output (Gemma 3 2B)

N/A

ProviderGemma 3 4B In $/MOut $/MGemma 3 2B In $/MOut $/M
google$0.05$0.10

Recommendation Summary

  • Gemma 3 4B scores higher on overall quality (54 vs 42).
  • Gemma 3 2B has a smaller memory footprint (4.0 GB vs 8.6 GB BF16), making it easier to deploy on fewer GPUs.
  • Gemma 3 4B supports a longer context window (131,072 vs 8,192 tokens).
  • Gemma 3 4B is stronger at code generation (HumanEval: 32.0 vs 22.0).
  • Gemma 3 4B is better at math reasoning (GSM8K: 58.0 vs 42.0).

Compare Other Models