Gemma 3 4B vs Gemma 3 2B
Architecture Comparison
SpecGemma 3 4BGemma 3 2B
TypeDENSEDENSE
Total Parameters4.3B2B
Active Parameters4.3B2B
Layers3426
Hidden Dimension2,5602,304
Attention Heads328
KV Heads84
Context Length131,0728,192
Precision (default)BF16BF16
Memory Requirements
PrecisionGemma 3 4BGemma 3 2B
BF16 Weights8.6 GB4.0 GB
FP8 Weights4.3 GB2.0 GB
INT4 Weights2.1 GB1.0 GB
KV-Cache / Token139264 B26624 B
Activation Estimate0.50 GB0.30 GB
Minimum GPUs Needed (BF16)
H100 SXM1 GPU1 GPU
L40S1 GPU1 GPU
Quality Benchmarks
BenchmarkGemma 3 4BGemma 3 2B
Overall5442
MMLU60.050.0
HumanEval32.022.0
GSM8K58.042.0
MT-Bench72.065.0
Gemma 3 4B
MMLU
60.0
HumanEval
32.0
GSM8K
58.0
MT-Bench
72.0
Gemma 3 2B
MMLU
50.0
HumanEval
22.0
GSM8K
42.0
MT-Bench
65.0
Capabilities
FeatureGemma 3 4BGemma 3 2B
Tool Use✓ Yes✗ No
Vision✓ Yes✗ No
Code✓ Yes✓ Yes
Math✓ Yes✓ Yes
Reasoning✗ No✗ No
Multilingual✓ Yes✓ Yes
Structured Output✓ Yes✓ Yes
API Pricing Comparison
Cheapest Output (Gemma 3 4B)
$0.10/M
Input: $0.05/M
Cheapest Output (Gemma 3 2B)
N/A
| Provider | Gemma 3 4B In $/M | Out $/M | Gemma 3 2B In $/M | Out $/M |
|---|---|---|---|---|
| $0.05 | $0.10 | — | — |
Recommendation Summary
- ‣Gemma 3 4B scores higher on overall quality (54 vs 42).
- ‣Gemma 3 2B has a smaller memory footprint (4.0 GB vs 8.6 GB BF16), making it easier to deploy on fewer GPUs.
- ‣Gemma 3 4B supports a longer context window (131,072 vs 8,192 tokens).
- ‣Gemma 3 4B is stronger at code generation (HumanEval: 32.0 vs 22.0).
- ‣Gemma 3 4B is better at math reasoning (GSM8K: 58.0 vs 42.0).