Falcon 3 10B
TII UAE · dense · 10.3B parameters · 32,768 context
Parameters
10.3B
Context Window
32K tokens
Architecture
Dense
Best GPU
A100 40GB SXM
Intelligence Brief
Falcon 3 10B is a 10.3B parameter DENSE model from TII UAE, featuring Grouped Query Attention (GQA) with 40 layers and 4,096 hidden dimensions. With a 32,768 token context window, it supports code, math, multilingual. For self-hosted inference, A100 40GB SXM delivers optimal throughput at $807/month.
Recent changes
Loading…
Related models
5 suggestions
Falcon 180BFalcon · 180B$2.40/M out
Falcon 40BFalcon · 40B$0.800/M out
Falcon 7BFalcon · 7B$0.150/M out
Falcon 11BFalcon · 11B—
Falcon 3 1BFalcon · 1B—
Picks: same family first, then same vendor within ±2× params, then top tag-overlap matches. Price shown is the cheapest Output $/M across providers — the row's page shows the canonical anchor.
Architecture Details
Memory Requirements
BF16 Weights
20.6 GB
FP8 Weights
10.3 GB
INT4 Weights
5.2 GB
GPU Compatibility Matrix
Falcon 3 10B is compatible with 89% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
BF16 · 1 GPU · vllm
95/100
score
Throughput
407.6 tok/s
Latency (ITL)
2.5ms
Est. TTFT
0ms
Cost/Month
$807
Cost/M Tokens
$0.75
BF16 · 1 GPU · vllm
95/100
score
Throughput
469.7 tok/s
Latency (ITL)
2.1ms
Est. TTFT
0ms
Cost/Month
$845
Cost/M Tokens
$0.68
BF16 · 1 GPU · vllm
95/100
score
Throughput
407.6 tok/s
Latency (ITL)
2.5ms
Est. TTFT
0ms
Cost/Month
$655
Cost/M Tokens
$0.61
Deployment Options
API Deployment
No API pricing available
Single GPU
A100 40GB SXM
$807/mo
Min VRAM: 10 GB
Multi-GPU
A4000 x2
188.9 tok/s
TP· $323/mo
API Pricing Comparison
No API pricing data available for this model.
Performance Estimates
Throughput by GPU
VRAM Breakdown (A100 40GB SXM, BF16)
Precision Impact
bf16
20.6 GB
weights/GPU
~407.6 tok/s
fp8
10.3 GB
weights/GPU
int4
5.2 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Falcon 3 10B
Similar Models
Falcon 11B
11B params · dense
Quality: 50
Falcon 3 7B
7.5B params · dense
Quality: 50
Falcon 7B
7B params · dense
Quality: 37
from $0.15/M
mmE5-mllama-11b-instruct
10.6B params · dense
Quality: 50
Alpamayo 1.5-10B
10B params · dense
Quality: 70
Frequently Asked Questions
How much VRAM does Falcon 3 10B need for inference?
Falcon 3 10B requires approximately 20.6 GB of VRAM at BF16 precision, 10.3 GB at FP8, or 5.2 GB at INT4 quantization. Additional VRAM is needed for KV-cache (81920 bytes per token) and activations (~0.70 GB).
What is the best GPU for Falcon 3 10B?
The top recommended GPU for Falcon 3 10B is the A100 40GB SXM using BF16 precision. It achieves approximately 407.6 tokens/sec at an estimated cost of $807/month ($0.75/M tokens). Score: 95/100.
How much does Falcon 3 10B inference cost?
Falcon 3 10B inference costs vary by provider and GPU setup. Use our calculator for detailed cost estimates across all providers.