Falcon 3 10B
TII UAE · dense · 10.3B parameters · 32,768 context
Parameters
10.3B
Context Window
32K tokens
Architecture
Dense
Best GPU
A100 40GB SXM
Intelligence Brief
Falcon 3 10B is a 10.3B parameter DENSE model from TII UAE, featuring Grouped Query Attention (GQA) with 40 layers and 4,096 hidden dimensions. With a 32,768 token context window, it supports code, math, multilingual. For self-hosted inference, A100 40GB SXM delivers optimal throughput at $807/month.
Architecture Details
Memory Requirements
BF16 Weights
20.6 GB
FP8 Weights
10.3 GB
INT4 Weights
5.2 GB
GPU Compatibility Matrix
Falcon 3 10B is compatible with 89% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
BF16 · 1 GPU · vllm
95/100
score
Throughput
407.6 tok/s
Latency (ITL)
2.5ms
Est. TTFT
0ms
Cost/Month
$807
Cost/M Tokens
$0.75
BF16 · 1 GPU · vllm
95/100
score
Throughput
469.7 tok/s
Latency (ITL)
2.1ms
Est. TTFT
0ms
Cost/Month
$845
Cost/M Tokens
$0.68
BF16 · 1 GPU · vllm
95/100
score
Throughput
407.6 tok/s
Latency (ITL)
2.5ms
Est. TTFT
0ms
Cost/Month
$655
Cost/M Tokens
$0.61
Deployment Options
API Deployment
No API pricing available
Single GPU
A100 40GB SXM
$807/mo
Min VRAM: 10 GB
Multi-GPU
A4000 x2
188.9 tok/s
TP· $323/mo
API Pricing Comparison
No API pricing data available for this model.
Performance Estimates
Throughput by GPU
VRAM Breakdown (A100 40GB SXM, BF16)
Precision Impact
bf16
20.6 GB
weights/GPU
~407.6 tok/s
fp8
10.3 GB
weights/GPU
int4
5.2 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Falcon 3 10B
Similar Models
Falcon 11B
11B params · dense
Quality: 50
Falcon 3 7B
7.5B params · dense
Quality: 50
Falcon 7B
7B params · dense
Quality: 37
from $0.15/M
Alpamayo 1.5-10B
10B params · dense
Quality: 70
SOLAR 10.7B
10.7B params · dense
Quality: 50
from $0.30/M
Frequently Asked Questions
How much VRAM does Falcon 3 10B need for inference?
Falcon 3 10B requires approximately 20.6 GB of VRAM at BF16 precision, 10.3 GB at FP8, or 5.2 GB at INT4 quantization. Additional VRAM is needed for KV-cache (81920 bytes per token) and activations (~0.70 GB).
What is the best GPU for Falcon 3 10B?
The top recommended GPU for Falcon 3 10B is the A100 40GB SXM using BF16 precision. It achieves approximately 407.6 tokens/sec at an estimated cost of $807/month ($0.75/M tokens). Score: 95/100.
How much does Falcon 3 10B inference cost?
Falcon 3 10B inference costs vary by provider and GPU setup. Use our calculator for detailed cost estimates across all providers.