Canary 1B
NVIDIA · dense · 1B parameters · 4,096 context
Parameters
1B
Context Window
4K tokens
Architecture
Dense
Best GPU
RTX 3080
Cheapest API
$0.04/M
Intelligence Brief
Canary 1B is a 1B parameter DENSE model from NVIDIA, featuring Multi-Head Attention (MHA) with 24 layers and 1,024 hidden dimensions. With a 4,096 token context window, it supports multilingual. The most cost-effective API deployment is via nvidia-nim at $0.04/M output tokens. For self-hosted inference, RTX 3080 delivers optimal throughput at $133/month.
Architecture Details
Memory Requirements
BF16 Weights
2.0 GB
FP8 Weights
1.0 GB
INT4 Weights
0.5 GB
GPU Compatibility Matrix
Canary 1B is compatible with 100% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
BF16 · 1 GPU · vllm
90/100
score
Throughput
2.1K tok/s
Latency (ITL)
0.5ms
Est. TTFT
0ms
Cost/Month
$133
Cost/M Tokens
$0.02
BF16 · 1 GPU · vllm
90/100
score
Throughput
734.4 tok/s
Latency (ITL)
1.4ms
Est. TTFT
0ms
Cost/Month
$209
Cost/M Tokens
$0.11
BF16 · 1 GPU · vllm
90/100
score
Throughput
1.2K tok/s
Latency (ITL)
0.8ms
Est. TTFT
0ms
Cost/Month
$85
Cost/M Tokens
$0.03
Deployment Options
API Deployment
nvidia-nim
$0.04/M
output tokens
Single GPU
RTX 3080
$133/mo
Min VRAM: 1 GB
Multi-GPU
RTX 3080
2.1K tok/s
Best available config
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| nvidia-nim | $0.04 | $0.04 | Cheapest |
Cost Analysis
| Provider | Input $/M | Output $/M | ~Monthly Cost |
|---|---|---|---|
| nvidia-nimBest Value | $0.04 | $0.04 | $0 |
Cost per 1,000 Requests
Short (500 tok)
$0.03
via nvidia-nim
Medium (2K tok)
$0.11
via nvidia-nim
Long (8K tok)
$0.40
via nvidia-nim
Performance Estimates
Throughput by GPU
VRAM Breakdown (RTX 3080, BF16)
Precision Impact
bf16
2.0 GB
weights/GPU
~2.1K tok/s
fp8
1.0 GB
weights/GPU
int4
0.5 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Canary 1B
Self-Hosted Infrastructure
Similar Models
Gemma 3 1B
1B params · dense
Quality: 35
Llama Guard 3 1B
1B params · dense
Quality: 50
Falcon 3 1B
1B params · dense
Quality: 50
CSM-1B
1B params · dense
Quality: 50
Parakeet TDT 1.1B
1.1B params · dense
Quality: 50
from $0.04/M
Frequently Asked Questions
How much VRAM does Canary 1B need for inference?
Canary 1B requires approximately 2.0 GB of VRAM at BF16 precision, 1.0 GB at FP8, or 0.5 GB at INT4 quantization. Additional VRAM is needed for KV-cache (12288 bytes per token) and activations (~0.20 GB).
What is the best GPU for Canary 1B?
The top recommended GPU for Canary 1B is the RTX 3080 using BF16 precision. It achieves approximately 2.1K tokens/sec at an estimated cost of $133/month ($0.02/M tokens). Score: 90/100.
How much does Canary 1B inference cost?
Canary 1B API inference starts from $0.04/M input tokens and $0.04/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.