CSM-1B
Sesame · dense · 1B parameters · 4,096 context
Parameters
1B
Context Window
4K tokens
Architecture
Dense
Best GPU
RTX 3080
Intelligence Brief
CSM-1B is a 1B parameter DENSE model from Sesame, featuring Multi-Head Attention (MHA) with 12 layers and 1,024 hidden dimensions. With a 4,096 token context window, it supports general text generation. For self-hosted inference, RTX 3080 delivers optimal throughput at $133/month.
Architecture Details
Memory Requirements
BF16 Weights
2.0 GB
FP8 Weights
1.0 GB
INT4 Weights
0.5 GB
GPU Compatibility Matrix
CSM-1B is compatible with 100% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
BF16 · 1 GPU · vllm
90/100
score
Throughput
1.9K tok/s
Latency (ITL)
0.5ms
Est. TTFT
0ms
Cost/Month
$133
Cost/M Tokens
$0.03
BF16 · 1 GPU · vllm
90/100
score
Throughput
697.6 tok/s
Latency (ITL)
1.4ms
Est. TTFT
0ms
Cost/Month
$209
Cost/M Tokens
$0.11
BF16 · 1 GPU · vllm
90/100
score
Throughput
1.1K tok/s
Latency (ITL)
0.9ms
Est. TTFT
0ms
Cost/Month
$85
Cost/M Tokens
$0.03
Deployment Options
API Deployment
No API pricing available
Single GPU
RTX 3080
$133/mo
Min VRAM: 1 GB
Multi-GPU
RTX 3080
1.9K tok/s
Best available config
API Pricing Comparison
No API pricing data available for this model.
Performance Estimates
Throughput by GPU
VRAM Breakdown (RTX 3080, BF16)
Precision Impact
bf16
2.0 GB
weights/GPU
~1.9K tok/s
fp8
1.0 GB
weights/GPU
int4
0.5 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy CSM-1B
Self-Hosted Infrastructure
Similar Models
Gemma 3 1B
1B params · dense
Quality: 35
Canary 1B
1B params · dense
Quality: 50
from $0.04/M
Llama Guard 3 1B
1B params · dense
Quality: 50
Falcon 3 1B
1B params · dense
Quality: 50
Parakeet TDT 1.1B
1.1B params · dense
Quality: 50
from $0.04/M
Frequently Asked Questions
How much VRAM does CSM-1B need for inference?
CSM-1B requires approximately 2.0 GB of VRAM at BF16 precision, 1.0 GB at FP8, or 0.5 GB at INT4 quantization. Additional VRAM is needed for KV-cache (24576 bytes per token) and activations (~0.10 GB).
What is the best GPU for CSM-1B?
The top recommended GPU for CSM-1B is the RTX 3080 using BF16 precision. It achieves approximately 1.9K tokens/sec at an estimated cost of $133/month ($0.03/M tokens). Score: 90/100.
How much does CSM-1B inference cost?
CSM-1B inference costs vary by provider and GPU setup. Use our calculator for detailed cost estimates across all providers.