JAIS 30B
G42/Inception · dense · 30B parameters · 8,192 context
Parameters
30B
Context Window
8K tokens
Architecture
Dense
Best GPU
H20
Intelligence Brief
JAIS 30B is a 30B parameter DENSE model from G42/Inception, featuring Multi-Head Attention (MHA) with 56 layers and 7,168 hidden dimensions. With a 8,192 token context window, it supports multilingual. For self-hosted inference, H20 delivers optimal throughput at $940/month.
Architecture Details
Memory Requirements
BF16 Weights
60.0 GB
FP8 Weights
30.0 GB
INT4 Weights
15.0 GB
GPU Compatibility Matrix
JAIS 30B is compatible with 62% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
1.1K tok/s
Latency (ITL)
1.0ms
Est. TTFT
0ms
Cost/Month
$940
Cost/M Tokens
$0.34
FP8 · 1 GPU · tensorrt-llm
95/100
score
Throughput
1.1K tok/s
Latency (ITL)
1.0ms
Est. TTFT
0ms
Cost/Month
$2553
Cost/M Tokens
$0.93
FP8 · 1 GPU · tensorrt-llm
95/100
score
Throughput
934.2 tok/s
Latency (ITL)
1.1ms
Est. TTFT
0ms
Cost/Month
$1794
Cost/M Tokens
$0.73
Deployment Options
API Deployment
No API pricing available
Single GPU
H20
$940/mo
Min VRAM: 30 GB
Multi-GPU
A10G x4
177.9 tok/s
TP· $1139/mo
API Pricing Comparison
No API pricing data available for this model.
Performance Estimates
Throughput by GPU
VRAM Breakdown (H20, FP8)
Precision Impact
bf16
60.0 GB
weights/GPU
fp8
30.0 GB
weights/GPU
~1.1K tok/s
int4
15.0 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy JAIS 30B
Self-Hosted Infrastructure
Similar Models
MPT 30B
30B params · dense
Quality: 48
Qwen 3 30B-A3B
30.5B params · moe
Quality: 70
Gemma 4 31B-IT
31B params · dense
Quality: 77
from $0.30/M
Qwen 2.5 32B
32.5B params · dense
Quality: 73
from $0.80/M
Qwen 2.5 Coder 32B
32.5B params · dense
Quality: 80
from $0.80/M
Frequently Asked Questions
How much VRAM does JAIS 30B need for inference?
JAIS 30B requires approximately 60.0 GB of VRAM at BF16 precision, 30.0 GB at FP8, or 15.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (802816 bytes per token) and activations (~1.50 GB).
What is the best GPU for JAIS 30B?
The top recommended GPU for JAIS 30B is the H20 using FP8 precision. It achieves approximately 1.1K tokens/sec at an estimated cost of $940/month ($0.34/M tokens). Score: 100/100.
How much does JAIS 30B inference cost?
JAIS 30B inference costs vary by provider and GPU setup. Use our calculator for detailed cost estimates across all providers.