How many parameters does Falcon 180B have?

Falcon 180B has 180 billion total parameters with 180 billion active parameters. It uses a DENSE architecture with 80 layers and a hidden dimension of 14,848.

What is the context length of Falcon 180B?

Falcon 180B supports a context window of 2,048 tokens.

Updated minutes ago· Sources: GPU Pricing, API Token Pricing, Model Registry

Falcon 180B

TII · dense · 180B parameters · 2,048 context

Quality

60.0

Calculate ROI →Compare with others Fine-Tune This Model →

Parameters

180B

Context Window

2K tokens

Architecture

Dense

Best GPU

B200 SXM

Cheapest API

$2.40/M

Quality Score

60/100

Intelligence Brief

Falcon 180B is a 180B parameter DENSE model from TII, featuring Grouped Query Attention (GQA) with 80 layers and 14,848 hidden dimensions. With a 2,048 token context window, it supports code, multilingual. On standardized benchmarks, it achieves MMLU 68.6, HumanEval 33, GSM8K 55. The most cost-effective API deployment is via tii at $2.40/M output tokens. For self-hosted inference, B200 SXM delivers optimal throughput at $8522/month.

Architecture Details

TypeDENSE

Total Parameters180B

Active Parameters180B

Layers80

Hidden Dimension14,848

Attention Heads232

KV Heads8

Head Dimension64

Vocab Size65,024

Memory Requirements

BF16 Weights

360.0 GB

FP8 Weights

180.0 GB

INT4 Weights

90.0 GB

KV-Cache per Token163840 bytes

Activation Estimate4.00 GB

Fits on (single GPU) — most practical first

BEST FITInstinct MI250X128GBINT4 Gaudi 3128GBINT4 Gaudi 3 HL-325L128GBINT4 H200 SXM141GBINT4 B200 SXM180GBINT4 H100 NVL 94GB (per GPU pair)188GBINT4 B100 SXM192GBINT4 GB200 NVL72 (per GPU)192GBINT4 GB300 NVL72 (per GPU)192GBINT4 Instinct MI300X192GBINT4 Groq LPU230GBFP8 Instinct MI325X256GBFP8

GPU Compatibility Matrix

Falcon 180B is compatible with 14% of GPU configurations across 41 GPUs at 3 precision levels.

No fit

Tight

Good

Excellent

BF16 (Full)

FP8 (Half)

INT4 (Quarter)

Blackwell(7 GPUs)

B200 NVL (pair)360GB

B300288GB

B100 SXM192GB

GB200 NVL72 (per GPU)192GB

✗

53%

Hopper(7 GPUs)

H100 NVL 94GB (per GPU pair)188GB

H200 SXM141GB

H2096GB

GH20096GB

Ada Lovelace(11 GPUs)

L40S48GB

L4048GB

RTX 6000 Ada48GB

L2048GB

Ampere(16 GPUs)

A100 80GB SXM80GB

A100 80GB PCIe80GB

A1664GB

RTX A600048GB

Legend:No fitVery tightTightModerateGoodExcellent

GPU Recommendations

B200 SXMoptimal

FP8 · 2 GPUs · tensorrt-llm

98/100

score

Throughput

280.0 tok/s

Latency (ITL)

3.6ms

Est. TTFT

1ms

Cost/Month

$8522

Cost/M Tokens

$11.58

Use this config →

B100 SXMoptimal

FP8 · 2 GPUs · tensorrt-llm

98/100

score

Throughput

280.0 tok/s

Latency (ITL)

3.6ms

Est. TTFT

1ms

Cost/Month

$8541

Cost/M Tokens

$11.61

Use this config →

H200 SXMoptimal

FP8 · 2 GPUs · tensorrt-llm

95/100

score

Throughput

280.0 tok/s

Latency (ITL)

3.6ms

Est. TTFT

1ms

Cost/Month

$5106

Cost/M Tokens

$6.94

Use this config →

Deployment Options

API

API Deployment

tii

$2.40/M

output tokens

Self-Hosted

Single GPU

B200 NVL (pair)

$9965/mo

Min VRAM: 180 GB

Scale

Multi-GPU

B200 SXM x2

280.0 tok/s

TP· $8522/mo

Open Full Calculator →

API Pricing Comparison

Provider	Input $/M	Output $/M	Badges
tii	$2.40	$2.40	Cheapest

Cost Analysis

Estimate monthly cost at M tokens/month (50% input, 50% output)

Provider	Input $/M	Output $/M	~Monthly Cost
tiiBest Value	$2.40	$2.40	$24

Cost per 1,000 Requests

Short (500 tok)

$1.68

via tii

Medium (2K tok)

$6.72

via tii

Long (8K tok)

$24.00

via tii

Performance Estimates

Throughput by GPU

B200 SXM

280.0 tok/s

B100 SXM

280.0 tok/s

H200 SXM

280.0 tok/s

VRAM Breakdown (B200 SXM, FP8)

Weights

Act

Weights 90.0 GBKV-Cache 1.3 GBActivations 32.0 GBOverhead 4.5 GB

Precision Impact

bf16

180.0 GB

weights/GPU

fp8

90.0 GB

weights/GPU

~280.0 tok/s

int4

45.0 GB

weights/GPU

Quality Benchmarks

Average

63th percentile across all models

MMLU

68.6

Below Average (29th pctile)

HumanEval

33.0

Bottom 25% (16th pctile)

GSM8K

55.0

Bottom 25% (19th pctile)

MT-Bench

72.0

Bottom 25% (0th pctile)

Capabilities

Features

✗ Tool Use✗ Vision✓ Code✗ Math✗ Reasoning✓ Multilingual✗ Structured Output

Supported Frameworks

vllmsglangtgitensorrt-llm

Supported Precisions

BF16 (default)FP8INT4

Where to Deploy Falcon 180B

Self-Hosted Infrastructure

B200 SXM GPU Details

Pricing across 3 providers

→

Configure Deployment

Full calculator with B200 SXM

→

Share This Benchmark

Falcon 180B

TII

Quality

60/100

Throughput

280 tok/s

Best GPU

B200 SXM

Cost/M Tokens

$2.40

Share on X Share on LinkedIn

Embed Badge

<a href="https://inferencebench.io/models/falcon/falcon-180b/"><img src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20width%3D%22286%22%20height%3D%2220%22%20role%3D%22img%22%20aria-label%3D%22InferenceBench%3A%20Falcon%20180B%20%7C%20180B%20params%22%3E%0A%20%20%3Ctitle%3EInferenceBench%3A%20Falcon%20180B%20%7C%20180B%20params%3C%2Ftitle%3E%0A%20%20%3ClinearGradient%20id%3D%22s%22%20x2%3D%220%22%20y2%3D%22100%25%22%3E%0A%20%20%20%20%3Cstop%20offset%3D%220%22%20stop-color%3D%22%23bbb%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%20%20%3Cstop%20offset%3D%221%22%20stop-opacity%3D%22.1%22%2F%3E%0A%20%20%3C%2FlinearGradient%3E%0A%20%20%3CclipPath%20id%3D%22r%22%3E%0A%20%20%20%20%3Crect%20width%3D%22286%22%20height%3D%2220%22%20rx%3D%223%22%20fill%3D%22%23fff%22%2F%3E%0A%20%20%3C%2FclipPath%3E%0A%20%20%3Cg%20clip-path%3D%22url(%23r)%22%3E%0A%20%20%20%20%3Crect%20width%3D%22107%22%20height%3D%2220%22%20fill%3D%22%23333%22%2F%3E%0A%20%20%20%20%3Crect%20x%3D%22107%22%20width%3D%22179%22%20height%3D%2220%22%20fill%3D%22%23007ec6%22%2F%3E%0A%20%20%20%20%3Crect%20width%3D%22286%22%20height%3D%2220%22%20fill%3D%22url(%23s)%22%2F%3E%0A%20%20%3C%2Fg%3E%0A%20%20%3Cg%20fill%3D%22%23fff%22%20text-anchor%3D%22middle%22%20font-family%3D%22Verdana%2CGeneva%2CDejaVu%20Sans%2Csans-serif%22%20text-rendering%3D%22geometricPrecision%22%20font-size%3D%2211%22%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%2253.5%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EInferenceBench%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%2253.5%22%20y%3D%2213%22%3EInferenceBench%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20aria-hidden%3D%22true%22%20x%3D%22196.5%22%20y%3D%2214%22%20fill%3D%22%23010101%22%20fill-opacity%3D%22.3%22%3EFalcon%20180B%20%7C%20180B%20params%3C%2Ftext%3E%0A%20%20%20%20%3Ctext%20x%3D%22196.5%22%20y%3D%2213%22%3EFalcon%20180B%20%7C%20180B%20params%3C%2Ftext%3E%0A%20%20%3C%2Fg%3E%0A%3C%2Fsvg%3E" alt="InferenceBench — Falcon 180B" /></a>

Similar Models

Gemini 1.5 Pro

175B params · moe

Quality: 80

from $5.00/M