Browse All Models
Explore all 302 text-generation models in our catalog plus 8 specialized non-LLM models below. Filter by family, architecture, size, or capability. Click any model to see detailed specs, GPU requirements, and pricing. Prices, providers and release dates live on the leaderboard →
Specialized models
Non-text modalities — TTS, image-gen, vision-embedding, video-fx, protein. These don't fit the LLM economics tables below and have their own pricing shapes (per-image, per-audio-second, etc.).
NVIDIA · BioNeMo
ESM-2 650M is a 33-layer transformer trained as a protein language model on ~65M UniRef50 sequences. NVIDIA BioNeMo prov…
mit
NVIDIA · Picasso
Edify Image is NVIDIA's enterprise text-to-image model — trained exclusively on commercially-licensed data through the P…
nvidia-picasso-commercial-license
NVIDIA · Maxine
Maxine Eye Contact is one effect in NVIDIA's Maxine SDK for real-time video communications. Maintains eye-contact appear…
nvidia-maxine-sdk-license
NVIDIA · NV-CLIP
NV-CLIP is NVIDIA's tuned variant of OpenAI CLIP, packaged as an NVIDIA NIM container for production embedding workloads…
nvidia-open-model-license
NVIDIA · DINOv2
DINOv2 is a self-supervised vision foundation model — produces strong frozen image features that transfer to many downst…
apache-2.0
NVIDIA · Riva TTS
FastPitch is a parallel transformer-based mel-spectrogram generator that explicitly controls pitch and duration of speec…
cc-by-4.0
NVIDIA · Riva TTS
HiFi-GAN is the vocoder half of NVIDIA's en-US Riva TTS stack — converts mel-spectrograms produced by FastPitch into 22.…
cc-by-4.0
NVIDIA · StyleGAN
StyleGAN3 is NVIDIA Labs' third-generation generative adversarial network for photorealistic image synthesis. Introduced…
nvidia-source-code-license-nc
| Name ▲ | Provider | Family | Params | Arch | Context | Precision | Capabilities | VRAM | Frameworks | Quality |
|---|---|---|---|---|---|---|---|---|---|---|
| Sentence Transformers | MiniLM | 23M | dense | 256 | bf16 | — | 0.0 GB | tgi · ollama | — | |
| NVIDIA | Alpamayo | 10B | dense | 8K | bf16 | 20.0 GB | vllm | 70.0 | ||
| Amazon | Nova | 12B | dense | 300K | bf16 | 24.0 GB | vllm | 35.0 | ||
| Amazon | Nova | 50B | dense | 300K | bf16 | 100.0 GB | vllm | 36.0 | ||
| Cohere | Aya | 35B | dense | 131K | bf16 | 70.0 GB | vllm · sglang · tgi+1 | — | ||
| Cohere | Aya | 8B | dense | 8K | bf16 | 16.0 GB | vllm · sglang · tgi+2 | — | ||
| Baichuan | Baichuan 2 | 13B | dense | 4K | bf16 | 26.0 GB | vllm · sglang · tgi | — | ||
| Baichuan | Baichuan 2 | 7B | dense | 4K | bf16 | 14.0 GB | vllm · tgi | — | ||
| BAAI | BGE | 110M | dense | 512 | bf16 | — | 0.2 GB | vllm · tgi | — | |
| BAAI | BGE | 335M | dense | 512 | bf16 | — | 0.7 GB | vllm · tgi · tensorrt-llm | — | |
| BAAI | BGE | 568M | dense | 8K | bf16 | 1.1 GB | vllm · tgi · tensorrt-llm | — | ||
| BAAI | BGE | 33M | dense | 512 | bf16 | — | 0.1 GB | vllm · tgi | — | |
| BioMistral | BioMistral | 7.2B | dense | 33K | bf16 | — | 14.4 GB | vllm · sglang · tgi+1 | — | |
| Cerebras | BTLM | 3B | dense | 8K | bf16 | 6.0 GB | vllm · tgi | — | ||
| NVIDIA | Canary | 1B | dense | 4K | bf16 | 2.0 GB | tensorrt-llm · vllm | — | ||
| Cerebras | Cerebras GPT | 13B | dense | 2K | bf16 | — | 26.0 GB | vllm · tgi | — | |
| Tsinghua University | ChatGLM3 | 6B | dense | 131K | bf16 | 12.0 GB | vllm · sglang · tgi+1 | — | ||
| Zhipu AI | ChatGLM | 9.4B | dense | 131K | bf16 | 18.8 GB | vllm · sglang · tgi | — | ||
| Anthropic | Claude | 175B | dense | 200K | bf16 | 350.0 GB | — | 80.0 | ||
| Anthropic | Claude | 70B | dense | 200K | bf16 | 140.0 GB | — | 78.0 | ||
| Anthropic | Claude | 20B | dense | 200K | bf16 | 40.0 GB | — | 67.0 | ||
| Anthropic | Claude | 175B (50B active) | moe | 200K | bf16 | 350.0 GB | vllm | — | ||
| Anthropic | Claude | 30B | moe | 200K | bf16 | 60.0 GB | vllm | — | ||
| Anthropic | Claude | 200B | dense | 200K | bf16 | 400.0 GB | — | 90.0 | ||
| Anthropic | Claude | 300B (75B active) | moe | 200K | bf16 | 600.0 GB | vllm | 90.0 | ||
| Anthropic | Claude | 400B (80B active) | moe | 200K | bf16 | 800.0 GB | vllm | 90.0 | ||
| Anthropic | Claude | 450B (85B active) | moe | 1000K | bf16 | 900.0 GB | vllm | 90.0 | ||
| Anthropic | Claude | 500B (90B active) | moe | 1000K | bf16 | 1000.0 GB | vllm | 90.0 | ||
| Anthropic | Claude | 70B | dense | 200K | bf16 | 140.0 GB | — | 86.0 | ||
| Anthropic | Claude | 150B (60B active) | moe | 200K | bf16 | 300.0 GB | vllm | 86.0 | ||
| Anthropic | Claude | 180B (70B active) | moe | 1000K | bf16 | 360.0 GB | vllm | 86.0 | ||
| Meta | Code Llama | 13B | dense | 16K | bf16 | 26.0 GB | vllm · sglang · tgi+2 | 44.0 | ||
| Meta | Code Llama | 34B | dense | 100K | bf16 | 68.0 GB | vllm · sglang · tgi+2 | 55.0 | ||
| Meta | Code Llama | 70B | dense | 16K | bf16 | 140.0 GB | vllm · sglang · tgi+1 | 60.0 | ||
| Meta | Code Llama | 7B | dense | 16K | bf16 | 14.0 GB | vllm · sglang · tgi+2 | 39.0 | ||
| Gemma | 8.5B | dense | 8K | bf16 | 17.0 GB | vllm · sglang · tgi+1 | 52.0 | |||
| Salesforce | CodeGen2 | 16B | dense | 2K | bf16 | 32.0 GB | vllm · tgi | — | ||
| Mistral AI | Codestral | 22B | dense | 33K | bf16 | 44.0 GB | vllm · sglang · tgi+1 | 63.0 | ||
| Mistral AI | Codestral | 7.3B | hybrid | 262K | bf16 | 14.6 GB | vllm · sglang | — | ||
| THUDM | CogVLM2 | 19B | dense | 8K | bf16 | 38.0 GB | vllm · sglang · tgi | — | ||
| Cohere | Embed | 500M | dense | 512 | bf16 | — | 1.0 GB | — | — | |
| Cohere | Command | 111B | dense | 256K | bf16 | 222.0 GB | — | 81.0 | ||
| Cohere | Command R | 35B | dense | 131K | bf16 | 70.0 GB | vllm · sglang · tgi+1 | 68.0 | ||
| Cohere | Command R | 35B | dense | 128K | bf16 | 70.0 GB | vllm · sglang · tgi | 68.0 | ||
| Cohere | Command R | 7B | dense | 131K | bf16 | 14.0 GB | vllm · sglang · tgi+2 | 68.0 | ||
| Cohere | Command R | 104B | dense | 131K | bf16 | 208.0 GB | vllm · sglang · tgi+1 | 68.0 | ||
| NVIDIA | Cosmos | 7B | dense | 4K | bf16 | 14.0 GB | tensorrt-llm | 60.0 | ||
| Sesame | CSM | 1B | dense | 4K | bf16 | — | 2.0 GB | ollama | — | |
| OpenAI | DALL-E | 3.5B | dense | 4K | bf16 | 7.0 GB | — | — | ||
| Databricks | DBRX | 132B (36B active) | moe | 33K | bf16 | 264.0 GB | vllm · sglang · tgi+1 | — | ||
| Databricks | DBRX | 132B (36B active) | moe | 33K | bf16 | 264.0 GB | vllm · sglang · tgi+1 | — | ||
| DeepSeek | DeepSeek Coder | 33B | dense | 16K | bf16 | 66.0 GB | vllm · sglang · tgi+1 | — | ||
| DeepSeek | DeepSeek Coder | 6.7B | dense | 16K | bf16 | 13.4 GB | vllm · sglang · tgi+2 | — | ||
| DeepSeek | DeepSeek Coder V2 | 236B (21B active) | moe | 131K | bf16 | 472.0 GB | vllm · sglang · tensorrt-llm | — | ||
| DeepSeek | DeepSeek LLM | 67B | dense | 4K | bf16 | 134.0 GB | vllm · sglang · tgi+1 | 66.0 | ||
| DeepSeek | DeepSeek Math | 7.24B | dense | 4K | bf16 | 14.5 GB | vllm · sglang · tgi+2 | — | ||
| DeepSeek | DeepSeek MoE | 16.4B (2.8B active) | moe | 4K | bf16 | 32.8 GB | vllm · sglang · tgi | — | ||
| DeepSeek | DeepSeek R1 | 671B (37B active) | moe | 131K | bf16 | 1342.0 GB | vllm · sglang · tensorrt-llm | 88.0 | ||
| DeepSeek | DeepSeek R1 | 1.5B | dense | 131K | bf16 | 3.0 GB | vllm · sglang · tgi+1 | 42.0 | ||
| DeepSeek | DeepSeek R1 | 14.8B | dense | 131K | bf16 | 29.6 GB | vllm · sglang · tgi+2 | 88.0 | ||
| DeepSeek | DeepSeek R1 | 32.8B | dense | 131K | bf16 | 65.6 GB | vllm · sglang · tgi+2 | 88.0 | ||
| DeepSeek | DeepSeek R1 | 70.6B | dense | 131K | bf16 | 141.2 GB | vllm · sglang · tgi+1 | 88.0 | ||
| DeepSeek | DeepSeek R1 | 8B | dense | 131K | bf16 | 16.0 GB | vllm · sglang · tgi+2 | 88.0 | ||
| DeepSeek | DeepSeek V2 | 15.7B (2.4B active) | moe | 33K | bf16 | 31.4 GB | vllm · sglang · tgi | — | ||
| DeepSeek | DeepSeek V2 | 236B (21B active) | moe | 131K | bf16 | 472.0 GB | vllm · sglang · tensorrt-llm | 78.0 | ||
| DeepSeek | DeepSeek V3 | 671B (37B active) | moe | 131K | bf16 | 1342.0 GB | vllm · sglang · tensorrt-llm | 81.0 | ||
| DeepSeek | DeepSeek V3 | 685B (37B active) | moe | 131K | fp8 | 685.0 GB | vllm · sglang · tensorrt-llm | 81.0 | ||
| Cognitive Computations | Dolphin | 72B | dense | 33K | bf16 | 144.0 GB | vllm · sglang · tgi+1 | — | ||
| NVIDIA | Eagle | 1.3B | dense | 4K | bf16 | 2.6 GB | vllm · ollama | 65.0 | ||
| NVIDIA | Eagle | 9B | dense | 8K | bf16 | 18.0 GB | vllm · tensorrt-llm | 65.0 | ||
| NVIDIA | Eagle | 8B | dense | 16K | bf16 | 16.0 GB | vllm · tensorrt-llm | 65.0 | ||
| ELYZA | ELYZA | 13B | dense | 4K | bf16 | 26.0 GB | vllm · tgi · ollama | — | ||
| TII UAE | Falcon | 11B | dense | 8K | bf16 | 22.0 GB | vllm · sglang · tgi | — | ||
| TII | Falcon | 180B | dense | 2K | bf16 | 360.0 GB | vllm · sglang · tgi+1 | 60.0 | ||
| TII UAE | Falcon | 10.3B | dense | 33K | bf16 | 20.6 GB | vllm · sglang · tgi+2 | — | ||
| TII UAE | Falcon | 1B | dense | 8K | bf16 | 2.0 GB | vllm · sglang · tgi+2 | — | ||
| TII UAE | Falcon | 3B | dense | 8K | bf16 | 6.0 GB | vllm · sglang · tgi+2 | — | ||
| TII UAE | Falcon | 7.5B | dense | 33K | bf16 | 15.0 GB | vllm · sglang · tgi+2 | — | ||
| TII | Falcon | 40B | dense | 2K | bf16 | 80.0 GB | vllm · sglang · tgi+1 | 48.0 | ||
| TII | Falcon | 7B | dense | 2K | bf16 | 14.0 GB | vllm · sglang · tgi+2 | 37.0 | ||
| TII | Falcon Mamba | 7.27B | hybrid | 8K | bf16 | 14.5 GB | vllm · sglang | — | ||
| AI4Finance | FinGPT | 7.2B | dense | 4K | bf16 | 14.4 GB | vllm · tgi · ollama | — | ||
| Microsoft | Florence | 770M | dense | 2K | bf16 | 1.5 GB | vllm | — | ||
| Black Forest Labs | FLUX | 12B | dense | 512 | bf16 | 24.0 GB | — | — | ||
| Black Forest Labs | FLUX | 12B | dense | 4K | bf16 | 24.0 GB | vllm · tensorrt-llm | — | ||
| Gemini | 50B (12B active) | moe | 1049K | bf16 | 100.0 GB | — | 75.0 | |||
| Gemini | 175B (40B active) | moe | 2097K | bf16 | 350.0 GB | — | 80.0 | |||
| Gemini | 50B (15B active) | moe | 1049K | bf16 | 100.0 GB | — | 80.0 | |||
| Gemini | 600B (150B active) | moe | 2000K | bf16 | 1200.0 GB | — | 88.0 | |||
| Google DeepMind | Gemini | 600B (100B active) | moe | 1000K | bf16 | 1200.0 GB | vllm | — | ||
| Gemma | 2.5B | dense | 8K | bf16 | 5.0 GB | vllm · sglang · tgi+2 | — | |||
| Gemma 2 | 27B | dense | 8K | bf16 | 54.0 GB | vllm · sglang · tgi+2 | 65.0 | |||
| Gemma 2 | 2.6B | dense | 8K | bf16 | 5.2 GB | vllm · sglang · tgi+2 | 44.0 | |||
| Gemma 2 | 9.2B | dense | 8K | bf16 | 18.4 GB | vllm · sglang · tgi+2 | 68.0 | |||
| Gemma 3 | 12B | dense | 131K | bf16 | 24.0 GB | vllm · sglang · tgi+2 | 71.0 | |||
| Gemma 3 | 1B | dense | 33K | bf16 | 2.0 GB | vllm · sglang · tgi+2 | 35.0 | |||
| Gemma 3 | 27B | dense | 131K | bf16 | 54.0 GB | vllm · sglang · tgi+2 | 69.0 | |||
| Gemma 3 | 2B | dense | 8K | bf16 | 4.0 GB | vllm · sglang · tgi+1 | 42.0 | |||
| Gemma 3 | 4.3B | dense | 131K | bf16 | 8.6 GB | vllm · sglang · tgi+2 | 54.0 | |||
| Gemma 4 | 31B | dense | 33K | bf16 | 62.0 GB | vllm · sglang · tgi+2 | 77.0 | |||
| Sberbank | GigaChat | 20B | dense | 8K | bf16 | 40.0 GB | vllm · tgi | — | ||
| Zhipu AI | GLM-4 | 9.4B | dense | 131K | bf16 | 18.8 GB | vllm · sglang · tgi+1 | — | ||
| Zhipu AI | GLM-5 | 200B | dense | 128K | bf16 | 400.0 GB | vllm · sglang · tgi | 51.0 | ||
| OpenAI | GPT-3.5 | 20B | dense | 16K | bf16 | 40.0 GB | — | 67.0 | ||
| OpenAI | GPT-4 | 200B (50B active) | moe | 128K | bf16 | 400.0 GB | — | 80.0 | ||
| OpenAI | GPT | 1500B (300B active) | moe | 128K | bf16 | 3000.0 GB | — | 93.0 | ||
| OpenAI | GPT-4 | 200B (50B active) | moe | 128K | bf16 | 400.0 GB | — | 85.0 | ||
| OpenAI | GPT-4 | 8B | dense | 128K | bf16 | 16.0 GB | — | 72.0 | ||
| OpenAI | GPT | 500B (90B active) | moe | 400K | bf16 | 1000.0 GB | vllm | — | ||
| OpenAI | GPT | 80B (25B active) | moe | 400K | bf16 | 160.0 GB | vllm | — | ||
| OpenAI | GPT | 8B (4B active) | moe | 400K | bf16 | 16.0 GB | vllm | — | ||
| OpenAI | GPT | 700B (110B active) | moe | 1000K | bf16 | 1400.0 GB | vllm | — | ||
| xAI | Grok | 600B (120B active) | moe | 131K | bf16 | 1200.0 GB | — | 90.0 | ||
| xAI | Grok | 400B (80B active) | moe | 256K | bf16 | 800.0 GB | vllm | — | ||
| xAI | Grok | 500B (90B active) | moe | 1000K | bf16 | 1000.0 GB | vllm | — | ||
| xAI | Grok | 314B (50B active) | moe | 131K | bf16 | 628.0 GB | vllm | 78.0 | ||
| xAI | Grok | 314B | dense | 131K | bf16 | 628.0 GB | vllm | 91.0 | ||
| xAI | Grok | 33B | dense | 131K | bf16 | 66.0 GB | vllm | 78.0 | ||
| Alibaba | GTE | 7.6B | dense | 33K | bf16 | 15.2 GB | vllm · sglang · tgi+1 | — | ||
| H2O.ai | H2O Danube | 500M | dense | 8K | bf16 | — | 1.0 GB | vllm · sglang · tgi+1 | — | |
| NVIDIA | Llama 3.1 | 70.6B | dense | 131K | bf16 | 141.2 GB | tensorrt-llm · vllm · sglang | 82.0 | ||
| Nous Research | Hermes 3 | 70.6B | dense | 131K | bf16 | 141.2 GB | vllm · sglang · tgi+1 | — | ||
| Nous Research | Hermes 3 | 8.03B | dense | 131K | bf16 | 16.1 GB | vllm · sglang · tgi+2 | — | ||
| Inflection AI | Inflection | 100B | dense | 8K | bf16 | 200.0 GB | — | 74.0 | ||
| Microsoft SAIL | InfoXLM | 550M | dense | 512 | bf16 | 1.1 GB | tgi | — | ||
| Shanghai AI Lab | InternLM 2.5 | 19.9B | dense | 262K | bf16 | 39.8 GB | vllm · sglang · tgi | — | ||
| Shanghai AI Lab | InternLM 2.5 | 7.74B | dense | 1049K | bf16 | 15.5 GB | vllm · sglang · tgi+1 | — | ||
| SenseTime | InternLM | 20B | dense | 16K | bf16 | 40.0 GB | vllm · tgi | — | ||
| Shanghai AI Lab | InternLM | 8B | dense | 33K | bf16 | 16.0 GB | vllm · sglang · tgi+1 | — | ||
| InternLM | InternVL2 | 26B | dense | 33K | bf16 | 52.0 GB | vllm · sglang · tgi | — | ||
| G42/Inception | JAIS | 30B | dense | 8K | bf16 | 60.0 GB | vllm · tgi | — | ||
| AI21 | Jamba | 398B | hybrid | 256K | bf16 | 796.0 GB | vllm · sglang | — | ||
| AI21 | Jamba | 52B | hybrid | 256K | bf16 | 104.0 GB | vllm · sglang | — | ||
| AI21 | Jamba | 52B (12B active) | moe | 256K | bf16 | 104.0 GB | — | 66.0 | ||
| DeepSeek | Janus | 7B | dense | 8K | bf16 | 14.0 GB | vllm · ollama | 62.0 | ||
| Stability AI | StableLM | 70B | dense | 8K | bf16 | 140.0 GB | vllm · sglang · tgi | — | ||
| Jina AI | Jina Embeddings | 570M | dense | 8K | bf16 | 1.1 GB | tgi · tensorrt-llm | — | ||
| Moonshot AI | Kimi | 1000B (32B active) | moe | 131K | fp8 | 1000.0 GB | vllm · sglang | 54.0 | ||
| Hexagrad | Kokoro | 82M | dense | 2K | bf16 | — | 0.2 GB | ollama | — | |
| Korea University | KULLM | 12.8B | dense | 4K | bf16 | 25.6 GB | vllm · tgi | — | ||
| Meta | Llama 2 | 13B | dense | 4K | bf16 | 26.0 GB | vllm · sglang · tgi+2 | 47.0 | ||
| Meta | Llama 2 | 70B | dense | 4K | bf16 | 140.0 GB | vllm · sglang · tgi+1 | 62.0 | ||
| Meta | Llama 2 | 7B | dense | 4K | bf16 | 14.0 GB | vllm · sglang · tgi+2 | 40.0 | ||
| Meta | Llama 3 | 70.6B | dense | 8K | bf16 | 141.2 GB | vllm · sglang · tgi+1 | 80.0 | ||
| Gradient | Llama 3 | 70.6B | dense | 1049K | bf16 | 141.2 GB | vllm · sglang | — | ||
| Meta | Llama 3 | 8B | dense | 8K | bf16 | 16.0 GB | vllm · sglang · tgi+2 | 63.0 | ||
| Meta | Llama 3.1 | 405B | dense | 131K | bf16 | 810.0 GB | vllm · sglang · tgi+1 | 81.0 | ||
| Meta | Llama 3.1 | 70.6B | dense | 131K | bf16 | 141.2 GB | vllm · sglang · tgi+1 | 75.0 | ||
| Together AI | Llama 3.1 | 70.6B | dense | 131K | fp8 | 70.6 GB | vllm · sglang · tensorrt-llm | — | ||
| Meta | Llama 3.1 | 8.03B | dense | 131K | bf16 | 16.1 GB | vllm · sglang · tgi+2 | 58.0 | ||
| NVIDIA | Llama 3.1 | 51B | dense | 131K | bf16 | 102.0 GB | tensorrt-llm · vllm · sglang | 78.0 | ||
| NVIDIA | Llama 3.1 | 70.6B | dense | 131K | bf16 | 141.2 GB | vllm · sglang · tgi+1 | 83.0 | ||
| NVIDIA | Llama 3.1 | 70.6B | dense | 131K | bf16 | 141.2 GB | tensorrt-llm · vllm · sglang | 80.0 | ||
| Meta | Llama 3.2 | 11B | dense | 131K | bf16 | 22.0 GB | vllm · sglang · tgi+2 | 9.0 | ||
| Meta | Llama 3.2 | 1.24B | dense | 131K | bf16 | 2.5 GB | vllm · sglang · tgi+2 | 38.0 | ||
| Meta | Llama 3.2 | 3.21B | dense | 131K | bf16 | 6.4 GB | vllm · sglang · tgi+2 | 55.0 | ||
| Meta | Llama 3.2 | 90B | dense | 131K | bf16 | 180.0 GB | vllm · sglang · tgi+1 | 84.0 | ||
| Meta | Llama 3.2 | 88.8B | dense | 131K | bf16 | 177.6 GB | vllm · sglang · tgi+1 | 84.0 | ||
| Meta | Llama 3.3 | 70.6B | dense | 131K | bf16 | 141.2 GB | vllm · sglang · tgi+2 | 77.0 | ||
| Meta | Llama 3.3 | 8B | dense | 131K | bf16 | 16.0 GB | vllm · sglang · tgi+2 | — | ||
| Meta | Llama 4 | 2000B (400B active) | moe | 1049K | bf16 | 4000.0 GB | — | 93.0 | ||
| Meta | Llama 4 | 400B (17B active) | moe | 1049K | bf16 | 800.0 GB | vllm · sglang · tensorrt-llm | 84.0 | ||
| Meta | Llama 4 | 109B (17B active) | moe | 10486K | bf16 | 218.0 GB | vllm · sglang · tensorrt-llm | 73.0 | ||
| Meta | Llama Guard | 1B | dense | 131K | bf16 | 2.0 GB | vllm · sglang · tgi+1 | — | ||
| Meta | Llama Guard | 8B | dense | 131K | bf16 | 16.0 GB | vllm · sglang · tgi+2 | — | ||
| Alibaba | Marco | 7.6B | dense | 66K | bf16 | 15.2 GB | vllm · sglang · tgi+1 | — | ||
| EPFL | Meditron | 70B | dense | 4K | bf16 | — | 140.0 GB | vllm · sglang · tgi+1 | — | |
| NVIDIA | Megatron-Turing | 530B | dense | 2K | bf16 | 1060.0 GB | tensorrt-llm · vllm | 58.0 | ||
| MiniMax | MiniMax M | 456B (45.9B active) | moe | 1049K | bf16 | 912.0 GB | vllm · sglang | 82.0 | ||
| MiniMax | MiniMax-M2 | 229B (7B active) | moe | 197K | fp8 | 229.0 GB | vllm · sglang | — | ||
| MiniMax | MiniMax-M2.1 | 229B (7B active) | moe | 197K | fp8 | 229.0 GB | vllm · sglang | — | ||
| MiniMax | MiniMax-M2 | 229B (7B active) | moe | 197K | fp8 | 229.0 GB | vllm · sglang | — | ||
| MiniMax | MiniMax | 456B (45.9B active) | moe | 1049K | fp8 | 456.0 GB | vllm · sglang | — | ||
| Mistral AI | Ministral | 8B | dense | 131K | bf16 | 16.0 GB | vllm · sglang · tgi+2 | 15.0 | ||
| NVIDIA | Nemotron | 4B | dense | 8K | bf16 | 8.0 GB | tensorrt-llm · vllm · sglang | 50.0 | ||
| NVIDIA | Nemotron | 8B | dense | 8K | bf16 | 16.0 GB | tensorrt-llm · vllm · sglang | 62.0 | ||
| Mistral AI | Mistral | 7.3B | dense | 33K | bf16 | 14.6 GB | vllm · sglang · tgi+2 | 56.0 | ||
| Mistral AI | Mistral Large | 123B | dense | 131K | bf16 | 246.0 GB | vllm · sglang · tgi+1 | 75.0 | ||
| Mistral AI | Mistral Large | 123B | dense | 131K | bf16 | 246.0 GB | vllm · sglang · tgi+1 | 75.0 | ||
| Mistral AI | Mistral | 70B | dense | 131K | bf16 | 140.0 GB | vllm · sglang · tgi+1 | 80.0 | ||
| Mistral AI | Mistral Nemo | 12B | dense | 131K | bf16 | 24.0 GB | vllm · sglang · tgi+2 | 62.0 | ||
| Mistral AI | Mistral Small | 24B | dense | 33K | bf16 | 48.0 GB | vllm · sglang · tgi+2 | 68.0 | ||
| Mistral AI | Mistral Small | 24B | dense | 131K | bf16 | 48.0 GB | vllm · sglang · tgi+2 | — | ||
| Mistral AI | Mixtral | 141B (39B active) | moe | 66K | bf16 | 282.0 GB | vllm · sglang · tgi+1 | 65.0 | ||
| Mistral AI | Mixtral | 46.7B (12.9B active) | moe | 33K | bf16 | 93.4 GB | vllm · sglang · tgi+2 | 67.0 | ||
| Mistral AI | Mixtral | 46.7B (12.9B active) | moe | 33K | bf16 | 93.4 GB | vllm · sglang · tgi+2 | 69.0 | ||
| intfloat | intfloat | 10.6B | dense | 131K | bf16 | 21.2 GB | vllm | — | ||
| Allen AI | Molmo | 72B | dense | 8K | bf16 | 144.0 GB | vllm · sglang | 78.0 | ||
| Vikhyat | Moondream | 1.86B | dense | 2K | bf16 | 3.7 GB | ollama · vllm | — | ||
| MosaicML | MPT | 30B | dense | 8K | bf16 | 60.0 GB | vllm · tgi | 48.0 | ||
| MosaicML | MPT | 6.7B | dense | 66K | bf16 | 13.4 GB | vllm · tgi · ollama | 36.0 | ||
| Microsoft | E5 | 560M | dense | 512 | bf16 | 1.1 GB | vllm · tgi | — | ||
| intfloat | intfloat | 600M | dense | 514 | bf16 | — | 1.2 GB | vllm | — | |
| Rinna | Nekomata | 14B | dense | 4K | bf16 | 28.0 GB | vllm · tgi | — | ||
| NVIDIA | Nemotron | 15B | dense | 4K | bf16 | 30.0 GB | vllm · sglang · tensorrt-llm | 72.0 | ||
| NVIDIA | Nemotron | 340B | dense | 131K | bf16 | 680.0 GB | tensorrt-llm · vllm · sglang | 85.0 | ||
| NVIDIA | Nemotron | 70.6B | dense | 131K | bf16 | 141.2 GB | vllm · sglang · tensorrt-llm | 83.0 | ||
| NVIDIA | Nemotron | 4B | dense | 8K | bf16 | 8.0 GB | tensorrt-llm · vllm · sglang | 48.0 | ||
| NVIDIA | Nemotron | 253B | dense | 131K | bf16 | 506.0 GB | vllm · tensorrt-llm | 86.0 | ||
| NVIDIA | Nemotron | 120B | dense | 131K | bf16 | 240.0 GB | vllm · sglang · tensorrt-llm | 84.0 | ||
| Nomic AI | Nomic Embed | 137M | dense | 8K | bf16 | — | 0.3 GB | vllm · tgi · ollama | — | |
| NVIDIA | NV Embed | 7.85B | dense | 33K | bf16 | 15.7 GB | vllm · sglang · tgi+1 | — | ||
| NVIDIA | NV EmbedQA | 330M | dense | 512 | bf16 | 0.7 GB | tensorrt-llm · vllm | — | ||
| NVIDIA | NV EmbedQA | 7.24B | dense | 33K | bf16 | 14.5 GB | tensorrt-llm · vllm · sglang | — | ||
| NVIDIA | NV Retriever | 330M | dense | 512 | bf16 | 0.7 GB | tensorrt-llm · vllm | — | ||
| NVIDIA | NVLM | 72B | dense | 33K | bf16 | 144.0 GB | vllm · tensorrt-llm | 79.0 | ||
| OpenAI | o1 | 200B (50B active) | moe | 200K | bf16 | 400.0 GB | — | 93.0 | ||
| OpenAI | o1 | 70B | dense | 128K | bf16 | 140.0 GB | — | 83.0 | ||
| OpenAI | o3 | 70B | dense | 200K | bf16 | 140.0 GB | — | 86.0 | ||
| BigCode | OctoCoder | 15.5B | dense | 8K | bf16 | 31.0 GB | vllm · sglang · tgi | — | ||
| Allen AI | OLMo 2 | 13B | dense | 4K | bf16 | 26.0 GB | vllm · sglang · tgi+1 | — | ||
| Allen AI | OLMo 2 | 7B | dense | 4K | bf16 | 14.0 GB | vllm · sglang · tgi+1 | — | ||
| Apple | OpenELM | 3B | dense | 2K | bf16 | — | 6.0 GB | vllm · sglang · ollama | — | |
| Teknium | OpenHermes | 7B | dense | 33K | bf16 | 14.0 GB | vllm · sglang · tgi+1 | — | ||
| Microsoft | Orca | 13B | dense | 4K | bf16 | 26.0 GB | vllm · sglang · tgi+2 | — | ||
| PaLI-Gemma | 2.9B | dense | 8K | bf16 | 5.8 GB | vllm · tgi | — | |||
| NVIDIA | Parakeet | 600M | dense | 4K | bf16 | — | 1.2 GB | tensorrt-llm · vllm | — | |
| NVIDIA | Parakeet | 1.1B | dense | 4K | bf16 | — | 2.2 GB | tensorrt-llm · vllm | — | |
| Microsoft | Phi | 1.3B | dense | 2K | bf16 | 2.6 GB | vllm · tgi · ollama | 38.0 | ||
| Microsoft | Phi | 1.3B | dense | 2K | bf16 | 2.6 GB | vllm · sglang · tgi+1 | 38.0 | ||
| Microsoft | Phi | 2.7B | dense | 2K | bf16 | 5.4 GB | vllm · sglang · tgi+2 | — | ||
| Microsoft | Phi 3 | 14B | dense | 131K | bf16 | 28.0 GB | vllm · sglang · tgi+2 | 76.0 | ||
| Microsoft | Phi 3 | 3.8B | dense | 131K | bf16 | 7.6 GB | vllm · sglang · tgi+2 | 64.0 | ||
| Microsoft | Phi 3 | 7B | dense | 131K | bf16 | 14.0 GB | vllm · sglang · tgi+2 | 72.0 | ||
| Microsoft | Phi | 41.9B (6.6B active) | moe | 131K | bf16 | 83.8 GB | vllm · sglang · tgi+1 | 74.0 | ||
| Microsoft | Phi 3.5 | 4.2B | dense | 131K | bf16 | 8.4 GB | vllm · sglang · tgi+2 | — | ||
| Microsoft | Phi | 3.8B | dense | 131K | bf16 | 7.6 GB | vllm · sglang · tgi+1 | 70.0 | ||
| Microsoft | Phi | 14.7B | dense | 16K | bf16 | 29.4 GB | vllm · sglang · tgi+2 | 73.0 | ||
| Mistral AI | Pixtral | 12B | dense | 131K | bf16 | 24.0 GB | vllm · sglang · tgi+1 | — | ||
| KAIST | Prometheus | 7.24B | dense | 8K | bf16 | 14.5 GB | vllm · sglang · tgi | — | ||
| Alibaba | Qwen 1.5 | 14.3B (2.7B active) | moe | 33K | bf16 | 28.6 GB | vllm · sglang · tgi | — | ||
| Alibaba | Qwen 2 | 7.6B | dense | 33K | bf16 | 15.2 GB | vllm · sglang · tgi | — | ||
| Alibaba | Qwen 2 VL | 2.2B | dense | 33K | bf16 | 4.4 GB | vllm · sglang · tgi | — | ||
| Alibaba | Qwen 2.5 | 500M | dense | 33K | bf16 | 1.0 GB | vllm · sglang · tgi+1 | — | ||
| Alibaba | Qwen 2.5 | 1.5B | dense | 33K | bf16 | 3.0 GB | vllm · sglang · tgi+1 | — | ||
| Alibaba | Qwen 2.5 | 14.8B | dense | 131K | bf16 | 29.6 GB | vllm · sglang · tgi+1 | 76.0 | ||
| Alibaba | Qwen 2.5 | 32.5B | dense | 131K | bf16 | 65.0 GB | vllm · sglang · tgi+1 | 73.0 | ||
| Alibaba | Qwen 2.5 | 3.09B | dense | 33K | bf16 | 6.2 GB | vllm · sglang · tgi+1 | 58.0 | ||
| Alibaba | Qwen 2.5 | 72.7B | dense | 131K | bf16 | 145.4 GB | vllm · sglang · tgi+1 | 77.0 | ||
| Alibaba | Qwen 2.5 | 7.6B | dense | 131K | bf16 | 15.2 GB | vllm · sglang · tgi+2 | 70.0 | ||
| Alibaba | Qwen 2.5 Coder | 1.5B | dense | 33K | bf16 | 3.0 GB | vllm · sglang · tgi+1 | 40.0 | ||
| Alibaba | Qwen 2.5 Coder | 14.7B | dense | 131K | bf16 | 29.4 GB | vllm · sglang · tgi+2 | — | ||
| Alibaba | Qwen 2.5 | 32.5B | dense | 131K | bf16 | 65.0 GB | vllm · sglang · tgi+1 | 80.0 | ||
| Alibaba | Qwen 2.5 Coder | 32.5B | dense | 131K | bf16 | 65.0 GB | vllm · sglang · tgi+2 | — | ||
| Alibaba | Qwen 2.5 Coder | 3.1B | dense | 33K | bf16 | 6.2 GB | vllm · sglang · tgi+1 | 50.0 | ||
| Alibaba | Qwen 2.5 Coder | 7.6B | dense | 131K | bf16 | 15.2 GB | vllm · sglang · tgi+2 | — | ||
| Alibaba | Qwen 2.5 Math | 72.7B | dense | 4K | bf16 | 145.4 GB | vllm · sglang · tgi+1 | — | ||
| Alibaba | Qwen 2.5 Math | 7.6B | dense | 4K | bf16 | 15.2 GB | vllm · sglang · tgi+2 | — | ||
| Alibaba | Qwen 2.5 VL | 72.7B | dense | 131K | bf16 | 145.4 GB | vllm · sglang · tgi+1 | — | ||
| Alibaba | Qwen 2.5 VL | 7.6B | dense | 131K | bf16 | 15.2 GB | vllm · sglang · tgi+2 | — | ||
| Alibaba | Qwen 3 | 600M | dense | 131K | bf16 | 1.2 GB | vllm · sglang · tgi+2 | — | ||
| Alibaba | Qwen 3 | 1.7B | dense | 131K | bf16 | 3.4 GB | vllm · sglang · tgi+2 | — | ||
| Alibaba | Qwen 3 | 235B (22B active) | moe | 131K | bf16 | 470.0 GB | vllm · sglang · tensorrt-llm | 83.0 | ||
| Alibaba | Qwen 3 | 30.5B (3.3B active) | moe | 131K | bf16 | 61.0 GB | vllm · sglang · tgi+1 | 70.0 | ||
| Alibaba | Qwen 3 | 32.8B | dense | 131K | bf16 | 65.6 GB | vllm · sglang · tgi+2 | 74.0 | ||
| Alibaba | Qwen 3 | 4B | dense | 131K | bf16 | 8.0 GB | vllm · sglang · tgi+2 | 57.0 | ||
| Alibaba | Qwen 3 | 8.2B | dense | 131K | bf16 | 16.4 GB | vllm · sglang · tgi+2 | 70.0 | ||
| Alibaba | Qwen 3 Coder | 8.2B | dense | 131K | bf16 | 16.4 GB | vllm · sglang · ollama | 74.0 | ||
| Qwen | Qwen3 | 235B (22B active) | moe | 262K | bf16 | 470.0 GB | vllm · sglang | — | ||
| RecurrentGemma | 2.7B | dense | 8K | bf16 | 5.4 GB | vllm · sglang | — | |||
| Reka AI | Reka | 70B | dense | 128K | bf16 | 140.0 GB | — | 76.0 | ||
| Replit | Replit Code | 3.3B | dense | 4K | bf16 | 6.6 GB | vllm · tgi | — | ||
| RWKV Foundation | RWKV | 14.1B | hybrid | 33K | bf16 | 28.2 GB | vllm | — | ||
| BigCode | SantaCoder | 1.1B | dense | 2K | bf16 | 2.2 GB | vllm · tgi | — | ||
| Equall.ai | SaulLM | 7.2B | dense | 8K | bf16 | — | 14.4 GB | vllm · sglang · tgi+1 | — | |
| Tsinghua | SciGLM | 6.2B | dense | 8K | bf16 | 12.4 GB | vllm · tgi | — | ||
| Meta | SeamlessM4T | 2.3B | dense | 4K | bf16 | 4.6 GB | vllm | — | ||
| Hugging Face | SmolLM | 135M | dense | 2K | bf16 | — | 0.3 GB | vllm · tgi · ollama | — | |
| Hugging Face | SmolLM | 360M | dense | 2K | bf16 | — | 0.7 GB | vllm · tgi · ollama | — | |
| Hugging Face | SmolLM2 | 1.7B | dense | 8K | bf16 | 3.4 GB | vllm · sglang · tgi+1 | — | ||
| Snowflake | Arctic | 395B (17B active) | moe | 4K | bf16 | 790.0 GB | vllm · sglang | — | ||
| Snowflake | Arctic | 480B (17B active) | moe | 4K | bf16 | 960.0 GB | vllm · sglang | — | ||
| Upstage | SOLAR | 10.7B | dense | 4K | bf16 | 21.4 GB | vllm · sglang · tgi+1 | — | ||
| Upstage | Solar | 22B | dense | 4K | bf16 | 44.0 GB | vllm · sglang · tgi+2 | 15.0 | ||
| Stability AI | Stable Diffusion | 3.5B | dense | 77 | bf16 | 7.0 GB | — | — | ||
| Stability AI | StableLM 2 | 12.1B | dense | 4K | bf16 | 24.2 GB | vllm · sglang · tgi+1 | — | ||
| Stability AI | StableLM | 3B | dense | 4K | bf16 | 6.0 GB | vllm · sglang · tgi+1 | — | ||
| BigCode | StarCoder2 | 15.5B | dense | 16K | bf16 | 31.0 GB | vllm · sglang · tgi+1 | 42.0 | ||
| BigCode | StarCoder2 | 3.03B | dense | 16K | bf16 | 6.1 GB | vllm · sglang · tgi+1 | 29.0 | ||
| BigCode | StarCoder2 | 6.73B | dense | 16K | bf16 | 13.5 GB | vllm · sglang · tgi+2 | 35.0 | ||
| TinyLlama | TinyLlama | 1.1B | dense | 2K | bf16 | — | 2.2 GB | vllm · sglang · tgi+1 | — | |
| TinyLlama | TinyLlama | 1.1B | dense | 2K | bf16 | — | 2.2 GB | vllm · sglang · tgi+2 | — | |
| LMSYS | Vicuna | 13B | dense | 4K | bf16 | 26.0 GB | vllm · sglang · tgi+1 | — | ||
| LMSYS | Vicuna | 33B | dense | 2K | bf16 | 66.0 GB | vllm · sglang · tgi+1 | — | ||
| LMSYS | Vicuna | 7B | dense | 4K | bf16 | 14.0 GB | vllm · sglang · tgi+1 | — | ||
| NVIDIA | VILA | 13B | dense | 4K | bf16 | 26.0 GB | tensorrt-llm · vllm · sglang | 62.0 | ||
| NVIDIA | VILA | 3B | dense | 4K | bf16 | 6.0 GB | tensorrt-llm · vllm · sglang | 44.0 | ||
| NVIDIA | VILA | 40B | dense | 8K | bf16 | 80.0 GB | tensorrt-llm · vllm · sglang | 73.0 | ||
| OpenAI | Whisper | 74M | dense | 448 | bf16 | 0.1 GB | vllm · tensorrt-llm | — | ||
| OpenAI | Whisper | 1.55B | dense | 448 | bf16 | 3.1 GB | vllm · tensorrt-llm | — | ||
| OpenAI | Whisper | 769M | dense | 448 | bf16 | 1.5 GB | vllm · tensorrt-llm | — | ||
| OpenAI | Whisper | 244M | dense | 448 | bf16 | 0.5 GB | vllm · tensorrt-llm | — | ||
| WizardLM | WizardCoder | 33B | dense | 16K | bf16 | 66.0 GB | vllm · sglang · tgi+1 | — | ||
| Microsoft | WizardMath | 70B | dense | 4K | bf16 | 140.0 GB | vllm · sglang · tgi+1 | — | ||
| Yandex | YaLM | 100B | dense | 2K | bf16 | 200.0 GB | vllm · tgi | — | ||
| 01.AI | Yi 1.5 | 34.4B | dense | 200K | bf16 | 68.8 GB | vllm · sglang · tgi+1 | 72.0 | ||
| 01.AI | Yi 1.5 | 8.83B | dense | 4K | bf16 | 17.7 GB | vllm · sglang · tgi+2 | 62.0 | ||
| 01.AI | Yi | 6B | dense | 200K | bf16 | 12.0 GB | vllm · sglang · tgi+1 | — | ||
| 01.AI | Yi Coder | 8.8B | dense | 131K | bf16 | 17.6 GB | vllm · sglang · tgi+2 | — | ||
| 01.AI | Yi | 102.6B (24B active) | moe | 33K | bf16 | 205.2 GB | vllm · sglang | 74.0 | ||
| 01.AI | Yi | 200B (22B active) | moe | 16K | bf16 | 400.0 GB | vllm · sglang | — | ||
| Hugging Face | Zephyr | 7B | dense | 33K | bf16 | 14.0 GB | vllm · sglang · tgi+2 | — |
Showing 302 of 302 models