Skip to content

Browse All Models

Explore all 302 text-generation models in our catalog plus 8 specialized non-LLM models below. Filter by family, architecture, size, or capability. Click any model to see detailed specs, GPU requirements, and pricing. Prices, providers and release dates live on the leaderboard →

Specialized models

Non-text modalities — TTS, image-gen, vision-embedding, video-fx, protein. These don't fit the LLM economics tables below and have their own pricing shapes (per-image, per-audio-second, etc.).

NameProviderFamilyParamsArchContextPrecisionCapabilitiesVRAMFrameworksQuality
Sentence TransformersMiniLM23Mdense256bf160.0 GBtgi · ollama
NVIDIAAlpamayo10Bdense8Kbf16
20.0 GBvllm70.0
AmazonNova12Bdense300Kbf16
24.0 GBvllm35.0
AmazonNova50Bdense300Kbf16
100.0 GBvllm36.0
CohereAya35Bdense131Kbf16
70.0 GBvllm · sglang · tgi+1
CohereAya8Bdense8Kbf16
16.0 GBvllm · sglang · tgi+2
BaichuanBaichuan 213Bdense4Kbf16
26.0 GBvllm · sglang · tgi
BaichuanBaichuan 27Bdense4Kbf16
14.0 GBvllm · tgi
BAAIBGE110Mdense512bf160.2 GBvllm · tgi
BAAIBGE335Mdense512bf160.7 GBvllm · tgi · tensorrt-llm
BAAIBGE568Mdense8Kbf16
1.1 GBvllm · tgi · tensorrt-llm
BAAIBGE33Mdense512bf160.1 GBvllm · tgi
BioMistralBioMistral7.2Bdense33Kbf1614.4 GBvllm · sglang · tgi+1
CerebrasBTLM3Bdense8Kbf16
6.0 GBvllm · tgi
NVIDIACanary1Bdense4Kbf16
2.0 GBtensorrt-llm · vllm
CerebrasCerebras GPT13Bdense2Kbf1626.0 GBvllm · tgi
Tsinghua UniversityChatGLM36Bdense131Kbf16
12.0 GBvllm · sglang · tgi+1
Zhipu AIChatGLM9.4Bdense131Kbf16
18.8 GBvllm · sglang · tgi
AnthropicClaude175Bdense200Kbf16
350.0 GB80.0
AnthropicClaude70Bdense200Kbf16
140.0 GB78.0
AnthropicClaude20Bdense200Kbf16
40.0 GB67.0
AnthropicClaude175B (50B active)moe200Kbf16
350.0 GBvllm
AnthropicClaude30Bmoe200Kbf16
60.0 GBvllm
AnthropicClaude200Bdense200Kbf16
400.0 GB90.0
AnthropicClaude300B (75B active)moe200Kbf16
600.0 GBvllm90.0
AnthropicClaude400B (80B active)moe200Kbf16
800.0 GBvllm90.0
AnthropicClaude450B (85B active)moe1000Kbf16
900.0 GBvllm90.0
AnthropicClaude500B (90B active)moe1000Kbf16
1000.0 GBvllm90.0
AnthropicClaude70Bdense200Kbf16
140.0 GB86.0
AnthropicClaude150B (60B active)moe200Kbf16
300.0 GBvllm86.0
AnthropicClaude180B (70B active)moe1000Kbf16
360.0 GBvllm86.0
MetaCode Llama13Bdense16Kbf16
26.0 GBvllm · sglang · tgi+244.0
MetaCode Llama34Bdense100Kbf16
68.0 GBvllm · sglang · tgi+255.0
MetaCode Llama70Bdense16Kbf16
140.0 GBvllm · sglang · tgi+160.0
MetaCode Llama7Bdense16Kbf16
14.0 GBvllm · sglang · tgi+239.0
GoogleGemma8.5Bdense8Kbf16
17.0 GBvllm · sglang · tgi+152.0
SalesforceCodeGen216Bdense2Kbf16
32.0 GBvllm · tgi
Mistral AICodestral22Bdense33Kbf16
44.0 GBvllm · sglang · tgi+163.0
Mistral AICodestral7.3Bhybrid262Kbf16
14.6 GBvllm · sglang
THUDMCogVLM219Bdense8Kbf16
38.0 GBvllm · sglang · tgi
CohereEmbed500Mdense512bf161.0 GB
CohereCommand111Bdense256Kbf16
222.0 GB81.0
CohereCommand R35Bdense131Kbf16
70.0 GBvllm · sglang · tgi+168.0
CohereCommand R35Bdense128Kbf16
70.0 GBvllm · sglang · tgi68.0
CohereCommand R7Bdense131Kbf16
14.0 GBvllm · sglang · tgi+268.0
CohereCommand R104Bdense131Kbf16
208.0 GBvllm · sglang · tgi+168.0
NVIDIACosmos7Bdense4Kbf16
14.0 GBtensorrt-llm60.0
SesameCSM1Bdense4Kbf162.0 GBollama
OpenAIDALL-E3.5Bdense4Kbf16
7.0 GB
DatabricksDBRX132B (36B active)moe33Kbf16
264.0 GBvllm · sglang · tgi+1
DatabricksDBRX132B (36B active)moe33Kbf16
264.0 GBvllm · sglang · tgi+1
DeepSeekDeepSeek Coder33Bdense16Kbf16
66.0 GBvllm · sglang · tgi+1
DeepSeekDeepSeek Coder6.7Bdense16Kbf16
13.4 GBvllm · sglang · tgi+2
DeepSeekDeepSeek Coder V2236B (21B active)moe131Kbf16
472.0 GBvllm · sglang · tensorrt-llm
DeepSeekDeepSeek LLM67Bdense4Kbf16
134.0 GBvllm · sglang · tgi+166.0
DeepSeekDeepSeek Math7.24Bdense4Kbf16
14.5 GBvllm · sglang · tgi+2
DeepSeekDeepSeek MoE16.4B (2.8B active)moe4Kbf16
32.8 GBvllm · sglang · tgi
DeepSeekDeepSeek R1671B (37B active)moe131Kbf16
1342.0 GBvllm · sglang · tensorrt-llm88.0
DeepSeekDeepSeek R11.5Bdense131Kbf16
3.0 GBvllm · sglang · tgi+142.0
DeepSeekDeepSeek R114.8Bdense131Kbf16
29.6 GBvllm · sglang · tgi+288.0
DeepSeekDeepSeek R132.8Bdense131Kbf16
65.6 GBvllm · sglang · tgi+288.0
DeepSeekDeepSeek R170.6Bdense131Kbf16
141.2 GBvllm · sglang · tgi+188.0
DeepSeekDeepSeek R18Bdense131Kbf16
16.0 GBvllm · sglang · tgi+288.0
DeepSeekDeepSeek V215.7B (2.4B active)moe33Kbf16
31.4 GBvllm · sglang · tgi
DeepSeekDeepSeek V2236B (21B active)moe131Kbf16
472.0 GBvllm · sglang · tensorrt-llm78.0
DeepSeekDeepSeek V3671B (37B active)moe131Kbf16
1342.0 GBvllm · sglang · tensorrt-llm81.0
DeepSeekDeepSeek V3685B (37B active)moe131Kfp8
685.0 GBvllm · sglang · tensorrt-llm81.0
Cognitive ComputationsDolphin72Bdense33Kbf16
144.0 GBvllm · sglang · tgi+1
NVIDIAEagle1.3Bdense4Kbf16
2.6 GBvllm · ollama65.0
NVIDIAEagle9Bdense8Kbf16
18.0 GBvllm · tensorrt-llm65.0
NVIDIAEagle8Bdense16Kbf16
16.0 GBvllm · tensorrt-llm65.0
ELYZAELYZA13Bdense4Kbf16
26.0 GBvllm · tgi · ollama
TII UAEFalcon11Bdense8Kbf16
22.0 GBvllm · sglang · tgi
TIIFalcon180Bdense2Kbf16
360.0 GBvllm · sglang · tgi+160.0
TII UAEFalcon10.3Bdense33Kbf16
20.6 GBvllm · sglang · tgi+2
TII UAEFalcon1Bdense8Kbf16
2.0 GBvllm · sglang · tgi+2
TII UAEFalcon3Bdense8Kbf16
6.0 GBvllm · sglang · tgi+2
TII UAEFalcon7.5Bdense33Kbf16
15.0 GBvllm · sglang · tgi+2
TIIFalcon40Bdense2Kbf16
80.0 GBvllm · sglang · tgi+148.0
TIIFalcon7Bdense2Kbf16
14.0 GBvllm · sglang · tgi+237.0
TIIFalcon Mamba7.27Bhybrid8Kbf16
14.5 GBvllm · sglang
AI4FinanceFinGPT7.2Bdense4Kbf16
14.4 GBvllm · tgi · ollama
MicrosoftFlorence770Mdense2Kbf16
1.5 GBvllm
Black Forest LabsFLUX12Bdense512bf16
24.0 GB
Black Forest LabsFLUX12Bdense4Kbf16
24.0 GBvllm · tensorrt-llm
GoogleGemini50B (12B active)moe1049Kbf16
100.0 GB75.0
GoogleGemini175B (40B active)moe2097Kbf16
350.0 GB80.0
GoogleGemini50B (15B active)moe1049Kbf16
100.0 GB80.0
GoogleGemini600B (150B active)moe2000Kbf16
1200.0 GB88.0
Google DeepMindGemini600B (100B active)moe1000Kbf16
1200.0 GBvllm
GoogleGemma2.5Bdense8Kbf16
5.0 GBvllm · sglang · tgi+2
GoogleGemma 227Bdense8Kbf16
54.0 GBvllm · sglang · tgi+265.0
GoogleGemma 22.6Bdense8Kbf16
5.2 GBvllm · sglang · tgi+244.0
GoogleGemma 29.2Bdense8Kbf16
18.4 GBvllm · sglang · tgi+268.0
GoogleGemma 312Bdense131Kbf16
24.0 GBvllm · sglang · tgi+271.0
GoogleGemma 31Bdense33Kbf16
2.0 GBvllm · sglang · tgi+235.0
GoogleGemma 327Bdense131Kbf16
54.0 GBvllm · sglang · tgi+269.0
GoogleGemma 32Bdense8Kbf16
4.0 GBvllm · sglang · tgi+142.0
GoogleGemma 34.3Bdense131Kbf16
8.6 GBvllm · sglang · tgi+254.0
GoogleGemma 431Bdense33Kbf16
62.0 GBvllm · sglang · tgi+277.0
SberbankGigaChat20Bdense8Kbf16
40.0 GBvllm · tgi
Zhipu AIGLM-49.4Bdense131Kbf16
18.8 GBvllm · sglang · tgi+1
Zhipu AIGLM-5200Bdense128Kbf16
400.0 GBvllm · sglang · tgi51.0
OpenAIGPT-3.520Bdense16Kbf16
40.0 GB67.0
OpenAIGPT-4200B (50B active)moe128Kbf16
400.0 GB80.0
OpenAIGPT1500B (300B active)moe128Kbf16
3000.0 GB93.0
OpenAIGPT-4200B (50B active)moe128Kbf16
400.0 GB85.0
OpenAIGPT-48Bdense128Kbf16
16.0 GB72.0
OpenAIGPT500B (90B active)moe400Kbf16
1000.0 GBvllm
OpenAIGPT80B (25B active)moe400Kbf16
160.0 GBvllm
OpenAIGPT8B (4B active)moe400Kbf16
16.0 GBvllm
OpenAIGPT700B (110B active)moe1000Kbf16
1400.0 GBvllm
xAIGrok600B (120B active)moe131Kbf16
1200.0 GB90.0
xAIGrok400B (80B active)moe256Kbf16
800.0 GBvllm
xAIGrok500B (90B active)moe1000Kbf16
1000.0 GBvllm
xAIGrok314B (50B active)moe131Kbf16
628.0 GBvllm78.0
xAIGrok314Bdense131Kbf16
628.0 GBvllm91.0
xAIGrok33Bdense131Kbf16
66.0 GBvllm78.0
AlibabaGTE7.6Bdense33Kbf16
15.2 GBvllm · sglang · tgi+1
H2O.aiH2O Danube500Mdense8Kbf161.0 GBvllm · sglang · tgi+1
NVIDIALlama 3.170.6Bdense131Kbf16
141.2 GBtensorrt-llm · vllm · sglang82.0
Nous ResearchHermes 370.6Bdense131Kbf16
141.2 GBvllm · sglang · tgi+1
Nous ResearchHermes 38.03Bdense131Kbf16
16.1 GBvllm · sglang · tgi+2
Inflection AIInflection100Bdense8Kbf16
200.0 GB74.0
Microsoft SAILInfoXLM550Mdense512bf16
1.1 GBtgi
Shanghai AI LabInternLM 2.519.9Bdense262Kbf16
39.8 GBvllm · sglang · tgi
Shanghai AI LabInternLM 2.57.74Bdense1049Kbf16
15.5 GBvllm · sglang · tgi+1
SenseTimeInternLM20Bdense16Kbf16
40.0 GBvllm · tgi
Shanghai AI LabInternLM8Bdense33Kbf16
16.0 GBvllm · sglang · tgi+1
InternLMInternVL226Bdense33Kbf16
52.0 GBvllm · sglang · tgi
G42/InceptionJAIS30Bdense8Kbf16
60.0 GBvllm · tgi
AI21Jamba398Bhybrid256Kbf16
796.0 GBvllm · sglang
AI21Jamba52Bhybrid256Kbf16
104.0 GBvllm · sglang
AI21Jamba52B (12B active)moe256Kbf16
104.0 GB66.0
DeepSeekJanus7Bdense8Kbf16
14.0 GBvllm · ollama62.0
Stability AIStableLM70Bdense8Kbf16
140.0 GBvllm · sglang · tgi
Jina AIJina Embeddings570Mdense8Kbf16
1.1 GBtgi · tensorrt-llm
Moonshot AIKimi1000B (32B active)moe131Kfp8
1000.0 GBvllm · sglang54.0
HexagradKokoro82Mdense2Kbf160.2 GBollama
Korea UniversityKULLM12.8Bdense4Kbf16
25.6 GBvllm · tgi
MetaLlama 213Bdense4Kbf16
26.0 GBvllm · sglang · tgi+247.0
MetaLlama 270Bdense4Kbf16
140.0 GBvllm · sglang · tgi+162.0
MetaLlama 27Bdense4Kbf16
14.0 GBvllm · sglang · tgi+240.0
MetaLlama 370.6Bdense8Kbf16
141.2 GBvllm · sglang · tgi+180.0
GradientLlama 370.6Bdense1049Kbf16
141.2 GBvllm · sglang
MetaLlama 38Bdense8Kbf16
16.0 GBvllm · sglang · tgi+263.0
MetaLlama 3.1405Bdense131Kbf16
810.0 GBvllm · sglang · tgi+181.0
MetaLlama 3.170.6Bdense131Kbf16
141.2 GBvllm · sglang · tgi+175.0
Together AILlama 3.170.6Bdense131Kfp8
70.6 GBvllm · sglang · tensorrt-llm
MetaLlama 3.18.03Bdense131Kbf16
16.1 GBvllm · sglang · tgi+258.0
NVIDIALlama 3.151Bdense131Kbf16
102.0 GBtensorrt-llm · vllm · sglang78.0
NVIDIALlama 3.170.6Bdense131Kbf16
141.2 GBvllm · sglang · tgi+183.0
NVIDIALlama 3.170.6Bdense131Kbf16
141.2 GBtensorrt-llm · vllm · sglang80.0
MetaLlama 3.211Bdense131Kbf16
22.0 GBvllm · sglang · tgi+29.0
MetaLlama 3.21.24Bdense131Kbf16
2.5 GBvllm · sglang · tgi+238.0
MetaLlama 3.23.21Bdense131Kbf16
6.4 GBvllm · sglang · tgi+255.0
MetaLlama 3.290Bdense131Kbf16
180.0 GBvllm · sglang · tgi+184.0
MetaLlama 3.288.8Bdense131Kbf16
177.6 GBvllm · sglang · tgi+184.0
MetaLlama 3.370.6Bdense131Kbf16
141.2 GBvllm · sglang · tgi+277.0
MetaLlama 3.38Bdense131Kbf16
16.0 GBvllm · sglang · tgi+2
MetaLlama 42000B (400B active)moe1049Kbf16
4000.0 GB93.0
MetaLlama 4400B (17B active)moe1049Kbf16
800.0 GBvllm · sglang · tensorrt-llm84.0
MetaLlama 4109B (17B active)moe10486Kbf16
218.0 GBvllm · sglang · tensorrt-llm73.0
MetaLlama Guard1Bdense131Kbf16
2.0 GBvllm · sglang · tgi+1
MetaLlama Guard8Bdense131Kbf16
16.0 GBvllm · sglang · tgi+2
AlibabaMarco7.6Bdense66Kbf16
15.2 GBvllm · sglang · tgi+1
EPFLMeditron70Bdense4Kbf16140.0 GBvllm · sglang · tgi+1
NVIDIAMegatron-Turing530Bdense2Kbf16
1060.0 GBtensorrt-llm · vllm58.0
MiniMaxMiniMax M456B (45.9B active)moe1049Kbf16
912.0 GBvllm · sglang82.0
MiniMaxMiniMax-M2229B (7B active)moe197Kfp8
229.0 GBvllm · sglang
MiniMaxMiniMax-M2.1229B (7B active)moe197Kfp8
229.0 GBvllm · sglang
MiniMaxMiniMax-M2229B (7B active)moe197Kfp8
229.0 GBvllm · sglang
MiniMaxMiniMax456B (45.9B active)moe1049Kfp8
456.0 GBvllm · sglang
Mistral AIMinistral8Bdense131Kbf16
16.0 GBvllm · sglang · tgi+215.0
NVIDIANemotron4Bdense8Kbf16
8.0 GBtensorrt-llm · vllm · sglang50.0
NVIDIANemotron8Bdense8Kbf16
16.0 GBtensorrt-llm · vllm · sglang62.0
Mistral AIMistral7.3Bdense33Kbf16
14.6 GBvllm · sglang · tgi+256.0
Mistral AIMistral Large123Bdense131Kbf16
246.0 GBvllm · sglang · tgi+175.0
Mistral AIMistral Large123Bdense131Kbf16
246.0 GBvllm · sglang · tgi+175.0
Mistral AIMistral70Bdense131Kbf16
140.0 GBvllm · sglang · tgi+180.0
Mistral AIMistral Nemo12Bdense131Kbf16
24.0 GBvllm · sglang · tgi+262.0
Mistral AIMistral Small24Bdense33Kbf16
48.0 GBvllm · sglang · tgi+268.0
Mistral AIMistral Small24Bdense131Kbf16
48.0 GBvllm · sglang · tgi+2
Mistral AIMixtral141B (39B active)moe66Kbf16
282.0 GBvllm · sglang · tgi+165.0
Mistral AIMixtral46.7B (12.9B active)moe33Kbf16
93.4 GBvllm · sglang · tgi+267.0
Mistral AIMixtral46.7B (12.9B active)moe33Kbf16
93.4 GBvllm · sglang · tgi+269.0
intfloatintfloat10.6Bdense131Kbf16
21.2 GBvllm
Allen AIMolmo72Bdense8Kbf16
144.0 GBvllm · sglang78.0
VikhyatMoondream1.86Bdense2Kbf16
3.7 GBollama · vllm
MosaicMLMPT30Bdense8Kbf16
60.0 GBvllm · tgi48.0
MosaicMLMPT6.7Bdense66Kbf16
13.4 GBvllm · tgi · ollama36.0
MicrosoftE5560Mdense512bf16
1.1 GBvllm · tgi
intfloatintfloat600Mdense514bf161.2 GBvllm
RinnaNekomata14Bdense4Kbf16
28.0 GBvllm · tgi
NVIDIANemotron15Bdense4Kbf16
30.0 GBvllm · sglang · tensorrt-llm72.0
NVIDIANemotron340Bdense131Kbf16
680.0 GBtensorrt-llm · vllm · sglang85.0
NVIDIANemotron70.6Bdense131Kbf16
141.2 GBvllm · sglang · tensorrt-llm83.0
NVIDIANemotron4Bdense8Kbf16
8.0 GBtensorrt-llm · vllm · sglang48.0
NVIDIANemotron253Bdense131Kbf16
506.0 GBvllm · tensorrt-llm86.0
NVIDIANemotron120Bdense131Kbf16
240.0 GBvllm · sglang · tensorrt-llm84.0
Nomic AINomic Embed137Mdense8Kbf160.3 GBvllm · tgi · ollama
NVIDIANV Embed7.85Bdense33Kbf16
15.7 GBvllm · sglang · tgi+1
NVIDIANV EmbedQA330Mdense512bf16
0.7 GBtensorrt-llm · vllm
NVIDIANV EmbedQA7.24Bdense33Kbf16
14.5 GBtensorrt-llm · vllm · sglang
NVIDIANV Retriever330Mdense512bf16
0.7 GBtensorrt-llm · vllm
NVIDIANVLM72Bdense33Kbf16
144.0 GBvllm · tensorrt-llm79.0
OpenAIo1200B (50B active)moe200Kbf16
400.0 GB93.0
OpenAIo170Bdense128Kbf16
140.0 GB83.0
OpenAIo370Bdense200Kbf16
140.0 GB86.0
BigCodeOctoCoder15.5Bdense8Kbf16
31.0 GBvllm · sglang · tgi
Allen AIOLMo 213Bdense4Kbf16
26.0 GBvllm · sglang · tgi+1
Allen AIOLMo 27Bdense4Kbf16
14.0 GBvllm · sglang · tgi+1
AppleOpenELM3Bdense2Kbf166.0 GBvllm · sglang · ollama
TekniumOpenHermes7Bdense33Kbf16
14.0 GBvllm · sglang · tgi+1
MicrosoftOrca13Bdense4Kbf16
26.0 GBvllm · sglang · tgi+2
GooglePaLI-Gemma2.9Bdense8Kbf16
5.8 GBvllm · tgi
NVIDIAParakeet600Mdense4Kbf161.2 GBtensorrt-llm · vllm
NVIDIAParakeet1.1Bdense4Kbf162.2 GBtensorrt-llm · vllm
MicrosoftPhi1.3Bdense2Kbf16
2.6 GBvllm · tgi · ollama38.0
MicrosoftPhi1.3Bdense2Kbf16
2.6 GBvllm · sglang · tgi+138.0
MicrosoftPhi2.7Bdense2Kbf16
5.4 GBvllm · sglang · tgi+2
MicrosoftPhi 314Bdense131Kbf16
28.0 GBvllm · sglang · tgi+276.0
MicrosoftPhi 33.8Bdense131Kbf16
7.6 GBvllm · sglang · tgi+264.0
MicrosoftPhi 37Bdense131Kbf16
14.0 GBvllm · sglang · tgi+272.0
MicrosoftPhi41.9B (6.6B active)moe131Kbf16
83.8 GBvllm · sglang · tgi+174.0
MicrosoftPhi 3.54.2Bdense131Kbf16
8.4 GBvllm · sglang · tgi+2
MicrosoftPhi3.8Bdense131Kbf16
7.6 GBvllm · sglang · tgi+170.0
MicrosoftPhi14.7Bdense16Kbf16
29.4 GBvllm · sglang · tgi+273.0
Mistral AIPixtral12Bdense131Kbf16
24.0 GBvllm · sglang · tgi+1
KAISTPrometheus7.24Bdense8Kbf16
14.5 GBvllm · sglang · tgi
AlibabaQwen 1.514.3B (2.7B active)moe33Kbf16
28.6 GBvllm · sglang · tgi
AlibabaQwen 27.6Bdense33Kbf16
15.2 GBvllm · sglang · tgi
AlibabaQwen 2 VL2.2Bdense33Kbf16
4.4 GBvllm · sglang · tgi
AlibabaQwen 2.5500Mdense33Kbf16
1.0 GBvllm · sglang · tgi+1
AlibabaQwen 2.51.5Bdense33Kbf16
3.0 GBvllm · sglang · tgi+1
AlibabaQwen 2.514.8Bdense131Kbf16
29.6 GBvllm · sglang · tgi+176.0
AlibabaQwen 2.532.5Bdense131Kbf16
65.0 GBvllm · sglang · tgi+173.0
AlibabaQwen 2.53.09Bdense33Kbf16
6.2 GBvllm · sglang · tgi+158.0
AlibabaQwen 2.572.7Bdense131Kbf16
145.4 GBvllm · sglang · tgi+177.0
AlibabaQwen 2.57.6Bdense131Kbf16
15.2 GBvllm · sglang · tgi+270.0
AlibabaQwen 2.5 Coder1.5Bdense33Kbf16
3.0 GBvllm · sglang · tgi+140.0
AlibabaQwen 2.5 Coder14.7Bdense131Kbf16
29.4 GBvllm · sglang · tgi+2
AlibabaQwen 2.532.5Bdense131Kbf16
65.0 GBvllm · sglang · tgi+180.0
AlibabaQwen 2.5 Coder32.5Bdense131Kbf16
65.0 GBvllm · sglang · tgi+2
AlibabaQwen 2.5 Coder3.1Bdense33Kbf16
6.2 GBvllm · sglang · tgi+150.0
AlibabaQwen 2.5 Coder7.6Bdense131Kbf16
15.2 GBvllm · sglang · tgi+2
AlibabaQwen 2.5 Math72.7Bdense4Kbf16
145.4 GBvllm · sglang · tgi+1
AlibabaQwen 2.5 Math7.6Bdense4Kbf16
15.2 GBvllm · sglang · tgi+2
AlibabaQwen 2.5 VL72.7Bdense131Kbf16
145.4 GBvllm · sglang · tgi+1
AlibabaQwen 2.5 VL7.6Bdense131Kbf16
15.2 GBvllm · sglang · tgi+2
AlibabaQwen 3600Mdense131Kbf16
1.2 GBvllm · sglang · tgi+2
AlibabaQwen 31.7Bdense131Kbf16
3.4 GBvllm · sglang · tgi+2
AlibabaQwen 3235B (22B active)moe131Kbf16
470.0 GBvllm · sglang · tensorrt-llm83.0
AlibabaQwen 330.5B (3.3B active)moe131Kbf16
61.0 GBvllm · sglang · tgi+170.0
AlibabaQwen 332.8Bdense131Kbf16
65.6 GBvllm · sglang · tgi+274.0
AlibabaQwen 34Bdense131Kbf16
8.0 GBvllm · sglang · tgi+257.0
AlibabaQwen 38.2Bdense131Kbf16
16.4 GBvllm · sglang · tgi+270.0
AlibabaQwen 3 Coder8.2Bdense131Kbf16
16.4 GBvllm · sglang · ollama74.0
QwenQwen3235B (22B active)moe262Kbf16
470.0 GBvllm · sglang
GoogleRecurrentGemma2.7Bdense8Kbf16
5.4 GBvllm · sglang
Reka AIReka70Bdense128Kbf16
140.0 GB76.0
ReplitReplit Code3.3Bdense4Kbf16
6.6 GBvllm · tgi
RWKV FoundationRWKV14.1Bhybrid33Kbf16
28.2 GBvllm
BigCodeSantaCoder1.1Bdense2Kbf16
2.2 GBvllm · tgi
Equall.aiSaulLM7.2Bdense8Kbf1614.4 GBvllm · sglang · tgi+1
TsinghuaSciGLM6.2Bdense8Kbf16
12.4 GBvllm · tgi
MetaSeamlessM4T2.3Bdense4Kbf16
4.6 GBvllm
Hugging FaceSmolLM135Mdense2Kbf160.3 GBvllm · tgi · ollama
Hugging FaceSmolLM360Mdense2Kbf160.7 GBvllm · tgi · ollama
Hugging FaceSmolLM21.7Bdense8Kbf16
3.4 GBvllm · sglang · tgi+1
SnowflakeArctic395B (17B active)moe4Kbf16
790.0 GBvllm · sglang
SnowflakeArctic480B (17B active)moe4Kbf16
960.0 GBvllm · sglang
UpstageSOLAR10.7Bdense4Kbf16
21.4 GBvllm · sglang · tgi+1
UpstageSolar22Bdense4Kbf16
44.0 GBvllm · sglang · tgi+215.0
Stability AIStable Diffusion3.5Bdense77bf16
7.0 GB
Stability AIStableLM 212.1Bdense4Kbf16
24.2 GBvllm · sglang · tgi+1
Stability AIStableLM3Bdense4Kbf16
6.0 GBvllm · sglang · tgi+1
BigCodeStarCoder215.5Bdense16Kbf16
31.0 GBvllm · sglang · tgi+142.0
BigCodeStarCoder23.03Bdense16Kbf16
6.1 GBvllm · sglang · tgi+129.0
BigCodeStarCoder26.73Bdense16Kbf16
13.5 GBvllm · sglang · tgi+235.0
TinyLlamaTinyLlama1.1Bdense2Kbf162.2 GBvllm · sglang · tgi+1
TinyLlamaTinyLlama1.1Bdense2Kbf162.2 GBvllm · sglang · tgi+2
LMSYSVicuna13Bdense4Kbf16
26.0 GBvllm · sglang · tgi+1
LMSYSVicuna33Bdense2Kbf16
66.0 GBvllm · sglang · tgi+1
LMSYSVicuna7Bdense4Kbf16
14.0 GBvllm · sglang · tgi+1
NVIDIAVILA13Bdense4Kbf16
26.0 GBtensorrt-llm · vllm · sglang62.0
NVIDIAVILA3Bdense4Kbf16
6.0 GBtensorrt-llm · vllm · sglang44.0
NVIDIAVILA40Bdense8Kbf16
80.0 GBtensorrt-llm · vllm · sglang73.0
OpenAIWhisper74Mdense448bf16
0.1 GBvllm · tensorrt-llm
OpenAIWhisper1.55Bdense448bf16
3.1 GBvllm · tensorrt-llm
OpenAIWhisper769Mdense448bf16
1.5 GBvllm · tensorrt-llm
OpenAIWhisper244Mdense448bf16
0.5 GBvllm · tensorrt-llm
WizardLMWizardCoder33Bdense16Kbf16
66.0 GBvllm · sglang · tgi+1
MicrosoftWizardMath70Bdense4Kbf16
140.0 GBvllm · sglang · tgi+1
YandexYaLM100Bdense2Kbf16
200.0 GBvllm · tgi
01.AIYi 1.534.4Bdense200Kbf16
68.8 GBvllm · sglang · tgi+172.0
01.AIYi 1.58.83Bdense4Kbf16
17.7 GBvllm · sglang · tgi+262.0
01.AIYi6Bdense200Kbf16
12.0 GBvllm · sglang · tgi+1
01.AIYi Coder8.8Bdense131Kbf16
17.6 GBvllm · sglang · tgi+2
01.AIYi102.6B (24B active)moe33Kbf16
205.2 GBvllm · sglang74.0
01.AIYi200B (22B active)moe16Kbf16
400.0 GBvllm · sglang
Hugging FaceZephyr7Bdense33Kbf16
14.0 GBvllm · sglang · tgi+2

Showing 302 of 302 models