ALL EVALUATED HOSTS· [46] MODELS
EVALUATED HOSTS · [46] MODELS
Anthropic
Anthropic
Claude Opus 4.6
Claude Opus 4.6
13.2
/20
Anthropic
Claude Haiku 4.5
Claude Haiku 4.5
10.5
/20
Anthropic
Claude Sonnet 4
Claude Sonnet 4
11.1
/20
Anthropic
Claude 3.5 Sonnet
Claude 3.5 Sonnet
9.2
/20
Anthropic
Claude Sonnet 4.6
Claude Sonnet 4.6
14
/20
Anthropic
Claude Opus 4.5
Claude Opus 4.5
13.2
/20
Anthropic
Claude 3.7 Sonnet
Claude 3.7 Sonnet
10.8
/20
Anthropic
Claude Opus 4.7
Claude Opus 4.7
13.9
/20
Google
Google
Gemini 1.5 Pro
Gemini 1.5 Pro
8.4
/20
Google
Gemini 2.5 Pro
Gemini 2.5 Pro
11.9
/20
Google
Gemini 2.5 Flash
Gemini 2.5 Flash
10.6
/20
Google
Gemma 4
Gemma 4
14.4
/20
OPEN WEIGHTS
Google
Gemini 3.1 Pro
Gemini 3.1 Pro
16.1
/20
Google
Gemma 3
Gemma 3
11.1
/20
OPEN WEIGHTS
Google
Gemini 3 Pro
Gemini 3 Pro
14.7
/20
Google
Gemini 2.5 Deep Think
Gemini 2.5 Deep Think
14.4
/20
Google
Gemini 3 Flash
Gemini 3 Flash
12.7
/20
OpenAI
OpenAI
o3
o3
10.6
/20
OpenAI
GPT-4o
GPT-4o
10.5
/20
OpenAI
GPT-5
GPT-5
11.1
/20
OpenAI
o1
o1
8.8
/20
OpenAI
GPT-5.4
GPT-5.4
15.2
/20
OpenAI
GPT-5.2
GPT-5.2
14.9
/20
OpenAI
GPT-5.3 Codex
GPT-5.3 Codex
13.4
/20
OpenAI
GPT-5.1
GPT-5.1
12.7
/20
Cohere
Cohere
Command R+
Command R+
6.5
/20
COMMERCIAL USE RESTRICTED
xAI
xAI
Grok 3
Grok 3
10.2
/20
xAI
Grok 4
Grok 4
10.8
/20
xAI
Grok 4.20
Grok 4.20
13.9
/20
Meta
Meta
Llama 3.1 405B
Llama 3.1 405B
9
/20
OPEN WEIGHTS
Meta
Llama 4 Maverick
Llama 4 Maverick
10.4
/20
OPEN WEIGHTS
Meta
Llama 4 Scout
Llama 4 Scout
12.1
/20
OPEN WEIGHTS
Meta
Muse Spark
Muse Spark
14.4
/20
Mistral
Mistral
Mistral Large 2
Mistral Large 2
9.5
/20
COMMERCIAL USE RESTRICTED
Mistral
Mistral Small 4
Mistral Small 4
13.6
/20
OPEN WEIGHTS
Mistral
Mistral Large 3
Mistral Large 3
12.1
/20
OPEN WEIGHTS
DeepSeek
DeepSeek
DeepSeek R1
DeepSeek R1
9.4
/20
OPEN WEIGHTS
DeepSeek
DeepSeek V3.2
DeepSeek V3.2
14.2
/20
OPEN WEIGHTS
MiniMax
MiniMax
MiniMax M2.5
MiniMax M2.5
14.2
/20
Zhipu AI
Zhipu AI
GLM-5
GLM-5
13.9
/20
OPEN WEIGHTS
Zhipu AI
GLM-5.1
GLM-5.1
14.1
/20
OPEN WEIGHTS
Alibaba
Alibaba
Qwen 3.5
Qwen 3.5
14.1
/20
OPEN WEIGHTS
Alibaba
Qwen3 Max Instruct
Qwen3 Max Instruct
13.9
/20
OPEN WEIGHTS
Other
Other
Kimi K2.5
Kimi K2.5
14
/20
OPEN WEIGHTS
Other
Kimi K2
Kimi K2
12.5
/20
OPEN WEIGHTS