▌badcase of Top Model

chatgpt-4o-latest

o1-mini

o3-mini

gemini-2.0-pro

gemini-2.0-flash-thinking

Claude-3.5-Sonnet

qwen2.5-max

ERNIE-4.0

ERNIE-4.0-Turbo

xunfei-4.0Ultra

MiniMax-Text-01

Baichuan4

GLM-4-Plus

GLM-Zero-Preview

kimi-latest

Doubao-1.5-pro

deepseek-chat-v3

DeepSeek-R1

Llama-3.3-70B-Instruct

qwen2.5-72b-instruct

▌badcase of Benchmarks

Reasoning & Math

Language & Instruction Following

Education

Law & Civil Service

Medical

BBH subtasks