badcase of Top Model

gpt-4o
gemini-1.5-pro
qwen-max
ERNIE-4.0
ERNIE-4.0-Turbo
xunfei-4.0Ultra
Baichuan4
GLM4
GLM-4-Plus
yi-large
moonshot-v1-8k
Doubao-pro-32k
deepseek-chat-v2
Llama-3.1-70B-Instruct
qwen2.5-72b-instruct

  classification badcase

  ▶ See all badcases

  info extract badcase

  ▶ See all badcases

  reading comprehension badcase

  ▶ See all badcases

  tableQA badcase

  ▶ See all badcases

  text2SQL badcase

  ▶ See all badcases

  arithmetic badcase

  ▶ See all badcases

  GSM8K badcase

  ▶ See all badcases

  BBH badcase

  ▶ See all badcases

  IFEval-zh (Instruction following) badcase

  ▶ See all badcases