LLM Benchmark Table

Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor
GPT-496%92%4%2%2%$3098%95%97%93%5%
LLaMA-2 70B89%85%4%6%5%$590%85%87%84%7%
Claude-293%89%4%4%3%$1595%90%92%91%6%
Palm-288%84%4%8%4%$2089%87%88%85%5%

FAQ

What does TOTAL represent? TOTAL indicates overall benchmark score calculated from multiple tasks.
How is $ mToK calculated? "$ mToK" denotes the approximate cost per million tokens processed by the model.
What is the meaning of Refusal? Refusal tracks percentage of tasks the AI explicitly declined to attempt.