LLM Benchmark Table

Toggle Dark Mode
Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor
GPT-4 95 90 5 3 2 0.03 92 88 94 96 Low

Frequently Asked Questions

What does the TOTAL score represent?
The TOTAL score is a weighted average of all performance metrics, providing a comprehensive evaluation of the model's capabilities.
How is the $ mToK metric calculated?
The $ mToK (Million Tokens per Dollar) metric represents the cost-efficiency of the model in processing tokens.