LLM Benchmark Table
Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor
Model A 100 80 10 5 5 50 90% 85% 92% 88% 5%
Model B 120 90 15 10 5 60 88% 80% 90% 85% 3%

FAQ

What does this table represent?
This table compares the performance of various AI models across different metrics such as total scores, pass rates, refinement, failure rates, and more.
How is the data collected?
Data is collected through standardized benchmarks and tests designed to evaluate AI models on various tasks, including reasoning, STEM, utility, coding, and censorship.