LLM Benchmark Table

Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor

FAQ

Q: What is the purpose of this benchmark?
A: This benchmark is designed to compare the performance of different AI models.
Q: How is the data collected?
A: The data is collected through a series of tests and evaluations.