LLM Benchmark Table
Model
TOTAL
Pass
Refine
Fail
Refusal
$ mToK
Reason
STEM
Utility
Code
Censor
FAQ
Q: What is the purpose of this benchmark?
A: This benchmark is designed to compare the performance of different AI models.
Q: How is the data collected?
A: The data is collected through a series of tests and evaluations.