LLM Benchmark Table

Model	TOTAL	Pass	Refine	Fail	Refusal	$ mToK	Reason	STEM	Utility	Code	Censor
Model A	100	80	10	5	5	2.5	Reason 1	High	Medium	Low	Yes
Model B	100	70	15	10	5	3.0	Reason 2	Medium	High	Medium	No

What is the purpose of this benchmark?

This benchmark compares the performance of various AI models based on different criteria.

Scores are based on testing the models in various tasks and recording their performance metrics.

Yes! You can submit your own model results for consideration.