LLM Benchmark Table

Model	TOTAL	Pass	Refine	Fail	Refusal	$ mToK	Reason	STEM	Utility	Code	Censor
Model A	100	80	10	5	5	50	90%	85%	92%	88%	5%
Model B	120	90	15	10	5	60	88%	80%	90%	85%	3%

FAQ

What does this table represent?

This table compares the performance of various AI models across different metrics such as total scores, pass rates, refinement, failure rates, and more.

How is the data collected?

Data is collected through standardized benchmarks and tests designed to evaluate AI models on various tasks, including reasoning, STEM, utility, coding, and censorship.