LLM Benchmark Table

Model	TOTAL	Pass	Refine	Fail	Refusal	$ mToK	Reason	STEM	Utility	Code	Censor
Model A	95	80	10	3	2	$0.02	Yes	85	90	80	Low
Model B	90	70	15	4	1	$0.03	No	80	85	75	Medium
Model C	88	75	8	4	1	$0.025	Yes	82	88	77	High

What is the LLM Benchmark Table?

The LLM Benchmark Table is a comparison of AI language models' performance across various metrics such as accuracy, efficiency, and compliance.

How are the models evaluated?

Models are evaluated based on criteria such as total performance score, pass rates, refinement ability, failure rates, and other specialized metrics.

What does "$ mToK" stand for?

"$ mToK" represents the monetary cost per thousand tokens processed by the model.

How can I interpret the 'Censor' metric?

The 'Censor' metric indicates the level at which the model censors content, ranging from 'Low' to 'High'.

Frequently Asked Questions