LLM Benchmark Table

Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor
Model A 95 80 10 3 2 $0.02 Yes 85 90 80 Low
Model B 90 70 15 4 1 $0.03 No 80 85 75 Medium
Model C 88 75 8 4 1 $0.025 Yes 82 88 77 High

Frequently Asked Questions

What is the LLM Benchmark Table?

The LLM Benchmark Table is a comparison of AI language models' performance across various metrics such as accuracy, efficiency, and compliance.

How are the models evaluated?

Models are evaluated based on criteria such as total performance score, pass rates, refinement ability, failure rates, and other specialized metrics.

What does "$ mToK" stand for?

"$ mToK" represents the monetary cost per thousand tokens processed by the model.

How can I interpret the 'Censor' metric?

The 'Censor' metric indicates the level at which the model censors content, ranging from 'Low' to 'High'.