Model | TOTAL | Pass | Refine | Fail | Refusal | $ mToK | Reason | STEM | Utility | Code | Censor |
---|---|---|---|---|---|---|---|---|---|---|---|
Model A | 95 | 80 | 10 | 3 | 2 | $0.02 | Yes | 85 | 90 | 80 | Low |
Model B | 90 | 70 | 15 | 4 | 1 | $0.03 | No | 80 | 85 | 75 | Medium |
Model C | 88 | 75 | 8 | 4 | 1 | $0.025 | Yes | 82 | 88 | 77 | High |
What is the LLM Benchmark Table?
The LLM Benchmark Table is a comparison of AI language models' performance across various metrics such as accuracy, efficiency, and compliance.
How are the models evaluated?
Models are evaluated based on criteria such as total performance score, pass rates, refinement ability, failure rates, and other specialized metrics.
What does "$ mToK" stand for?
"$ mToK" represents the monetary cost per thousand tokens processed by the model.
How can I interpret the 'Censor' metric?
The 'Censor' metric indicates the level at which the model censors content, ranging from 'Low' to 'High'.