Model | TOTAL | Pass | Refine | Fail | Refusal | $ mToK | Reason | STEM | Utility | Code | Censor |
---|---|---|---|---|---|---|---|---|---|---|---|
Model A | 100 | 50 | 20 | 15 | 15 | 0.05 | Good accuracy | 85% | 90% | 95% | Low |
Model B | 100 | 40 | 30 | 20 | 10 | 0.06 | High utility | 80% | 92% | 88% | Medium |
The LLM Benchmark Table is a tool for comparing the performance of various AI language models across multiple metrics.
The columns represent different metrics, such as total performance, pass rate, refinement rate, failure rate, refusal rate, and more.
You can interact with the table to compare models and analyze their performance based on your needs.