Model | TOTAL | Pass | Refine | Fail | Refusal | $ mToK | Reason | STEM | Utility | Code | Censor |
---|---|---|---|---|---|---|---|---|---|---|---|
Model A | 100 | 85 | 10 | 5 | 0 | 2.5 | Performance | 90 | 80 | 70 | 0 |
LLM Benchmark Table is a tool to compare the performance of various AI models.
You can sort and filter the data to compare different AI models based on various metrics.
$ mToK stands for the cost in dollars per million tokens.