LLM Benchmark Table

Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor

Frequently Asked Questions

What is the purpose of this benchmark?

This benchmark aims to compare the performance of various AI models across different metrics.

How is the data updated?

The data is updated regularly through automated scripts.