Toggle Dark Mode
LLM Benchmark Table
Comprehensive AI Model Performance Comparison
Model
TOTAL
Pass
Refine
Fail
Refusal
$ mToK
Reason
STEM
Utility
Code
Censor
GPT-4
92.5
85.3
7.2
4.5
3
30
High
95
90
88
Medium
Frequently Asked Questions
What is this benchmark table?
This table compares various Large Language Models (LLMs) across multiple performance metrics.