LLM Benchmark Table

Comprehensive AI Model Performance Comparison

Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor
GPT-4 92.5 85.3 7.2 4.5 3 30 High 95 90 88 Medium

Frequently Asked Questions

What is this benchmark table?
This table compares various Large Language Models (LLMs) across multiple performance metrics.