LLM Benchmark Table

Comprehensive AI Model Performance Comparison

Model	TOTAL	Pass	Refine	Fail	Refusal	$ mToK	Reason	STEM	Utility	Code	Censor
GPT-4	92.5	85.3	7.2	4.5	3	30	High	95	90	88	Medium

Frequently Asked Questions

What is this benchmark table?

This table compares various Large Language Models (LLMs) across multiple performance metrics.