LLM Benchmark Table

Benchmark Results

Toggle Columns

# Rank by total score	Model Model name and provider	Total Overall benchmark score (0–100)	Pass Tasks completed successfully (%)	Refine Tasks completed after refinement (%)	Fail Tasks failed (%)	Refusal Rate of task refusals (%)	$ /MToK Cost per million output tokens (USD)	Reason Logical reasoning score (0–100)	STEM Science, Technology, Engineering & Math (0–100)	Utility Everyday usefulness score (0–100)	Code Coding ability score (0–100)	Censor Censorship restrictiveness (Low/Med/High)	★

Visual Comparison

Top 8 — Total Score

Top 8 — Code Score

Top 8 — Reasoning

Best Value (Score/Price)

Frequently Asked Questions

AI Model PerformanceComparison