Live Benchmark Data

AI Model Performance
Comparison

Comprehensive benchmarks across reasoning, STEM, coding, utility and more. Compare the world's leading language models side by side.

0 Models Tested
Top Score
Avg Score
Lowest Price/MToK
Benchmark Results
Toggle Columns
Provider:
Sort by:
# Rank by total score Model Model name and provider Total Overall benchmark score (0–100) Pass Tasks completed successfully (%) Refine Tasks completed after refinement (%) Fail Tasks failed (%) Refusal Rate of task refusals (%) $ /MToK Cost per million output tokens (USD) Reason Logical reasoning score (0–100) STEM Science, Technology, Engineering & Math (0–100) Utility Everyday usefulness score (0–100) Code Coding ability score (0–100) Censor Censorship restrictiveness (Low/Med/High)
Visual Comparison
Top 8 — Total Score
Top 8 — Code Score
Top 8 — Reasoning
Best Value (Score/Price)
Frequently Asked Questions