Live Data • Updated Daily

The Definitive AI
Performance Index

Compare Large Language Models across reasoning, coding, and utility. Filter by cost, capability, and safety metrics to find the perfect model for your stack.

Models Tracked
24
Top Performer
GPT-4o
Best Value
Claude 3
Avg Cost
$4.20
Model Total Score Pass Rate Refine Fail Refusal $ / mTok Reason STEM Utility Code Censor
Showing all models

Frequently Asked Questions