Live Data • Updated Daily

The Definitive AI
Performance Index

Compare Large Language Models across reasoning, coding, and utility. Filter by cost, capability, and safety metrics to find the perfect model for your stack.

Models Tracked

Top Performer

GPT-4o

Best Value

Claude 3

Avg Cost

$4.20

Model	Total Score	Pass Rate	Refine	Fail	Refusal	$ / mTok	Reason	STEM	Utility	Code	Censor

Showing all models

Frequently Asked Questions