Benchmark Results
Toggle Columns
Provider:
Sort by:
| # Rank by total score | Model Model name and provider | Total Overall benchmark score (0–100) | Pass Tasks completed successfully (%) | Refine Tasks completed after refinement (%) | Fail Tasks failed (%) | Refusal Rate of task refusals (%) | $ /MToK Cost per million output tokens (USD) | Reason Logical reasoning score (0–100) | STEM Science, Technology, Engineering & Math (0–100) | Utility Everyday usefulness score (0–100) | Code Coding ability score (0–100) | Censor Censorship restrictiveness (Low/Med/High) | ★ |
|---|
Visual Comparison
Top 8 — Total Score
Top 8 — Code Score
Top 8 — Reasoning
Best Value (Score/Price)
Frequently Asked Questions