The TOTAL score is a weighted average of Reasoning (30%), Coding (30%), STEM Knowledge (20%), and Utility (20%), penalized by Refusal rates.
This stands for Price per Million Output Tokens. It represents the API cost for generating text with that specific model provider.
Refine measures the model's ability to correct its own output when prompted with error messages or user feedback.
This is a demo table. In a real-world scenario, this data would be pulled from an API or database and updated weekly.