Model Performance Comparison

Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor
High Performance (≥80%)
Medium Performance (50-79%)
Low Performance (<50%)

Frequently Asked Questions