Model | TOTAL | Pass | Refine | Fail | Refusal | $ mToK | Reason | STEM | Utility | Code | Censor |
---|---|---|---|---|---|---|---|---|---|---|---|
GPT-4 | 100 | 90 | 5 | 3 | 2 | 1.2 | 0.80 | 0.75 | 0.85 | 0.78 | 0.55 |
GPT-4 Turbo | 100 | 88 | 6 | 4 | 2 | 1.1 | 0.78 | 0.72 | 0.82 | 0.75 | 0.50 |
GPT-3.5 Turbo | 100 | 75 | 10 | 10 | 5 | 0.9 | 0.65 | 0.60 | 0.70 | 0.68 | 0.45 |
FAQ
What does the TOTAL column represent?
The TOTAL column indicates the total number of benchmark questions evaluated for each model.
How do I sort the table?
Click any column header to sort ascending or descending by that column.
What is $ mToK?
mToK stands for “millions to thousands” processing ratio (example metric); adjust to your own definition.
How do I switch to dark mode?
Use the moon/sun icon in the top-right to toggle light/dark mode. Your preference is saved in localStorage.
Can I search/filter?
Yes—use the search box above to filter any matching text across all rows.