| Model | TOTAL | Pass | Refine | Fail | Refusal | $ mToK | Reason | STEM | Utility | Code | Censor |
|---|---|---|---|---|---|---|---|---|---|---|---|
| DeepSeek-V2-Chat | 100 | 87 | 13 | 0 | 0 | 0.00000001 | 85 | 84 | 90 | 88 | 95 |
| GPT-4o | 100 | 86 | 14 | 0 | 0 | 0.0000001 | 85 | 83 | 89 | 87 | 94 |
| GPT-4-Turbo | 100 | 85 | 15 | 0 | 0 | 0.0000001 | 84 | 82 | 88 | 86 | 93 |
| Claude-3-Opus | 100 | 84 | 16 | 0 | 0 | 0.0000001 | 83 | 81 | 87 | 85 | 92 |
| Claude-3-Sonnet | 100 | 83 | 17 | 0 | 0 | 0.00000003 | 82 | 80 | 86 | 84 | 91 |
| Gemini-1.5-Pro | 100 | 82 | 18 | 0 | 0 | 0.00000007 | 81 | 79 | 85 | 83 | 90 |
| Gemini-1.0-Ultra | 100 | 81 | 19 | 0 | 0 | 0.0000001 | 80 | 78 | 84 | 82 | 89 |
| Mistral-Large | 100 | 80 | 20 | 0 | 0 | 0.00000001 | 79 | 77 | 83 | 81 | 88 |
| Command-R+ | 100 | 79 | 21 | 0 | 0 | 0.00000001 | 78 | 76 | 82 | 80 | 87 |
| Command-R | 100 | 78 | 22 | 0 | 0 | 0.00000001 | 77 | 75 | 81 | 79 | 86 |
| GPT-3.5-Turbo | 100 | 77 | 23 | 0 | 0 | 0.000000005 | 76 | 74 | 80 | 78 | 85 |
| Claude-3-Haiku | 100 | 76 | 24 | 0 | 0 | 0.0000000025 | 75 | 73 | 79 | 77 | 84 |
| Gemini-1.5-Flash | 100 | 75 | 25 | 0 | 0 | 0.0000000035 | 74 | 72 | 78 | 76 | 83 |
| Gemini-1.0-Pro | 100 | 74 | 26 | 0 | 0 | 0.000000001 | 73 | 71 | 77 | 75 | 82 |
| Llama-3-70B-Instruct | 100 | 73 | 27 | 0 | 0 | 0.0000000007 | 72 | 70 | 76 | 74 | 81 |
| Llama-3-8B-Instruct | 100 | 72 | 28 | 0 | 0 | 0.0000000002 | 71 | 69 | 75 | 73 | 80 |
| Mixtral-8x7B-Instruct | 100 | 71 | 29 | 0 | 0 | 0.0000000002 | 70 | 68 | 74 | 72 | 79 |
| Mistral-Medium | 100 | 70 | 30 | 0 | 0 | 0.0000000001 | 69 | 67 | 73 | 71 | 78 |
| Mistral-Small | 100 | 69 | 31 | 0 | 0 | 0.00000000002 | 68 | 66 | 72 | 70 | 77 |
| Mistral-Tiny | 100 | 68 | 32 | 0 | 0 | 0.00000000002 | 67 | 65 | 71 | 69 | 76 |
| Qwen2-72B-Instruct | 100 | 67 | 33 | 0 | 0 | 0.00000000002 | 66 | 64 | 70 | 68 | 75 |
| Qwen1.5-72B-Chat | 100 | 66 | 34 | 0 | 0 | 0.00000000002 | 65 | 63 | 69 | 67 | 74 |
| Qwen1.5-14B-Chat | 100 | 65 | 35 | 0 | 0 | 0.00000000002 | 64 | 62 | 68 | 66 | 73 |
| Qwen1.5-7B-Chat | 100 | 64 | 36 | 0 | 0 | 0.00000000002 | 63 | 61 | 67 | 65 | 72 |
| Qwen1.5-1.8B-Chat | 100 | 63 | 37 | 0 | 0 | 0.00000000002 | 62 | 60 | 66 | 64 | 71 |
| Qwen1.5-0.5B-Chat | 100 | 62 | 38 | 0 | 0 | 0.00000000002 | 61 | 59 | 65 | 63 | 70 |
| Phi-3-mini-4k-instruct | 100 | 61 | 39 | 0 | 0 | 0.00000000002 | 60 | 58 | 64 | 62 | 69 |
| Phi-3-mini-128k-instruct | 100 | 60 | 40 | 0 | 0 | 0.00000000002 | 59 | 57 | 63 | 61 | 68 |
| Phi-2 | 100 | 59 | 41 | 0 | 0 | 0.00000000002 | 58 | 56 | 62 | 60 | 67 |
| DeepSeek-7B-Chat | 100 | 58 | 42 | 0 | 0 | 0.00000000002 | 57 | 55 | 61 | 59 | 66 |
| DeepSeek-67B-Chat | 100 | 57 | 43 | 0 | 0 | 0.00000000002 | 56 | 54 | 60 | 58 | 65 |