Model | TOTAL | Pass | Refine | Fail | Refusal | $mToK | Reason | STEM | Utility | Code | Censor |
---|
The TOTAL score is a composite indicator derived from several primary metrics (Pass, Refine, Fail, Refusal, and mToK) to summarize reliability, usefulness, and safety of an LLM. Higher is generally better; the score is shown out of 100 for quick comparison.
Each numeric column shows a bar representing the value as a percent of the observed maximum for that column in the current view. This provides a quick visual sense of relative performance among models in the filtered/sorted state.
Yes. Click "Export CSV" to download the current view as a comma-separated file. The export reflects any filters you have applied.
Dark mode preference is saved in your browser's localStorage and persists across visits. Use the toggle to switch themes.