This interactive table blends outcome rates (Pass / Refine / Fail / Refusal), price ($ mToK), and category scores (Reason, STEM, Utility, Code, Censor). Use the controls to build a short-list and export it in one click.
How to read the table, what the numbers mean, and how the view works.
Think of them as outcome rates across a benchmark suite:
Higher Pass is good; higher Refusal may be desirable in safety contexts, but can reduce utility in general use.
A simple cost proxy: dollars per million tokens (mToK = million tokens). Lower is cheaper. Use it alongside TOTAL to find value models.
In this demo dataset, TOTAL is a composite score designed for comparison: category scores (Reason/STEM/Utility/Code/Censor) plus outcome rates in a balanced way. It’s not a universal standard.
Tip: Click any column header to sort; click again to reverse. Use Columns to tailor your view.
/ focus searchEsc clear search (or close menus)T toggle theme? show shortcuts (this hint)