LLM Benchmark table

A sleek, interactive comparison of AI model performance

Models compared

Distinct model entries in table

Average pass rate

%
Unweighted average of Pass/TOTAL across models

Median $ per 1M tokens

$
Lower may indicate better cost-efficiency
Interactive Data Table — shown
Sort: click headers • Column menu to show/hide • Export filtered rows
Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor Pass %

FAQ

What is this site?
LLM Benchmark table is an interactive, single‑file web app for comparing AI model performance across categories. Use search, filters, sorting, and column visibility to explore the data. Export your current view to CSV.
Where do the numbers come from?
The dataset here is sample/demo data to showcase the interface. Replace it with your own measurements to use this as a real dashboard. You can modify the data array in the script section of this HTML file.
How do I filter and sort?
- Search bar: filters model names in real time. - Filters: open the Filters panel to constrain by pass rate, cost, code score, or censor percentage. - Sorting: click a column header to sort; click again to reverse. Hold Shift to multi-sort by several columns. - Columns: use the Columns menu to show/hide any column; your choices persist locally.
Does it support dark mode?
Yes. Toggle the switch in the header. Your preference is remembered in your browser.
Can I export the data?
Click Export to download a CSV of the currently filtered and sorted rows.