LLM Benchmark table

A sleek, interactive comparison of AI model performance

Models compared

—

Distinct model entries in table

Average pass rate

—%

—

Unweighted average of Pass/TOTAL across models

Median $ per 1M tokens

—

Lower may indicate better cost-efficiency

Min Pass %

Max $ mToK

Min Code score

Max Censor %

Model	TOTAL	Pass	Refine	Fail	Refusal	$ mToK	Reason	STEM	Utility	Code	Censor	Pass %

FAQ

What is this site?

LLM Benchmark table is an interactive, single‑file web app for comparing AI model performance across categories. Use search, filters, sorting, and column visibility to explore the data. Export your current view to CSV.

Where do the numbers come from?

The dataset here is sample/demo data to showcase the interface. Replace it with your own measurements to use this as a real dashboard. You can modify the data array in the script section of this HTML file.

How do I filter and sort?

- Search bar: filters model names in real time. - Filters: open the Filters panel to constrain by pass rate, cost, code score, or censor percentage. - Sorting: click a column header to sort; click again to reverse. Hold Shift to multi-sort by several columns. - Columns: use the Columns menu to show/hide any column; your choices persist locally.

Does it support dark mode?

Yes. Toggle the switch in the header. Your preference is remembered in your browser.

Can I export the data?

Click Export to download a CSV of the currently filtered and sorted rows.