LLM Benchmark table

Sleek interactive comparison of AI models — sortable, filterable, exportable.

0%
100
Columns: Model, TOTAL, Pass, Refine, Fail, Refusal, $ mToK, Reason, STEM, Utility, Code, Censor. Click headers to sort.
Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor
Tip: Click a model row to expand details. Use filters and column toggles to customize view.

FAQ

What do the columns mean?

Model: model name. TOTAL: number of evaluated prompts. Pass/Refine/Fail/Refusal: outcome counts or percentages. $ mToK: approximate cost metric (relative). STEM/Utility/Code: category scores (0–100). Censor: percent of censor/guardrails triggered.

How is the Pass rate calculated?

Pass is computed as (successful responses / TOTAL) * 100. Refine indicates partial success requiring user iteration.

Can I export the data?

Yes — use the Export button to download visible rows as CSV. Hidden columns are omitted from the export.

Is this data live?

No — the demo uses sample data. Replace the array in the script with real benchmark results to display your dataset.