LLM
LLM Benchmark table
Sleek interactive comparison of AI models — sortable, filterable, exportable.
Model | TOTAL | Pass | Refine | Fail | Refusal | $ mToK | Reason | STEM | Utility | Code | Censor |
---|
Tip: Click a model row to expand details. Use filters and column toggles to customize view.
FAQ
What do the columns mean?
Model: model name. TOTAL: number of evaluated prompts. Pass/Refine/Fail/Refusal: outcome counts or percentages. $ mToK: approximate cost metric (relative). STEM/Utility/Code: category scores (0–100). Censor: percent of censor/guardrails triggered.
How is the Pass rate calculated?
Pass is computed as (successful responses / TOTAL) * 100. Refine indicates partial success requiring user iteration.
Can I export the data?
Yes — use the Export button to download visible rows as CSV. Hidden columns are omitted from the export.
Is this data live?
No — the demo uses sample data. Replace the array in the script with real benchmark results to display your dataset.