LLM Benchmark Table

Model	TOTAL	Pass	Refine	Fail	Refusal	$ mToK	Reason	STEM	Utility	Code	Censor
Model A	100	50	20	15	15	0.05	Good accuracy	85%	90%	95%	Low
Model B	100	40	30	20	10	0.06	High utility	80%	92%	88%	Medium

The LLM Benchmark Table is a tool for comparing the performance of various AI language models across multiple metrics.

The columns represent different metrics, such as total performance, pass rate, refinement rate, failure rate, refusal rate, and more.

You can interact with the table to compare models and analyze their performance based on your needs.

FAQ