🦾 LLM Benchmark Table

Model	TOTAL	Pass	Refine	Fail	Refusal	$ mToK	Reason	STEM	Utility	Code	Censor
GPT-4	95.2		12	3	5	$0.03	92	96	94	89	Medium

FAQ

What do these metrics mean?

TOTAL: Overall performance score combining all metrics...