LLM Benchmark Table

Model	TOTAL	Pass	Refine	Fail	Refusal	$ mToK	Reason	STEM	Utility	Code	Censor
Model A	100	80	10	5	5	$0.10	High	90	85	88	Low
Model B	95	75	12	6	2	$0.12	Medium	85	80	82	Medium

This table compares the performance of various AI models across multiple metrics.

Data is collected through standardized benchmarks and real-world testing scenarios.

Yes, contributions are welcome. Please contact us for more details.

FAQ