LLM Benchmark Table

Model TOTAL Pass Refine Fail Refusal $ mToK Reason STEM Utility Code Censor
Model A 100 80 10 5 5 $0.10 High 90 85 88 Low
Model B 95 75 12 6 2 $0.12 Medium 85 80 82 Medium

FAQ

What is the purpose of this table?

This table compares the performance of various AI models across multiple metrics.

How is the data collected?

Data is collected through standardized benchmarks and real-world testing scenarios.

Can I contribute to the data?

Yes, contributions are welcome. Please contact us for more details.