Model | TOTAL | Pass | Refine | Fail | Refusal | $ mToK | Reason | STEM | Utility | Code | Censor |
---|---|---|---|---|---|---|---|---|---|---|---|
Model A | 100 | 80 | 10 | 5 | 5 | $0.10 | High | 90 | 85 | 88 | Low |
Model B | 95 | 75 | 12 | 6 | 2 | $0.12 | Medium | 85 | 80 | 82 | Medium |
This table compares the performance of various AI models across multiple metrics.
Data is collected through standardized benchmarks and real-world testing scenarios.
Yes, contributions are welcome. Please contact us for more details.