Performance analysis of 62 AI models in 384 chess matches
Initially, most performant model observed was GPT-3.5 Turbo Instruct playing white in Continuation mode, here is a spontanous live video demonstration:
Youtube Link
In full information chess (Reasoning mode), o1-mini showed strengh winning the first tournament (since decrowned)
.
1.) Every game is analyzed by Stockfish 17.1, which then sets the initial Elo placement from calculated accuracy.¹
The initial Elo rating is determined by analyzing the first 5 games (excluding self-play) of each AI model using Stockfish 17.1 at depth 18. The accuracy is calculated using Lichess's methodology, converting move-by-move engine evaluations to win percentages, applying position-specific complexity weighting, and combining weighted and harmonic means for the final accuracy score. Each mode receives unique placements.
Formula:
Initial_Elo = 400 + 200 × (2^((Accuracy-30)/20) - 1) Where: - Accuracy = Average accuracy across first 5 non-self-play games (%) - Accuracy is constrained between 10% and 90% - Human players start at 1500 Elo regardless of accuracy - Default fallback: 1000 Elo if no accuracy data available
Examples:
• 30% avg accuracy → Initial_Elo = 400 + 200 × (2^0 - 1) = 400
• 50% avg accuracy → Initial_Elo = 400 + 200 × (2^1 - 1) = 600
• 70% avg accuracy → Initial_Elo = 400 + 200 × (2^2 - 1) = 1000
• 90% avg accuracy → Initial_Elo = 400 + 200 × (2^3 - 1) = 1800
2.) Then, automatic Elo updates are applied for each game, per mode and mixed.²
After initial placement, Elo ratings are updated after each AI vs AI game using the standard Elo rating system. Each mode has separate Elo.
Update Formula:
New_Elo = Old_Elo + K × (Actual_Score - Expected_Score) Where: - K = K-factor based on experience: • Provisional players (<30 games): K = 40 • Established players (≥30 games): K = 20 - Actual_Score = 1 (win), 0.5 (draw), 0 (loss) - Expected_Score = 1 / (1 + 10^((Opponent_Elo - Player_Elo) / 400))
Example:
If a 600 Elo AI (15 games played, K=40) plays a 1000 Elo AI and wins:
• Expected Score = 1 / (1 + 10^((1000-600)/400)) = 0.091
• Elo Change = 40 × (1 - 0.091) = +36.4
• New Elo = 600 + 36.4 = 636.4
Special Rules:
Model | Games | W-D-L | Elo | Acc | Score | Avg Mat. | Avg Turns | W/B Games |
---|