AI Chess Leaderboard

Performance analysis of 62 AI models in 384 chess matches

Initially, most performant model observed was GPT-3.5 Turbo Instruct playing white in Continuation mode, here is a spontanous live video demonstration: Youtube Link
In full information chess (Reasoning mode), o1-mini showed strengh winning the first tournament (since decrowned) .

This was my most sophisticated side-project to date!

1.) Every game is analyzed by Stockfish 17.1, which then sets the initial Elo placement from calculated accuracy.¹

Initial Elo Calculation

The initial Elo rating is determined by analyzing the first 5 games (excluding self-play) of each AI model using Stockfish 17.1 at depth 18. The accuracy is calculated using Lichess's methodology, converting move-by-move engine evaluations to win percentages, applying position-specific complexity weighting, and combining weighted and harmonic means for the final accuracy score. Each mode receives unique placements.

Formula:

Initial_Elo = 400 + 200 × (2^((Accuracy-30)/20) - 1)

Where:
- Accuracy = Average accuracy across first 5 non-self-play games (%)
- Accuracy is constrained between 10% and 90%
- Human players start at 1500 Elo regardless of accuracy
- Default fallback: 1000 Elo if no accuracy data available

Examples:
• 30% avg accuracy → Initial_Elo = 400 + 200 × (2^0 - 1) = 400
• 50% avg accuracy → Initial_Elo = 400 + 200 × (2^1 - 1) = 600
• 70% avg accuracy → Initial_Elo = 400 + 200 × (2^2 - 1) = 1000
• 90% avg accuracy → Initial_Elo = 400 + 200 × (2^3 - 1) = 1800

2.) Then, automatic Elo updates are applied for each game, per mode and mixed.²

Elo Update System

After initial placement, Elo ratings are updated after each AI vs AI game using the standard Elo rating system. Each mode has separate Elo.

Update Formula:

New_Elo = Old_Elo + K × (Actual_Score - Expected_Score)

Where:
- K = K-factor based on experience:
  • Provisional players (<30 games): K = 40
  • Established players (≥30 games): K = 20
- Actual_Score = 1 (win), 0.5 (draw), 0 (loss)
- Expected_Score = 1 / (1 + 10^((Opponent_Elo - Player_Elo) / 400))

Example:
If a 600 Elo AI (15 games played, K=40) plays a 1000 Elo AI and wins:
• Expected Score = 1 / (1 + 10^((1000-600)/400)) = 0.091
• Elo Change = 40 × (1 - 0.091) = +36.4
• New Elo = 600 + 36.4 = 636.4

Special Rules:

  • Human vs AI games: Only human Elo is updated
  • Self-play games are excluded from Elo calculations
Updates occur automatically every 24h.
Chess Performance by Model
Wins
Draws
Losses
Acc
Score
Elo
Player Statistics
Model Games W-D-L Elo Acc Score Avg Mat. Avg Turns W/B Games
Mode Statistics
Decisive (Win/Loss)
Draws