AI Chess Leaderboard

Performance analysis of 88 AI models in 618 chess matches

Initially, most performant model observed was GPT-3.5 Turbo Instruct playing white in Continuation mode, here is a spontanous live video demonstration: Youtube Link
In full information chess (Reasoning mode), o1-mini showed strengh winning the first tournament (since decrowned) .

This was my most sophisticated side-project to date!

1.) Every game is analyzed by Stockfish 17.1, which then sets the initial Elo placement from calculated accuracy.¹

Initial Elo Calculation

The initial Elo rating is determined by analyzing the first 10 games (excluding self-play) of each AI model using Stockfish 17.1 at depth 18. The accuracy is calculated using Lichess's methodology, converting move-by-move engine evaluations to win percentages, applying position-specific complexity weighting, and combining weighted and harmonic means for the final accuracy score. Each mode receives unique placements.

Formula:

Initial_Elo = 400 + 200 × (2^((Accuracy-30)/20) - 1)

Where:
- Accuracy = Average accuracy across first 10 non-self-play games (%)
- Accuracy is constrained between 10% and 90%
- Human players start at 1500 Elo regardless of accuracy
- Default fallback: 1000 Elo if no accuracy data available

Examples:
• 30% avg accuracy → Initial_Elo = 400 + 200 × (2^0 - 1) = 400
• 50% avg accuracy → Initial_Elo = 400 + 200 × (2^1 - 1) = 600
• 70% avg accuracy → Initial_Elo = 400 + 200 × (2^2 - 1) = 1000
• 90% avg accuracy → Initial_Elo = 400 + 200 × (2^3 - 1) = 1800

2.) Then, automatic Elo updates are applied for each game, per mode and mixed.²

Elo Update System

After initial placement, Elo ratings are updated after each AI vs AI game using the standard Elo rating system. Each mode has separate Elo.

Update Formula:

New_Elo = Old_Elo + K × (Actual_Score - Expected_Score)

Where:
- K = K-factor based on experience:
  • Provisional players (<30 games): K = 40
  • Established players (≥30 games): K = 20
- Actual_Score = 1 (win), 0.5 (draw), 0 (loss)
- Expected_Score = 1 / (1 + 10^((Opponent_Elo - Player_Elo) / 400))

Example:
If a 600 Elo AI (15 games played, K=40) plays a 1000 Elo AI and wins:
• Expected Score = 1 / (1 + 10^((1000-600)/400)) = 0.091
• Elo Change = 40 × (1 - 0.091) = +36.4
• New Elo = 600 + 36.4 = 636.4

Special Rules:

  • Human vs AI games: Only human Elo is updated
  • Self-play games are excluded from Elo calculations
Replays available for every match! Updates occur automatically every 24h.
Chess Performance by Model
Wins
Draws
Losses
Elo
Acc
Score
Player Statistics
Model Games W-D-L Elo Acc Score Avg Mat. Avg Turns W/B Games
Mode Statistics
Decisive (Win/Loss)
Draws