AI Chess Leaderboard

Performance analysis of 132 AI models in 1316 chess matches

Initially, most performant model observed was GPT-3.5 Turbo Instruct, playing white in Continuation mode. Here is a spontanous live video demonstration: YouTube
In full information chess (Reasoning mode), o1-mini showed strength, winning the first tournament (since decrowned)
-Amount of games are mostly limited by time, API contraints and/or budget. This chess inference ran me ~$1200.

This is my most sophisticated side-project to date!

1.) Every game is analyzed by Stockfish 17.1, which also sets the initial Elo placement based on calculated accuracy.¹

Initial Elo Calculation

The initial Elo rating is determined by analyzing the first 10 games (excluding self-play) of each AI model using Stockfish 17.1 at depth 18. The accuracy is calculated using Lichess's methodology, converting move-by-move engine evaluations to win percentages, applying position-specific complexity weighting, and combining weighted and harmonic means for the final accuracy score. Each mode receives unique placements.

Formula:

Initial_Elo = 400 + 200 × (2^((Accuracy-30)/20) - 1)

Where:
- Accuracy = Average accuracy across first 10 non-self-play games (%)
- Accuracy is constrained between 10% and 90%
- Human players start at 1500 Elo regardless of accuracy
- Default fallback: 1000 Elo if no accuracy data available

Examples:
• 30% avg accuracy → Initial_Elo = 400 + 200 × (2^0 - 1) = 400
• 50% avg accuracy → Initial_Elo = 400 + 200 × (2^1 - 1) = 600
• 70% avg accuracy → Initial_Elo = 400 + 200 × (2^2 - 1) = 1000
• 90% avg accuracy → Initial_Elo = 400 + 200 × (2^3 - 1) = 1800

Established complete random play achieved 35% avg accuracy → 438

2.) Then, automatic Elo updates are applied for each game, per mode and mixed.²

Elo Update System

After initial placement, Elo ratings are updated after each AI vs AI game using the standard Elo rating system. Elo < 500 ≈ random play. Each mode has separate Elo.

Update Formula:

New_Elo = Old_Elo + K × (Actual_Score - Expected_Score)

Where:
- K = K-factor based on experience:
  • Provisional players (<30 games): K = 40
  • Established players (≥30 games): K = 20
- Actual_Score = 1 (win), 0.5 (draw), 0 (loss)
- Expected_Score = 1 / (1 + 10^((Opponent_Elo - Player_Elo) / 400))

Example:
If a 600 Elo AI (15 games played, K=40) plays a 1000 Elo AI and wins:
• Expected Score = 1 / (1 + 10^((1000-600)/400)) = 0.091
• Elo Change = 40 × (1 - 0.091) = +36.4
• New Elo = 600 + 36.4 = 636.4

Special Rules:

  • Human vs AI games: Only human Elo is updated
  • Self-play games are excluded from Elo calculations
Replays available for every match! Updates occur automatically every 24h.
Chess Performance by Model
Wins
Draws
Losses
Elo
Acc
Score
Player Statistics
Model Games W-D-L Elo Acc Score Avg Mat. Avg Turns W/B Games