AI Chess Leaderboard

Performance analysis of 109 AI models in 972 chess matches

Initially, most performant model observed was GPT-3.5 Turbo Instruct, playing white in Continuation mode. Here is a spontanous live video demonstration: YouTube
In full information chess (Reasoning mode), o1-mini showed strength, winning the first tournament (since decrowned)
-Amount of games are mostly limited by API contraints and/or budget. This chess inference ran me ~$850.

This is my most sophisticated side-project to date!

1.) Every game is analyzed by Stockfish 17.1, which also sets the initial Elo placement based on calculated accuracy.¹

Initial Elo Calculation

The initial Elo rating is determined by analyzing the first 10 games (excluding self-play) of each AI model using Stockfish 17.1 at depth 18. The accuracy is calculated using Lichess's methodology, converting move-by-move engine evaluations to win percentages, applying position-specific complexity weighting, and combining weighted and harmonic means for the final accuracy score. Each mode receives unique placements.

Formula:

Initial_Elo = 400 + 200 × (2^((Accuracy-30)/20) - 1)

Where:
- Accuracy = Average accuracy across first 10 non-self-play games (%)
- Accuracy is constrained between 10% and 90%
- Human players start at 1500 Elo regardless of accuracy
- Default fallback: 1000 Elo if no accuracy data available

Examples:
• 30% avg accuracy → Initial_Elo = 400 + 200 × (2^0 - 1) = 400
• 50% avg accuracy → Initial_Elo = 400 + 200 × (2^1 - 1) = 600
• 70% avg accuracy → Initial_Elo = 400 + 200 × (2^2 - 1) = 1000
• 90% avg accuracy → Initial_Elo = 400 + 200 × (2^3 - 1) = 1800

Established complete random play achieved 35% avg accuracy → 438

2.) Then, automatic Elo updates are applied for each game, per mode and mixed.²

Elo Update System

After initial placement, Elo ratings are updated after each AI vs AI game using the standard Elo rating system. Elo < 500 ≈ random play. Each mode has separate Elo.

Update Formula:

New_Elo = Old_Elo + K × (Actual_Score - Expected_Score)

Where:
- K = K-factor based on experience:
  • Provisional players (<30 games): K = 40
  • Established players (≥30 games): K = 20
- Actual_Score = 1 (win), 0.5 (draw), 0 (loss)
- Expected_Score = 1 / (1 + 10^((Opponent_Elo - Player_Elo) / 400))

Example:
If a 600 Elo AI (15 games played, K=40) plays a 1000 Elo AI and wins:
• Expected Score = 1 / (1 + 10^((1000-600)/400)) = 0.091
• Elo Change = 40 × (1 - 0.091) = +36.4
• New Elo = 600 + 36.4 = 636.4

Special Rules:

Human vs AI games: Only human Elo is updated
Self-play games are excluded from Elo calculations

Replays available for every match! Updates occur automatically every 24h.

Mode Statistics

Decisive (Win/Loss)

Draws

Chess Performance by Model

Wins

Draws

Losses

Elo

Acc

Score

Player Statistics

Model	Games	W-D-L	Elo	Acc	Score	Avg Mat.	Avg Turns	W/B Games

White	Black	Mode	Outcome	Replay	Turns	Mat	Acc	Elo

AI Chess Leaderboard

Mode Statistics

Chess Performance by Model

Player Statistics

Game History for