Elena Popova
Ratings Science is Hard
How do you eat an elephant? One bite at a time.As always, opinions are my own, not those of Lichess.org.
You awaken and glance out the window at the sunrise? sunset?... you can't remember, over the whirr of the computer fan and the scent of your fourth? fifth? Irish coffee. You click the 1+0 button and after the tone, you're staring at another Black king and another "1500?" You slam the cup on your box of smokes and abort the game. The cup shatters; its spell broken, you clean up and return to bed.
Halfway around the Earth, moderators receive emails from players who can't get a game because their opponents keep aborting. There must be a better way...
The next morning, you delete your X account again. Doomscrolling on Mastodon, you see:
Were all the aborted games and rage for nought? Mathematically yes, but they meant something to you and the way you treated your opponents and yourself, so in that sense no.
4,000 miles away and years ago, a chess expert tires of a decade of complaints. Maybe we can force players to play Black until trolls stop aborting games (and ban if trolling continues)?
Online trolling is as old as the internet, so we can't solve it in a day. Complaints continue, and https://lichess.org/terms-of-service revisions indicate our ever-improving understanding of how to handle problems both solvable and unsolvable without leaving users in the dark about what we're doing. Sure, Ono & Yara remind us that players love to complain, but is there more to the story?
Thinking back to the 2011 Deloitte/FIDE Chess Rating Challenge, the expert recalls:
- Glicko-Boost allows for an advantage to white. -- Prof. Glickman (Harvard)
Who is that chess expert, anyway? Well, that would be me. I enjoy chess and I enjoy science: starting 2014, I co-authored Multi-Variant Stockfish (with Fabian Fichter) to try to solve chess variants & help players learn, and it's been an enjoyable collaboration with Lichess ever since.
Back to the task at hand... if I'm rated 2000 and as black I earn a hard-fought draw against a 1999 playing white and my rating decreases, how is that fair? Looking both at Opening Explorer and https://database.lichess.org/ we see a statistically significant first move advantage.
Inspired by Glickman's work, let's seize the day! How hard can it be?
First, we need to estimate the first player advantage. Unfortunately, we lack the luxury of guessing η=30 points: if we get this estimate too wrong, an armada of a million keyboard warriors will rise up and make news of us all across the internet, but if we get this right then years later other sites will copy us and take full credit. I don't care who gets the credit, let's just solve the tragedy of the commons... someone's got to be the first to do it (14 years after Glickman):
Experimenting with FOSS and online Elo calculators and Wolfram Alpha, I realized:
- In theory (Elo/Glicko), the first move advantage should be the same at every rating
- There are multiple Elo estimation models; Prof. Elo acknowledges in his book (I forget the citation) that there are Gaussian and logistic distribution-based models (to try to account for uncertainty at low/high ratings)
Due to research budget constraints, rather than solving "should we assume a logistic distribution?" let's apply the Ideal Case Theorem (at the median):
I sampled recent games from similarly-rated players around 1500, measured a first move win% advantage, then computed the Elo advantage which would produce that win% advantage. I tried learning Rust to reuse existing PGN parsers since the Lichess database is enormous, but since this advantage changes every decade and barely change each month, the current estimate of 11-12 points (and for crazyhouse, 20 points) suffices (and is consistent with my original guess: ~10% / ~10 points).
What does this first move advantage mean in practice?
- If Player A takes White every game and Player B takes Black every game and they always draw, their rating difference will be about 12 points.
- As a 1500 moving second if you draw against a 1490, your rating increases (or moving first drawing a 1510, your rating decreases) because the first move has an effect of about 12 points.
- Using Lichess' artificial RD=45 floor (the most stable RD allowed on Lichess), I ran some 1500 versus 1500 tests.
.
| Outcome | 1-0 | 0.5 | 0-1 |
|---|---|---|---|
| Glicko-2 | 1505.68, 1494.32 | 1500, 1500 | 1494.32, 1505.68 |
| Patched | 1505.49, 1494.51 | 1499.81, 1500.19 | 1494.13, 1505.87 |
On Lichess if you have a stable rating (any rating), and if your similarly-rated opponent moves first you'll gain an extra 0.19 points per game.
Happy holidays!
Image credit: Elena Popova
You may also like
What is Creativity?
Are we creative at using technology, or do computers have a mind of their own?Antichess: 1. e3 Wins for White in #79
The title of this blog is a reference to the 2016 article by Mark Watkins, “Losing Chess: 1. e3 wins…
Toadofsky