Friday, July 22, 2016

Fun Tidbits - Poking Around With 538's 2016 Elo Ratings

Rather than attempting another large-scale, code heavy analysis (see last week's post) this week, I decided to do some simpler little projects just with the Elo ratings I have scraped from FiveThirtyEight. Are they rigorous? No. Are they a little fun to think about? I hope so!

In an effort to keep with the spirit of this blog, I included the methods used so others can learn. Feel free to skim to the results if that's all you want.

I did an unpublished version of some of these with NFL ratings last winter, and felt that some might translate over into baseball.

The spreadsheet I used is located here on Google drive for reference.

Biggest MLB Upsets in 2016 (so far) --- According to Elo

This is a simple analysis. The absolute value in Elo ratings was taken:
=ABS(Home Elo - Away Elo)
And then a simple If/then filter was applied to runs scored to determine if the team with the lower rating won.

=IF(OR(AND(Runs Scored<Runs Allowed,Home Elo>Away Elo),AND(Runs Allowed>Runs Scored, Away Elo<Home Elo)),1,0)
Or, in English:

  • If one of these conditions are true
    • A. Home team scored less points than away team AND home team was rated higher
    • B. Away team scored less points than home team AND home team was rated lower
  • THEN return a 1
  • Else, return a 0
Then these results were sorted by the above condition, and then by the absolute difference of Elo ratings.


I highlighted the Cubs just to make a point. As mentioned above, I did the same thing last winter for NFL Elo ratings, and the results made much clearer sense. I think this highlights a major difference between the two sports. The sheer number of games played in baseball lends itself to being less of a surprise that the Cubs were a part of 9/10 of the top 10 upsets this season. Whereas the Patriots losing to the Dolphins last winter was a much bigger deal.

2016's "Boringist" Games

A similar analysis was done, but instead sorted by the higher rating team winning and a difference in runs using the formula:

=ABS(Runs - Runs Allowed)

This would give a list of "boring games", where the higher rated team won and by a lot of runs. One might think that the stands were rather empty for these.


2016's "Closest" Games

Here's the flipside, where I sorted on games with the lowest Elo difference, and that were only won by one run.



Best,
Bryce

 

Sunday, July 17, 2016

Who Watches the Watchmen - Attempting to Analyze 538's MLB Elo Predictions

The Elo rating system has been on the rise in the use of sports analytics, driven by its use on the popular data journalism website FiveThirtyEight.com. I have noticed a lack of self-analysis of their Elo system, and also feel like Elo ratings are a new, ripe area for discovery in sports analytics. This project was intended as a learning experience for myself, as I am new to sabermetrics and programming. I do not claim that this is a rigorous analysis, and am fully open to suggestions on how to improve anything that needs improved. Without further ado:

538's MLB Elo system attempts to do two distinct things:

1) Long term predictions for post-season probabilities (make playoffs, win division, win world series)

and

2) Short term predictions for outcomes given some Elo adjustments like opening pitcher, travel time and rest.

This post will deal with the latter.

In an attempt to improve my programming chops, I used a python script I developed to scrape and re-organize the data from FiveThirtyEight's website for the 2016 season thus far. And, because I am most familiar with Excel, I used it to develop graphs and perform a few extra calculations here and there.

Methods

Only games played thus far in the 2016 season have been analyzed (n = 1364).

This analysis was kept to only looking at how well the unadjusted Elo rating was at predicting the outcome of the game compared to the adjusted rating. There are a couple ways to tackle this problem:

1) Convert the difference in win probabilities of the two opposing teams into a point spread, and then compare that point spread to the outcome.

or

2) Analyze the % chance of win compared to the outcome.

Due to a personal uncertainty I had with option 1 (mainly the constant used in the formula to convert a difference in Elo ratings to a point spread), I relegated this analysis to option 2.

To do this I took the win expectancy, which is continuous, and converted it to discrete values with bounds of 5%.

Ex:

0 - 4.9 % chance of win = 2.5%
5-9.9% chance of win = 7.5%
10-14.9% chance of win = 12.5%
and so on

All played games to date for the 2016 season were then converted to these bounds. Then, a win or loss binary value was assigned for each game (W = 1, L = 0). A pivot table was used in Excel to take the percentage of games won in each bound.

If the Elo system was perfect, for a given category that would have a win percentage of the same value. Granted, the lower the sample size, the less likely this will be.

Results



After subtracting the predicted percentage from the actual percentage, this chart is the result for bounds of 5% (note that excess bounds with no data were deleted). If the bars go below 0 %, it suggests that Elo underestimated games won, if the bar is above 0%, Elo overestimated games won. An ideal distribution would have very small bar heights.

Example:

1. Of the 132 games in the 40-45 raw Elo bound, on average 42.5% of them should have been won. According to the chart only 36.1% of those games were actually won (42.5 + (-6.4)) = 36.1

2. Of the 78 games in the 65-70 adjusted Elo bound, on average 67.5% of them should have been won. According to the chart 67.5325% of those games were actually won.

etc

-------

Labels for each bar cluster indicate the number of total (n) games analyzed for each bound in the format [adjusted n] / [raw n]. Generally speaking, the adjusted Elo seems to be slightly better at prediction, but not by a wide margin for any of the bounds. Furthermore, Elo in general seems to be poor at predicting teams with a very low win probability (25-30%), but given the low sample size (n = 5), this conclusion may be too hasty.

Overall, Elo win expectancy seems to be very accurate as sample size increases, at least given the data analyzed, to about +/- 10%, excluding the 25-30 bound.

Limitations

One of the glaring limitations of this method is that the actual adjustment of the Elo rating applied to the raw rating does not affect the win % chance very much. As a result, the number of times an adjusted rating actually changes the category of win percentage (i.e. a 30-35% chance of win changing to a 35-40% chance) is low. In fact, of the 1364 games scored at the time of writing, only 10 games managed to change categories.

"Well why don't you just decrease the bound size then, guy?" I hear you asking,

the astute observer would realize that this would then lead to a lower sample size per category, and thus less confident answers. The question becomes, can this be optimized?

Further Exploration - Thoughts on K

The author realizes that the season is just a little over half complete, and intends to follow up this post with the same methodology at the end of the regular season, perhaps with a bound width of ~2.5% as the sample size increases.

This starts to build on another ripe topic, including optimization of the constant k in the Elo equation as the season progresses. k is used to provide "oomph" to the degree in which ratings change after games are played, the higher the k value, the more a win or loss affects a teams rating. It is common practice to change k for the playoffs, but what about mid-season?

 Is it correct to assume that k should be constant for a whole season? What happens if k is changed, and are the ways to achieve more accurate predictions by doing so.

Questions for another day.