Introduction And so finally we come to the end of this comparison of four different modelling strategies for predicting football matches: hierarchical Bayesian regression models, a traditional Elo rating system, an optimised Elo system using Evolutionary Algorithms (EAs), and online Bayesian models using Particle Filters. I’ll skip most of the code so we can jump straight to the results, but it’s all available by clicking on the folds in the following section.
Introduction This is the fourth in a series of posts looking back at the various statistical and machine learning models that have been used to predict football match outcomes as part of Predictaball. Here’s a quick summary of the first 3 parts:
Part 1 used a Bayesian hierarchical regression to model a team’s latent skill, where skill was constant over time Part 2 used an Elo rating system to handle time, but the functions and parameters were hardcoded and a match prediction model was bolted on top to replace Elo’s basic prediction Part 3 used Evolutionary Algorithms (EA) to simultaneously optimize the rating system and match prediction model without requiring any hardcoding parameters The EA model has working reliably for the last 5 and a half seasons and hasn’t been tweaked since.
Introduction The Elo models introduced last time, which were the models used on Predictaball from 2017-2019, worked very well but with some limitations. Firstly, the parameters used in the rating update equation (home advantage and margin of victory multiplier) were chosen manually by inspection for each league. If I wanted to apply Predictaball to a whole new set of leagues (yet alone sports!) I’d need to go through each one to identify new parameters - hardly ideal!
Introduction Following on from Part 1 in this series of posts looking back at the history of Predictaball, I’ll be reexamining the Elo models that were used from 2017-2019. One obvious flaw with the hierarchical Bayesian regression model was that there was no acknowledgement of time - a team’s skill was modelled at the point of training and kept fixed from them on. The model could be retrained after each match, but MCMC is time-consuming, and the resultant skill would still be an average over the full training period rather than the value at the current time.
Introduction This is the first in a retrospective series of posts looking at the evolution of Predictaball - my football match prediction engine - and reflect on how it mirrors my own development as a data scientist. I’ve been fortunate to work in a wide variety of domains, exposing me to a range of statistical paradigms and perspectives, which have been reflected in the models used in Predictaball. At the end of the series I’ll do a full comparison of all the algorithms too, as this is something I’ve never done before but have been interested in for a long time.
While I’ve been quite happy with the performance of my Predictaball football rating system, one thing that that’s bothered me since its inception last summer is the reliance on hard-coded parameters.
Similar to many other football rating methods, it’s an adaptation of the Elo system that was designed for Chess matches by Arpad Elo in the 1950s. His aim was to devise an easily implementable system to rate competitors in a 2-person zero-sum game.
Back in March I rewrote thepredictaball.com from its original R Shiny implementation into a static website using the Vue Javascript framework. I intended to write about it at the time but I’ve been busy and hadn’t made time for it until now, which is handy given that the football season has just finished!
Excuse the clickbait title, but I genuinely couldn’t think of a better way of organising this post.
eXpected Goals (xG) is a popular method of answering that age old question of which team ‘deserved’ to win a match. It does this by assigning a probability of a goal being scored from every opportunity based upon various metrics, such as the distance from goal, number of defenders nearby, and so on. By comparing a team’s actual standings with those from the output of an xG model we get a retrospective measure of how well a team is doing given their chances.
I’ve created a website for Predictaball with team ratings and match predictions for all 4 main European leagues, at thepredictaball.com. It has each team’s current rating and plots showing the change over the course of season, along with match outcome forecasts. Various statistics are also included, such as the biggest upset, worst teams in history, as well as this season’s predictive accuracy. Previously only Premiership match predictions were made available (via Twitter) and so I’m happy that I’ve finally got this website released.
This post continues on from the mid-season review of the Elo system and looks at my Bayesian football prediction model, Predictaball, up to and including matchday 20 of the Premier League (29th December). I’ll go over the overall predictive accuracy and compare my model to others, including bookies, expected goals (xG), and a compilation of football models.
Overall accuracy So far, across the top 4 European leagues, there have been 696 matches with 379 (54%) of these outcomes being correctly predicted.