Predicting the Euro 2016 Final using Markov Chains

We’re back on July 9, 2016: Within the next 24 hours, the soccer world will be heads over heels for the upcoming final game of this season’s championship of UEFA Euro 2016 in France. Like usual for such big events, I am curious to know the team’s performances over all. So compare two teams that never played against each other. Additionally, I’d like to get the know the probable winner in advance. According to my predictions, France will win this final with odds 2:1 (about 63.5%).
Data
To be fair, the data is fairly sparse. I only took matches from this end round of the Euro 2016. Only matches from June 10, 2016, to today, July 9, 2016, are taken. There will likely be an update after the final match; but this post and prediction is one day ahead of time. No other data, e.g. ranks, world lists, statistics, player performances, were taken. Only those match results before the final game. So this study is not skewed by history or details.
Method
The method is fairly simple. Being me, I chose Markov Chains to figure out the teams performances. On those sparse data from the end round, for any match I only took the final scores. So if team A plays against team B and the score in the end is 2:1, then B “owes” A 2 recommendations and gets 1 recommendation in return. So every goal scored against one team forces this team “honours” for the team it scored the goal.
Goals during the first 90 or extended 120 minutes are counted as exactly one. Goals scored in penalty shouting are discounted by a factor of ten — they only count for one tenth of a regular goal.
As team count is 24, I build a huge 24 x 24 matrix with all goals from any team to any team. Had one team/team combination been up twice, there were only the recommendation counts added on top the former ones.
This matrix was then normalized, so rows summed to one. Therefrom, I build a Markov Chain. With no Shadow Matrix ε as the sparsity of the data was bad enough and it only adds up doubt. Having a Shadow Matrix in place tends to make everything even, which is not. So in that case, I decided against it.
The Markov Chain simulates a Random Walk between any teams. The very next team is chosen by the odds of recommendations. Having most recommendations and having recommendations from other top teams, therefore guarantees for a top-notch place in the list. The probability of every team being the best over all, is therefore the Steady State solution of this Markov Chain. The latter is like running forever over the graph following the recommendations by chance. The Steady State therefore is, the probability of being in that state by time infinity.
Teams Performances
One friend from England was sad about his team’s performance. They lost in the quarter finals against Iceland. Germany on the other hand, came into the semi final. Even though, England was at most as good as Germany — the latter is the current World Champion and one prospect finalist. England, however, lost to Iceland, which is up to now second best team in this season.
Iceland is most obviously the relative winner. The absolute winner is to be decided tomorrow. Both Iceland and Wales, were extraordinary and made excellent matches — the model does not know former performances and therefore is blind to relative excellence of teams.
The top four teams, as of now are:

This bar chart shows teams sorted by their probability of being the single best team in this Euro 2016 as of now. The colors correspond to four cluster of teams — two clusters span in these top four teams. The vertical axis is the median probability of the sample data shown — not the entire data set.
The above-average teams are:

This bar chart shows teams sorted by their probability of being the single best team in this Euro 2016 as of now. The colors correspond to four cluster of teams — three clusters span in these top four teams. The vertical axis is the median probability of the sample data shown — not the entire data set.
The complete list of all teams as of now:

This bar chart shows teams sorted by their probability of being the single best team in this Euro 2016 as of now. The colors correspond to four cluster of teams. The vertical axis is the median probability of the entire data set.
Match Predictions
According to Markov Chains, therefore any posterior of the current Steady State solution, can be used to compute the odds of to teams winning a game against each other. These posterior odds s from a priori values p therefore are

According to this math, Germany was supposed to lose close to France. With the odds before the match being about 43% for Germany vs 57% for France. Just comment out the semi final Germany/France and issue
df[df$team=="France",]$prob /(df[df$team=="Germany",]$prob + df[df$team=="France",]$prob)
to see it.
Doing the same for the final match, France is supposed to win by a comfortable margin with 36.5% for Portugal vs 63.5% for France respectively — about 2:1, which could be a reasonable score after 90 or 120 minutes.
The interested reader might also like the Goldman Sachs prediction for Euro 2016 using historical data and ELO ranking. The link is added down below as well. With more data but all from the phase before the first match, they got astonishingly accurate results. According to my odds formula above, Goldman Sachs would suggest France winning even by odds 3:1 (about 74% for France vs 26% for Portugal by using priors 23.1% for France and 8% for Portugal respectively).
Taking out some matches, or tweaking match outcomes, shows how the data is still changing upon even minor changes. So before the Germany/France match, Italy was by far the top score — which was send home by Germany. When France eliminated Germany then, France was on top and Germany got reached down rather far. Germany now just does a little better than average.
I encourage you to test out, how my score prediction of 1:2 (Portugal vs France) plays out. If Portugal really scores one goal and got two counter-scores from France, there will be a top-three cluster with France/Iceland/Portugal in this order. France will lead by a wide margin; Portugal can really profit from getting at least one recommendation from France. If Portugal wins, say 2:1, then the top cluster will be France/Portugal in this ranking with a very close gap between the two. So, Iceland’s Markov-Chain-based performance evaluation depends on the goals of the final as well…
Résumé
You can predict sports events and other rankings using Markov Chains. We already did it on other occasions like
- describing and predicting traffic flow through cities by time and day of week,
- sport scores to rank teams,
- to rank ideas,
- to rank individual people’s performances and
- vacation venues.
The data is almost to sparse to be quiet exact. But it gives you a feeling for a possible ranking every sharpening with more and more data. On the first data, like here, the teams may swap after one game, which is not a steady state. Incorporating more data makes this effect vanish swiftly.
Thanks for reading my blog post. For any questions and helpful feedback, please do not hesitate to let me know down in the comments’ section below. Thank you.
Source:
github.com/danielschulz/Euro2016-Predictions
Further reading:
[A] Goldman Sachs’ sophisticated pre-Euro-2016 forecast with historical data: goldmansachs.com/our-thinking/macroeconomic-insights/euro-cup-2016
[B] amazon.de/Eine-Einführung-zeit-diskrete-homogene-Markov-Ketten/dp/3640797086
[C] grin.com/de/e-book/164218/eine-einfuehrung-in-zeit-diskrete-homogene-markov-ketten