Expected goals are having a moment right now. The stat most associated with advancedHayes.jpg stats in football has recently gotten think-pieces from outlets like The Guardian, The Telegraph, Goal.com, and Fan-Sided . Football stats writer Mike Goodman said today on twitter, “I’m so happy that xG is getting increased exposure that I’m gonna grit my teeth and grin through all the re-litigation of its underpinnings.” More people now at least are aware of xG and have an idea what the stat measures than ever before. With each new “mainstream media” (shudders) piece on expected goals, we start to have the same arguments and discussions about the metric that were had years ago.

While the fight over xG’s importance may be starting all over again in places, if you dive deeper into so called “analytics twitter” you will find great pieces discussing efforts to improve the metric. Marek and Nils articles I linked there are the type of work that made me interesting in stats in football in the first place. Furthermore, those pieces are what drove me to want to track these stats for Scottish football when I could not find any publicly available.

With that being said, Christian Wulff and I have been working on improving expected goals in Scotland. The wonderful people at Stratagem have given us more data than we

Moussa Dembele; xG Monster

could have imagined to help accomplish this. Before Stratagem, I was reliant on pulling shot “location” information from the BBC live-tracker of SPFL matches, simply because there was no other public data available. The “Beeb” xG model served it purpose well, giving us a surprisingly decent look expected goals in Scotland when it did not exist before. However, we can now do better.

Seeing honest-to-goodness x an y coordinates for shots (AND passing locations?!) in the data Stratagem sent Christian and I was a coming to god moment. No longer would be be reliant on terms of “center of the box” from the BBC. In addition to using actual x and y coordinates and shot types, we also now had such information as how many defenders were in between the shooter and the goal and how much defensive pressure the shooter was under when shooting.

With this additional info, Christian and I set out to improve our xG model for the SPFL. Common criticisms of xG is that is does not take defenders and defensive pressure into account, so this new Stratagem data would allow us to address this. Good in theory, right? Well, believe it or not, you run in to a sample size issue when you become more granular and only have a season’s worth of data. Trying to get enough shots to come up with a decent xG model for Scotland where 2 defenders were in between the shooter at 37 x, 44 y on the pitch proved to be a challenge. Luckily, the SPFL is not the only league Stratagem has data for.

Sheet 1-3

While it seems to be a recent trend to try and asses how your Gran would do in the SPFL, most would agree that the level of play in Scotland is below leagues such as the EPL, Bundesliga, and La Liga. No shame in that, it is just reality. However, there are plenty of leagues in Europe and around the world that most would agree are at a similar level to Scotland. Leagues such as Eliteserien in Norway, the Swiss Super League, and others. In total, we had 11 leagues worth of data (Turkish, Swiss, Swedish, Greek, Bundesliga 2, Dutch, Austria, Australia, the English Championship, Norwegian, and Scottish league data to be specific) that gave us over 400,000 shots. With data from what I have dubbed the “League of Average Leagues”, we now can use these defensive metrics from Strata and create small zones on the pitch where shots take place to calculate xG values (though thanks to Nils and Marek’s articles I mentioned above, we are now thinking about how to implement their ideas for our model!).

Heat Map

With these figures calculated based on location and defensive pressure, we can develop a heat map similar to above of the probability of scoring a goal. Going from light green, as the least likely to score, to dark red, as the most likely to score, we can see where a team should be trying to take shots from in order to score. The “danger zone” concept is shown well here, with red filling the 8 yard box and the darkest red in between the frame of the goal. Compare that to the green in the sides of the pitch or outside the box. Clearly you are more likely to score in the red areas than the green (I’m looking at you, Fassai El Bahktouri).


With this new model, Christian (on his new stats and tactics vertical from the 90 Minute Cynic, xCynic) and I are also planning on doing some new things and changing some graphs and maps we have done previously. We have made some improvements to our xG game maps, xG and xA player maps, and some new team graphs that we hope will help further understanding both advanced stats in football and Scottish football as a whole.

This article was written with the aid of StrataData, which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform, in addition to StrataBet Premium Recommendations.

2 thoughts on “Using Data from Leagues Around Europe to Improve xG for the SPFL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s