Statistics | Scatter Graphs

Before trying to plot a graph, I need to load the packages I want to use (ggplot2 & dplyr, which are part of the tidyverse package), and download my data from vaastav’s Fantasy-Premier-League GitHub page. This repository stores lots of FPL data from the last few seasons, which is very helpful!

Next I want to inspect the data to see what kind of information we have:

I can see that there is a lot of information for each player in each gameweek, including events like goals and clean sheets, statistics like xG, match info like kick-off time and opponent, and FPL-related data like price and position. So there is lots of information to make graphs from! Before making any plots, I usually like to define a theme, which basically just changes the font style of all my plots so they all match. Here is my theme:

# Make theme
theme <- theme_classic() + theme(text = element_text(size = 12, family='Radio Canada Big'))

Here, theme_classic() gives the plot a minimalistic style, while element_text() modifies the font.

Scatter Plot: xG Overperformance

To assess the relationship between xG and goals scored, I want to plot the total xG against the total points scored for each player across the whole season. To do this, I first need to manipulate the data to get the bits of information to plot. Using the dplyr %>% (pipe) operator, I can group all the player information across all gameweeks together to get the total xG and total goals scored for each player.

name position team Goals xG
Aaron Cresswell DEF West Ham 0 0.00
Aaron Hickey DEF Brentford 0 0.20
Aaron Ramsdale GK Arsenal 0 0.00
Aaron Ramsey MID Burnley 0 0.35
Aaron Wan-Bissaka DEF Man Utd 0 0.11
Abdoulaye Doucouré MID Everton 7 8.70

To make a basic plot, I first specify the dataframe, x and y axes to ggplot(). Then I can add geom_point() to make a scatter graph (and add my theme).

While this graph shows a positive correlation between xG and goals scored, it’s a bit boring and also not very useful. For example, we don’t know who the players are, what position they are, and we can’t tell who had the largest xG overperformance or underperformance. We can change this by adding colours and labels. For example, I could colour the players by their position and use scale_color_manual() to change the colour palette. I can also change the alpha level (transparency) and size of the points.

Now I can see that the majority of the biggest goal-scorers are Midfielders and Forwards. But this still isn’t very useful! Perhaps I can also select a subset of players to plot to make it less busy, and add some labels so I can see who the biggest xG underperformers were last year. To do this I could minus xG from total goals scored to get a delta score:

name position team Goals xG Delta
Dominic Calvert-Lewin FWD Everton 7 12.86 -5.86
Darwin Núñez Ribeiro FWD Liverpool 11 16.23 -5.23
Brennan Johnson MID Spurs 5 10.03 -5.03
Nicolas Jackson FWD Chelsea 14 18.14 -4.14
Luis Díaz MID Liverpool 8 12.08 -4.08
Norberto Bercique Gomes Betuncal FWD Everton 3 6.46 -3.46

Then I could label the 3 biggest underperformers on my graph, and change the point size & colour to reflect the delta value and total goals scored for each player.

The labels look a bit messy - they are covering some of the other points. I can use the package ggrepel and the argument nudge_x to try and space the labels out so they are more readable:

Now we can see which 3 players had the biggest xG underperformance last year!