Before trying to plot a graph, I need to load the packages I want to
use (ggplot2 & dplyr, which are part of the tidyverse package), and
download my data from vaastav’s Fantasy-Premier-League GitHub page. This
repository stores lots of FPL data from the last few seasons, which is
very helpful!
# Load the tidyverse
library(tidyverse)
# Get data for all players
github <- 'https://raw.githubusercontent.com/vaastav/Fantasy-Premier-League/master/data/2023-24/'
filenames <- sprintf("gw%s",seq(1:38))
PlayersList <- lapply(filenames, function(x) {
data <- subset(read_csv(url(paste0(github, "gws/",x,".csv"))), minutes>0)
return(data)})
PlayerData <- data.frame(data.table::rbindlist(PlayersList))
Next I want to inspect the data to see what kind of information we
have:
# View player data
knitr::kable(head(PlayerData)[1:10])
name
position
team
xP
assists
bonus
bps
clean_sheets
creativity
element
Jadon Sancho
MID
Man Utd
3.0
0
0
4
0
11.3
397
Vitaly Janelt
MID
Brentford
2.1
0
0
6
0
11.5
105
Andre Brooks
MID
Sheffield Utd
0.5
0
0
3
0
0.0
655
Curtis Jones
MID
Liverpool
2.1
0
0
1
0
1.8
300
Reece James
DEF
Chelsea
2.1
0
0
15
0
35.9
206
Ben Osborn
MID
Sheffield Utd
1.1
0
0
6
0
0.0
489
# Get column names of player data
colnames(PlayerData)
I can see that there is a lot of information for each player in each gameweek, including
events like goals and clean sheets, statistics like xG, match info like
kick-off time and opponent, and FPL-related data like price and
position. So there is lots of information to make graphs from! Before
making any plots, I usually like to define a theme, which basically just
changes the font style of all my plots so they all match. Here is my
theme:
Here, theme_classic() gives the plot a minimalistic style,
while element_text() modifies the font.
Scatter Plot: xG Overperformance
To assess the relationship between xG and goals scored, I want to
plot the total xG against the total points scored for each player across the
whole season. To do this, I first need to manipulate the data to get the
bits of information to plot. Using the dplyr %>% (pipe)
operator, I can group all the player information across all gameweeks
together to get the total xG and total goals scored for each player.
# Get total xG and total goals scored for all players
GoalsxG <- PlayerData %>% group_by(name, position, team) %>% summarise(Goals=sum(goals_scored), xG=sum(expected_goals))
knitr::kable(head(GoalsxG))
While this graph shows a positive correlation between xG and goals
scored, it’s a bit boring and also not very useful. For example, we
don’t know who the players are, what position they are, and we can’t
tell who had the largest xG overperformance or underperformance. We can
change this by adding colours and labels. For example, I could colour
the players by their position and use scale_color_manual()
to change the colour palette. I can also change the alpha level
(transparency) and size of the points.
Now I can see that the majority of the biggest goal-scorers are
Midfielders and Forwards. But this still isn’t very useful! Perhaps I
can also select a subset of players to plot to make it less busy, and
add some labels so I can see who the biggest xG underperformers were
last year. To do this I could minus xG from total goals scored to get a
delta score:
# Get difference between goals scored and xG
GoalsxG$Delta <- GoalsxG$Goals - GoalsxG$xG
knitr::kable(head(GoalsxG %>% arrange(Delta)))
name
position
team
Goals
xG
Delta
Dominic Calvert-Lewin
FWD
Everton
7
12.86
-5.86
Darwin Núñez Ribeiro
FWD
Liverpool
11
16.23
-5.23
Brennan Johnson
MID
Spurs
5
10.03
-5.03
Nicolas Jackson
FWD
Chelsea
14
18.14
-4.14
Luis DÃaz
MID
Liverpool
8
12.08
-4.08
Norberto Bercique Gomes Betuncal
FWD
Everton
3
6.46
-3.46
Then I could label the 3 biggest underperformers on my graph, and
change the point size & colour to reflect the delta value and total
goals scored for each player.
# Get 30 players with most goals scored to plot
GoalsxGSubset <- GoalsxG %>% arrange(-Goals) %>% head(n=50)
# Get top 3 overperformers to label
Top3 <- GoalsxGSubset %>% arrange(Delta) %>% head(n=3)
# Add labels to dataframe
GoalsxGSubset$label <- ifelse(GoalsxGSubset$name %in% Top3$name, GoalsxGSubset$name, NA)
# Plot
ggplot(data=GoalsxGSubset, aes(x=xG, y=Goals, color=Delta, size=Goals)) +
# Add points and labels
geom_point(alpha=0.75) +
geom_text(aes(label=label), color='black', family='Radio Canada Big', size=4) +
theme +
# Modify colour and size of points
scale_color_gradientn(colours=colorRampPalette(c('#5f8fb0', '#93b572', '#e7d044', '#d10000'))(100),
guide=guide_colorbar(frame.colour="black", ticks.colour="black", alpha=0.75)) +
scale_size_continuous(range=c(3,15)) +
# Change axis limits
scale_x_continuous(limits=c(0,32)) +
scale_y_continuous(limits=c(7,30))
The labels look a bit messy - they are covering some of the other
points. I can use the package ggrepel and the argument
nudge_x to try and space the labels out so they are more
readable: