Fantasy Premier League Analysis in R
This is my first blog on data science topics. In this blog post, I will show you how to import the data from fantasy premier league
into R and perform exploratory data analysis. ————-
The questions that came up in my mind before doing this analysis are:
- How has Wayne Rooney been performing for the last several years ?
- How do Romelu Lukaku, Sergio Aguero, Harry Kane line up side by side ?
The data sources that we will be using on this tutorial can be found on the following links:
- https://fantasy.premierleague.com/drf/elements/
- https://fantasy.premierleague.com/drf/element-summary/{player_id}
The second link above requires the user to know the player_id
in order to access the details about the particular player.
The packages we will need for this analysis are
library(ggplot2)
library(dplyr)
library(tidyr)
library(rjson)
library(knitr)
library(plotly)
The data available for the link mentioned above is in json
format. So, we will use rjson
package in order to read and summarize in R.
Reading data using rjson package
json_file <- "https://fantasy.premierleague.com/drf/elements/"
json_data <- fromJSON(paste(readLines(json_file), collapse = ""))
# View length of json_data
length(json_data)
## [1] 580
Now, we have read the data into R and json_data
variable contains list of 566
players.
Change the data to data frame
players <- data.frame(do.call(rbind, lapply(json_data, rbind)))
# View head
head(players[,c(1:10)])
## id photo web_name team_code status code first_name
## 1 1 48844.jpg Ospina 3 a 48844 David
## 2 2 11334.jpg Cech 3 a 11334 Petr
## 3 3 98980.jpg Martinez 3 u 98980 Damian Emiliano
## 4 4 51507.jpg Koscielny 3 a 51507 Laurent
## 5 5 17127.jpg Mertesacker 3 a 17127 Per
## 6 6 158074.jpg Gabriel 3 u 158074 Gabriel Armando
## second_name squad_number news
## 1 Ospina 13
## 2 Cech 33
## 3 Martinez 26 Season-long loan to Getafe
## 4 Koscielny 6
## 5 Mertesacker 4
## 6 de Abreu 5 Joined Valencia on 18/8
# Sapply class
sapply(players[,c(1:10)], class)
## id photo web_name team_code status
## "list" "list" "list" "list" "list"
## code first_name second_name squad_number news
## "list" "list" "list" "list" "list"
# Reshape the data frame with only selected columns
players <- players %>% dplyr::select(id, first_name, second_name, total_points, yellow_cards,
red_cards, goals_scored, assists)
head(players) %>% knitr::kable()
id | first_name | second_name | total_points | yellow_cards | red_cards | goals_scored | assists |
---|---|---|---|---|---|---|---|
1 | David | Ospina | 0 | 0 | 0 | 0 | 0 |
2 | Petr | Cech | 75 | 0 | 0 | 0 | 0 |
3 | Damian Emiliano | Martinez | 0 | 0 | 0 | 0 | 0 |
4 | Laurent | Koscielny | 65 | 3 | 0 | 0 | 0 |
5 | Per | Mertesacker | 14 | 0 | 0 | 1 | 0 |
6 | Gabriel Armando | de Abreu | 0 | 0 | 0 | 0 | 0 |
Finding the players whose stats we are interested in
We need to be very careful with the letter type when searching the players. For example, if you search Alexis Sanchez
, then the program won’t find any player with that name.
interested_players <- c("Alexis Sánchez", "Romelu Lukaku", "Harry Kane", "Wayne Rooney")
players %>%
dplyr::mutate(full_name = paste(first_name, second_name)) %>%
dplyr::filter(full_name %in% interested_players) %>% knitr::kable()
id | first_name | second_name | total_points | yellow_cards | red_cards | goals_scored | assists | full_name |
---|---|---|---|---|---|---|---|---|
14 | Alexis | Sánchez | 68 | 4 | 0 | 4 | 4 | Alexis Sánchez |
161 | Wayne | Rooney | 91 | 3 | 0 | 10 | 3 | Wayne Rooney |
285 | Romelu | Lukaku | 94 | 2 | 0 | 10 | 4 | Romelu Lukaku |
394 | Harry | Kane | 97 | 4 | 0 | 12 | 1 | Harry Kane |
Accessing Stats of individual players
Now that we have found the id
of interested players, let’s go ahead and pull data for these players.
I have created a function in order to pull information for the players
# A function to calculate the statistics of each player
player_stats <- function(player_id, player_name){
#browser()
json_data <- fromJSON(paste(readLines(paste0("https://fantasy.premierleague.com/drf/element-summary/", player_id)), collapse = ""))
# If we look at the length of json_data, it will be 6
# We will be looking at only the first list : past_history
# Convert json_data to data.frame
player <- data.frame(do.call(rbind, lapply(json_data[1]$history_past, rbind)))
# Change each column from list to vector
player <- tidyr::unnest(player, )
player$name <- player_name
return(player)
}
In the above function, we are just looking at the first list that contains the past history of each player. The following information is available to us if we are interested:
- The information on lists are summarized below:
- List 1 : history_past
- List 2 : fixtures_summary
- List 3 : explain
- List 4 : history_summary
- List 5 : fixtures
- List 6 : history
Viewing Wayne Rooney’s Stats
rooney <- player_stats(player_id = 161, player_name = "Wayne Rooney")
# Look at dimension of rooney
dim(rooney)
## [1] 11 26
# View 6 columns of dataframe
rooney[,1:6]
## id season_name element_code start_cost end_cost total_points
## 1 262 2006/07 13017 115 122 184
## 2 1031 2007/08 13017 120 117 148
## 3 1660 2008/09 13017 110 107 135
## 4 2392 2009/10 13017 110 120 224
## 5 3029 2010/11 13017 120 118 142
## 6 3893 2011/12 13017 120 129 230
## 7 4602 2012/13 13017 120 116 143
## 8 5297 2013/14 13017 105 113 190
## 9 6016 2014/15 13017 105 106 132
## 10 6729 2015/16 13017 105 99 118
## 11 7434 2016/17 13017 90 86 76
# Names of columns
names(rooney)
## [1] "id" "season_name" "element_code"
## [4] "start_cost" "end_cost" "total_points"
## [7] "minutes" "goals_scored" "assists"
## [10] "clean_sheets" "goals_conceded" "own_goals"
## [13] "penalties_saved" "penalties_missed" "yellow_cards"
## [16] "red_cards" "saves" "bonus"
## [19] "bps" "influence" "creativity"
## [22] "threat" "ict_index" "ea_index"
## [25] "season" "name"
Reshape function
For this tutorial, we will look at only goals_scored
, total_points
and assists
. So, I am going to write another function that selects only these three variables and change to tidy data.
player_reshape <- function(playerdf){
playerdf %>% tidyr::gather(key = "variable", value = "value", -id, -season_name, -name) %>%
dplyr::filter(variable %in% c("goals_scored", "total_points", "assists"))
}
Creating data frames for players
rooney_reshape <- player_stats(player_id = 161, player_name = "Wayne Rooney") %>% player_reshape()
sanchez_reshape <- player_stats(player_id = 14, player_name = "Alexis Sanchez") %>% player_reshape()
kane_reshape <- player_stats(player_id = 394, player_name = "Harry Kane") %>% player_reshape()
lukaku_reshape <- player_stats(player_id = 285, player_name = "Romelu Lukaku") %>% player_reshape()
Combining all the dataframes into one and visualizing
all_players <- rbind(rooney_reshape, sanchez_reshape, kane_reshape, lukaku_reshape)
head(all_players)
## id season_name name variable value
## 1 262 2006/07 Wayne Rooney total_points 184
## 2 1031 2007/08 Wayne Rooney total_points 148
## 3 1660 2008/09 Wayne Rooney total_points 135
## 4 2392 2009/10 Wayne Rooney total_points 224
## 5 3029 2010/11 Wayne Rooney total_points 142
## 6 3893 2011/12 Wayne Rooney total_points 230
The data is tidy because each observation is shown in each row in the table.
Now, let us use ggplot to visualize the plots.
ggplot(all_players) + geom_bar(aes(season_name, as.numeric(value), fill = name),
position = "dodge", width = 0.5, stat = "identity") +
facet_wrap(~ variable, ncol = 1, scales = "free_y") +
xlab("Premier League Season") +
ylab("Total Fantasy Points / Goals Scored / Assists") +
labs(fill = "Players") +
theme(legend.position = "top")
If you want to view the interactive version of the above graph use the plotly
package and use ggplotly
function.
library(plotly)
ggplotly(ggplot(all_players) + geom_bar(aes(season_name, as.numeric(value), fill = name),
position = "dodge", width = 0.5, stat = "identity") +
facet_wrap(~ variable, ncol = 1, scales = "free_y") +
xlab("Premier League Season") +
ylab("Total Fantasy Points / Goals Scored / Assists") +
labs(fill = "Players") +
theme(legend.position = "top"))
I hope you guys learned how easy it is to bring the fantasy premier league
data into R and then analyse using the awesome packages such as dplyr
, tidyr
, ggplot2
etc.
Please leave your comments below.