|Project Name||Stars||Downloads||Repos Using This||Packages Using This||Most Recent Commit||Total Releases||Latest Release||Open Issues||License||Language|
|Football_analytics||1,091||20 days ago||2||Jupyter Notebook|
|📊⚽ A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), including a curated list of publicly available resources published by the football analytics community.|
|Engsoccerdata||707||7 months ago||20||R|
|English and European soccer results 1871-2022|
|Data Science Projects||623||5 months ago||7||Jupyter Notebook|
|DataScience projects for learning : Kaggle challenges, Object Recognition, Parsing, etc.|
|Soccertrack||89||a month ago||22||gpl-3.0||Jupyter Notebook|
|A Dataset and Tracking Algorithm for Soccer with Fish-eye and Drone Videos.|
|Xg Model||41||a year ago||gpl-3.0||R|
|An example of how to create a xG model using R and Wyscout event data|
|Opendata||30||3 years ago||mit|
|SkillCorner Open Data with 9 matches of broadcast tracking data.|
|Football Data||28||2 months ago||R|
|football (soccer) datasets|
|Footballdata||25||5 years ago||6||July 17, 2017||5||mit||Jupyter Notebook|
|A collection of wrappers over football data from various websites / APIs.|
|Py4 Ds||15||2 years ago||apache-2.0||Jupyter Notebook|
|:snake: Data Science Boot-Camp : UC San DiegoX|
|Statball||13||2 months ago||6||mit||C#|
|Statball - Football soccer stats analyser from top 5 european leagues with data obtained by web scraping from Fbref and Statsbomb|
Latest GitHub version: 11/4/2022, v0.1.7
Nov 2022 update: All European league datasets [England,Scotland,Italy,Germany,France,Holland,Spain,Belgium,Greece,Turkey,Portugal] (and their current() functions) are up to date for the 22/23 season.
MLS and South Africa data are still only up to 2017. Cup data is also only available up to 2017. English non-league have also not been updated.
Any help in curating data is appreciated - please get in touch.
This R package is mainly a repository for complete soccer datasets, along with some built-in functions for analyzing parts of the data. Currently I include three English ones (League data, FA Cup data, Playoff data - described below), several European leagues (Spain, Germany, Italy, Holland, France, Belgium, Portugal, Turkey, Scotland, Greece) as well as South Africa and MLS.
Free to use for non-commerical use. Compiled by James Curley.
Please cite as:
James P. Curley (2016). engsoccerdata: English Soccer Data 1871-2016. R package version 0.1.5
If you do use it on any publications, blogs, websites, etc. please note the source (i.e. me!). Also, if you do use it - I would love to see any analysis produced from it etc. Of course, I accept no responsibility for any error that may be contained herewithin.
Contact details: curley AT utexas DOT edu
To install this directly into R.
library(devtools) install_github("jalapic/engsoccerdata") data(package="engsoccerdata") # lists datasets currently available
If you get an error message like this one
Error in curl::curl_fetch_memory(url, handle = handle) : Problem with the SSL CA cert (path? access rights?)
which has happened on occasions for me, try this:
library(RCurl) library(httr) set_config( config( ssl_verifypeer = 0L ) ) library(devtools) install_github('jalapic/engsoccerdata', username = "jalapic") library(engsoccerdata)
Last update:24 Oct 2020, v 0.1.7
data-raw- summary of conference location of each team by year.
I would love help in collating more results. If anyone wants to work on a particular league or competition please let me know. These are the things I'd like to work on:
maketablefamily of funcitons
Some built-in functions:
england_current().r - get results from top 4 tiers of English football from current season. Other current seasons for other datasets can be got by e.g. holland_current(), germany_current() etc.
maketable.r - make a league table - probably the quickest way to make a league table
maketable_eng.r - make a league table that follows the tie-breaking and points procedures for each season.
maketable_all.r - make a league table between dates or only for home/away results.
games_between.r - returns all games ever played between two teams
games_between_sum.r - returns the summary of results between any two teams
alltimerecord.r - returns the all time record of any team in the league
score_most.r - returns the team who has been involved in the most games of each scoreline
score_teamX.r - Lists all matches that a team has played in that ended in a specific scoreline
score_team.r - List all occurrences of a specific scoreline for a specific team
scoreline_by_team.r - How often each team has a won,lost,drawn by a scoreline?
totalgoals_by_team.r - Return all instances of a team being involved in a game with n goals
ngoals.r - Return number of times a team has scored n goals
n_offs.r - Will return the scorelines that have occurred n number of times
opponentfreq.r - Return how often a team has played each opponent
opponents.r - number of unique opponents for all teams in total or by tier
bestwins.r - best wins for each team
worstlosses.r - worst losses for each team
homeaway.r - very useful function to get home & away results with each team listed in 'team' column
all top 4 tier games ever played 1888-2020
FL = Football League
PL = Premier League
1888/9-1891/2 FL Division 1
1892/3-1914/5 FL Divisions 1 & 2
1919/20 FL Divisions 1 & 2
1920/21 FL Divisions 1, 2 & 3
1921/22-1938/9 FL Divisions 1, 2, 3a North & 3b South
1939 FL Divisions 1, 2, 3a North & 3b South (truncated season)
1946/7-1957/8 FL Divisions 1, 2, 3a North & 3b South
1958/9-1991/2 FL Divisions 1, 2, 3 & 4
1992/3-2004/5 PL, FL Divisions 1, 2 & 3
2004/5-present PL, FL Championship, FL Divisions 1 & 2
In the csv file, I've used divisions 1,2,3,3a,3b, 4 as the notation I've also used tier 1,2,3,4 - to refer to 3,3a & 3b all belonging to tier 3
teams that dropped out half way through a season:
1919 Leeds City
1931 Wigan Borough
1961 Accrington Stanley
includes 1919 Port Vale who replaced Leeds City mid-season
The truncated 1939/40 season is in a separate file england1939.csv
Team Names used in the file are those that are currently used: e.g. Small Heath are Birmingham City, Ardwick are Manchester City, etc.
The modern Accrington Stanley are 'Accrington' to distinguish from original Accrington Stanley and earlier Accrington FC
This was a pain to put together. It contains every single FA Cup tie (whether played or not) from the first inception of the competition in 1871 to the 2015/16 season. It does not contain pre-qualifying rounds (yet). It is best to describe each variable name in turn to give more information:
Important notes to above:
I have tried to make the dataset as complete as possible. The FA Cup data is difficult as some of it is just unobtainable. For instance, I have added venues and attendances for all semis and finals and have included this information sporadically wherelse I was able to get it. I have not done a systematic application of this to early rounds. Several games in the FA Cup are played at neutral grounds or even the visiting team is allowed to play at home (e.g. if a minnow plays a big team). I have not managed to systematically check this. Also, there was a trend to play 2nd and 3rd and 4th replays at neutral venues. This could be systematically checked but I have not yet. Further, I think I have all games that ever ended in penalties added in correctly.
Finally, team names. There are great disputes about which teams branch off from which teams in history and who should have shared history. I have tried to be consistent in naming teams with their most current name throughout (e.g. Millwall Rovers, Millwall Athletic and Millwall are all listed as the current name - Millwall), or the name that they used when they stopped playing (e.g. Mitchell St. George's are always listed as Birmingham St. George's). I have also tried to follow the same team name format as in england.csv - I think the three Accrington teams may be the only one I need to re-edit for this purpose.
Please refer to the spainliga rpubs below for further information.
I've added complete all top tier results for Holland (1956-present), Germany (1963-present), Italy (1934-present), France (1933-present). Additionally all tier 2 results for Germany. Finally, we have results from the all tiers of Scotland, and top tier of Belgium, Turkey, Greece, Portugal since 1994/1995.
These dataframes contain all league results played in regular season. They don't yet include relegation/promotion playoff fixtures. Further, I have not yet completed all final checks of the data. I believe they are error free - but if others want to test and check, I'd welcome this.
Any help in improving the quality of these datasets is appreciated.
(note as of May 2015, the code in these may need to change to reflect the change in names of datasets and some functions)
Oliver Roeder and I have written several articles for fivethirtyeight using these data:
Also this piece on league inequality:
(listing them here so I don't forget them)
Dec 2014 - Profile of this dataset and me in "FourFourTwo" Magazine - https://www.scribd.com/doc/246229712/FourFourTwo-UK-2014-12
Mar 12th 2015 - Some research on strange results occuring in a row discussed on Guardian's Football Weekly https://soundcloud.com/guardianfootballweekly/football-weekly-extra-chelseas-champion-league-campaign-goes-down-the-tube
Jul 30th 2015 - Piece by Sky Sports on homefield advantage and these data - http://www.skysports.com/football/news/11661/9829828/home-advantage-is-not-as-important-as-it-once-was-finds-sky-sports-study
Nov 28th 2015 - I discuss home-field advantage on NPR's "Only a Game" - http://onlyagame.wbur.org/2015/11/28/home-field-advantage-epl-curley
May 4th 2016 - I discussed this dataset and Leicester City on BBC Radio 5's "Up All Night" - unfortunately no audio - I got cut short because Ted Cruz decided to quit the Republican nomination.
May 17th 2016 - Piece by John Murdoch at the Financial Times on Leicester City's unique season - https://ig.ft.com/sites/leicester-premier-league-champions/
March 10 2017 - Discussion of PSG-Barcelona 2017 Champions League tie in historical perspective (in Portuguese) - https://www.nexojornal.com.br/grafico/2017/03/10/O-que-as-bolsas-de-apostas-diziam-sobre-Barcelona-e-PSG
2022 - Deutsche Welle piece on most common scorelines in soccer: https://www.youtube.com/watch?v=UyaFCkgZYCI&ab_channel=DWKickoff%21
If you use these data --- please cite and let me know. I'll add a link to the links at the bottom.
Data in this package have been used to devise fivethirtyeight's ratings and prediction models for soccer.
A number of analyses and visualizations using these data by Prof Simon Garnier - http://graphzoo.tumblr.com/
More in depth analysis by Simon on David Sumpter's Collective Behavior blog:
Prof Michael Lopez analyses home-field advantage - https://statsbylopez.com/2016/05/13/on-soccers-declining-home-field-advantage/
Prof Antony Unwin uses for his book "Graphical Data Analysis with R" - http://www.gradaanwr.net/wp-content/uploads/2016/05/dataApr16.pdf
Excellent Masters Thesis in Statistics on improving FIFA rankings by Tom Van de Wiele (https://ttvand.github.io/) here - https://ttvand.github.io/MastatTomVandeWiele.pdf
3D visualization of team performance over years - https://vr-data-vis.herokuapp.com/engsoccerdata/index.html
Joe Gallagher's blog post on home advantage - https://jogall.github.io/2017-05-12-home-away-pref/
Joe Gallagher's blog post on Robin Hood teams - https://jogall.github.io/2017-08-04-robin-hood-teams/
Andrew Clark's interactive viz of best and worst consecutive league finishes - https://www.mytinyshinys.com/2017/08/04/socceriimprovers/
Ryan Estrellado's analysis of Liverpool FC Managers - https://restrellado.github.io/liverpoolfc/lfc_managers.html & https://ryanestrellado.netlify.com/post/lfc-home-and-away-odds/
Austin Wehrwein's modeling of soccer results - http://austinwehrwein.com/soccer/ and here https://austinwehrwein.com/tutorials/xgforeveryone/
Robert Hickman has several nice posts looking at football trivia using the data in this package - https://www.robert-hickman.eu/post/five_min_trivia_invincibles/
Stefan Gouyet looked at one-sided matches in the EPL - https://worldsocceranalytics.com/2018/10/10/one-sided-matches-in-the-english-premier-league/
Xiang Ao's analysis of head to head matches between title contenders: https://xiangao.netlify.app/2019/01/17/soccer-epl/
McJames et al, 2022, A Supervised Learning Approach to Rankability. Preprint https://arxiv.org/pdf/2203.07364.pdf
Leonardo Egidi - footBayes: an R package for football (soccer) modeling using Stan https://statmodeling.stat.columbia.edu/2022/02/22/footbayes-an-r-package-for-football-soccer-modeling-using-stan/
Many thanks to the following for their assistance - apologies to anyone I have omitted (please let me know!):
Hakon Malmedal, Joe Gallagher, Ben Dilday, Aaron Smith, Michael Thompson, Andrew Clark, S'busiso Mkhondwane, Robert Hickman