bpl is a python 3 library for fitting Bayesian versions of the Dixon & Coles (1997) model to data.
It uses the
stan library to fit models to data.
pip install bpl
bpl provides a class
BPLModel that can be used to forecast the outcome of football matches.
Data should be provided to the model as a
pandas dataframe, with columns
You can also optionally provide a set of numerical covariates for each team (e.g. their ratings on FIFA) and these will be used in the fit.
import bpl import pandas as pd df_train = pd.read_csv("<path-to-training-data>") df_X = pd.read_csv("<path-to-team-level-covariates>") forecaster = bpl.BPLModel(data=df_train, X=df_X) forecaster.fit(seed=42) # calculate the probability that team 1 beats team 2 3-0 at home: forecaster.score_probability("Team 1", "Team 2", 3, 0) # calculate the probabilities of a home win, an away win and a draw: forecaster.overall_probabilities("Team 1", "Team 2") # compute home win, away win and draw probabilities for a collection of matches: df_test = pd.read_csv("<path-to-test-data>") # must have columns "home_team" and "away_team" forecaster.predict_future_matches(df_test) # add a new, previously unseen team to the model by sampling from the prior X_3 = np.array([0.1, -0.5, 3.0]) # the covariates for the new team forecaster.add_new_team("Team 3", X=X_3, seed=43)
The statistical model behind
bpl is a slight variation on the Dixon & Coles approach.
The likelihood is:
where y_h and y_a are the number of goals scored by the home team and the away team, respectively. a_i is the attacking aptitude of team i and b_i is the defending aptitude of team j. gamma_i represents the home advantage for team i, and tau is a correlation term that was introduced by Dixon and Coles to produce more realistic scorelines in low-scoring matches. The model uses the following bivariate, hierarchical prior for a and b
X_i are a set of (optional) team-level covariates (these could be, for example, the attack and defence ratings of team i on Fifa). beta are coefficient vectors, and mu_b is an offset for the defence parameter. rho encodes the correlation between a and b, since teams that are strong at attacking also tend to be strong at defending as well. The home advantage has a log-normal prior
Finally, the hyper-priors are