Awesome Open Source
Awesome Open Source


PyPI - Python Version Version GitHub repo size GitHub


DaisyRec is a Python toolkit dealing with rating prediction and item ranking issue.

The name DAISY (roughly :) ) stands for Multi-Dimension fAIrly compArIson for recommender SYstem. The whole framework of Daisy is showed below:

Make sure you have a CUDA enviroment to accelarate since these deep-learning models could be based on it.

We will consistently update this repo.


You can download experiment data, and put them into the data folder. All data are available in links below:

How to run

  1. Make sure running command python build_ext --inplace to compile dependent extensions before running the other code. After that, you will find file *.so or *.pyd file generated in path daisy/model/

  2. In order to reproduce results, you need to run python to create experiment_data folder with certain public dataset listed in our paper. If you just want to research one certain dataset, you need to modify the code in to indicate your demands and let this code yield train and test datasets as you want. In the default situation, will generate all kinds of datasets (raw data, 5-core data and 10-core data) with different data splitting methods, including tloo, loo, tfo and fo. The meaning of these split methods will be explained in the Important Commands of README.

  1. There are seperate codes for validation and test, and they are stored in the folders of nested_tune_kit and test_kit, respectively. Each of the code in these folders should be moved into the root path, just the same directory as, so as to successfully run these code. Furthermore, if you have an IDE toolkit, you can simply set your work path and run in any folder path.
  1. The validation dataset is used for parameter tuning, so we provide split_validation interfact inside the code in the nested_tune_kit folder. Further and more detail parameter settings information about validation split method is depicted in daisy/utils/ After finishing validation, the results will be stored in the automatically generated folder tune_log/.

  2. Based on the best parameter determined by the validation, run the test code that you moved into the root path before and the results will be stored in the automatically generated folder res/.

Examples to run:

Taking the following case as an example: if we want to reproduce the top-20 results for BPR-MF on ML-1M-10core dataset.

  1. Assume we have already run and get the training and test datasets by tfo (i.e., time-aware split by ratio method). We should get files named train_ml-1m_10core_tfo.dat, test_ml-1m_10core_tfo.dat in ./experiment_data/.

  2. The whole procedure contains validation and test. Therefore, we first need to run to get the best parameter settings. Besides, we may change the parameter search space in the Command to run:

python --dataset=ml-1m --prepro=10core --val_method=tfo --test_method=tfo --topk=20 --loss_type=BPR --sample_method=uniform --gpu=0
  1. After finishing step 2, we will get the best paramter settings from tune_log/. Then we can run the test code by following the command as below:
python --dataset=ml-1m --prepro=10core --test_method=tfo --topk=20 --loss_type=BPR --num_ng=2 --factors=34 --epochs=50 --lr=0.0005 --lamda=0.0016 --sample_method=uniform --gpu=0

More details of arguments are available in help message, try:

python --help
  1. Once step 3 terminated, we can obtain the results w.r.t. top-20 from the dynamically generated result file ./res/ml-1m/10core_tfo_pairmf_BPR_uniform.csv

More Ranking Results

More ranking results for different methods on different datasets across various settings of top-N (N=1,5,10,20,30) are available in the file of

Important Commands

The description of all common parameter settings used by code inside examples are listed below:

Commands Description on Commands           Choices           Description on Choices
dataset the selected datasets ml-100k;
all choices are the names of datasets
prepro the data pre-processing method origin;
'origin' means using the raw data;
'Ncore' means only preserving users and items that have interactions more than N. Notice N could be any integer value
train-validation splitting;
train-test splitting
time-aware split-by-ratio
leave one out
time-aware leave one out
cross validation (only apply to val_method)
topk the length of recommendation list
test_size ratio of test set size
fold_num the number of fold used for validation (only apply to 'cv', 'fo').
cand_num the number of candidate items used for ranking
sample_method negative sampling method uniform
uniformly sampling;
sampling popular items with low rank;
sampling popular item with high rank
num_ng the number of negative samples
Related Awesome Lists
Top Programming Languages

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (890,046
Machine Learning (40,892
Dataset (33,271
Pytorch (22,633
Validation (20,400
Amazon (11,039
Recommendation System (2,975
Ranking (2,719
Slim (2,579
Recommender (2,136
Vae (1,120
Collaborative Filtering (734
Matrix Factorization (568
Factorization Machines (171
K Nearest Neighbors (67
Ease (57
Deepfm (38
Afm (31
Neural Collaborative Filtering (19
Nfm (14
Item2vec (7
Svdpp (5