A highly-modularized and recommendation-efficient recommendation library based on PyTorch.
Alternatives To Recstudio
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
D2l En17,942
3 days ago100otherPython
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 400 universities from 60 countries including Stanford, MIT, Harvard, and Cambridge.
2 days ago236Jupyter Notebook
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Catalyst3,10619102 months ago108April 29, 20225apache-2.0Python
Accelerated deep learning R&D
5 months ago72mitPython
Deep recommender models using PyTorch.
9 days ago7February 25, 2022106mitPython
A unified, comprehensive and efficient recommendation library
12 hours ago77May 12, 2022113bsd-3-clausePython
Pytorch domain library for recommendation systems
3 days ago9June 16, 202287apache-2.0Python
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
2 days ago8July 03, 20225apache-2.0Python
A curated model zoo for CTR prediction
2 months ago15August 14, 2022mitPython
This is the repository of our article published in RecSys 2020 "Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison" and of several follow-up studies.
12 days ago2March 28, 20216mitPython
CRSLab is an open-source toolkit for building Conversational Recommender System (CRS).
Alternatives To Recstudio
Select To Compare

Alternative Project Comparisons


RecStudio logo

RecStudio is a unified, highly-modularized and recommendation-efficient recommendation library based on PyTorch. All the algorithms are categorized as follows according to recommendation tasks.

  • General Recommendation
  • Sequential Recommendation
  • Knowledge-based Recommendation
  • Feature-based Recommendation
  • Social Recommendation


Model Structure

At the core of the library, all recommendation models are grouped into three base classes:

  • TowerFreeRecommender: The most flexible base class, which enables any complex feature-interaction modeling.
  • ItemTowerRecommender: Item encoders are separated from recommender, enabling fast ANN and model-based negative sampling.
  • TwoTowerRecommender: The subclass of ItemTowerRecommender, where recommenders only consist of user encoder and item encoder.

Dataset Structure

For the dataset structure, the datasets are divided into five categories:

Dataset Application Examples
TripletDataset Dataset for providing user-item-rating triplet BPR, NCF, CML et al.
UserDataset Dataset for AutoEncoder-based ItemTowerRecommender MultiVAE, RecVAE, et al.
SeqDataset Dataset for Sequential recommenders with Causal Prediction GRU4Rec, SASRec, et al.
Seq2SeqDataset Dataset for Sequential recommenders with Masked Prediction Bert4Rec, et al.
ALSDataset Dataset for recommenders optimized by alternating least square WRMF, et al.

In order to accelerate dataset processing, processed dataset are automatically cached for repeatable training shortly.

Model Evaluation

Almost all common metrics used in recommender systems are implemented in RecStudio based on PyTorch, such as NDCG, Recall, Precision, et al. All metric functions have the same interface, being fully implemented with tensor operators. Therefore, the evaluation procedure can be moved to GPU, leading to a remarkable speedup of evaluation.

ANNs & Sampler

In order to accelerate training and evaluation, RecStudio integrates various Approximate Nearest Neighbor search (ANNs) and negative samplers. By building indexes with ANNs, the topk operator based on Euclidean distance, inner product and cosine similarity can be significantly accelerated. Negative samplers consist of static sampler and model-based samplers developed by RecStudio team. Static samplers consist of Uniform Sampler and Popularity Sampler. The model-based samplers are based on either quantization of item vectors or importance resampling. Moreover, we also implement static sampling in the dataset, which enables us to generate negatives when loading data.

Loss & Score

In RecStudio, loss functions are categorized into three types: - FullScoreLoss: Calculating scores on the whole items, such as SoftmaxLoss. - PairwiseLoss: Calculating scores on positive and negative items, such as BPRLoss, BinaryCrossEntropyLoss, et al. - PointwiseLoss: Calculating scores for a single (user, item) interaction, such as HingeLoss.

Score functions are used to model users' preference on items. Various common score functions are implemented in RecStudio, such as InnerProduct, EuclideanDistance, CosineDistance, MLPScorer, et al.

Loss Math Type Sampling Distribution Calculation Complexity Sampling Complexity Convergence Speed Related Metrics
Softmax No sampling - very fast NDCG
Sampled Softmax No sampling - fast NDCG
BPR Uniform sampling slow AUC
WARP Reject Sampling slower and slower slow Precision
InfoNCE Popularity sampling fast DCG
WRMF No sampling - very fast -
PRIS Cluster sampling very fast DCG

RecStudio v0.2 Framework
Figure: RecStudio Framework


  • General Dataset Structure A unified dataset config based on atomic data files and automatic data cache are supported in RecStudio.
  • Modular Model Structure By organizing the whole recommender into different modules, loss functions, scoring functions, samplers and ANNs, you can customize your model like building blocks.
  • GPU Acceleration The whole operation from model training to model evaluation could be easily moved to on GPUs and distributed GPUs for running.
  • Simple Model Categorization RecStudio categorizes all the models based on the number of encoders, which is easy to understand and use. The taxonomy can cover all models.
  • Simple and Complex Negative Samplers RecStudio integrates static and model-based samplers with only tensor operators.

Quick Start

By downloading the source code, you can run the provided script for initial usage of RecStudio.


The initial config will train and evaluate BPR model on MovieLens-100k(ml-100k) dataset.

Generally speaking, the simple example will take less than one minute with GPUs. And the output will be like below:

[2022-04-11 14:30:29] INFO (faiss.loader/MainThread) Loading faiss with AVX2 support.
[2022-04-11 14:30:29] INFO (faiss.loader/MainThread) Loading faiss.
[2022-04-11 14:30:29] INFO (faiss.loader/MainThread) Successfully loaded faiss.
[2022-04-11 14:30:30] INFO (pytorch_lightning.utilities.seed/MainThread) Global seed set to 42
[2022-04-11 14:30:30] INFO (pytorch_lightning/MainThread) learning_rate=0.001
split_ratio=[0.8, 0.1, 0.1]
test_metrics=['recall', 'precision', 'map', 'ndcg', 'mrr', 'hit']
val_metrics=['recall', 'ndcg']
use_fields=['user_id', 'item_id', 'rating']
[2022-04-11 14:30:30] INFO (pytorch_lightning/MainThread) save_dir:/home/RecStudio/
[2022-04-11 14:30:30] INFO (pytorch_lightning.utilities.distributed/MainThread) GPU available: True, used: False
[2022-04-11 14:30:30] INFO (pytorch_lightning.utilities.distributed/MainThread) TPU available: False, using: 0 TPU cores
[2022-04-11 14:30:30] INFO (pytorch_lightning.utilities.distributed/MainThread) IPU available: False, using: 0 IPUs
[2022-04-11 14:30:30] INFO (pytorch_lightning.utilities.distributed/MainThread) The following callbacks returned in `LightningModule.configure_callbacks` will override existing callbacks passed to Trainer: ModelCheckpoint
[2022-04-11 14:30:30] INFO (pytorch_lightning.core.lightning/MainThread)
  | Name         | Type               | Params
0 | loss_fn      | BPRLoss            | 0
1 | score_func   | InnerProductScorer | 0
2 | item_encoder | Embedding          | 107 K
3 | sampler      | UniformSampler     | 0
4 | user_encoder | Embedding          | 60.4 K
168 K     Trainable params
0         Non-trainable params
168 K     Total params
0.673     Total estimated model params size (MB)
[2022-04-11 14:30:30] INFO (pytorch_lightning.callbacks.early_stopping/MainThread) Metric [email protected] improved. New best score: 0.007
[2022-04-11 14:30:30] INFO (pytorch_lightning/MainThread) Training: Epoch=  0 [[email protected]=0.0074 [email protected]=0.0129 train_loss=0.6932]
[2022-04-11 14:30:31] INFO (pytorch_lightning.callbacks.early_stopping/MainThread) Metric [email protected] improved by 0.006 >= min_delta = 0.0. New best score: 0.014
[2022-04-11 14:30:31] INFO (pytorch_lightning/MainThread) Training: Epoch=  1 [[email protected]=0.0135 [email protected]=0.0251 train_loss=0.6915]
[2022-04-11 14:30:32] INFO (pytorch_lightning.callbacks.early_stopping/MainThread) Metric [email protected] improved by 0.038 >= min_delta = 0.0. New best score: 0.051
[2022-04-11 14:31:26] INFO (pytorch_lightning/MainThread) Training: Epoch= 75 [[email protected]=0.2074 [email protected]=0.2942 train_loss=0.1909]
[2022-04-11 14:31:26] INFO (pytorch_lightning.callbacks.early_stopping/MainThread) Monitored metric [email protected] did not improve in the last 10 records. Best score: 0.211. Signaling Trainer to stop.
[2022-04-11 14:31:26] INFO (pytorch_lightning/MainThread) Training: Epoch= 76 [[email protected]=0.2073 [email protected]=0.2949 train_loss=0.1899]
[2022-04-11 14:31:26] INFO (pytorch_lightning.utilities.distributed/MainThread) The following callbacks returned in `LightningModule.configure_callbacks` will override existing callbacks passed to Trainer: EarlyStopping, ModelCheckpoint
[2022-04-11 14:31:27] INFO (pytorch_lightning/MainThread) Testing:  [[email protected]=0.2439 [email protected]=0.1893 [email protected]=0.5762 [email protected]=0.3718 [email protected]=0.4487 [email protected]=0.7815]

If you want to change models or datasets, command line is ready for you.

python -m=NCF -d=ml-1m
  • Supported command line arguments:

    args type description default optional
    -m,--model str model name BPR all the models in RecStudio
    -d,--dataset str dataset name ml-100k all the datasets supported by RecStudio
    --data_dir str dataset folder datasets folders that could be read by RecStudio
    mode str training mode light ['light','detail','tune']
    --learning_rate float learning rate 0.001
    --learner str optimizer name adam ['adam','sgd','adasgd','rmsprop','sparse_adam']
    --weight_decay float weight decay for optimizer 0
    --epochs int training epoch 20,50
    --batch_size int the size of mini batch in training 2048
    --eval_batch_size int the size of mini batch in evaluation 128
    --embed_dim int the output size of embedding layers 64
  • For ItemTowerRecommender, some extra args are supported:

    args type description default optional
    --sampler str sampler name uniform ['uniform','popularity','midx_uni','midx_pop','cluster_uni','cluster_pop']
    --negative_count int number of negative samples 1 positive integer
  • For TwoTowerRecommender, some extra args are supported based on ItemTowerRecommender:

    args type description default optional
    --split_mode str split methods for the dataset user_entry ['user','entry','user_entry']

Here are some details of some unclear arguments.

  1. mode: in light mode and detail mode, the output will displayed on the terminal, while the latter provide more detailed info. tune mode will use Neural Network Intelligence(NNI) to show a beautiful visual interface. You can run with a config file like config.yaml. For more details about NNI, please refer to NNI Documentation.
  2. sampler: uniform stands for UniformSampler is used. popularity stands for sampling according to the item popularity (more popular items are sampled with higher probabilities). midx_uni,midx_pop are midx dynamic sampler, please refer to FastVAE for more details. cluster_uni,cluster_pop are cluster dynamic sampler, please refer to PRIS for more details.
  3. split_mode: user means splitting all users into train/valid/test datasets, users in those datasets are disjoint. entry means spliting all the interactions in those three dataset. user_entry means spliting interaction of each user into three parts.

Also, you can install RecStudio from PyPi:

pip install recstudio

For basic usage like below:

import recstudio"BPR", data_dir="./datasets/", dataset='ml-100k')

For more detailed information, please refer to our documentation

Automatic Hyper-parameter Tuning

RecStudio integrates with NNI module for tuning the hype-parameters automatically. For easy usage, you can run script with your specific config file like the provided file config.yaml.

For more detailed information about NNI, please refer to NNI Documentation.


Please let us know if you encounter a bug or have any suggestions by submitting an issue.

We welcome all contributions from bug fixes to new features and extensions.

We expect all contributions firstly discussed in the issue tracker and then going through PRs.

The Team

RecStudio is developed and maintained by USTC BigData Lab.

User Contributions
@DefuLian Framework design and construction
@AngusHuang17 Sequential model, docs, bugs fixing
@Xiuchen519 Knowledge-based model, bugs fixing
@JennahF NCF,CML,logisticMF models
@HERECJ AutoEncoder models
@BinbinJin IRGAN model


RecStudio uses MIT License.

Popular Pytorch Projects
Popular Recommendation System Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Deep Learning
Recommender System
Knowledge Graph
Collaborative Filtering
Matrix Factorization
Factorization Machines