Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Alternatives To Spark Movie Lens
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Superset52,203213 hours ago3April 29, 20221,354apache-2.0TypeScript
Apache Superset is a Data Visualization and Data Exploration Platform
Spark Movie Lens757
2 years ago10otherJupyter Notebook
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
6 years ago3mitPython
Conversation models in TensorFlow. (website removed)
Machine Learning234
2 years ago78otherJavaScript
Web-interface + rest API for classification and regression (
3 years ago3mitPython
Face2Data: Extract meaningful information from a person face in less than a second. Powered by Keras and Flask.
5 years ago2mitPython
A project designed to explore CNN and the effectiveness of RCNN on classifying the EMNIST dataset.
2 years ago3mitJupyter Notebook
Health Check ✔ is a Machine Learning Web Application made using Flask that can predict mainly three diseases i.e. Diabetes, Heart Disease, and Cancer.
8 months ago4Jupyter Notebook
A Machine Learning and Deep Learning based webapp used to predict multiple diseases.
Covid19 Greece Api46
a day ago3eupl-1.2Python
🦠 A simple and fast API for tracking the coronavirus (COVID-19) outbreak in Greece 🇬🇷
Datascience Webapp With Flask35
5 years agomitPython
Data Science webapp to show some of the capabilities of Flask and libraries such as sklearn, pandas, matplotlib, seaborn...
Alternatives To Spark Movie Lens
Select To Compare

Alternative Project Comparisons

A scalable on-line movie recommender using Spark and Flask

This Apache Spark tutorial will guide you step-by-step into how to use the MovieLens dataset to build a movie recommender using collaborative filtering with Spark's Alternating Least Saqures implementation. It is organised in two parts. The first one is about getting and parsing movies and ratings data into Spark RDDs. The second is about building and using the recommender and persisting it for later use in our on-line recommender system.

This tutorial can be used independently to build a movie recommender model based on the MovieLens dataset. Most of the code in the first part, about how to use ALS with the public MovieLens dataset, comes from my solution to one of the exercises proposed in the CS100.1x Introduction to Big Data with Apache Spark by Anthony D. Joseph on edX, that is also publicly available since 2014 at Spark Summit. Starting from there, I've added with minor modifications to use a larger dataset, then code about how to store and reload the model for later use, and finally a web service using Flask.

In any case, the use of this algorithm with this dataset is not new (you can Google about it), and this is because we put the emphasis on ending up with a usable model in an on-line environment, and how to use it in different situations. But I truly got inspired by solving the exercise proposed in that course, and I highly recommend you to take it. There you will learn not just ALS but many other Spark algorithms.

It is the second part of the tutorial the one that explains how to use Python/Flask for building a web-service on top of Spark models. By doing so, you will be able to develop a complete on-line movie recommendation service.

Part I: Building the recommender

Part II: Building and running the web service

Quick start

The file server/ starts a CherryPy server running a Flask to start a RESTful web server wrapping a Spark-based context. Through its API we can perform on-line movie recommendations.

Please, refer the the second notebook for detailed instructions on how to run and use the service.


Contributions are welcome! For bug reports or requests please submit an issue.


Feel free to contact me to discuss any issues, questions, or comments.


This repository contains a variety of content; some developed by Jose A. Dianes, and some from third-parties. The third-party content is distributed under the license provided by those parties.

The content developed by Jose A. Dianes is distributed under the following license:

Copyright 2016 Jose A Dianes

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
See the License for the specific language governing permissions and
limitations under the License.
Popular Dataset Projects
Popular Flask Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Jupyter Notebook
Big Data