A Network Tour of Millenial Movies

Project for the course A Network Tour of Data Science

Github repository of the project done by a team of four students for A Network Tour of Data Science course (EE-558) given at École polytechnique fédérale de Lausanne. This readme contains the project abstract, list of required libraries for the correct execution, datasets that were used for project implementation and the different research questions and products that were analyzed. The code can be found in the Jupyter Notebooks of this repository, and the report is given in the Project Report.pdf.

Libraries used

We used the following libraries for this project, with Python 3.6.6

Computational:

numpy (as np)
pandas (pd)
networkx (nx)
scipy
sklearn
surprise
operator
collections
pandas_profiling

Graphical:

seaborn (as sns)
matplotlib (as plt)
IPython

Textual:

json
base64
codecs
re
io

We also utilized these libraries for Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.

Abstract

As Walt Disney once said: "Movies can and do have tremendous influence in shaping young lives in the realm of entertainment towards the ideals and objectives of normal adulthood." But what do viewers really know about movies and what makes them successful? This project, based on the TMDb dataset, offers some interesting insights into movies from the past several decades. It shows how some of the movie features are correlated, explores how movies can be classified into genres using spectral graph analysis and CNNs, and gives a simple demo of a recommender system.

Datasets

The TMDb credits dataset
The TMDb movies dataset

The data folder contains the subsampled data that was used for the implementation.

Research Questions

Can spectral clustering classify movie genres using the k-means algorithm? Can this technique be used in different graph settings, such as cast and crew co-occurence graph or movie keyword co-occurence graph?
Can the movie genre classificator be improved by using Convolutional Neural Networks on graphs with Fast Localized Spectral Filtering? If so, what is the gained result from this analysis?
Can we suggest movies to users by creating a movie recommendation engine?

Structure of repo

The notebooks of the repository should be read in the following order:

Data Cleaning and Subsampling notebook
Data Exploration notebook
Data Exploitation - Spectral Graph Theory (Cast and Crew) notebook
Data Exploitation - Keyword co-occurrence graph notebook
Data Exploitation - CNNs notebook
Data Exploitation - Recommender Systems notebook

Additionally, there is a Gephi graph visualization notebook that was only used for visualization.

Authors

Milena Filipović
Kristijan Lopatichki
Jelena Malić
Davor Todorovski

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
gephi		gephi
lib		lib
Data Cleaning and Subsampling.ipynb		Data Cleaning and Subsampling.ipynb
Data Exploitation - CNNs.ipynb		Data Exploitation - CNNs.ipynb
Data Exploitation - Recommender Systems.ipynb		Data Exploitation - Recommender Systems.ipynb
Data Exploitation - Spectral Graph Theory (Cast and Crew).ipynb		Data Exploitation - Spectral Graph Theory (Cast and Crew).ipynb
Data Exploration.ipynb		Data Exploration.ipynb
Data Explotitation - Keyword co-occurrence graph.ipynb		Data Explotitation - Keyword co-occurrence graph.ipynb
Gephi graph visualization.ipynb		Gephi graph visualization.ipynb
LICENSE		LICENSE
Project Report.pdf		Project Report.pdf
README.md		README.md

License

MilenaFilipovic/NTDS_Project_Team_49

Folders and files

Latest commit

History

Repository files navigation

A Network Tour of Millenial Movies

Project for the course A Network Tour of Data Science

Libraries used

Abstract

Datasets

Research Questions

Structure of repo

Authors

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages