Skip to content

MilenaFilipovic/NTDS_Project_Team_49

Repository files navigation

A Network Tour of Millenial Movies

Project for the course A Network Tour of Data Science

Github repository of the project done by a team of four students for A Network Tour of Data Science course (EE-558) given at École polytechnique fédérale de Lausanne. This readme contains the project abstract, list of required libraries for the correct execution, datasets that were used for project implementation and the different research questions and products that were analyzed. The code can be found in the Jupyter Notebooks of this repository, and the report is given in the Project Report.pdf.

Libraries used

We used the following libraries for this project, with Python 3.6.6

Computational:

numpy (as np)
pandas (pd)
networkx (nx)
scipy
sklearn
surprise
operator
collections
pandas_profiling

Graphical:

seaborn (as sns)
matplotlib (as plt)
IPython

Textual:

json
base64
codecs
re
io

We also utilized these libraries for Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.

Abstract

As Walt Disney once said: "Movies can and do have tremendous influence in shaping young lives in the realm of entertainment towards the ideals and objectives of normal adulthood." But what do viewers really know about movies and what makes them successful? This project, based on the TMDb dataset, offers some interesting insights into movies from the past several decades. It shows how some of the movie features are correlated, explores how movies can be classified into genres using spectral graph analysis and CNNs, and gives a simple demo of a recommender system.

Datasets

The data folder contains the subsampled data that was used for the implementation.

Research Questions

  • Can spectral clustering classify movie genres using the k-means algorithm? Can this technique be used in different graph settings, such as cast and crew co-occurence graph or movie keyword co-occurence graph?

  • Can the movie genre classificator be improved by using Convolutional Neural Networks on graphs with Fast Localized Spectral Filtering? If so, what is the gained result from this analysis?

  • Can we suggest movies to users by creating a movie recommendation engine?

Structure of repo

The notebooks of the repository should be read in the following order:

  • Data Cleaning and Subsampling notebook

  • Data Exploration notebook

  • Data Exploitation - Spectral Graph Theory (Cast and Crew) notebook

  • Data Exploitation - Keyword co-occurrence graph notebook

  • Data Exploitation - CNNs notebook

  • Data Exploitation - Recommender Systems notebook

Additionally, there is a Gephi graph visualization notebook that was only used for visualization.

Authors

  • Milena Filipović
  • Kristijan Lopatichki
  • Jelena Malić
  • Davor Todorovski

License

Copyright 2019 Milena Filipović, Kristijan Lopatichki, Jelena Malić and Davor Todorovski

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Github repository of the project done by a team of four students for A Network Tour of Data Science course (EE-558) given at École polytechnique fédérale de Lausanne.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published