Skip to content

swouf/ntds_IMDb_team4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NTDS - TEAM 4 - EVOLUTION OF THE MOVIE INDUSTRY

License : MIT version : final

The idea of our project is to use a subset of the IMDB movie dataset, taken from Kaggle: https://www.kaggle.com/tmdb/tmdb-movie-metadata , to make an analysis of the evolution of the movie industry throughout the years. More specifically, we want to have have an economy-oriented approach, by looking at properties such as the budget or the return on investment, and see if trends can be determined from these.

Structure of the repository

  • /: contains the final notebook, this README and the different folders.
  • data/: Contains the data necessary for the project, such as .csv files and numpy arrays.
  • milestones/: Contains the 4 milestone notebooks that were written during the semester.
  • pictures/: Contains figures that were exported from our different notebooks.
  • src/: Contains the python codes we wrote during the milestones to manipulate our data, such as scripts and functions.

Notebook

The main code is in the jupyter notebook final_project_ntds_2018.ipynb.

Python functions

A few functions were developped in their own function file. These functions can be found in the folder src. The most important ones are the following:

  • load_data contains multiple functions used to clean the initial dataset, create features dataframes and adjacency matrices.
  • genre_graph contains functions used to create graphs based on the genres of the movies.
  • test_success contains functions that reorder adjacency matrices based on kmeans results.