A dive into the music industry

The following GitHub contains our final project for the Network Tour of Data Science course. You can find here two notebooks; one used to query the data for our dataframe and one to do the study of the data. You can also find our written report. Note that the data being too big to be uploaded on GitHub, we can give it to you directly so that you don't have to re-pull everything from Genius

The project: starting point

During this project, we wanted to tackle the relationships between the multiple actors of the music industry. This means we are not only interested in the well known interprets but also the producers and the songwriters. Using graph theory, we tried our best to discover important features of this world based on a database of around 70 000 songs containing, for each song, the collaborations that lead to its existence.

From this basic idea of studying working relationships between multiple actors, we thought about the following questions and tried to answer them in the best way possible:

First, we are interested to know if the music industry is a connected world or not. If the answer is yes, we would like to know if we are able, by studying the working relationships, to discern a mixing of the musical genres or not (does actors work only on one genre our multiple, do they either have preference for a specific genre of music, can we make the same conclusion for each group of actors). On the same note, we are also interested in looking at possible sub-genre division using the collaborations data cause sub-genres are not defined in the data recovered from Spotify. Finally, we will study if a relationship with a big name is needed to create a place in the musical world or not.

What did we used to answer these questions

During this project, we used a graph theory approach to answer the questions above. We created a matrix representation (adjacency matrix) of our data. We used plotting tools to look at the result nicely and interactively. We used clustering/community detection algorithms to look into the graph with more depths (Spectral clustering, DBSCAN, Louvain method). We also did ourselves the pulling of the data using both an existing dataset from Spotify from Kaggle (https://www.kaggle.com/zaheenhamidani/ultimate-spotify-tracks-db) and the Genius API.

Tool needed to run the project

To run this project you will need the following python libraries installed:

pandas
numpy
re
networkx
random
seaborn
collection
matplotlib
sklearn
community
bokeh
requests
BeautifulSoup

You can send us a message if you want the data we pulled to rerun the notebook. If you want to pull the data yourself, you'll have to create yourself an account on the Genius API to have your token for pulling data.

Note: some algorithm, when rerun, will not necessarily output the same result that the one present in the notebook cause they are solving optimization problem (Louvain method).

Note: Since Bokeh is not always cached in Jupyter notebooks, you'll probably have to rerun all the notebook (approx: 1h40) to be able to navigate through the graph. Otherwise, you can find the pictures corresponding to each graph in the folder picture of this repository. To go quicker set the sample size of the test dataset to 0.1 instead of 0.2 (the observations will not be necessarily right but at least you will see the graphs.

Authors

Sacha Leblanc, Etienne Caquot, Grégoire Mayrhefer, Alexis Mermet

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
images		images
NTDS_processing_of_the_data.ipynb		NTDS_processing_of_the_data.ipynb
NTDS_project.ipynb		NTDS_project.ipynb
README.md		README.md
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

NTDS_processing_of_the_data.ipynb

NTDS_processing_of_the_data.ipynb

NTDS_project.ipynb

NTDS_project.ipynb

README.md

README.md

report.pdf

report.pdf

Repository files navigation

A dive into the music industry

The project: starting point

What did we used to answer these questions

Tool needed to run the project

Authors

About

Releases

Packages

Contributors 3

Languages

SachaLeblanc/NTDS_final

Folders and files

Latest commit

History

Repository files navigation

A dive into the music industry

The project: starting point

What did we used to answer these questions

Tool needed to run the project

Authors

About

Resources

Stars

Watchers

Forks

Languages