Awesome Open Source
Awesome Open Source

Trend-Analyzer

Analyses trends in upcoming movie's anticipation.

This project analyzes trends in movie anticipation by scraping data from IMDb and twitter and then performing sentiment analysis on tweets to identify positive tweets. Finally plotting this time series data to identify patterns in movie anticipation.

The workflow takes place in three steps:-

  1. Scraping of data
    In the scraper folder of the repo there are two scripts for scraping:
    a) movies.py - to scrape the popularity of movies from IMDb and store it in mongodb.
    (This is code is run from crontab every 10 minutes)
    b) tweets.py - to scrape tweets from twitter and string it in a text file.

  2. Modeling
    In the models folder of the repo we have three scripts for modeling:
    a) sentiment_analysis.py - to perform sentiment analysis of extracted tweets and find out the percentage of positive tweets pertaining to a movie.
    b) sent_pre_trained_naive_Bayes.py - to train a Naive Bayes model on the training data and store the model as an object so that we don't have to train the model everytime we perform sentiment analysis.
    c) sent_predict.py - to predict the sentiment of the extracted tweets using the pretrained model and store the percentage of positive tweets for movies in mongodb.
    (This is code is run from crontab every 10 minutes)

  3. Visualization
    In the visualization folder of the repo we have two scripts to perform visualization of time series data:
    a) visualization.py - to visualize the popularity of movies on the basis of data extracted from IMDb using the script movies.py by plotting it against time.

b) visuallize_timeseries.py – to visualize the percentage of positive tweets for different movies stored in mongodb by sent_predict.py by plotting it against time.
Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (808,235
Machine Learning (37,158
Scraper (11,794
Tweets (8,386
Sentiment (6,179
Data Visualization (5,676
Sentiment Analysis (3,259
Data Mining (2,055
Imdb (2,010