Stock Market Prediction Via Google Trends

Attempt to predict future stock prices based on Google Trends data.
Alternatives To Stock Market Prediction Via Google Trends
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
21 hours ago338gpl-2.0C++
Money Manager Ex is an easy to use, money management application built with wxWidgets
12 years ago7September 14, 201819mitJavaScript
A fast, simple and cross-platform(html5 react-native weex wechat-applet) stock chart library created using canvas.
2 years ago2mitJavaScript
Open Source Option Analytics Platform.
Reddit Sentiment Analysis165
a year agomitPython
This program goes thru reddit, finds the most mentioned tickers and uses Vader SentimentIntensityAnalyzer to calculate the ticker compound value.
4 years ago1apache-2.0Python
Dash Stock Tickers Demo App123
a year ago10mitCSS
Dash Demo App - Stock Tickers
Stock Market Prediction Via Google Trends38
a year ago1mitPython
Attempt to predict future stock prices based on Google Trends data.
Stock Option Analytics21
a year ago7Python
Stocks and options picking. Tries to contain predictive analytics, recommendations, and calculators.
4 years agoJupyter Notebook
Machine Learning Tutorials & Fundamentals
Canvas Desktop19
2 months agomitC#
Cross-platform real-time financial charts for Desktop apps with built-in pan and zoom support.
Alternatives To Stock Market Prediction Via Google Trends
Select To Compare

Alternative Project Comparisons

Table of Contents


The data used is downloaded from Google Trends. The concept for this project came from research by Tobias Preis, Helen Susannah Moat, and H. Eugene Stanley, "Quantifying Trading Behavior in Financial Markets Using Google Trends". In this research was found that the search volume for certain (financial) words are linked to the stock price of the Dow Jones Industrial Average stock price, and can in most cases predict a dip in the market. The purpose of this project is to combine this research with machine learning.


Two machine learning algorithms have been explored for this project: XGBoost and MLPClassifier. The MLPClassifier clearly performed better than XGBoost. The best annual return, which XGBoost got is 44.2%. In contrast, MLPClassifier's best model got a 91.3% between 2008 and the present. A big contribution towards these insanely high annual returns was the coronavirus. Because of the coronavirus, the stock market crashed, which could be a major source of profits for these algorithms.


MLPClassifier performed very well on the test data. This algorithm was very strong in identifying that it was impossible to predict the small changes in the market in between crashes. Thus, for the most part, it held a buy-and-hold strategy, but during a stock market crash (like corona) or other, slightly bigger, changes, it performed well. As can be seen in figure 8.

Comparison of the MLPClassifier, 10.000 random and a buy-and-hold strategy

Figure 8. Comparison of the mean plus and minus 1 standard deviation of 10.000 random simulations, MLPClassifier algorithm and a buy-and-hold strategy.


XGBoost did not have the insight, which MLP did. It tried to predict the small changes, which it ultimately failed at. However, XGBoost was still able to predict the stock market crash caused by the coronavirus. This was the reason why XGBoost still had such a large annual return (44.2%).

Comparison of the MLPClassifier, 10.000 random and a buy-and-hold strategy

Figure 9. Comparison of the mean plus and minus 1 standard deviation of 10.000 random simulations, MLPClassifier algorithm, XGBoost algorithm and a buy-and-hold strategy.


Data Collection

Two datasets were needed for this project; the Google Trends daily data for a specific keyword and the stock price daily data for a specific ticker. To collect the Google Trends daily data, you have to download all 6-month increments, 5-year increments, and 2004—present within the 2004—2020 timespan. All this data will eventually be adjusted to be relative to each other, instead of only within its respective timespan. To collect the stock price daily data for a specific ticker you want to predict, you have to download it from a website like Yahoo Finance, where you can download the historical data of any ticker.

Data Visualisation


To prove that there indeed is a correlation between Google Trends data (e.g. 'debt') and stock prices (e.g. Dow Jones Industrial Average). I plotted the DJIA stock price with indicators of peaks in the search volume for "stock market". As you can see, before a major stock market crash, there are usually some peaks to be observed. There are also some peaks in the middle of a crash, but the peaks before the crash are quite indicative.

DJIA stock price data with peak-indicators of 'stock market'.

Figure 1. A graph where the stock price of DJIA is plotted with red dots where a peak in search volume for "stock market" has been observed. From this graph can be observed that erratic movement in search volume precedes a major stock crash.


After all adjustments of the data to eventually get relative daily data, which is relative to each other, the data visually looks as follows:

Adjusted daily data over entire timespan.

Figure 2. A graph in which the adjusted daily data is visualised.


All data on Google Trends is relative (0—100) to each other within one timeframe and you can only get daily data in 6-month increments, weekly data in 5-year increments, and only monthly data is provided for the entire timespan available. So to aggregate all data needed for this project was quite a challenge and because of these restrictions aren't completely accurate, however, the method I used was the only method to getting daily data over the entire timespan available (which is crucial for this project).


To get all the data relative to each other, instead of only within its 6-month increment, I had to merge them based on weekly data. However, the weekly data is only available in 5-year increments, so I had to merge these 5-year increments based on the monthly data, which is available for timespan needed for this project. To merge all the 6-month, and 5-year increments, I computed the percentage change of each data point within its respective increment. Afterwards, I got one data point from the higher up periodicity data per increment and computed the missing days by applying the percentage change to the provided data point.


An example of the search term 'debt' ('debt' is the best search term to predict market change, according to the research mentioned earlier) in the timespan 2007—2009:

Before adjustments

Before adjustments of example.

Figure 3. A graph where the unadjusted relative daily data is visualised. The black vertical lines indicate the edges of the 6-month increments.

After adjustments

After adjustments of example.

Figure 4. A graph where the adjusted relative daily data is visualised. The graph follows the actual weekly data much better.


Actual monthly data.

Figure 5. The actual weekly data.


To get better results, the raw data had to be feature engineered. Features used include:

Following the computation for these features, all of them are shifted 3 through 10 days. This is because Google Trends data is available three days after the fact and the target may correlate well with further shifted data. Afterward, there are 272 features. The top 50 correlating (with the target, according to the Pearson correlation coefficient) are used in the training and predicting of the direction of the Dow Jones Industrial Average.

Simple Moving Average Delta

SMA delta.

Figure 6. When this feature becomes more volatile, the close price follows. This is a good indicator for a machine learning algorithm. It can also be seen that the close price percentage change loosely follows the line of the feature.

Bollinger Bands

Bollinger bands.

Figure 7. When the 20-day simple moving average crosses the upper Bollinger band, the close price becomes more volatile. The stock close percentage change also loosely follows the lower Bollinger band.

Project Organisation

    ├── LICENSE
    ├── Makefile           <- Makefile with commands like `make data` or `make train`
    ├──          <- The top-level README for developers using this project.
    ├── data
    │   ├── processed      <- The final, canonical data sets for modeling.
    │   └── raw            <- The original, immutable data dump.
    ├── docs               <- A default Sphinx project; see for details
    ├── models             <- Trained and serialized models, model predictions, or model summaries
    ├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
    │                         the creator's initials, and a short `-` delimited description, e.g.
    │                         `1.0-jqp-initial-data-exploration`.
    ├── references         <- Data dictionaries, manuals, and all other explanatory materials.
    ├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
    │   └── figures        <- Generated graphics and figures to be used in reporting
    ├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
    │                         generated with `pip freeze > requirements.txt`
    ├──           <- makes project pip installable (pip install -e .) so src can be imported
    └── src                <- Source code for use in this project.
        ├──    <- Makes src a Python module
        ├── data           <- Scripts to download or generate data
        │   └──
        └── features       <- Scripts to turn raw data into features for modeling


MIT License

Copyright (c) 2020 Cristian Perez Jensen

Popular Stock Projects
Popular Data Visualization Projects
Popular Economics Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Machine Learning
Data Visualisation
Stock Market
Stock Price Prediction