Awesome Open Source
Awesome Open Source

DS-Banner by Brandon Beckett

31 data science, machine learning, and data engineering projects I completed in 2020. To view each project, simply click on the title of the project and it will take you to the corresponding Jupyter notebook.


Profitable Google Play and Apple App Profiles

An analysis to find the most profitable categories of free apps on the Apple App Store and Google Play store.


Exploring Hacker News Posts

An analysis of Ask HN and Show HN posts from Hacker News to determine what type of posts receive the most attention and at what time.


Exploring eBay Car Sales

An analysis of used car listings from the German classifieds site eBay Kleineanzeigen.

Python Pandas NumPy

Visualizing Earnings Based on College Majors

An exploration of earnings of individuals after graduation based on their college majors, and a look at some statistics for each major.

Python Pandas Matplotlib

Visualizing the Gender Gap in College Degrees

An exploration of the gender gap in college degrees across the US.

Python Pandas Matplotlib

Clean and Analyze Employee Exit Surveys

The cleaning and analysis of exit survey data from employees of the Department of Education, Training, and Employment (DETE), and the Technical and Further Education Body (TAFE) of Queensland, Australia.

Python Pandas NumPy Matplotlib

Analyzing NYC High School Data

A look at whether standardized tests like the SAT are unfair to certain demographics by investigating the correlations between SAT scores in New York City high schools.

Python Pandas NumPy Matplotlib Regex

Star Wars Survey

An analysis of Star Wars survey data from fans of Star Wars movies.

Python Pandas NumPy Matplotlib


Analyzing CIA Factbook Data

Exploration of the CIA World Factbook database that contains demographic information for every country.


Answering Business Questions

Use SQL to answer business questions using a database that contains information about a fictional digital music store that is contained within 11 tables.

Python SQL

Popular Data Science Questions

An analysis to determine the best data science content to write about for an education company that creates data science books, online articles, videos, or interactive text-based platforms, all based on data that we'll extract from the Data Science Stack Exchange.

Python SQL Pandas NumPy Matplotlib Seaborn


Investigating Fandango Movie Ratings

An analysis of movie ratings to determine whether or not Fandango has changed their biased rating system in 2016.

Python Pandas NumPy Matplotlib

Finding the Best Markets to Advertise In

An analysis of survey data from new coders to determine the best markets to advertise a company's online programming courses in.

Python Pandas Matplotlib Seaborn

Mobile App for Lottery Addiction

Work through probability calculations to contribute to the development of a mobile app that aims to prevent and treat lottery addiction by helping people better estimate their chances of winning.

Python Pandas

Building a Spam Filter with Naive Bayes

Learn about the practical side of the multinomial Naive Bayes algorithm by building a spam filter for SMS messages that classifies new messages as spam or non-spam with an accuracy greater than 95%.

Python Pandas Regex

Winning Jeopardy

Perform hypothesis testing to see if there are any good potential strategies for winning Jeopardy.

Python Pandas NumPy Regex SciPy


Predicting Car Prices

Use the k-nearest neighbors algorithm to predict a car's market price using data containing technical attributes for various cars.

Python Pandas NumPy Matplotlib scikit-learn

Predicting House Sale Prices

Explore ways to build and improve a linear regression model by working with housing data for the city of Ames, Iowa from 2006 to 2010.

Python Pandas NumPy Matplotlib scikit-learn regex

Predicting the Stock Market

Work with historical data from the S&P500 Index to develop a linear regression model that predicts future S&P500 prices.

Python Pandas scikit-learn

Predicting Bike Rentals

Predict the number of bikes that people rent in a given hour by creating several machine learning models––linear regression, decision tree, random forest––and evaluating their performance.

Python Pandas NumPy Matplotlib scikit-learn

Creating a Kaggle Workflow

A look at all the necessary steps when attempting a Kaggle competition. Here we'll work with the most popular Kaggle competition for beginners and predict which passengers survived the sinking of the Titanic.

Python Pandas NumPy Matplotlib scikit-learn


Building a Handwritten Digits Classifier

Build models that can classify handwritten digits. Explore image classification, observe the limitations of traditional machine learning models for image classification, and improve some neural networks for image classification.

Python Pandas NumPy Matplotlib scikit-learn


Building Fast Queries on a CSV

Create a class with methods that answer business questions about online inventory, while focussing on time and space complexity of algorithms, preprocessing data to speed up the algorithms, efficiently sorting data and searching that data, and using efficient algorithms.


Building a Database for Crime Reports

Create a database from scratch using Boston crime data, create user groups, and assign proper privileges to those groups.

Python SQL Psycopg2 CSV Postgres

Practice Optimizing Dataframes and Processing in Chunks

Work with large financial lending dataset in chunks and optimize the memory usage.

Python Pandas NumPy

Analyzing Startup Fundraising Deals from Crunchbase

An analysis of startup fundraising deals using a large database from

Python Pandas SQL

Analyzing Wikipedia Pages

An analysis of 54 megabytes of Wikipedia data by implementing a grep function to search textual data.

Python Pandas OS Multiprocessing MapReduce

Analyzing Stock Prices

A statistical analysis of large quantity of historical stock market data from Yahoo Finance.

Python Pandas Pickle

Evaluating Numerical Expressions

Use a stack data structure to implement a function that can evaluate complex numerical expressions stored as a string.


Implementing a Key-Value Database

Create a fully functional save-to-disk key-value store using a b-tree data structure.


Hacker News Pipeline

Build a hacker news pipeline from a JSON API that will filter, clean, aggregate, and summarize data.

Python io JSON

Thank you for checking out my work! Please don't hesitate to contact me if you're interested in collaborating on a project, have a virtual chat, or if you're in Berlin and want to grab a ☕️

Alternative Project Comparisons
Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (805,485
Jupyter Notebook (153,806
Learning (76,254
Machine Learning (37,000
Deep Learning (36,402
Sql (21,995
Statistics (10,480
Data Science (10,132
Car (8,935
Pandas (6,601
Data Analysis (4,797
Survey (4,427
College (3,392
Spam (2,803
Image Classification (2,299
Data Engineering (626