Harry Potter and a Data Scientist: Write a multi-class classifier using gradient descent optimization algorithm to replace the bewitched Sorting Hat and save Hogwarts! 🎩🧙‍♂️
Alternatives To Dslr
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Flatiron School Data Science Curriculum Resources75
2 years agoJupyter Notebook
Lesson material on data science and machine learning topics/concepts
Neural Network From Scratch38
a year agoRust
A neural network library written from scratch in Rust along with a web-based application for building + training neural networks + visualizing their outputs
Edx Scalable Machine Learning7
8 years agoPython
2 years agoPython
Harry Potter and a Data Scientist: Write a multi-class classifier using gradient descent optimization algorithm to replace the bewitched Sorting Hat and save Hogwarts! 🎩🧙‍♂️
The Essential Guide To Calculus For Ml2
a month agoJupyter Notebook
This is a comprehensive and beginner-friendly repository that covers the basics of Calculus and its applications in Machine Learning, including topics such as Differentiation, Integration, Vector Calculus, Optimization Algorithms, and more.
Covid 19_modelling_for_india1
2 years agoJupyter Notebook
Modelling the Coronavirus Statistics for India using the SIR Model
Microsoft Dat256x Essential Math For Machine Learning Python Edition1
2 years agoJupyter Notebook
Machine learning and AI are built on mathematical principles like Calculus, Linear Algebra, Probability, Statistics, and Optimization. This course aims to help you learn some essential foundational concepts and the notation used to express them.
Alternatives To Dslr
Select To Compare

Alternative Project Comparisons

Harry Potter and a Data Scientist

Subject PDF:


Cook Book:


DSLR (Datascience X Logistic Regression)

On no! Since its creation, the famous school of wizards, Hogwarts, had never known such an offense. The forces of evil have bewitched the Sorting Hat. It no longer responds, and is unable to fulfill his role of sorting the students to the houses.

The new academic year is approaching. Gladly, the Professor McGonagall was able to take action in such a stressful situation, since it is impossible for Hogwarts not to welcome new students. . . She decided to call on you, a muggle "datascientist" who is able to create miracles with the tool which all muggles know how to use: a "computer". Despite the intrinsic reluctance of many wizards, the director of the school welcomes you to his office to explain the situation. You are here because his informant discovered that you are able to recreate a magic Sorting Hat using your muggle tools.

You explain to him that in order for your "muggle" tools to work, you need students data. Hesitantly, Professor McGonagall gives you a dusty spellbook. Fortunately for you, a simple "Digitalis!" and the book turned into a USB stick.

Data Visualization


Which Hogwarts course has a homogeneous score distribution between all four houses ?

python3 histogram.py -d

  • -d: Display all the histograms.
  • -f: Show histogram of the feature 'n'.

Screen Shot 1

Scatter plot

What are the two features that are similar ?

python3 scatter_plot.py [-f1{n1} -f2{n2}]

  • -f1: precise the first feature to use.
  • -f2: precise the second feature to use.
  • n1 and n2: index of the features to use

Screen Shot 2

Pair plot

python3 pair_plot.py

Screen Shot 3

Data Analysis:

Some features are homogenous or coherant with other ones, so there existance is not necessary for training the model and can give use= a complex hypothesis that will cause 'Overfitting' Our choice was to remove:

  • Arithmancy: Homogenous
  • Astronomy: Similar to 'Defense Against the Dark Arts'
  • Transfiguration: Semi similar to 'History of Magic'
  • Potions: Semi homogenous
  • Care of Magical Creatures: Semi homogenous

Training the model

python3 logreg_train.py [-BGD | -SGD] <_train dataset_>

  • -BGD: Batch Gradient Descent Algorithm
  • -SGD: Stochastic Gradient Descent Algorithm

Output a file named: ./weights.csv that contain the weights of the model.

In the end of the training the program output the: (using the training set)

  • Accuracy of the model 98.06%
  • Confusion Matrix
  • F1 Score
  • Balanced Accuracy 98.71%

Predict with the model

python3 logreg_predict.py [-p] <_dataset_> <_weights_>

  • -p: Print the result with the students names in the stdout

Output a file named: ./houses.csv that contain the Indexs and the predicted house affected to the students.

Packages needed

  • pip3 install pandas
  • pip3 install matplotlib
  • pip3 install seaborn


  • ✔️ 125 [ Accuracy: (training data: 98.06%) (evaluation data: 99%) ]
  • Miss McGonagall is very happy for the results 🎉🥳


  • Oussama Belouche 1337
  • Anas Elouargui 1337

forthebadge forthebadge

Popular Gradient Descent Projects
Popular Data Visualization Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Data Visualization
Logistic Regression
Gradient Descent