Awesome Open Source
Awesome Open Source

photo_2022-04-20_03-18-03

     DATA SCIENCE ROADMAP :pirate_flag: 2022

Data Science Roadmap for anyone interested in how to break into the field!

This repository is intended to provide a free Self-Learning Roadmap to learn the field of Data Science. I provide some of the best free resources.


  Our Previous Roadmap
   ⚠️ Before we start, ⚠️

If you Dont know What`s Data Science or Projects Life Cycle (starting from Business Understanding to Deployment) or Which Programming Language you should go for or Job Descriptions or the required Soft & Hard Skills needed for this field or Data Science Applications or the Most Common Mistakes, then

📌This Video is for you (Highly Recommended ✔️)

Data Science vs Data Analytics vs Data Engineering - What's the Difference?


aaa

These terms are wrongly used interchangably among people. There are distinct differences:

🔸 Data Science 🔸 Data Analytics 🔸 Data Engineering
Is a multidisciplinary field that focuses on looking at raw and structured data sets and providing potential actionable insights. The field of Data Science looks at ensuring we are asking the right questions as opposed to finding exact answers. Data Scientist require skillsets that are centered on Computer Science, Mathematics, and Statistics. Data Scientist use several unique techniques to analyze data such as machine learning, trends, linear regressions, and predictive modeling. The tools Data Scientist use to apply these techniques include Python and R.
Focuses on looking at existing data sets and creating solutions to capture data, process data, and finally organize data to draw actionable insights. This field looks at finding general process, business, and engineering improvements we can make based on questions we don't know the answers to. Data Analytics require skillsets that are centered on Statistics, Mathematics, and high level understanding of Computer Science. It involves data cleaning, data visualization, and simple modeling. Common Data Analytic tools used include Microsoft Power Bi, Tableau, and SQL.
Focuses on creating the correct infrastructure and tools required to support the business. Data Engineers look at what are the optimal ways to store and extract data and involves writing scripts and building data warehouses. Data Engineering require skillsets that are centered on Software Engineering, Computer Science and high level Data Science. The tools Data Engineers utilize are mainly Python, Java, Scala, Hadoop, and Spark.

Prepare your workspace

Tip 1️⃣ : Pick one and stick to it. (📁Click)


Anaconda: Its a tool kit that fulfills all your necessities in writing and running code. From Powershell prompt to Jupyter Notebook and PyCharm, even R Studio (if interested to try R)

a

Atom: A more advanced Python interface, highly recommended by experts.
Google Colab: Its like a Jupyter Notebook but in the cloud. You dont need to install anything locally. All the important libraries are already installed. For example NumPy, Pandas, Matplotlib, and Sci-kit Learn
PyCharm: PyCharm is another excellent IDE that enables you to integrate with libraries such as NumPy and Matplotlib, allowing you to work with array viewers and interactive plots.
Thonny: Thonny is an IDE for teaching and learning programming. Thonny is equipped with a debugger, and supports code completion, and highlights syntax errors.

Most learning platforms have integrated code exercises where you dont need to install anything locally. But to learn it right, you should have an IDE installed on your local machine. Suggestions will be a marketplace with many options and few improvements from one platform to another.

Tip 2️⃣ : Focus on one course at least.

Tip 3️⃣ : Dont chase certifications.

Tip 4️⃣ : Dont rush for ML without having a good background in programming & maths.

This track is divided into 3 phases ⬇️ :

  1. Beginner: you get a basic understanding of data analysis, tools and techniques.

  2. Intermediate: dive deeper in more complex topics of ML, Math and data engineering.

  3. Advanced: where we learn more advanced Math, DL and Deployment.

🔔 For Data Camp courses, github student pack gives 3 free months. Google how to get it.
if you already used it, do not hesitate to contact us to have an account with free access.🌺

Legend

  • 📹 Video Content
  • 📕 Online Article Content / Book

💡 Roadmap Explanation ▶️ Youtube Video 🎥


🔰 Beginner 🔰

Algorithms Book Every piece of code could be called an algorithm, but this book covers the more interesting bits.
Specializations (data structures-algorithms)

1. Descriptive Stats.
   📹 Intro to descriptive statistics
   📕 Online statistics education
   📕 Intro to descriptive statistics Article1 & Article2
   📹 Arabic Course
   📹 Intro to Inferential Statistics++
   📕 Practical Statistics for Data Scientists

2. Probability
   📹 Khan Academy
   📹 Arabic Course
   📕 Introduction to Probability

3. Python
   📹 Introduction to Python Programming
   📹 OOP
   📹 Arabic - Hassouna | Elzero
   📹 Python Full Course - FreeCodeCamp on YouTube
   📕 Intro to Python for CS and Data Science
   more in OOP
4. Pandas
   📹 Corey Schafer-Youtube
   📕 Kaggle
   📕 Docs
   📹 Data School-Youtube
   📹 Arabic Course
5. Numpy
   📕 Kaggle
   📹 Arabic Course
   📕 Tutorial
   📕 Docs
6. Scipy
   📕 Tutorial
   📕 Docs
7. Data Cleaning: One of the MOST important skills that you need to master to become a good data scientist, you need to practice on many datasets to master it.
   Read this
   📹 Course 1
   📕 Notebook1
   📕 Notebook2
   📕 Notebook3
   📕 Kaggle Data cleaning
8. Data Visualization 📊
   📹 Introduction to Data Visualization with Matplotlib or
   📹 Corey Schafer - Playlist on Youtube or
   📹 sentdex - Playlist on YouTube
   📕 Kaggle to Data Visualization with Seaborn
   📹 Playlist-Youtube
   📹 Course1: Intro to Data Visualization with Seaborn
   📹 Course2: Intermediate Data Visualization with Seaborn
   📹 Course3: Understanding and Visualizing with Python

9. EDA Note: it's already mentioned in the above probability course
   📹 DataCamp-EDA in Python
   📹 IBM-EDA for Machine Learning

10. Dashboards
Tableau
   📕 Tutorial
   📹 docs
   📹 course
Power BI
   📹 Power BI Desktop - Coursera
   📹 Power BI training
   📹 Arabic - Youtube

11. SQL and DB
   📹 Intro to SQL or IBM
   📹 Intro to Relational Databases in SQL
   📹 Arabic Course
   📹 Joining Data in SQL
   📝 Practice HackerRank & DataLemur

12. Python Regular Expression
   📕 Tutorial
13. Time Series Analysis
   📹 Track
   📕 Book
   📕 fbprohet
   📹 Arabic Source Video1 & Video2

At The end of Beginner phase apply all what you've learned on a project.


🔰 Intermediate 🔰

1. Math for ML: consists of Linear Algebra, Calculus and PCA.
📹 Specialization
📹 Mathematics for Machine Learning - Most of the needed basics

🔹Linear Algebra
   📹 Khan Academy - Linear Algebra
   📹 Mathematics for Machine Learning: Linear Algebra
   📹 3Blue1Brown - Essence of Linear Algebra
🔹Calculus
   📹 Multivariate Calculus - Coursera
   📹 Essence of calculus - Youtube
🔹PCA
   📹 PCA - Coursera

2. Machine Learning
   📹 Coursera Free Course by Andrew Ng (Octave/Matlab)
   📹 Coursera Andrew`s new ML Specialization (Python)
   📹 Machine Learning Stanford Full Course on YouTube by Andrew
   📹 CS480/680 Intro to Machine Learning - Spring 2019 - University of Waterloo
   📹 SYDE 522 Machine Intelligence (Winter 2018, University of Waterloo)
   📹 Introduction to Machine Learning Course - Udacity
   📹 Hesham Asem - Arabic content
   📹 IBM ML with Python
   📹 Machine Learning From Scratch - YouTube (Python Engineer)
   📕 Hands On ML (1st & 2nd & 3rd) Editions | example code 'Notebooks'
   📹 ML Algorithms in Practice
   📹 ML scientist
   📹 Project

3. Web Scraping/APIs
   📹 course
   📕 intro2
   📕 Tutorial
   📕 Book for both topics
APIs
   📕 Tutorial
   📕 Article
   📕 Tutorial
4. Stats.
   📕 This stats - Book
   📕 Think Bayes - Book
5. Advanced SQL
   📹 More advanced SQL
   📹 Joining Data in SQL

7. Feature Engineering
   📕 Tutorial
   📕 Article
   📕 Book
8. interpet Shapley-based explanations of ML models.
   📕 SHAP
   📕 Kaggle ML explainability

After finishing this level apply to 2 or 3 good sized projects.

Read this book, please 📖 Introduction to Statistical Learning with Applications in R


🔰 Advanced 🔰

1. Deep Learning
   📹 Deep Learning Fundamentals
   📹 Introduction to Deep Learning - MIT
   📹 Specialization
   📕 Dive into Deep Learning (En) | (Ar) version ➡️Part1 & Part2
   📹 Deep Learning UC Berkely
   📕 github of Dive into DL
   📹 Stanford Lecture - Convolutional Neural Networks for Visual Recognition
   📹 University of Waterloo - ML / DL

2. Tensorflow
   📹 Specialization
   📹 Youtube
    fast.ai's Deep Learning Courses

3. Advanced Data Science
   📹 Advanced Data Science with IBM Specialization
4. NLP
   📹 Specialization
   📹 Arabic - Ahmed El Sallab
   📹 Introduction to Natural Language Processing in Python

5. Inferential Statistics
   📹 Specialization, 2nd & 3rd courses
   📹 course
6. Bayesian Statistics
   📹 1 - From Concept to Data Analysis
   📹 2 - Techniques and Models
   📹 3 - Mixture Models
7. Model Deployment
   📕 Flask tutorial
   📹 TensorFlow: Data and Deployment Specialization
   📹 Deploy Models with TensorFlow Serving and Flask
   📹 How to Deploy a Machine Learning Model to Google Cloud - Daniel Bourke
   if you`re intersted in more deployment methods, search for (FastAPI - Heroku - chitra)

8. Probabilistic Graphical Models

   📹 Specialization


Tasks and Projects will be added soon.


📌 Common Tools ⤵️

   Anaconda
   Git
   Course - Udacity
   Arabic - Youtube

📌 More Books ~ 📌 Check This!

  :atom::atom::atom::atom::atom:
   📕 🔥 65 Free Important Books 🔥
   📕 Mathematics for Machine Learning
   📕 An Introduction to Statistical Learning
   📕 Understanding Machine Learning: From Theory to Algorithms
   📕 Probabilistic Machine Learning: An Introduction
   📕 storytelling with data Important data visualization guide.


📌 Collection of the best Cheat sheets

  1. Importing Data

  2. Pandas

   - (1)    - (2)    - (3)

  1. Matplotlib

  2. Seaborn

  3. Probability

  4. Supervised Learning

  5. Unsupervised Learning

  6. Deep Learning

  7. Machine Learning Tips and Tricks

  8. Probabilities and Statistics

  9. Comprehensive Stanford Master Cheat Sheet

  10. Linear Algebra and Calculus

  11. Data Science Cheat Sheet

  12. Keras Cheat Sheet

  13. Deep Learning with Keras Cheat Sheet

  14. Visual Guide to Neural Network Infrastructures

  15. Skicit-Learn Python Cheat Sheet

  16. Scikit-learn Cheat Sheet: Choosing the Right Estimator

  17. Tensorflow Cheat Sheet

  18. Machine Learning Test Cheat Sheet


The best way to practice is to take part in competitions.🏆 🏆

Competitions will make you even more proficient in Data Science.
When we talk about top data science competitions, Kaggle is one of the most popular platforms for data science. Kaggle has a lot of competitions where you can participate according to your knowledge level.

You can also check these platforms for data science competitions-
- Driven Data
- Codalab
- Iron Viz
- Topcoder
- CrowdANALYTIX Community
- Bitgrit


📓 Data Science Interview Questions: ▶️   - (1)  - (2)  - (3)  - (4)  - (5)  - (6) Arabic Podcast🎧
                    - (7) 30 days of interview preparation📖


📌 Data Analysis Recommendations.
    Books (📕 The Data Analysis Workshop &  📕 Head First Data Analysis)
    FWD - (The 3 Levels)
    Google Data Analytics Professional Certificate
    IBM Data Analyst Professional Certificate
   Note: A good knowledge & projects in just Excel, SQL & Power BI / Tableau can bring you great opportunities

📌 Data Engineering Recommendations.
    Roadmap 1
    Roadmap 2
    IBM Data Engineering Professional Certificate


📁

CV / Resumes 📝


📌 Data & AI Companies in Egypt   -   AI/ML Driven Companies In Egypt


Contact Me 📱

Alternatives To Data Science Roadmap
Select To Compare


Alternative Project Comparisons
Related Awesome Lists
Top Programming Languages

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (861,116
Machine Learning (39,698
Deep Learning (38,511
Sql (22,618
Neural Network (15,932
Data Science (11,040
Statistics (10,718
Mathematics (10,191
Data Visualization (6,073
Data Analysis (5,176
Probability (4,235
Interview Questions (1,233
Linear Algebra (1,106
Data Engineering (733
Cv Template (155