This repository is the basic Introduction to Data Science in Python. I feel it will be great help for those starting with Data Science.
Data scientist need to have these skills:
- Basic Tools: Like python, R or SQL. You do not need to know everything. What you only need is to learn how to use python
- Basic Statistics: Like mean, median or standart deviation. If you know basic statistics, you can use python easily.
- Data Munging: Working with messy and difficult data. Like a inconsistent date and string formatting. As you guess, python helps us.
- Data Visualization: Title is actually explanatory. We will visualize the data with python like matplot and seaborn libraries.
- Machine Learning: You do not need to understand math behind the machine learning technique. You only need is understanding basics of machine learning and learning how to implement it while using python.
Tutorial 1: Introduction to Python
Tutorial 2: Python Data Science Toolbox
Tutorial 3: Data Cleaning Methods
Tutorial 4: Introduction to Pandas
All the codes are written in Jupyter Notebook (Python 2.7.x)
After installing Jupyter Notebook, run it through terminal:
To install mentioned libraries:
NumPy is the fundamental package for scientific computing with Python.
sudo apt-get install python-pip
sudo pip install numpy scipy
Easy-to-use data structures and data analysis tools for the Python programming language.
sudo pip install pandas
Similarly Install Matplotlib, Seaborn, etc.