Project Name  Stars  Downloads  Repos Using This  Packages Using This  Most Recent Commit  Total Releases  Latest Release  Open Issues  License  Language 

Pandas  39,869  38,392  31,493  18 hours ago  116  June 28, 2023  3,632  bsd3clause  Python  
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more  
Data Science For Beginners  22,590  5 days ago  44  mit  Jupyter Notebook  
10 Weeks, 20 Lessons, Data Science for All!  
Ydata Profiling  11,186  80  106  3 days ago  40  February 03, 2023  194  mit  Python  
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.  
Pandas_exercises  9,277  25 days ago  30  bsd3clause  Jupyter Notebook  
Practice your pandas skills!  
Mlcourse.ai  8,803  4 months ago  4  other  Python  
Open Machine Learning Course  
Pandas Ai  8,624  a day ago  111  mit  Python  
PandasAI is the Python library that integrates Gen AI into pandas, making data analysis conversational  
Pygwalker  7,421  3  21 hours ago  72  August 03, 2023  22  apache2.0  Python  
PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis  
Ai Learn  6,991  a year ago  19  
人工智能学习路线图，整理近200个实战案例与项目，免费提供配套教材，零基础入门，就业实战！包括：Python，数学，机器学习，数据分析，深度学习，计算机视觉，自然语言处理，PyTorch tensorflow machinelearning,deeplearning dataanalysis datamining mathematics datascience artificialintelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域  
Cudf  5,970  2  16 hours ago  29  June 08, 2023  945  apache2.0  C++  
cuDF  GPU DataFrame Library  
Datasciencepython  4,776  5 months ago  11  mit  Python  
common data analysis and machine learning tasks using python 
Course in data science. Learn to analyze data of all types using the Python programming language. No programming experience is necessary.
Quick links: 📁 lessons ⏬ Lesson Schedule
Software covered:
Course topics include:
O'Reilly Media titles are free to UCSD affiliates with Safari Books Online.
Weekly takehome assignments will follow the course schedule, reinforcing skills with exercises to analyze and visualize scientific data. Assignments will given out on Wednesdays and will be due the following Wednesday, using TritonEd. Assignments are worth 8 points each and will be graded on effort, completeness, and accuracy.
You will choose a dataset of your own or provided in one of the texts and write a Python program (or set of Python programs or mixture of .ipynb and .py/.sh scripts) to carry out a revealing data analysis or create a software tool. Have a look at Shaw Ex4352 and McKinney Ch1012 for more ideas. The final project is worth 20 points and will be graded on effort, creativity, and fulfillment of the requirements below.
Requirements:
pandas
and one or more package from at least three (≥3) of the categories below:
matplotlib
, seaborn
bokeh
, pygal
, plotly
, mpld3
, nvd3
scipy
, statsmodels
, scikitlearn
scikitbio
, biopython
cdms
, iris
There are 100 points total possible for the course:
Participation is based on completing the precourse survey, showing up to class (when you are able), and completing the course evaluation (this is on the honor system as I won't know who completes it). There are no midterm or final exams.
The course consists of 20 lessons. As a class, it is taught as two lessons per week for 10 weeks, but the material can be covered at any pace.
Lessons 13 will be an introduction to the command line. By the end of this tutorial, everyone will be familiar with basic Unix commands.
Lessons 49 will be an introduction to programming using Python. The main text will be Shaw's Learn Python 3 the Hard Way. For those with experience in a programming language other than Python, Lutz's Learning Python will provide a more thorough introduction to programming Python. We will learn to use IPython and IPython Notebooks (also called Jupyter Notebooks), a much richer Python experience than the Unix command line or Python interpreter.
Lessons 1018 will focus on Python packages for data analysis. We will work through McKinney's Python for Data Analysis, which is all about analyzing data, doing statistics, and making pretty plots. You may find that Python can emulate or exceed much of the functionality of R and MATLAB.
Lessons 1920 conclude the course with two skills useful in developing code: writing your own classes and modules, and sharing your code on GitHub.
Lessons are available as .md or .ipynb files by clicking on the lesson numbers below. Readings should be completed while typing out the code (this is integral to the Shaw readings) and doing any Study Drills (Shaw) and Chapter Quizzes (Lutz).
Lesson  Title  Readings  Topics  Assignment 

1  Overview    Introductions and overview of course  Precourse survey; Acquire texts 
2  Command Line Part I  Shaw: Introduction, Ex0, Appendix A 
Command line crash course; Text editors  Assignment 1: Basic Shell Commands 
3  Command Line Part II  Yale: The 10 Most Important Linux Commands  Advanced commands in the bash shell   
4  Conda, IPython, and Jupyter Notebooks  Geohackweek: Introduction to Conda  Conda tutorial including Conda environments, Python packages, and PIP; Python and IPython in the command line; Jupyter notebook tutorial; Python crash course  Assignment 2: Bash, Conda, IPython, and Jupyter 
5  Python Basics, Strings, Printing  Shaw: Ex110; Lutz: Ch17  Python scripts, error messages, printing strings and variables, strings and string operations, numbers and mathematical expressions, getting help with commands and Ipython   
6  Taking Input, Reading and Writing Files, Functions  Shaw: Ex1126; Lutz: Ch9,1417  Taking input, reading files, writing files, functions  Assignment 3: Python Fundamentals I 
7  Logic, Loops, Lists, Dictionaries, and Tuples  Shaw: Ex2739; Lutz: Ch813  Logic and loops, lists and list comprehension, tuples, dictionaries, other types   
8  Python and IPython Review  McKinney: Ch1, Ch2, Ch3  Review of Python commands, IPython review  Assignment 4: Python Fundamentals II 
9  Regular Expressions  Kuchling: Regular Expression HOWTO  Regular expression syntax, Commandline tools: grep , sed , awk , perl e , Python examples: builtin and re module 
 
10  Numpy, Pandas and Matplotlib Crashcourse  Pratik: Introduction to Numpy and Pandas  Numpy, Pandas, and Matplotlib overview  Assignment 5: Regular Expressions 
11  Pandas Part I  McKinney: Ch4, Ch5  Introduction to NumPy and Pandas: ndarray , Series , DataFrame , index , columns , dtypes , info , describe , read_csv , head , tail , loc , iloc , ix , to_datetime

 
12  Pandas Part II  McKinney: Ch6, Ch7, Ch8  Data Analysis with Pandas: concat , append , merge , join , set_option , stack , unstack , transpose , dotnotation, values , apply , lambda , sort_index , sort_values , to_csv , read_csv , isnull

Assignment 6: Pandas Fundamentals 
13  Plotting with Matplotlib  McKinney: Ch9; Johansson: Matplotlib 2D and 3D plotting in Python  Matplotlib tutorial from J.R. Johansson   
14  Plotting with Seaborn  Seaborn Tutorial  Seaborn tutorial from Michael Waskom  Assignment 7: Plotting 
15  Pandas Time Series  McKinney: Ch11  Time series data in Pandas   
16  Pandas Group Operations  McKinney: Ch10 
groupby , melt , pivot , inplace=True , reindex

Assignment 8: Time Series and Group Operations 
17  Statistics Packages  Handbook of Biological Statistics  Statistics capabilities of Pandas, Numpy, Scipy, and Scikitbio   
18  Interactive Visualization with Bokeh  Bokeh User Guide  Quickstart guide to making interactive HTML and notebook plots with Bokeh  Assignment 9: Statistics and Interactive Visualization 
19  Modules and Classes  Shaw: Ex4052  Packaging your code so you and others can use it again   
20  Git and GitHub  GitHub Guides  Sharing your code in a public GitHub repository  Final Project 