Course in data science. Learn to analyze data of all types using the Python programming language. No programming experience is necessary.
Quick links: 📁 lessons ⏬ Lesson Schedule
Software covered:
Course topics include:
O'Reilly Media titles are free to UCSD affiliates with Safari Books Online.
Weekly takehome assignments will follow the course schedule, reinforcing skills with exercises to analyze and visualize scientific data. Assignments will given out on Wednesdays and will be due the following Wednesday, using TritonEd. Assignments are worth 8 points each and will be graded on effort, completeness, and accuracy.
You will choose a dataset of your own or provided in one of the texts and write a Python program (or set of Python programs or mixture of .ipynb and .py/.sh scripts) to carry out a revealing data analysis or create a software tool. Have a look at Shaw Ex4352 and McKinney Ch1012 for more ideas. The final project is worth 20 points and will be graded on effort, creativity, and fulfillment of the requirements below.
Requirements:
pandas
and one or more package from at least three (≥3) of the categories below:
matplotlib
, seaborn
bokeh
, pygal
, plotly
, mpld3
, nvd3
scipy
, statsmodels
, scikitlearn
scikitbio
, biopython
cdms
, iris
There are 100 points total possible for the course:
Participation is based on completing the precourse survey, showing up to class (when you are able), and completing the course evaluation (this is on the honor system as I won't know who completes it). There are no midterm or final exams.
The course consists of 20 lessons. As a class, it is taught as two lessons per week for 10 weeks, but the material can be covered at any pace.
Lessons 13 will be an introduction to the command line. By the end of this tutorial, everyone will be familiar with basic Unix commands.
Lessons 49 will be an introduction to programming using Python. The main text will be Shaw's Learn Python 3 the Hard Way. For those with experience in a programming language other than Python, Lutz's Learning Python will provide a more thorough introduction to programming Python. We will learn to use IPython and IPython Notebooks (also called Jupyter Notebooks), a much richer Python experience than the Unix command line or Python interpreter.
Lessons 1018 will focus on Python packages for data analysis. We will work through McKinney's Python for Data Analysis, which is all about analyzing data, doing statistics, and making pretty plots. You may find that Python can emulate or exceed much of the functionality of R and MATLAB.
Lessons 1920 conclude the course with two skills useful in developing code: writing your own classes and modules, and sharing your code on GitHub.
Lessons are available as .md or .ipynb files by clicking on the lesson numbers below. Readings should be completed while typing out the code (this is integral to the Shaw readings) and doing any Study Drills (Shaw) and Chapter Quizzes (Lutz).
Lesson  Title  Readings  Topics  Assignment 

1  Overview    Introductions and overview of course  Precourse survey; Acquire texts 
2  Command Line Part I  Shaw: Introduction, Ex0, Appendix A 
Command line crash course; Text editors  Assignment 1: Basic Shell Commands 
3  Command Line Part II  Yale: The 10 Most Important Linux Commands  Advanced commands in the bash shell   
4  Conda, IPython, and Jupyter Notebooks  Geohackweek: Introduction to Conda  Conda tutorial including Conda environments, Python packages, and PIP; Python and IPython in the command line; Jupyter notebook tutorial; Python crash course  Assignment 2: Bash, Conda, IPython, and Jupyter 
5  Python Basics, Strings, Printing  Shaw: Ex110; Lutz: Ch17  Python scripts, error messages, printing strings and variables, strings and string operations, numbers and mathematical expressions, getting help with commands and Ipython   
6  Taking Input, Reading and Writing Files, Functions  Shaw: Ex1126; Lutz: Ch9,1417  Taking input, reading files, writing files, functions  Assignment 3: Python Fundamentals I 
7  Logic, Loops, Lists, Dictionaries, and Tuples  Shaw: Ex2739; Lutz: Ch813  Logic and loops, lists and list comprehension, tuples, dictionaries, other types   
8  Python and IPython Review  McKinney: Ch1, Ch2, Ch3  Review of Python commands, IPython review  Assignment 4: Python Fundamentals II 
9  Regular Expressions  Kuchling: Regular Expression HOWTO  Regular expression syntax, Commandline tools: grep , sed , awk , perl e , Python examples: builtin and re module 
 
10  Numpy, Pandas and Matplotlib Crashcourse  Pratik: Introduction to Numpy and Pandas  Numpy, Pandas, and Matplotlib overview  Assignment 5: Regular Expressions 
11  Pandas Part I  McKinney: Ch4, Ch5  Introduction to NumPy and Pandas: ndarray , Series , DataFrame , index , columns , dtypes , info , describe , read_csv , head , tail , loc , iloc , ix , to_datetime

 
12  Pandas Part II  McKinney: Ch6, Ch7, Ch8  Data Analysis with Pandas: concat , append , merge , join , set_option , stack , unstack , transpose , dotnotation, values , apply , lambda , sort_index , sort_values , to_csv , read_csv , isnull

Assignment 6: Pandas Fundamentals 
13  Plotting with Matplotlib  McKinney: Ch9; Johansson: Matplotlib 2D and 3D plotting in Python  Matplotlib tutorial from J.R. Johansson   
14  Plotting with Seaborn  Seaborn Tutorial  Seaborn tutorial from Michael Waskom  Assignment 7: Plotting 
15  Pandas Time Series  McKinney: Ch11  Time series data in Pandas   
16  Pandas Group Operations  McKinney: Ch10 
groupby , melt , pivot , inplace=True , reindex

Assignment 8: Time Series and Group Operations 
17  Statistics Packages  Handbook of Biological Statistics  Statistics capabilities of Pandas, Numpy, Scipy, and Scikitbio   
18  Interactive Visualization with Bokeh  Bokeh User Guide  Quickstart guide to making interactive HTML and notebook plots with Bokeh  Assignment 9: Statistics and Interactive Visualization 
19  Modules and Classes  Shaw: Ex4052  Packaging your code so you and others can use it again   
20  Git and GitHub  GitHub Guides  Sharing your code in a public GitHub repository  Final Project 