Awesome Open Source
Awesome Open Source


Source code accompanying book:

Data Science on the Google Cloud Platform
Valliappa Lakshmanan
O'Reilly, Jan 2017

Try out the code on Google Cloud Platform

Open in Cloud Shell

The code on Qwiklabs (see below) is continually tested, and this repo is kept up-to-date. The code should work as-is for you, however, there are three very common problems that readers report:

  • Ch 2: Download data fails. The Bureau of Transportation website to download the airline dataset periodically goes down or changes availability due to government furloughs and the like. Please use the instructions in 02_ingest/ to copy the data from my bucket. The rest of the chapters work off the data in the bucket, and will be fine.
  • Ch 3: Permission errors. These typically occur because we expect that you will copy the airline data to your bucket. You don't have write access to gs://cloud-training-demos-ml/. The instructions will tell you to change the bucket name to one that you own. Please do that.
  • Ch 4, 10: Dataflow doesn't do anything.. The real-time simulation requires that you simultaneously run and the Dataflow pipeline. If the Dataflow pipeline is not progressing, make sure that the simulate program is still running.

If the code doesn't work for you, I recommend that you try the corresponding Qwiklab lab to see if there is some step that you missed. If you still have problems, please leave feedback in Qwiklabs, or file an issue in this repo.

Try out the code on Qwiklabs

Purchase book

Read on-line or download PDF of book

Buy on

Updates to book

I updated the book in Nov 2019 with TensorFlow 2.0, Cloud Functions, and BigQuery ML.

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
jupyter-notebook (6,433
machine-learning (3,725
data-science (923
data-visualization (438
data-analysis (291
cloud-computing (72
data-engineering (53
data-processing (38
data-pipeline (25