Pyspark Cookbook

PySpark Cookbook, published by Packt
Alternatives To Pyspark Cookbook
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Pyspark Cookbook76
4 months agomitHTML
PySpark Cookbook, published by Packt
5 years ago1gpl-3.0HTML
A repository for a PySpark Cookbook by Tomasz Drabas and Denny Lee
2 years agogpl-3.0
A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.
Data Engineering9
a year agomit
Common data manipulations in different languages and frameworks.
Data Analysis7
3 years ago
Pyspark Dse Cookbook1
3 years agoPython
A series of PySpark recipes for interacting with the Apache Spark/Apache Cassandra/DSEFS* (HDFS) components of the Datastax Enterprise platform.
Alternatives To Pyspark Cookbook
Select To Compare

Alternative Project Comparisons

PySpark Cookbook

PySpark Cookbook

This is the code repository for PySpark Cookbook, published by Packt.

Over 60 recipes for implementing big data processing and analytics using Apache Spark and Python

What is this book about?

Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. The PySpark Cookbook presents effective and time-saving recipes for leveraging the power of Python and putting it to use in the Spark ecosystem.

This book covers the following exciting features:

  • Configure a local instance of PySpark in a virtual environment
  • Install and configure Jupyter in local and multi-node environments
  • Create DataFrames from JSON and a dictionary using pyspark.sql
  • Explore regression and clustering models available in the ML module
  • Use DataFrames to transform data used for modeling

If you feel this book is for you, get your copy today!

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

if [ "${_check_R_req}" = "true" ]; then

Following is what you need for this book: The PySpark Cookbook is for you if you are a Python developer looking for hands-on recipes for using the Apache Spark 2.x ecosystem in the best possible way. A thorough understanding of Python (and some familiarity with Spark) will help you get the best out of the book.

With the following software and hardware list you can run all code files present in the book (Chapter 1-8).

Software and Hardware List

Chapter Software required OS required
1-8 Apache Spark, Python, Jupyter, Cloudera QuickStart VM Linux distro (preferably Ubuntu >14.04)

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Get to Know the Authors

Denny Lee Denny Lee is a technology evangelist at Databricks. He is a hands-on data science engineer with 15+ years of experience. His key focuses are solving complex large-scale data problems—providing not only architectural direction but hands-on implementation of such systems. He has extensive experience of building greenfield teams as well as being a turnaround/change catalyst. Prior to joining Databricks, he was a senior director of data science engineering at Concur and was part of the incubation team that built Hadoop on Windows and Azure (currently known as HDInsight).

Tomasz Drabas Tomasz Drabas is a data scientist specializing in data mining, deep learning, machine learning, choice modeling, natural language processing, and operations research. He is the author of Learning PySpark and Practical Data Analysis Cookbook. He has a PhD from University of New South Wales, School of Aviation. His research areas are machine learning and choice modeling for airline revenue management.

Other books by the authors

Suggestions and Feedback

Click here if you have any feedback or suggestions.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

Popular Cookbook Projects
Popular Pyspark Projects
Popular Learning Resources Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Apache Spark