Oreilly 2018

Alternatives To Oreilly 2018
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
2 years ago41apache-2.0Java
Asakusa Framework
Spark Sandbox27
4 years agoScala
A playground for Spark jobs.
Predictiveanalatics Using Dl4j And Apachespark26
6 years agoScala
Predictive analatics using deepLearning4j and Spark
Oreilly 201823
5 years ago2Scala
Spark Cassandra Collabfiltering22
a year agoapache-2.0Java
Collaborative filtering with MLLib on Spark based on data in Cassandra
Mastering Apache Spark 2x19
2 months agomitScala
Mastering Apache Spark 2x, published by Packt
Apache Spark 2 Data Processing And Real Time Analytics18
2 months ago2mitScala
Master complex big data processing, stream analytics, and machine learning with Apache Spark
Large Scale Machine Learning With Spark18
5 months agomit
Code repository for Large Scale Machine Learning with Spark by Packt
Opentsdb Spark17
7 years agomitXSLT
Access OpenTSDB data from Spark
Setting Up Spark16
6 years agoScala
Quick guide for setting up Spark project running on a local cluster
Alternatives To Oreilly 2018
Select To Compare

Alternative Project Comparisons

Oreilly - 2018

This tutorial can either be run in spark-shell or in an IDE (IntelliJ or Scala IDE for Eclipse)

Below are the steps for the setup.

Pre-requisites for Installation

Java/JDK 1.8+ has to be installed on the laptop before proceeding with the steps below.

Running in spark-shell

Download Spark 2.3.1

Download Spark 2.3.1 from here : http://spark.apache.org/downloads.html

Direct Download link : https://www.apache.org/dyn/closer.lua/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz

Install Spark 2.3.1 on Mac/Linux

tar -zxvf spark-2.3.1-bin-hadoop2.7.tgz

export PATH=$PATH:/<path_to_downloaded_spark>/spark-2.3.1-bin-hadoop2.7/bin

Running spark-shell on mac

  • spark-shell

Install Spark 2.3.1 on Windows

Unzip spark-2.3.1-bin-hadoop2.7.tgz

Add the spark bin directory to Path : ...\spark-2.3.1-bin-hadoop2.7\bin

Set up winutils.exe on Windows (not needed on mac)

  • download winutils.exe from https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin
  • move it to c:\hadoop\bin
  • set HADOOP_HOME in your environment variables
    • HADOOP_HOME = C:\hadoop
  • run from command prompt:
    • mkdir \tmp\hive
    • C:\hadoop\bin\winutils.exe chmod 777 \tmp\hive
  • run spark-shell from command prompt with extra conf parameter
    • spark-shell --driver-memory 2G --executor-memory 3G --executor-cores 2 --conf spark.sql.warehouse.dir=file:///c:/tmp/spark-warehouse

Pasting code in spark-shell

When pasting larger sections of the code in spark-shell, use the below:

scala> :paste

Running in IDE

If you prefer to use IDE over spark-shell, below are the steps.

You can either use IntelliJ or Scala IDE for Eclipse.


Scala IDE for Eclipse

Summary of Downloads needed

Have the following downloaded before the session


Nice to have

hadoop fs -copyToLocal /strata-nyc/transferlearning.tgz .

tar -zxvf transferlearning.tgz

pip install tensorflow

pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.3.0-cp27-none-linux_x86_64.whl

pip install numpy scipy

pip install scikit-learn

pip install pillow

pip install h5py

pip install keras

Popular Spark Projects
Popular Eclipse Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.