Learning Spark

Practical examples of using Apache Spark in several different use cases
Alternatives To Learning Spark
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Spark35,3262,39488216 hours ago46May 09, 2021217apache-2.0Scala
Apache Spark - A unified analytics engine for large-scale data processing
Cookbook11,362
3 months ago108apache-2.0
The Data Engineering Cookbook
God Of Bigdata7,992
a day ago2
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Zeppelin5,98132234 days ago2June 21, 2017134apache-2.0Java
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Sparkinternals4,665
a year ago27
Notes talking about the design and implementation of Apache Spark
Bigdl4,17910a day ago16April 19, 2021720apache-2.0Jupyter Notebook
Fast, distributed, secure AI for Big Data
Iceberg4,057
18 hours ago4May 23, 20221,304apache-2.0Java
Apache Iceberg
Tensorflowonspark3,849
518 days ago32April 21, 202211apache-2.0Python
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Koalas3,2281123 months ago47October 19, 2021109apache-2.0Python
Koalas: pandas API on Apache Spark
Spark Nlp3,1602218 hours ago90March 05, 202134apache-2.0Scala
State of the Art Natural Language Processing
Alternatives To Learning Spark
Select To Compare


Alternative Project Comparisons
Readme

Sean's learning-spark project

This repo contains various Spark projects I've created to help learn spark for myself, teach others, present, and other useful information I've accumulated.

Exactly Once Message Delivery with Kafka & Cassandra

The exactlyonce project is a demonstration of implementing Kafka's Exactly Once message delivery semantics with Spark Streaming, Kafka, and Cassandra.

StackOverflow.com Analysis

The stackanalysis project analyzes StackOverflow.com post data to discover insights in regards to Scala questions asked on the site.

This project accompanied a presentation for the Scala Toronto meetup group in the winter of 2015.

GitHub Events Streaming

The githubstream project consumes data directly from the public Github Events API and demonstrates some common streaming capabilities of Apache Spark.

This project accompanied a presentation for the Scala Up North Scala conference in the fall of 2015.

Popular Spark Projects
Popular Apache Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Javascript
Scala
Apache
Spark
Kafka
Streaming
Cassandra
Apache Spark