Spark With Python Alternatives

Name: tirthajyoti/Spark-with-Python
Brand: tirthajyoti/Spark-with-Python
SKU: project/tirthajyoti/Spark-with-Python
Rating: 4.46 (98 reviews)

Fundamentals of Spark with Python (using PySpark), code examples

Categories > Data Processing > Database

Suggest Alternative

Stars

Alternatives

License

mit

Open Issues

Most Recent Commit

about 6 years ago

Programming Language

Jupyter Notebook

Dependent Repos

Dependent Packages

Total Releases

Categories

Programming Languages > Python

Data Processing > Jupyter Notebook

Data Storage > Database

Machine Learning > Machine Learning

Data Processing > Sql

Web Servers > Apache

Data Processing > Spark

Data Processing > Hadoop

Data Processing > Big Data

Control Flow > Dataframe

Data Storage > Hdfs

Data Processing > Mapreduce

Data Processing > Apache Spark

Data Processing > Pyspark

Control Flow > Parallel Computing

Repo

Alternatives To tirthajyoti/Spark-with-Python

Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
microsoft/SynapseML	5,228	0	6	about 1 month ago	12	November 27, 2023	335	mit	Scala
Simple and Distributed Machine Learning
JohnSnowLabs/spark-nlp	3,578	0	30	over 2 years ago	134	December 08, 2023	43	apache-2.0	Scala
State of the Art Natural Language Processing
apache/linkis	3,407	0	38	6 days ago	3	July 29, 2023	215	apache-2.0	Java
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
ibis-project/ibis	3,404	24	29	over 2 years ago	68	December 10, 2023	157	apache-2.0	Python
The flexibility of Python with the scale and performance of modern SQL.
uber/petastorm	1,693	0	8	over 2 years ago	86	February 03, 2023	174	apache-2.0	Python
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
hi-primus/optimus	1,540	0	0	over 1 year ago	32	June 19, 2022	29	apache-2.0	Python
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
jadianes/spark-py-notebooks	1,515	0	0	over 3 years ago	0		9	other	Jupyter Notebook
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
combust/mleap	1,479	15	12	over 2 years ago	26	May 07, 2021	109	apache-2.0	Scala
MLeap: Deploy ML Pipelines to Production
awesome-spark/awesome-spark	1,461	0	0	about 3 years ago	0		20	cc0-1.0	Shell
A curated list of awesome Apache Spark packages and resources.
jupyter-incubator/sparkmagic	1,272	25	6	over 2 years ago	54	September 13, 2023	156	other	Python
Jupyter magics and kernels for working with remote Spark clusters

Alternatives To tirthajyoti/Spark-with-Python

Select To Compare

microsoft/SynapseML ⭐ 5,228

Simple and Distributed Machine Learning

dependent packages 6 total releases 12 most recent commit about 1 month ago

JohnSnowLabs/spark-nlp ⭐ 3,578

State of the Art Natural Language Processing

dependent packages 30 total releases 134 most recent commit over 2 years ago downloads badge

apache/linkis ⭐ 3,407

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

dependent packages 38 total releases 3 most recent commit 6 days ago

ibis-project/ibis ⭐ 3,404

The flexibility of Python with the scale and performance of modern SQL.

dependent packages 29 total releases 68 most recent commit over 2 years ago downloads badge

uber/petastorm ⭐ 1,693

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

dependent packages 8 total releases 86 most recent commit over 2 years ago downloads badge

hi-primus/optimus ⭐ 1,540

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

dependent packages 0 total releases 32 most recent commit over 1 year ago downloads badge

jadianes/spark-py-notebooks ⭐ 1,515

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

dependent packages 0 total releases 0 most recent commit over 3 years ago

combust/mleap ⭐ 1,479

MLeap: Deploy ML Pipelines to Production

dependent packages 12 total releases 26 most recent commit over 2 years ago

awesome-spark/awesome-spark ⭐ 1,461

A curated list of awesome Apache Spark packages and resources.

dependent packages 0 total releases 0 most recent commit about 3 years ago

jupyter-incubator/sparkmagic ⭐ 1,272

Jupyter magics and kernels for working with remote Spark clusters

dependent packages 6 total releases 54 most recent commit over 2 years ago downloads badge

Suggest An Alternative To Spark-with-Python

Alternative Project Comparisons

tirthajyoti/Spark-with-Python vs Synapseml

tirthajyoti/Spark-with-Python vs Spark Nlp

tirthajyoti/Spark-with-Python vs Linkis

tirthajyoti/Spark-with-Python vs Ibis

tirthajyoti/Spark-with-Python vs Petastorm

tirthajyoti/Spark-with-Python vs Optimus

tirthajyoti/Spark-with-Python vs Spark Py Notebooks

tirthajyoti/Spark-with-Python vs Mleap

tirthajyoti/Spark-with-Python vs Awesome Spark

tirthajyoti/Spark-with-Python vs Sparkmagic

Popular Pyspark Projects

kailashahirwar/cheatsheets-ai⭐ 13,281

Essential Cheat Sheets for deep learning and machine learning researchers https://medium.com/@kailashahirwar/essential-cheat-sheets-for-machine-learning-and-deep-learning-researchers-efb6a8ebd2e5

ethen8181/machine-learning⭐ 2,607

:earth_americas: machine learning tutorials (mainly in Python3)

baidu/bigflow⭐ 1,122

Baidu Bigflow is an interface that allows for writing distributed computing programs and provides lots of simple, flexible, powerful APIs. Using Bigflow, you can easily handle data of any scale. Bigflow processes 4P+ data inside Baidu and runs about 10k jobs every day.

lensacom/sparkit-learn⭐ 1,054

PySpark + Scikit-learn = Sparkit-learn

logicalclocks/hopsworks⭐ 1,041

Hopsworks - Data-Intensive AI platform with a Feature Store

Popular Spark Projects

apache/spark⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

donnemartin/data-science-ipython-notebooks⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

getredash/redash⭐ 24,479

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

yeasy/docker_practice⭐ 23,279

Learn and understand Docker&Container technologies, with real DevOps practice!

DataTalksClub/data-engineering-zoomcamp⭐ 19,461

Free Data Engineering course!

Popular Data Processing Categories

Jupyter Notebook

Dataset

Sql

Validation

Pipeline

Translation

Data Science

Classification

Transaction

Scraper