Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Ibis	3,404	24	29	4 months ago	68	December 10, 2023	157	apache-2.0	Python
The flexibility of Python with the scale and performance of modern SQL.
Devops Python Tools	709			4 months ago			37	mit	Python
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Sagemaker Spark	285	2		8 months ago	36	August 26, 2022	34	apache-2.0	Scala
A Spark library for Amazon SageMaker.
Spark Jupyter Aws	255			7 years ago			2		Jupyter Notebook
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Aut	128			10 months ago	27	November 17, 2022	3	apache-2.0	Scala
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Spark With Python	98			4 years ago				mit	Jupyter Notebook
Fundamentals of Spark with Python (using PySpark), code examples
Apachespark	59			2 years ago					Python
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Big_data	55			4 months ago				mit	Jupyter Notebook
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Spark Training	52			3 years ago			3		Jupyter Notebook
Repository used for Spark Trainings
Datapipelines Essentials Python	45			a year ago			1	apache-2.0	Python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Alternatives To Pyspark Ml

Select To Compare

Ibis ⭐ 3,404

The flexibility of Python with the scale and performance of modern SQL.

dependent packages 29total releases 68most recent commit 4 months ago

Devops Python Tools ⭐ 709

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

most recent commit 4 months ago

Sagemaker Spark ⭐ 285

A Spark library for Amazon SageMaker.

total releases 36most recent commit 8 months ago

Spark Jupyter Aws ⭐ 255

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

most recent commit 7 years ago

Aut ⭐ 128

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

total releases 27most recent commit 10 months ago

Spark With Python ⭐ 98

Fundamentals of Spark with Python (using PySpark), code examples

most recent commit 4 years ago

Apachespark ⭐ 59

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

most recent commit 2 years ago

Big_data ⭐ 55

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.

most recent commit 4 months ago

Spark Training ⭐ 52

Repository used for Spark Trainings

most recent commit 3 years ago

Datapipelines Essentials Python ⭐ 45

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

most recent commit a year ago

Suggest An Alternative To Pyspark-ML

Alternative Project Comparisons

Pyspark Ml vs Ibis

Pyspark Ml vs Devops Python Tools

Pyspark Ml vs Sagemaker Spark

Pyspark Ml vs Spark Jupyter Aws

Pyspark Ml vs Aut

Pyspark Ml vs Spark With Python

Pyspark Ml vs Apachespark

Pyspark Ml vs Big_data

Pyspark Ml vs Spark Training

Pyspark Ml vs Datapipelines Essentials Python

Popular Hadoop Projects

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

dependent packages 939total releases 46latest release May 09, 2021most recent commit 4 months ago

Data Science Ipython Notebooks ⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

most recent commit 7 months ago

Xgboost ⭐ 25,253

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

dependent packages 972total releases 79latest release November 13, 2023most recent commit 4 months ago

Luigi ⭐ 17,046

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

dependent packages 76total releases 80latest release October 05, 2023most recent commit 4 months ago

Apijson ⭐ 16,586

🏆 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码，前端(客户端) 定制返回 JSON 的数据和结构。 🏆 A JSON Transmission Protocol and an ORM Library 🚀 provides APIs and Docs without writing any code.

most recent commit a month ago

Popular Pyspark Projects

Cheatsheets Ai ⭐ 13,281

Essential Cheat Sheets for deep learning and machine learning researchers https://medium.com/@kailashahirwar/essential-cheat

most recent commit 5 years ago

Synapseml ⭐ 4,967

Simple and Distributed Machine Learning

dependent packages 6total releases 12latest release November 27, 2023most recent commit 21 days ago

Spark Nlp ⭐ 3,578

State of the Art Natural Language Processing

dependent packages 30total releases 134latest release December 08, 2023most recent commit 4 months ago

Linkis ⭐ 3,224

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

dependent packages 38total releases 3latest release July 29, 2023most recent commit a month ago

Machine Learning ⭐ 2,607

:earth_americas: machine learning tutorials (mainly in Python3)

most recent commit 4 months ago

Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories

No Spam. Unsubscribe easily at any time.

Jupyter Notebook

Hadoop

Mnist

Imdb

Pyspark

Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.