Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python spark
python
x
spark
x
1,076 search results found
Spark
⭐
35,928
Apache Spark - A unified analytics engine for large-scale data processing
Data Science Ipython Notebooks
⭐
25,025
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Redash
⭐
23,271
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Horovod
⭐
13,339
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Deeplearning4j
⭐
12,966
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Ds Cheatsheets
⭐
11,535
List of Data Science Cheatsheets to rule the world
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
Dagster
⭐
7,590
An orchestration platform for the development, production, and observation of data assets.
H2o 3
⭐
6,303
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Dev Setup
⭐
5,802
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Technical Books
⭐
5,129
😆 国内外互联网技术大牛们都写了哪些书籍:计算机基础、网络、前端、后端、数据库、架构、大数据、深度学习.
Mage Ai
⭐
4,796
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Bigdl
⭐
4,223
Fast, distributed, secure AI for Big Data
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Sqlglot
⭐
3,320
Python SQL Parser and Transpiler
Koalas
⭐
3,228
Koalas: pandas API on Apache Spark
Blaze
⭐
2,949
NumPy and Pandas interface to Big Data
Ibis
⭐
2,769
The flexibility of Python with the scale and performance of modern SQL.
Dpark
⭐
2,637
Python clone of Spark, a MapReduce alike framework in Python
Analytics Zoo
⭐
2,565
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Spark Deep Learning
⭐
1,915
Deep Learning Pipelines for Apache Spark
Benchm Ml
⭐
1,839
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Home
⭐
1,707
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Petastorm
⭐
1,614
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Fugue
⭐
1,600
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Elephas
⭐
1,548
Distributed Deep learning with Keras & Spark
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Mleap
⭐
1,454
MLeap: Deploy ML Pipelines to Production
Seldon Server
⭐
1,420
Machine Learning Platform and Recommendation Engine built on Kubernetes
Optimus
⭐
1,382
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Cloudpickle
⭐
1,352
Extended pickling support for Python objects
Aws Glue Samples
⭐
1,286
AWS Glue code samples
Sparkmagic
⭐
1,210
Jupyter magics and kernels for working with remote Spark clusters
Bigflow
⭐
1,122
Baidu Bigflow is an interface that allows for writing distributed computing programs and provides lots of simple, flexible, powerful APIs. Using Bigflow, you can easily handle data of any scale. Bigflow processes 4P+ data inside Baidu and runs about 10k jobs every day.
Machine Learning
⭐
1,046
机器学习原理
Spark Sklearn
⭐
1,039
(Deprecated) Scikit-learn integration package for Apache Spark
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Pixiedust
⭐
1,030
Python Helper library for Jupyter Notebooks
Adam
⭐
946
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Around Dataengineering
⭐
926
A Data Engineering & Machine Learning Knowledge Hub
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Incubator Livy
⭐
773
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Spark Movie Lens
⭐
757
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Cdap
⭐
707
An open source framework for building data analytic applications.
Machinelearning
⭐
684
Machine learning resources,including algorithm, paper, dataset, example and so on.
Splink
⭐
661
Fast, accurate and scalable probabilistic data linkage using your choice of SQL backend
Devops Python Tools
⭐
659
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Pythondatascience Collections
⭐
615
最全数据分析资料汇总(含python、爬虫、数据库、大数据、tableau、统计学等)
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Flintrock
⭐
604
A command-line tool for launching Apache Spark clusters.
Elasticsearch Spark Recommender
⭐
603
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Python Data Science Cheatsheet
⭐
590
Python数据科学速查表
Sparkmeasure
⭐
561
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Enterprise_gateway
⭐
558
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Listenbrainz Server
⭐
556
Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
Eat_pyspark_in_10_days
⭐
534
pyspark🍒🥭 is delicious,just eat it!😋😋
Aws Glue Libs
⭐
514
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Lopq
⭐
512
Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.
Timliu Python
⭐
492
python资源集合与开源硬件
Data Science Learning Resources
⭐
490
A collection of data science and machine learning resources that I've found helpful (I only post what I've read!)
Traceml
⭐
473
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Popmon
⭐
450
Monitor the stability of a Pandas or Spark dataframe ⚙︎
Agile_data_code_2
⭐
435
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Findspark
⭐
428
Recommendersystems
⭐
421
推荐系统
Complete Life Cycle Of A Data Science Project
⭐
417
Complete-Life-Cycle-of-a-Data-Science-Project
Zat
⭐
393
Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Azuredatabricksbestpractices
⭐
377
Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Spark Ec2
⭐
367
Scripts used to setup a Spark cluster on EC2
Learning Resource
⭐
351
列出一些优秀的程序员学习资源
Sparklingpandas
⭐
338
Sparkling Pandas
Elasticluster
⭐
317
Create clusters of VMs on the cloud and configure them with Ansible.
Spark Standalone Cluster On Docker
⭐
311
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡️
Tensorspark
⭐
302
TensorFlow on Spark
Sparktorch
⭐
297
Train and run Pytorch models on Apache Spark.
Learning Pyspark
⭐
294
Code repository for Learning PySpark by Packt
Sparrow
⭐
292
Sparrow scheduling platform (U.C. Berkeley).
Sparkflow
⭐
290
Easy to use library to bring Tensorflow on Apache Spark
Datacompy
⭐
289
Pandas and Spark DataFrame comparison for humans
Sk Dist
⭐
283
Distributed scikit-learn meta-estimators in PySpark
Cc Pyspark
⭐
280
Process Common Crawl data with Python and Spark
Tidb Docker Compose
⭐
278
Azure Event Hubs
⭐
277
☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Beginner_de_project
⭐
276
Beginner data engineering project - batch edition
Pyspark Style Guide
⭐
264
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
Openuba
⭐
264
A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Sagemaker Spark
⭐
263
A Spark library for Amazon SageMaker.
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Butterfree
⭐
249
A tool for building feature stores.
Bisheserver
⭐
242
本系统是我的毕业设计项目,题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为
Installations_mac_ubuntu_windows
⭐
233
Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).
Data_science_blogs
⭐
232
A repository to keep track of all the code that I end up writing for my blog posts.
Gimel
⭐
230
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Intro_ds
⭐
229
Code to accompany Mastering Data Science from PT press
Raydp
⭐
227
RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Spark Recommendation Engine
⭐
222
Joblib Spark
⭐
221
Joblib Apache Spark Backend
Ngods Stocks
⭐
217
New Generation Opensource Data Stack Demo
Learningapachespark
⭐
192
LearningApacheSpark
Related Searches
Python Python3 (857,414)
Python Flask (16,475)
Python Dataset (14,792)
Python Pytorch (14,667)
Python Machine Learning (14,099)
Python Docker (13,757)
Python Tensorflow (13,736)
Python Command Line (13,209)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
1-100 of 1,076 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.