Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python spark
python
x
spark
x
733 search results found
Flytekit
⭐
175
Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.
Spark Ml Streaming
⭐
175
Visualize streaming machine learning in Spark
Visions
⭐
166
Type System for Data Analysis in Python
Avenir
⭐
164
Set of Machine Learning and Stochastic Optimazion tools based on Hadoop, Spark and Storm https://pkghosh.wordpress.com/
Cape Dataframes
⭐
162
Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.
Juicy Bigdata
⭐
162
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Qb
⭐
160
QANTA Quiz Bowl AI
Seqr
⭐
158
web-based analysis tool for rare disease genomics
Lakehouse Engine
⭐
154
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Aztk
⭐
152
AZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Spark Extension
⭐
152
A library that provides useful extensions to Apache Spark and PySpark.
Ipython Spark Docker
⭐
151
Data Algorithms With Spark
⭐
151
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Geopyspark
⭐
151
GeoTrellis for PySpark
Tuhi
⭐
144
An application to access Wacom SmartPad devices
Albedo
⭐
142
A recommender system for discovering GitHub repos, built with Apache Spark
Sparknotebook
⭐
142
An example of running Apache Spark using Scala in ipython notebook
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Sqlflow
⭐
132
SQLflow based on python development, support to Spark, as the underlying distributed computing engine, through a set of unified configuration file to complete the batch, flow calculation, the Rest service development.
Spark Ar Creators
⭐
130
List of 9500 (and counting) Spark AR Creators. Open an issue or contact me if you want to be added.❤️
Handyspark
⭐
129
HandySpark - bringing pandas-like capabilities to Spark dataframes
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Trace Analysis
⭐
128
Scripts to analyze Spark's performance
Python Bigdata
⭐
128
Data science and Big Data with Python
Easy_sql
⭐
126
A library developed to ease the data ETL development process.
Emr Serverless Samples
⭐
124
Example code for running Spark and Hive jobs on EMR Serverless.
Data Science Tutorials
⭐
124
Python Tutorials for Data Science
Pyspark Stubs
⭐
116
Apache (Py)Spark type annotations (stub files).
Spark Df Profiling
⭐
115
Create HTML profiling reports from Apache Spark DataFrames
Movalytics Data Warehouse
⭐
114
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Coffeehack
⭐
113
Hack of our Jura coffee machine
Spark Knn Recommender
⭐
113
Item and User-based KNN recommendation algorithms using PySpark
Ml Resource
⭐
110
A concise resource repository for machine learning
Spark Website
⭐
109
Apache Spark Website
Yelper_recommendation_system
⭐
108
Yelper recommendation system
Webcrawlerforonlineinflation
⭐
107
Price Crawler - Tracking Price Inflation
De Zoomcamp Ui
⭐
107
🎨 UI for the Free Data Engineering Zoomcamp 2023 Course provided by DataTalksClub
Frank Kanes Taming Big Data With Apache Spark And Python
⭐
106
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Docs
⭐
102
《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程爬虫; Scrapy介绍 ;Scrapy-redis介绍; 使用docker部署; 使用nomad管理docker集群; 使用EFK查询docker日志
Pilotscope
⭐
102
PilotScope is a middleware to bridge the gaps of deploying AI4DB (Artificial Intelligence for Databases) algorithms into actual database systems.
Recordflux
⭐
100
Formal specification and generation of verifiable binary parsers, message generators and protocol state machines
Nd027 C3 Data Lakes With Spark
⭐
99
Til
⭐
99
Today I Learned. Daily commit.
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Streamify
⭐
97
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
Pyspark2pmml
⭐
93
Python library for converting Apache Spark ML pipelines to PMML
Relation_extraction
⭐
93
Relation Extraction using Deep learning(CNN)
Stream4flow
⭐
93
A framework for the real-time network traffic analysis based on world-leading technologies for distributed stream processing, network traffic monitoring, and visualization.
Big Data Engineering Coursera Yandex
⭐
91
Big Data for Data Engineers Coursera Specialization from Yandex
Maggy
⭐
88
Distribution transparent Machine Learning experiments on Apache Spark
Qbox Blog Code
⭐
88
Code reference from my Qbox blog posts.
Phrase At Scale
⭐
84
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Dlflow
⭐
84
DLFlow is a deep learning framework.
Pyspark Cassandra
⭐
81
PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.
Spark
⭐
81
Publication quality NGS track plotting
Spark_python_ml_examples
⭐
81
Spark 2.0 Python Machine Learning examples
Tiledb Vcf
⭐
79
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Lehar
⭐
79
Visualize data using relative ordering
Spark Ui Proxy
⭐
78
Lightweight proxy to expose the UI of an Apache Spark cluster that is behind a firewall
Deeplearningreading
⭐
78
Deep Learning and Machine Learning mini-projects. Current Project: Deepmind Attentive Reader (rc-data)
Twitter Sentiment Analysis Using Spark Streaming And Kafka
⭐
78
Twitter Sentiment Analysis using Spark and Kafka
Cqu_bigdata
⭐
77
重庆大学计算机学院“大数据课程群”实验及PPT
Resilient Ml Research Platform
⭐
76
Sparkadmm
⭐
75
Generic Implementation of Consensus ADMM over Spark
Holoclean Legacy Deprecated
⭐
74
A Machine Learning System for Data Enrichment.
Python Spark Streaming
⭐
73
Luigi Warehouse
⭐
73
A luigi powered analytics / warehouse stack
Labs
⭐
73
Research on distributed system
Algobox
⭐
72
Open Source algorithmic trading platform in Java / Python
Sit742
⭐
72
SIT742: Modern Data Science
Google Finance Stock Data Analysis
⭐
69
Developed a high performance data processing platform using Apache Kafka, Apache Cassandra, and Apache Spark to analyze stock price and related stock tweets sentiment.
Sparksteps
⭐
68
⭐ CLI tool to launch Spark jobs on AWS EMR
Jgit Spark Connector
⭐
67
jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Deepsentiment
⭐
67
Speech Emotion Recognition using FFT and SVM
Pyspark Cassandra
⭐
67
pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Airflow Spark Operator Plugin
⭐
66
A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator
Fink Broker
⭐
66
Astronomy Broker based on Apache Spark
Sparklingml
⭐
65
Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)
Mltoolkits
⭐
65
learningOrchestra is a distributed Machine Learning integration tool that facilitates and streamlines iterative processes in a Data Science project.
Airflow Spark
⭐
64
Docker with Airflow and Spark standalone cluster
Pythom
⭐
64
Code supporting Data Science articles at The Marketing Technologist, Floryn Tech Blog, and Pythom.nl
Apache Spark Hands On
⭐
64
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Pypmml
⭐
64
Python PMML scoring library
Pyspark_dist_explore
⭐
64
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Frovedis
⭐
62
Framework of vectorized and distributed data analytics
Panbin Python
⭐
61
python开源项目
Python_master_courses
⭐
61
人生苦短 我用Python
Pypardis
⭐
61
A parallel distributed implementation of DBSCAN on Spark using Python
Sparkly
⭐
60
Helpers & syntactic sugar for PySpark.
Coursework
⭐
59
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Mylearningnotes
⭐
58
Because its never late to start taking notes and 'public' it...
Datapipeline
⭐
57
Real time stock data pipeline --play with Kafka, Cassandra, Spark, Redis, Node.js, Zookeeper
Learning
⭐
57
Walkthrough notebooks for Deep Learning, Machine Learning, Reinforcement Learning, Spark, Statistics, Algorithms, Scala, Python
Serverless Spark Workshop
⭐
56
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
Pybigdata
⭐
56
使用 python 操作大数据的各种组件
Rl Bakery
⭐
55
RL-Bakery makes it easy to build production, large scale, batch Deep Reinforcement Learning applications.
Onetl
⭐
55
One ETL tool to rule them all
Spark Examples
⭐
54
RAPIDS Spark examples
Related Searches
Python Django (28,897)
Python Flask (17,643)
Python Dataset (14,792)
Python Pytorch (14,667)
Python Tensorflow (14,376)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Command Line (13,351)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
101-200 of 733 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.