Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for apache spark pyspark
apache-spark
x
pyspark
x
63 search results found
Synapseml
⭐
4,967
Simple and Distributed Machine Learning
Spark Nlp
⭐
3,578
State of the Art Natural Language Processing
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Sparkit Learn
⭐
1,054
PySpark + Scikit-learn = Sparkit-learn
Quinn
⭐
572
pyspark methods to enhance developer productivity 📣 👯 🎉
Spark Gotchas
⭐
276
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Sql Data Analysis And Visualization Projects
⭐
200
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Azure Cosmosdb Spark
⭐
194
Apache Spark Connector for Azure Cosmos DB
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Pyspark Stubs
⭐
116
Apache (Py)Spark type annotations (stub files).
Dataproc Templates
⭐
103
Dataproc templates and pipelines for solving simple in-cloud data tasks
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Pyspark Cookbook
⭐
76
PySpark Cookbook, published by Packt
Jupyterlab Sparkmonitor
⭐
72
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Learn By Examples
⭐
72
Real-world Spark pipelines examples
Mmtf Pyspark
⭐
64
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Pyspark Setup Guide
⭐
54
A guide for setting up Spark + PySpark under Ubuntu linux
Mmtf Workshop 2018
⭐
53
Structural Bioinformatics Training Workshop & Hackathon 2018
Spark Hive Udf
⭐
47
Example project showing how to use Hive UDFs in Apache Spark
Sparkora
⭐
46
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Pyjaws
⭐
36
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Dlsa
⭐
33
Distributed least squares approximation (dlsa) implemented with Apache Spark
Spark Twitter Sentiment Analysis
⭐
33
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Docker Pyspark
⭐
28
Docker image of Apache Spark with its Python interface, pyspark.
Isarn Sketches Spark
⭐
27
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Pyspark Asyncactions
⭐
26
Asynchronous actions for PySpark
Odsc_india_2018
⭐
26
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Amazon Emr Vscode Toolkit
⭐
25
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Detecting Malicious Url Machine Learning
⭐
23
Spark For Data Engineers
⭐
22
Apache Spark for data engineers
Spark3d
⭐
22
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Apache Spark Docker
⭐
21
Dockerizing an Apache Spark Standalone Cluster
Spark Sframe
⭐
19
This project contains the code to translate between Apache Spark and SFrame.
Covid 19 Data Engineering Pipeline
⭐
19
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Bigdata Workshop Es
⭐
17
Workshop Big Data en Español
Admml
⭐
17
ADMM based Scalable Machine Learning on Spark
Setup Spark
⭐
17
✨ Setup Apache Spark in GitHub Action workflows
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Spark Fits
⭐
15
FITS data source for Spark SQL and DataFrames
Live_log_analyzer_spark
⭐
14
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Bigdata Spark
⭐
12
BerkeleyX: CS100.1x, Introduction to Big Data with Apache Spark
Sparkql
⭐
10
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
Sparkitecture
⭐
9
A collection of “cookbook-style” scripts for simplifying data engineering and machine learning in Apache Spark.
Xmltocsv_stackexchange
⭐
8
Xml to Csv converter for Large files using Apache Spark
Traffic Data Analysis With Apache Spark Based On Mobile Robot Data
⭐
7
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
Spark Tutorials
⭐
6
PySpark notebooks to learn Apache Spark (WIP)
Databrickstraining
⭐
6
Repository for Microsoft Databricks Training Events - Hosted by BlueGranite
Pyspark_pandas
⭐
6
Pyspark + pandas. This may get merged into the SparklingPandas project.
Pyspark Connectors
⭐
6
Spark Structured Streaming Kafka
⭐
5
Spark Structured Streaming + Kafka + Delta pipeline.
Sms Spam Filter Using Hortonworks
⭐
5
Build Spam Filter Model on HDP using Watson Studio Local
Dsr Spark Appliedml
⭐
5
DSR Class - Applied Machine Learning with Apache Spark
Docker Spark Anaconda
⭐
5
Spark and Anaconda in Docker
Apachespark Pyspark 2023
⭐
5
PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendimiento y un conjunto de herramientas integradas para el análisis y manejo de datos a gran escala.
Spark Streaming In Python
⭐
5
Apache Spark 3 - Structured Streaming Course Material
Pyspark Docker
⭐
5
PySpark in Docker Containers
Spark For Dummies
⭐
5
Mastering Spark 2 from the very beginning
Machine Learning Pipeline Lr Pyspark
⭐
5
Power Plant ML Pipeline Application - Apache Spark
Related Searches
Spark Pyspark (773)
Python Pyspark (689)
Jupyter Notebook Pyspark (502)
Scala Apache Spark (497)
1-63 of 63 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.