Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for pipeline spark
pipeline
x
spark
x
70 search results found
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Cube Studio
⭐
1,710
cube studio开源云原生一站式机器学习/深度学习AI平台,支持sso登录,多租户/多项目组,数据资产对
Mleap
⭐
1,479
MLeap: Deploy ML Pipelines to Production
Digandburied
⭐
645
挖坑与填坑
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Keystone
⭐
472
Simplifying robust end-to-end machine learning on Apache Spark.
Sparkflow
⭐
301
Easy to use library to bring Tensorflow on Apache Spark
Koober
⭐
301
Sparktorch
⭐
297
Train and run Pytorch models on Apache Spark.
Big Data Rosetta Code
⭐
283
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Butterfree
⭐
269
A tool for building feature stores.
Jpmml Sparkml
⭐
265
Java library and command-line application for converting Apache Spark ML pipelines to PMML
Hydro Serving
⭐
248
MLOps Platform
Whylogs Java
⭐
179
Profile and monitor your ML data pipeline end-to-end
Setl
⭐
177
A simple Spark-powered ETL framework that just works 🍺
Envelope
⭐
133
Build configuration-driven ETL pipelines on Apache Spark
Spark Nlp Models
⭐
100
Models and Pipelines for the Spark NLP library
Pyspark2pmml
⭐
93
Python library for converting Apache Spark ML pipelines to PMML
Qstreaming
⭐
89
A simplified, lightweight ETL pipeline framework for build stream/batch processing applications on top of Apache Spark
Smart Data Lake
⭐
87
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Mleap
⭐
76
MLeap allows for easily putting Spark ML pipelines into production
Data Engineering Nanodegree
⭐
76
Projects done in the Data Engineering Nanodegree by Udacity.com
Learn By Examples
⭐
72
Real-world Spark pipelines examples
Pipeline
⭐
68
Complete Pipeline Training at Big Data Scala By the Bay
Jgit Spark Connector
⭐
67
jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Sparklingml
⭐
65
Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)
Data Processing Pipeline
⭐
59
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra
Datapipeline
⭐
57
Real time stock data pipeline --play with Kafka, Cassandra, Spark, Redis, Node.js, Zookeeper
Data Stream Development With Apache Spark Kafka And Spring Boot
⭐
54
Data Stream Development with Apache Spark, Kafka and Spring Boot by Packt Publishing
Lighthouse
⭐
54
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
Pravda Ml
⭐
52
This project is used to capture machine learning pipelines created on top of Spark as OK
Bpmn.ai
⭐
49
Machine learning around business processes
Aardpfark
⭐
47
A library for exporting Spark ML models and pipelines to PFA
Sagemaker Sparkml Serving Container
⭐
46
This code is used to build & run a Docker container for performing predictions against a Spark ML Pipeline.
Spark Ml Serving
⭐
44
Spark ML Lib serving library
Trembita
⭐
43
Model complex data transformation pipelines easily
Sparkplug
⭐
42
A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.
Hyperdrive
⭐
41
Extensible streaming ingestion pipeline on top of Apache Spark
Sparkxgb
⭐
40
R interface for XGBoost on Spark
Deep Learning Pyspark
⭐
40
Deep Learning with Apache Spark and Deep Cognition
Streamliner Starter
⭐
33
Starter project for building MemSQL Streamliner Pipelines
Spark Flow
⭐
32
Library for organizing batch processing pipelines in Apache Spark
Spark Ai
⭐
31
Toolbox for building Generative AI applications on top of Apache Spark.
Sparkpipe Core
⭐
30
Modular, non-linear pipeline framework for Spark
Basin
⭐
29
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Data Engineer Nanodegree Projects Udacity
⭐
27
Projects done in the Data Engineer Nanodegree Program by Udacity.com
Azure Synapse Analytics Ga Content Packs
⭐
26
Readiness content packs for Azure Synapse Analytics features released at GA.
Kraps Haskell
⭐
26
Experimental Haskell bindings to Spark Datasets and DataFrames
Odsc_india_2018
⭐
26
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Spark Examples
⭐
26
Spark pipelines that correspond to a series of Dataflow examples.
Spark Featureselection
⭐
24
Featureselection methods as Spark MLlib Pipelines
Spark Intro Ml Pipeline Workshop
⭐
23
A simple introduction to using spark ml pipelines
Streamliner Examples
⭐
23
Example code for building your own MemSQL Streamliner Pipelines
Suim
⭐
23
Analytic UIMA pipelines using Spark
Mleap
⭐
23
R Interface to MLeap
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Click Through Rate Prediction
⭐
21
Kaggle's click through rate prediction with Spark Pipeline API
Analysis Pipelines
⭐
20
Enables data scientists to compose pipelines of analysis which consist of data manipulation, exploratory analysis & reporting, as well as modeling steps. Data scientists can use tools of their choice through an R interface, and compose interoperable pipelines between R, Spark, and Python.
Pyspark_dl_pipeline
⭐
17
Real Time Stock Analyzer
⭐
16
Bigdata Pipeline
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Peapod
⭐
15
Dependency and data pipeline management framework for Spark and Scala
Data Factory R Server Apache Spark Pipeline
⭐
15
This tutorial highlights how to build a scalable machine-learning based data processing pipeline using Microsoft R Server with Apache Spark utilizing Azure Data Factory (ADF)
Graphsense Transformation
⭐
15
GraphSense Transformation Pipeline
Mleap Docs
⭐
14
Documentation for MLeap
Project Fortis Pipeline
⭐
14
Project Fortis is a data ingestion, analysis and visualization pipeline.
Pipeasy Spark
⭐
14
an easy way to define preprocessing data pipeline (similar to sklean-pandas but for Spark ML)
Virapipe
⭐
13
ViraPipe is a Apache Spark based scalable pipeline for metagenome analysis from NGS read data
Nyc_taxi_pipeline
⭐
12
Design/Implement stream/batch architecture on NYC taxi data | #DE
Spark Ranking Algorithms
⭐
11
Ranking algorithms for Spark machine learning pipeline
Deepvariant On Spark
⭐
11
DeepVariant-on-Spark is a germline short variant calling pipeline that runs Google DeepVariant on Apache Spark at scale.
Pyspark_pipes
⭐
11
Helper functions for building complex Spark ML pipelines
Diem
⭐
10
DIEM Data Integration Engine Multipurpose
Stackexchange Spark Scala Analyser
⭐
10
Still in Beta
Pycodehash
⭐
9
PyCodeHash is a generic data and code hashing library that facilitates downstream caching.
Spark Pipeline
⭐
9
Machine learning pipeline for Apache Spark
Project Fortis Spark
⭐
9
A repository for all spark jobs running on fortis
Spark Kaggle
⭐
9
Spark in Kaggle competitions
Bpmn.ai Ui
⭐
9
Easy setup and control of your bpmn.ai data flow
Spark Pipeline
⭐
9
Example End-to-End Data Pipeline with Apache Spark from Data Analysis to Data Product
Voluseg
⭐
8
pipeline for volumetric cell segmentation
Insight Zone Defense
⭐
8
One-click automation of big data pipeline with monitoring
Apache Spark Etl Pipeline Example
⭐
8
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computing.
Airflow
⭐
8
This set of code and instructions has the porpouse to instanciate a compiled environment with set of docker images like airflow webserver, airflow scheduler, postgresql, pyspark, Data Pipeline consuming data from weather api , processing with pyspark and storing in postgresql
Spark Vs
⭐
7
Structure-Based Virtual Screening in Spark
Spark Streaming Twitter
⭐
7
Building pipeline to process the real-time data using Spark and Mongodb.
Streaming Pipeline
⭐
7
A real-time text classification based on Kafka and Spark.
Apache Beam Example
⭐
7
Apache Beam Example 中国开源社区
Pyspark Boilerplate Mehdio
⭐
7
Pyspark boilerplate for running prod ready data pipeline
Openmrs Etl
⭐
7
openmrs - mysql - debezium - kafka - spark - scala
Aws Etl
⭐
7
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/A it's a zipped file with some .csvs inside that we will apply transformations.
Morenlp
⭐
6
Capabilities of StanfordNLP and OpenNLP on Spark
Arc Jupyter
⭐
6
Arc-Jupyter is an interactive Jupyter Notebooks Extenstion for building Arc data pipelines via Jupyter Notebooks.
Sparklyr2pmml
⭐
6
R library for converting Apache Spark ML pipelines to PMML
Insight18b Sparksql Array
⭐
6
Repo of my Insight project. Extended SparkSQL functionality internally and tested its performance against UDFs. Additionally, implemented a batch pipeline HDFS->SparkSQL->MySQL->Flask and a streaming pipeline Kafka->Spark Streaming->MySQL->Flask to analyze Amazon User Data.
Sim
⭐
6
A set of helpers to build Apache Spark pipelines for Neuroimaging
Karps
⭐
5
Experimental Haskell bindings to Spark Datasets and DataFrames
Machine Learning Pipeline Lr Pyspark
⭐
5
Power Plant ML Pipeline Application - Apache Spark
Airflow Dags
⭐
5
Related Searches
Python Pipeline (4,316)
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Javascript Pipeline (1,369)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Pipeline Jenkins (1,150)
Shell Pipeline (1,143)
1-70 of 70 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.