Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for apache spark
apache-spark
x
583 search results found
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Spark Tsne
⭐
134
Distributed t-SNE via Apache Spark
Envelope
⭐
133
Build configuration-driven ETL pipelines on Apache Spark
Hdfs_fdw
⭐
131
PostgreSQL foreign data wrapper for HDFS
Cobrix
⭐
131
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Mastering Apache Spark
⭐
130
This is repository of my YouTube Course on End to End Apache Spark in AIEngineering YouTube Channel
Griffon Vm
⭐
129
Griffon Data Science Virtual Machine
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Eclairjs
⭐
122
Main EclairJS Repository
Example Spark Kafka
⭐
118
Apache Spark and Apache Kafka integration example
Pyspark Stubs
⭐
116
Apache (Py)Spark type annotations (stub files).
Drizzle Spark
⭐
113
Drizzle integration with Apache Spark
Spark Atlas Connector
⭐
112
A Spark Atlas connector to track data lineage in Apache Atlas
Bunsen
⭐
110
Explore, transform, and analyze FHIR data with Apache Spark
Distributed Dataset
⭐
107
A distributed data processing framework in Haskell.
Frank Kanes Taming Big Data With Apache Spark And Python
⭐
106
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Spark Tpc Ds Performance Test
⭐
104
Use the TPC-DS benchmark to test Spark SQL performance
Pulsar Spark
⭐
103
Spark Connector to read and write with Pulsar
Dataproc Templates
⭐
103
Dataproc templates and pipelines for solving simple in-cloud data tasks
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Eclairjs Nashorn
⭐
94
JavaScript API for Apache Spark
Learning Spark
⭐
94
Practical examples of using Apache Spark in several different use cases
Mongo Spark
⭐
93
Example application on how to use mongo-hadoop connector with Spark
Spark Dynamodb
⭐
90
DynamoDB data source for Apache Spark
Qstreaming
⭐
89
A simplified, lightweight ETL pipeline framework for build stream/batch processing applications on top of Apache Spark
Maggy
⭐
88
Distribution transparent Machine Learning experiments on Apache Spark
Spark States
⭐
88
Custom state store providers for Apache Spark
Qbox Blog Code
⭐
88
Code reference from my Qbox blog posts.
Sparkprojecttemplate.g8
⭐
88
Template for Spark Projects
Sparkcube
⭐
87
SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.
Splash
⭐
86
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Cuesheet
⭐
85
A framework for writing Spark 2.x applications in a pretty way
Flowman
⭐
85
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Spork
⭐
84
Pig on Apache Spark
Spark.samples
⭐
81
tutorials and samples that show you how get the most out of IBM Analytics for Apache Spark
Fasttrackml
⭐
77
Experiment tracking server focused on speed and scalability
Docker Spark
⭐
77
🚢 Docker image for Apache Spark
Pyspark Cookbook
⭐
76
PySpark Cookbook, published by Packt
Spark Examples
⭐
75
Apache Spark jobs such as Principal Coordinate Analysis.
Euphoria
⭐
74
Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
Jupyterlab Sparkmonitor
⭐
72
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Lighter
⭐
72
REST API for Apache Spark on K8S or YARN
Openstreetmap_h3
⭐
72
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
Learn By Examples
⭐
72
Real-world Spark pipelines examples
Avocado
⭐
71
A Variant Caller, Distributed. Apache 2 licensed.
Scalaapacheaccesslogparser
⭐
71
An Apache access log parser written in Scala
Cleanframes
⭐
70
type-class based data cleansing library for Apache Spark SQL
Reforest
⭐
69
Random Forests in Apache Spark
Sparksql For Hbase
⭐
66
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Fink Broker
⭐
66
Astronomy Broker based on Apache Spark
Spark
⭐
65
Open Source D-APM (Data-Application Performance Monitoring) for Apache Spark
Spark Lp
⭐
64
Distributed Linear Programming Solver on top of Apache Spark
Mmtf Pyspark
⭐
64
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Net.jgp.labs.spark
⭐
63
Apache Spark examples exclusively in Java
Lambda Arch Spark
⭐
63
Pyspark Twitter Stream Mining
⭐
63
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Spark Overflow
⭐
62
A stack overflow for Apache Spark
Spark Etl
⭐
62
Apache Spark based ETL Engine
Net.jgp.books.spark.ch01
⭐
61
Spark in Action, 2nd edition - chapter 1 - Introduction
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Osm Parquetizer
⭐
58
A converter for the OSM PBFs to Parquet files
Spark Records
⭐
58
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Serverless Spark Workshop
⭐
56
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
Kafka Streaming Click Analysis
⭐
56
Use Kafka and Apache Spark streaming to perform click stream analytics
Lighthouse
⭐
54
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
Pyspark Setup Guide
⭐
54
A guide for setting up Spark + PySpark under Ubuntu linux
Sparker
⭐
53
SparkER: an Entity Resolution framework for Apache Spark
Awesome Pulsar
⭐
53
A curated list of Pulsar tools, integrations and resources.
Mmtf Workshop 2018
⭐
53
Structural Bioinformatics Training Workshop & Hackathon 2018
Connected Component
⭐
52
Map Reduce Implementation of Connected Component on Apache Spark
Spark.jl
⭐
52
Spark in Julia for MIT 6.824
Spark Scala Maven Example
⭐
50
Example Maven configuration for a Spark, Scala project
Spark Tpcds Datagen
⭐
50
All the things about TPC-DS in Apache Spark
Spark Json Schema
⭐
50
JSON schema parser for Apache Spark
Prosparkstreaming
⭐
49
Code used in "Pro Spark Streaming: The Zen of Real-time Analytics using Apache Spark" published by Apress Publishing.
Spark Nkp
⭐
47
Natural Korean Processor for Apache Spark
Aardpfark
⭐
47
A library for exporting Spark ML models and pipelines to PFA
Spark Hive Udf
⭐
47
Example project showing how to use Hive UDFs in Apache Spark
Spark Activator
⭐
47
Spark Streaming with Scala and Akka Activator template
Sparkora
⭐
46
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Spark Tda
⭐
46
SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Liquidsvm
⭐
45
Support vector machines (SVMs) and related kernel-based learning algorithms are a well-known class of machine learning algorithms, for non-parametric classification and regression. liquidSVM is an implementation of SVMs whose key features are: fully integrated hyper-parameter selection, extreme speed on both small and large data sets, full flexibility for experts, and inclusion of a variety of different learning scenarios: multi-class classification, ROC, and Neyman-Pearson learning, and least-s
Geospark
⭐
45
bring sf to spark in production
Spark Es
⭐
44
ElasticSearch integration for Apache Spark
Vagrant Spark Zeppelin
⭐
43
Vagrant, Apache Spark and Apache Zeppelin VM for teaching
Spark Dataframe Introduction
⭐
42
This is an introduction of Apache Spark DataFrames.
Sparkplug
⭐
42
A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.
Hyperdrive
⭐
41
Extensible streaming ingestion pipeline on top of Apache Spark
Stark
⭐
41
A framework for Spatio-Temporal Data Analytics on Spark
Ansible Spark Cluster
⭐
41
Ansible roles to install an Spark Standalone cluster (HDFS/Spark/Jupyter Notebook) or Ambari based Spark cluster
Sparkxgb
⭐
40
R interface for XGBoost on Spark
Deep Learning Pyspark
⭐
40
Deep Learning with Apache Spark and Deep Cognition
Dbscan Spark
⭐
40
DBSCAN implementation using Apache Spark
Spark Examples
⭐
40
Spark examples
Jpmml Sparkml Lightgbm
⭐
39
JPMML-SparkML plugin for converting LightGBM-Spark models to PMML
Spark With Python My Learning Notes
⭐
39
ETL pipeline using pyspark (Spark - Python)
Financial Market Data Analysis
⭐
39
Real-Time Financial Market Data Processing and Prediction application
Decision Tree Visualization Spark
⭐
39
🌲 Decision Tree Visualization for Apache Spark
Dblink
⭐
38
Distributed Bayesian Entity Resolution in Apache Spark
Related Searches
Scala Apache Spark (497)
101-200 of 583 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.