Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for big data apache spark
apache-spark
x
big-data
x
52 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Data Engineer Handbook
⭐
5,650
This is a repo with links to everything you'd ever want to learn about data engineering
Hudi
⭐
5,064
Upserts, Deletes And Incremental Processing on Big Data.
Synapseml
⭐
4,967
Simple and Distributed Machine Learning
Koalas
⭐
3,291
Koalas: pandas API on Apache Spark
Spark
⭐
1,963
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Mobius
⭐
937
C# and F# language binding and extensions to Apache Spark
Tispark
⭐
872
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Spark Rapids
⭐
619
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Morpheus
⭐
329
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Parquet Dotnet
⭐
319
🏐 Apache Parquet for modern .NET
Mist
⭐
303
Serverless proxy for Spark cluster
Data Accelerator
⭐
293
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Succinct
⭐
239
Enabling queries on compressed data.
Azure Event Hubs Spark
⭐
225
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Spark.jl
⭐
180
Julia binding for Apache Spark
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Spark On Lambda
⭐
144
Apache Spark on AWS Lambda
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Hydrograph
⭐
138
A visual ETL development and debugging tool for big data
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Griffon Vm
⭐
129
Griffon Data Science Virtual Machine
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Frank Kanes Taming Big Data With Apache Spark And Python
⭐
106
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Splash
⭐
86
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Flowman
⭐
85
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Euphoria
⭐
74
Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
Cleanframes
⭐
70
type-class based data cleansing library for Apache Spark SQL
Spark
⭐
65
Open Source D-APM (Data-Application Performance Monitoring) for Apache Spark
Mmtf Pyspark
⭐
64
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Osm Parquetizer
⭐
58
A converter for the OSM PBFs to Parquet files
Spark Records
⭐
58
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Serverless Spark Workshop
⭐
56
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
Mmtf Workshop 2018
⭐
53
Structural Bioinformatics Training Workshop & Hackathon 2018
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Awesome Tools
⭐
32
curated list of awesome tools and libraries for specific domains
Baskerville
⭐
30
Security Analytics Engine - Anomaly Detection in Web Traffic
Detecting Malicious Url Machine Learning
⭐
23
Sparkucx
⭐
23
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Learn Hadoop And Spark
⭐
22
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Awesome Sparklyr
⭐
22
An awesome sparklyr related package collection
Spark.sas7bdat
⭐
21
Read in SAS data in parallel into Apache Spark
Mmtf Spark
⭐
19
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Sparkprogramminginscala
⭐
18
Apache Spark Course Material
Dspbench
⭐
17
a suite of benchmark applications for distributed data stream processing systems
Spark Streaming Monitoring With Lightning
⭐
15
Plot live-stats as graph from ApacheSpark application using Lightning-viz
Bigdata Projects
⭐
14
Student projects in Big Data field.
Bigdata Spark
⭐
12
BerkeleyX: CS100.1x, Introduction to Big Data with Apache Spark
Pybda
⭐
9
💻💻💻 A commandline tool for analysis of big biological data sets for distributed HPC clusters.
Spark Gp
⭐
9
Gaussian Process Classification and Regression on Apache Spark
Blaspark
⭐
8
Distributed linear algebra operations using Apache Spark
K8s Bigdata
⭐
8
Apache Spark with HDFS cluster within Kubernetes
Bigdata
⭐
8
빅데이터 pipeline 구성 요소 기술들에 관한 coding 실습 및 연구
Ma Inf 4223 Dbda Lab
⭐
7
Repository for Lab “Distributed Big Data Analytics” (MA-INF 4223), University of Bonn
Traffic Data Analysis With Apache Spark Based On Mobile Robot Data
⭐
7
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
Itmo_technologies_and_infrastructure_for_big_data
⭐
7
📊 My smth from ITMO; Dis - BiggusDatus
Bigdata And Machine Learning
⭐
6
Basics of Big Data and Machine Learning using Apache Spark and Scala
Spark Databricks
⭐
6
🔥 Master Apache Spark & Databricks! Dive into a world of big data with exclusive insights from Udemy courses, personal notes, and practical guides. Whether you're starting out or scaling new heights in data engineering, this is your ultimate resource hub! 🌟🚀
Bigdata
⭐
6
小白大数据学习笔记,学习路线,技术路线
Spark Most Frequent Word Counter
⭐
6
This java program counts the most frequent word in a given file using Apache Spark
Samba
⭐
6
SAMbA: Extending Apache Spark for Scientific Computational Experiments
Genespark
⭐
5
geneSpark is a bioinformatics software program written in Python and Apache Spark for big data epigenetic histone modification ChIP-seq analysis.
Delta Dotnet
⭐
5
Delta Lake native library for .NET
Spark Streaming In Python
⭐
5
Apache Spark 3 - Structured Streaming Course Material
Related Searches
Python Big Data (588)
Spark Big Data (570)
Java Big Data (533)
Scala Apache Spark (497)
1-52 of 52 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.