Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark big data
big-data
x
spark
x
186 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Data Science Ipython Notebooks
⭐
25,668
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Cookbook
⭐
12,557
The Data Engineering Cookbook
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
Delta
⭐
6,656
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
H2o 3
⭐
6,618
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Zeppelin
⭐
6,259
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Risingwave
⭐
5,799
The distributed streaming database. Engineered to offer the simplest and most cost-efficient way for stream processing and management.
Synapseml
⭐
4,967
Simple and Distributed Machine Learning
Sql Generator
⭐
3,346
🔨 用 JSON 来生成结构化的 SQL 语句,基于 Vue3 + TypeScript + Vite + Ant Design + MonacoEditor 实现,项目简单(重逻辑轻页面)、适合练手~
Koalas
⭐
3,291
Koalas: pandas API on Apache Spark
Dpark
⭐
2,637
Python clone of Spark, a MapReduce alike framework in Python
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Lakesoul
⭐
2,248
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Spark
⭐
1,963
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Gaffer
⭐
1,724
A large-scale entity and relation database supporting aggregation of properties
Ytsaurus
⭐
1,694
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Incubator Paimon
⭐
1,647
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Optimus
⭐
1,446
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Carbondata
⭐
1,401
High performance data store solution
Bigdata Interview
⭐
1,397
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop
Bigdata Growth
⭐
1,256
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Utils4s
⭐
1,033
scala、spark使用过程中,各种测试用例以及相关资料整理
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Mobius
⭐
937
C# and F# language binding and extensions to Apache Spark
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Tispark
⭐
872
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Blaze
⭐
784
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Spark Movie Lens
⭐
757
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Incubator Celeborn
⭐
725
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Delta Sharing
⭐
654
An open protocol for secure data sharing
Wedatasphere
⭐
624
WeDataSphere is a financial grade, one-stop big data platform suite.
Spark Rapids
⭐
619
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Listenbrainz Server
⭐
613
Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
Onedal
⭐
584
oneAPI Data Analytics Library (oneDAL)
Spline
⭐
553
Data Lineage Tracking And Visualization Solution
Metorikku
⭐
536
A simplified, lightweight ETL Framework based on Apache Spark
Magellan
⭐
509
Geo Spatial Data Analytics on Spark
Sidekick
⭐
503
High Performance HTTP Sidecar Load Balancer
Sparklearning
⭐
451
A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
Kotlin Spark Api
⭐
425
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Docker Spark Cluster
⭐
413
A simple spark standalone cluster for your testing environment purposses
Sparkler
⭐
401
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Sylph
⭐
396
Stream computing platform for bigdata
Zdh_web
⭐
379
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批
Big_data_architect_skills
⭐
353
一个大数据架构师应该掌握的技能
Ecommercerecommendsystem
⭐
350
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Hyperspace
⭐
334
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Every Single Day I Tldr
⭐
311
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Mist
⭐
303
Serverless proxy for Spark cluster
Data Accelerator
⭐
293
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Compass
⭐
284
Compass is a task diagnosis platform for bigdata
Big Data Rosetta Code
⭐
283
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Geni
⭐
268
A Clojure dataframe library that runs on Spark
Succinct
⭐
239
Enabling queries on compressed data.
Gimel
⭐
230
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Bigdata_docker
⭐
226
Big Data Ecosystem Docker
Azure Event Hubs Spark
⭐
225
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Bigdata
⭐
219
大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理,实时处理,OLAP等,如hadoop、spark、flink、hive、
Datacompare
⭐
195
big data comparison and data profiling platform: low code,data comparison and data profiling
Sparkrdma
⭐
191
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Bigdata Hub
⭐
187
数据建设与大数据技术知识体系,包含hadoop、hive、spark、flink主流框架和系列框架,
Spark Notes
⭐
183
Javaorbigdata Interview
⭐
180
Java开发者或者大数据开发者面试知识点整理
Spark.jl
⭐
180
Julia binding for Apache Spark
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Qbeast Spark
⭐
171
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Kafka Book
⭐
167
《Kafka技术内幕》代码
Juicy Bigdata
⭐
162
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Incubator Wayang
⭐
162
Apache Wayang(incubating) is the first cross-platform data processing system.
Webank All Project
⭐
156
All the project addresses participated and established by WeBank are collected.汇集了微众银行参与和建立的所有项目地址。
Bigdata In Practice
⭐
154
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Lakehouse Engine
⭐
154
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Lambda Arch
⭐
151
A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.
Data Algorithms With Spark
⭐
151
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Geopyspark
⭐
151
GeoTrellis for PySpark
Spark On Lambda
⭐
144
Apache Spark on AWS Lambda
Bigdata
⭐
142
hadoop,hbase,storm,spark,etc..
Pyspark Cheatsheet
⭐
140
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Bigdata Learning
⭐
136
大数据学习记录
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Sparkling Graph
⭐
134
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Hodor
⭐
130
A distributed scheduling framework supporting DAG workflow for big data and regular jobs, providing programmable job types across different languages.
Python Bigdata
⭐
128
Data science and Big Data with Python
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Xichuan_note
⭐
114
xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件
Asakusafw
⭐
113
Asakusa Framework
Spark Website
⭐
109
Apache Spark Website
Bigdataclass
⭐
109
Two-day workshop that covers how to use R to interact databases and Spark
Spark R Notebooks
⭐
109
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Clustering4ever
⭐
109
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Frank Kanes Taming Big Data With Apache Spark And Python
⭐
106
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Logisland
⭐
106
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Sift
⭐
91
Knowledge extraction from web data
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Spark Pyspark (812)
Docker Spark (701)
1-100 of 186 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.