Awesome Open Source

Programming Languages

Search results for spark big data

186 search results found

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

Data Science Ipython Notebooks ⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Bigdata Notes ⭐ 14,872

大数据入门指南 ⭐

Cookbook ⭐ 12,557

The Data Engineering Cookbook

God Of Bigdata ⭐ 8,483

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.

Delta ⭐ 6,656

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

H2o 3 ⭐ 6,618

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Zeppelin ⭐ 6,259

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Risingwave ⭐ 5,799

The distributed streaming database. Engineered to offer the simplest and most cost-efficient way for stream processing and management.

Synapseml ⭐ 4,967

Simple and Distributed Machine Learning

Sql Generator ⭐ 3,346

🔨 用 JSON 来生成结构化的 SQL 语句，基于 Vue3 + TypeScript + Vite + Ant Design + MonacoEditor 实现，项目简单（重逻辑轻页面）、适合练手~

Koalas ⭐ 3,291

Koalas: pandas API on Apache Spark

Dpark ⭐ 2,637

Python clone of Spark, a MapReduce alike framework in Python

Bigdataguide ⭐ 2,355

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Lakesoul ⭐ 2,248

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

Spark ⭐ 1,963

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Gaffer ⭐ 1,724

A large-scale entity and relation database supporting aggregation of properties

Ytsaurus ⭐ 1,694

YTsaurus is a scalable and fault-tolerant open-source big data platform.

Incubator Paimon ⭐ 1,647

Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.

Spark Py Notebooks ⭐ 1,515

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Optimus ⭐ 1,446

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Carbondata ⭐ 1,401

High performance data store solution

Bigdata Interview ⭐ 1,397

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop

Bigdata Growth ⭐ 1,256

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Spark Doc Zh ⭐ 1,186

Apache Spark 官方文档中文版

Utils4s ⭐ 1,033

scala、spark使用过程中，各种测试用例以及相关资料整理

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

Sparkling Water ⭐ 957

Sparkling Water provides H2O functionality inside Spark cluster

C# and F# language binding and extensions to Apache Spark

Coding Now ⭐ 925

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、

Tispark ⭐ 872

TiSpark is built for running Apache Spark on top of TiDB/TiKV

Incubator Livy ⭐ 840

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.

Spark Movie Lens ⭐ 757

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Incubator Celeborn ⭐ 725

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

Delta Sharing ⭐ 654

An open protocol for secure data sharing

Wedatasphere ⭐ 624

WeDataSphere is a financial grade, one-stop big data platform suite.

Spark Rapids ⭐ 619

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

Listenbrainz Server ⭐ 613

Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.

oneAPI Data Analytics Library (oneDAL)

Data Lineage Tracking And Visualization Solution

Metorikku ⭐ 536

A simplified, lightweight ETL Framework based on Apache Spark

Magellan ⭐ 509

Geo Spatial Data Analytics on Spark

Sidekick ⭐ 503

High Performance HTTP Sidecar Load Balancer

Sparklearning ⭐ 451

A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.

Kotlin Spark Api ⭐ 425

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Docker Spark Cluster ⭐ 413

A simple spark standalone cluster for your testing environment purposses

Sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stream computing platform for bigdata

Zdh_web ⭐ 379

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台，包含数据采集,调度,权限,审批

Big_data_architect_skills ⭐ 353

一个大数据架构师应该掌握的技能

Ecommercerecommendsystem ⭐ 350

商品大数据实时推荐系统。前端：Vue + TypeScript + ElementUI，后端 Spring + Spark

Hyperspace ⭐ 334

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Every Single Day I Tldr ⭐ 311

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Serverless proxy for Spark cluster

Data Accelerator ⭐ 293

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Compass ⭐ 284

Compass is a task diagnosis platform for bigdata

Big Data Rosetta Code ⭐ 283

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

A Clojure dataframe library that runs on Spark

Succinct ⭐ 239

Enabling queries on compressed data.

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Bigdata_docker ⭐ 226

Big Data Ecosystem Docker

Azure Event Hubs Spark ⭐ 225

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Bigdata ⭐ 219

大数据处理相关技术学习之路(持续更新中...)。 Bigdata整理 --> 慢慢滴~ 大数据相关技术包括离线处理，实时处理，OLAP等，如hadoop、spark、flink、hive、

Datacompare ⭐ 195

big data comparison and data profiling platform: low code，data comparison and data profiling

Sparkrdma ⭐ 191

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Bigdata Hub ⭐ 187

数据建设与大数据技术知识体系，包含hadoop、hive、spark、flink主流框架和系列框架，

Spark Notes ⭐ 183

Javaorbigdata Interview ⭐ 180

Java开发者或者大数据开发者面试知识点整理

Spark.jl ⭐ 180

Julia binding for Apache Spark

A simple Spark-powered ETL framework that just works 🍺

Qbeast Spark ⭐ 171

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

Kafka Book ⭐ 167

《Kafka技术内幕》代码

Juicy Bigdata ⭐ 162

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

Incubator Wayang ⭐ 162

Apache Wayang(incubating) is the first cross-platform data processing system.

Webank All Project ⭐ 156

All the project addresses participated and established by WeBank are collected.汇集了微众银行参与和建立的所有项目地址。

Bigdata In Practice ⭐ 154

大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....

Lakehouse Engine ⭐ 154

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

Lambda Arch ⭐ 151

A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.

Data Algorithms With Spark ⭐ 151

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Geopyspark ⭐ 151

GeoTrellis for PySpark

Spark On Lambda ⭐ 144

Apache Spark on AWS Lambda

Bigdata ⭐ 142

hadoop,hbase,storm,spark,etc..

Pyspark Cheatsheet ⭐ 140

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Bigdata Learning ⭐ 136

大数据学习记录

Big Data Mapreduce Course ⭐ 135

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Sparkling Graph ⭐ 134

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

A distributed scheduling framework supporting DAG workflow for big data and regular jobs, providing programmable job types across different languages.

Python Bigdata ⭐ 128

Data science and Big Data with Python

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Xichuan_note ⭐ 114

xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件

Asakusafw ⭐ 113

Asakusa Framework

Spark Website ⭐ 109

Apache Spark Website

Bigdataclass ⭐ 109

Two-day workshop that covers how to use R to interact databases and Spark

Spark R Notebooks ⭐ 109

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Clustering4ever ⭐ 109

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Frank Kanes Taming Big Data With Apache Spark And Python ⭐ 106

Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt

Logisland ⭐ 106

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Spark With Python ⭐ 98

Fundamentals of Spark with Python (using PySpark), code examples

Knowledge extraction from web data

Related Searches

Scala Spark (3,279)

Python Spark (2,053)

Java Spark (1,587)

Apache Spark (1,207)

Spark Hadoop (1,188)

Jupyter Notebook Spark (1,151)

Spark Kafka (985)

Spark Streaming (817)

Spark Pyspark (812)

Docker Spark (701)

1-100 of 186 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.