Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark
spark
x
4,231 search results found
Spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
Data Science Ipython Notebooks
⭐
25,668
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Redash
⭐
24,479
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Docker_practice
⭐
23,279
Learn and understand Docker&Container technologies, with real DevOps practice!
Data Engineering Zoomcamp
⭐
19,461
Free Data Engineering course!
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Chuanhuchatgpt
⭐
14,595
GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.
Horovod
⭐
13,921
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Flink Learning
⭐
13,801
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《 Flink 实战与性能优化》
Technology Talk
⭐
13,579
【大厂面试专栏】一份Java程序员需要的技术指南,这里有面试题、系统架构、职场锦囊、主流中间件等,让
Deeplearning4j
⭐
13,397
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Cookbook
⭐
12,557
The Data Engineering Cookbook
Ds Cheatsheets
⭐
11,535
List of Data Science Cheatsheets to rule the world
Doris
⭐
11,243
Apache Doris is an easy-to-use, high performance and unified analytics database.
Spark
⭐
9,543
A simple expressive web framework for java. Spark has a kotlin DSL https://github.com/perwendel/spark-kotlin
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
It_book
⭐
8,543
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
Angel
⭐
6,690
A Flexible and Powerful Parameter Server for large-scale machine learning
Delta
⭐
6,656
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
H2o 3
⭐
6,618
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Alluxio
⭐
6,612
Alluxio, data orchestration for analytics and machine learning in the cloud
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Zeppelin
⭐
6,259
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Dev Setup
⭐
5,802
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Risingwave
⭐
5,799
The distributed streaming database. Engineered to offer the simplest and most cost-efficient way for stream processing and management.
Spark
⭐
5,649
▁▂▃▅▂▇ in your shell.
Technical Books
⭐
5,519
😆 国内外互联网技术大牛们都写了哪些书籍:计算机基础、网络、前端、后端、数据库、架构、大数据、深度学习.
Iceberg
⭐
5,179
Apache Iceberg
Synapseml
⭐
4,960
Simple and Distributed Machine Learning
Hudi
⭐
4,901
Upserts, Deletes And Incremental Processing on Big Data.
Bigdl
⭐
4,728
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm
Sparkinternals
⭐
4,665
Notes talking about the design and implementation of Apache Spark
Sqlglot
⭐
4,652
Python SQL Parser and Transpiler
Pipeline
⭐
4,158
PipelineAI
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Learning Spark
⭐
3,804
Example code from Learning Spark book
Helk
⭐
3,633
The Hunting ELK
Spark Nlp
⭐
3,578
State of the Art Natural Language Processing
Coolplayspark
⭐
3,447
酷玩 Spark: Spark 源代码解析、Spark 类库等
Ibis
⭐
3,404
The flexibility of Python with the scale and performance of modern SQL.
Sql Generator
⭐
3,346
🔨 用 JSON 来生成结构化的 SQL 语句,基于 Vue3 + TypeScript + Vite + Ant Design + MonacoEditor 实现,项目简单(重逻辑轻页面)、适合练手~
Roaringbitmap
⭐
3,308
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Tablesaw, and many others
Koalas
⭐
3,291
Koalas: pandas API on Apache Spark
Linkis
⭐
3,224
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Imgui Node Editor
⭐
3,153
Node Editor built using Dear ImGui
Spark Notebook
⭐
3,147
Interactive and Reactive Data Science using Scala and Spark.
Deequ
⭐
3,044
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Dataspherestudio
⭐
2,860
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Spark Jobserver
⭐
2,837
REST job server for Apache Spark
Dpark
⭐
2,637
Python clone of Spark, a MapReduce alike framework in Python
Analytics Zoo
⭐
2,592
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Spark On K8s Operator
⭐
2,526
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Scio
⭐
2,505
A Scala API for Apache Beam and Google Cloud Dataflow.
React Trend
⭐
2,387
📈 Simple, elegant spark lines
Bigdataguide
⭐
2,355
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Spring Boot Quick
⭐
2,282
🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、K
Lakesoul
⭐
2,248
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Ballista
⭐
2,244
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Zio Quill
⭐
2,135
Compile-time Language Integrated Queries for Scala
Transmogrifai
⭐
2,099
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Szt Bigdata
⭐
2,055
深圳地铁大数据客流分析系统🚇🚄🌟
Sparks
⭐
2,010
A typeface for creating sparklines in text without code.
Easyml
⭐
1,966
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Spark
⭐
1,963
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Quicksql
⭐
1,939
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Spark Cassandra Connector
⭐
1,929
DataStax Connector for Apache Spark to Apache Cassandra
Spark Deep Learning
⭐
1,915
Deep Learning Pipelines for Apache Spark
Elasticsearch Hadoop
⭐
1,914
🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Vega
⭐
1,904
A new arguably faster implementation of Apache Spark from scratch in Rust
Kyuubi
⭐
1,849
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Benchm Ml
⭐
1,839
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Fugue
⭐
1,821
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Docker Spark
⭐
1,783
Apache Spark docker image
Gaffer
⭐
1,724
A large-scale entity and relation database supporting aggregation of properties
.github
⭐
1,722
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Jvm Profiler
⭐
1,717
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
Cube Studio
⭐
1,710
cube studio开源云原生一站式机器学习/深度学习AI平台,支持sso登录,多租户/多项目组,数据资产对
Spark Ml Source Analysis
⭐
1,710
spark ml 算法原理剖析以及具体的源码实现分析
Ytsaurus
⭐
1,694
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Petastorm
⭐
1,693
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Incubator Paimon
⭐
1,647
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
Elassandra
⭐
1,633
Elassandra = Elasticsearch + Apache Cassandra
Gatk
⭐
1,576
Official code repository for GATK versions 4 and up
Almond
⭐
1,560
A Scala kernel for Jupyter
Spark The Definitive Guide
⭐
1,558
Spark: The Definitive Guide's Code Repository
Elephas
⭐
1,548
Distributed Deep learning with Keras & Spark
Spark Py Notebooks
⭐
1,515
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Cloudpickle
⭐
1,514
Extended pickling support for Python objects
Aas
⭐
1,493
Code to accompany Advanced Analytics with Spark from O'Reilly Media
Mleap
⭐
1,479
MLeap: Deploy ML Pipelines to Production
Spark Testing Base
⭐
1,475
Base classes to use when writing tests with Spark
Awesome Spark
⭐
1,461
A curated list of awesome Apache Spark packages and resources.
Movie_recommend
⭐
1,441
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Optimus
⭐
1,438
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Seldon Server
⭐
1,420
Machine Learning Platform and Recommendation Engine built on Kubernetes
Tutorial
⭐
1,414
后端 (Java Golang)全栈知识架构体系总结
Carbondata
⭐
1,401
High performance data store solution
Bigdata Interview
⭐
1,397
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop
Dji Firmware Tools
⭐
1,344
Tools for handling firmwares of DJI products, with focus on quadcopters.
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Spark Pyspark (812)
1-100 of 4,231 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.