Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark
spark
x
4,230 search results found
Awesome Opensource Data Engineering
⭐
1,331
An Awesome List of Open-Source Data Engineering Projects
Dr Elephant
⭐
1,301
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Ecosystem
⭐
1,289
Integration of TensorFlow with other open-source frameworks
Caffeonspark
⭐
1,272
Distributed deep learning on Hadoop and Spark clusters.
Sparkmagic
⭐
1,272
Jupyter magics and kernels for working with remote Spark clusters
Bigdata Growth
⭐
1,256
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Taier
⭐
1,220
Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
Spark Doc Zh
⭐
1,186
Apache Spark 官方文档中文版
Vue Trend
⭐
1,178
🌈 Simple, elegant spark lines for Vue.js
Killrweather
⭐
1,174
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
Dockerfiles
⭐
1,171
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
Spark
⭐
1,141
A simple Android sparkline chart view.
Bigflow
⭐
1,122
Baidu Bigflow is an interface that allows for writing distributed computing programs and provides lots of simple, flexible, powerful APIs. Using Bigflow, you can easily handle data of any scale. Bigflow processes 4P+ data inside Baidu and runs about 10k jobs every day.
Machine Learning
⭐
1,046
机器学习原理
Spark Sklearn
⭐
1,039
(Deprecated) Scikit-learn integration package for Apache Spark
Snappydata
⭐
1,037
Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Kylo
⭐
1,035
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Pixiedust
⭐
1,035
Python Helper library for Jupyter Notebooks
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Utils4s
⭐
1,033
scala、spark使用过程中,各种测试用例以及相关资料整理
Spark Csv
⭐
1,009
CSV Data Source for Apache Spark 1.x
Spark Nlp Workshop
⭐
977
Public runnable examples of using John Snow Labs' NLP for Apache Spark.
Data Algorithms Book
⭐
973
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Pyspark Tutorial
⭐
959
PySpark-Tutorial provides basic algorithms using PySpark
Sparkling Water
⭐
957
Sparkling Water provides H2O functionality inside Spark cluster
Graphframes
⭐
944
Splink
⭐
939
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Mobius
⭐
937
C# and F# language binding and extensions to Apache Spark
Spark Redis
⭐
926
A connector for Spark that allows reading and writing to/from Redis cluster
Around Dataengineering
⭐
926
A Data Engineering & Machine Learning Knowledge Hub
Coding Now
⭐
925
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、
Sparklyr
⭐
922
R interface for Apache Spark
Spark Scala Tutorial
⭐
922
A free tutorial for Apache Spark.
Livy
⭐
911
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Spark Kotlin
⭐
909
A Spark DSL in idiomatic kotlin // dependency: com.sparkjava:spark-kotlin:1.0.0-alpha
Spark
⭐
904
A performance profiler for Minecraft clients, servers, and proxies.
Sparkctr
⭐
896
CTR prediction model based on spark(LR, GBDT, DNN)
Tispark
⭐
872
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Frameless
⭐
869
Expressive types for Spark.
Incubator Livy
⭐
840
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Datasophon
⭐
823
The next generation of cloud-native big data management expert , Aims to help users rapidly build stable, efficient, and scalable cloud-native platforms for big data.
Hadoop_study
⭐
817
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Useractionanalyzeplatform
⭐
810
电商用户行为分析大数据平台
Extraction Framework
⭐
802
The software used to extract structured data from Wikipedia
Flint
⭐
796
A Time Series Library for Apache Spark
Spark
⭐
791
✨Spark is a web-based, cross-platform and full-featured Remote Administration Tool (RAT) written in Go that allows you control all your devices anywhere. Spark是一个Go编写的,网页UI、跨平台以及多功能的远程控制和监控工具,你可以随时随地监控和控制
Blaze
⭐
784
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Docker Spark
⭐
769
Scriptis
⭐
767
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Nessie
⭐
762
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
Spark Movie Lens
⭐
757
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Spark Daria
⭐
738
Essential Spark extensions and helper methods ✨😲
Cdap
⭐
735
An open source framework for building data analytic applications.
Design Inspiration
⭐
733
A collection of websites to spark creativity
Kafka Storm Starter
⭐
729
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Incubator Celeborn
⭐
725
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Incubator Toree
⭐
721
Mirror of Apache Toree (Incubating)
Thermostat
⭐
719
A place for all things related to ye olde Spark Thermostat Hackathon
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Pkpmspark
⭐
697
awesome 三维数据挖掘 数据分析 & 推荐
Mongo Spark
⭐
692
The MongoDB Spark Connector
Machinelearning
⭐
684
Machine learning resources,including algorithm, paper, dataset, example and so on.
Vegas
⭐
682
The missing MatPlotLib for Scala + Spark
Coral
⭐
680
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
Fregata
⭐
674
A light weight, super fast, large scale machine learning library on spark .
Data Science With Ruby
⭐
664
Practical Data Science with Ruby based tools.
Delta Sharing
⭐
654
An open protocol for secure data sharing
Sparkr Pkg
⭐
649
R frontend for Spark
Digandburied
⭐
645
挖坑与填坑
Justenoughscalaforspark
⭐
643
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Streaming Readings
⭐
640
Streaming System 相关的论文读物
Flintrock
⭐
627
A command-line tool for launching Apache Spark clusters.
Datafusion
⭐
626
DataFusion has now been donated to the Apache Arrow project
Docker Spark
⭐
626
Docker build for Apache Spark
Blinkdb
⭐
625
BlinkDB: Sub-Second Approximate Queries on Very Large Data.
Wedatasphere
⭐
624
WeDataSphere is a financial grade, one-stop big data platform suite.
Spark Rapids
⭐
619
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Kafka Spark Consumer
⭐
616
High Performance Kafka Connector for Spark Streaming.Supports Multi Topic Fetch, Kafka Security. Reliable offset management in Zookeeper. No Data-loss. No dependency on HDFS and WAL. In-built PID rate controller. Support Message Handler . Offset Lag checker.
Freestyle
⭐
615
A cohesive & pragmatic framework of FP centric Scala libraries
Pythondatascience Collections
⭐
615
最全数据分析资料汇总(含python、爬虫、数据库、大数据、tableau、统计学等)
Reference Apps
⭐
615
Spark reference applications
Listenbrainz Server
⭐
613
Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
Dist Keras
⭐
611
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Enterprise_gateway
⭐
607
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Elasticsearch Spark Recommender
⭐
603
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Sparkmeasure
⭐
603
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Sparknet
⭐
601
Distributed Neural Networks for Spark
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Python Data Science Cheatsheet
⭐
590
Python数据科学速查表
Wiki2vec
⭐
587
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
Yanagishima
⭐
584
Web UI for Trino, Hive and SparkSQL
Onedal
⭐
584
oneAPI Data Analytics Library (oneDAL)
Spark
⭐
576
Emergency web server
Cassandra Lucene Index
⭐
574
Lucene based secondary indexes for Cassandra
Sparklearning
⭐
573
Learning Apache spark,including code and data .Most part can run local.
Learningsparkv2
⭐
570
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Aws Glue Libs
⭐
568
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Streaming Benchmarks
⭐
560
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Spark Pyspark (812)
101-200 of 4,230 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.