Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark hadoop
hadoop
x
spark
x
489 search results found
Distributed Statistical Computing
⭐
99
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
Ros_hadoop
⭐
98
Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Mongo Spark
⭐
93
Example application on how to use mongo-hadoop connector with Spark
Flink Spark Submiter
⭐
92
从本地IDEA提交Flink/Spark任务到Yarn/k8s集群
Correlation Approximation
⭐
90
Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets
Focusbigdata
⭐
89
【大数据成神之路学习路径+面经+简历】
Druid Spark Batch
⭐
89
Druid indexing plugin for using Spark in batch jobs
Smart Data Lake
⭐
87
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Guacamole
⭐
86
Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly
Flowman
⭐
85
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Scalding Example Project
⭐
85
The Scalding WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR
Spork
⭐
84
Pig on Apache Spark
Dlflow
⭐
84
DLFlow is a deep learning framework.
Caochong
⭐
80
Set up a Hadoop and/or Spark cluster running within Docker containers on a single physical machine
Hadoop_cookbook
⭐
80
Cookbook to install Hadoop 2.0+ using Chef
Vagrant Hadoop 2.4.1 Spark 1.0.1
⭐
79
Vagrant project to spin up a cluster virtual machines with Hadoop v2.4.1 and Spark v1.0.1
Docker Spark
⭐
77
🚢 Docker image for Apache Spark
Cqu_bigdata
⭐
77
重庆大学计算机学院“大数据课程群”实验及PPT
Resilient Ml Research Platform
⭐
76
Waimak
⭐
73
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
The Apache Ignite Book
⭐
72
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Kafka_spark_hbase_demo
⭐
72
kafka spark hbase 日志统计
Mynote
⭐
72
本项目已废弃,笔记收藏整理参考:
Sparkplugins
⭐
70
Code and examples of how to write and deploy Apache Spark Plugins with Spark 3.x. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
Vagrant Hadoop Spark Hive
⭐
68
Vagrant project to spin up a single virtual machine running current versions of Hadoop, Hive and Spark
Terraform Aws Emr Cluster
⭐
67
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
Sparkbwa
⭐
67
SparkBWA is a new tool that exploits the capabilities of a Big Data technology as Apache Spark to boost the performance of one of the most widely adopted sequence aligner, the Burrows-Wheeler Aligner (BWA).
Splittablegzip
⭐
66
Splittable Gzip codec for Hadoop
Apache Spark Hands On
⭐
64
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Airflow Spark
⭐
64
Docker with Airflow and Spark standalone cluster
Spark Gpu
⭐
61
Spark GPU and SIMD Support
Spark Submit Ui
⭐
60
This is a based on playframwork for submit spark app
Hadoop Spark Installer
⭐
59
hadoop related tools
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Coursework
⭐
59
Platys Modern Data Platform
⭐
58
Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....
Mylearningnotes
⭐
58
Because its never late to start taking notes and 'public' it...
Titandataoperationsystem
⭐
57
最好的大数据项目。《Titan数据运营系统》,本项目是一个全栈闭环系统,我们有用作数据可视化的web Echart等;
Serverless Spark Workshop
⭐
56
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
Pybigdata
⭐
56
使用 python 操作大数据的各种组件
Bigdataanalytics_infoh515
⭐
56
Material for the Big Data Analytics exercise classes - INFOH515 - Big Data : Distributed Data Management and Scalable Analytics - Université Libre de Bruxelles
Big_data
⭐
55
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Geodocker
⭐
55
Central repository for the GeoDocker project
Bigdataparty
⭐
54
大数据组件 All-in-One 的 Dockerfile
Vagrant Hadoop Hive Spark
⭐
53
Vagrant project to spin up a single node VM running current versions of Hadoop, Hive and Spark
Bestconf
⭐
53
A tool automatically improving the performance of large-scale systems by finding better configuration settings
Spark Training
⭐
52
Repository used for Spark Trainings
Docker Hadoop
⭐
51
A Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Iotdb
⭐
51
This repository is ReadOnly now. please go to https://github.com/apache/incubator-iotdb
Movie Recommender Demo
⭐
50
This project walks through how you can create recommendations using Apache Spark machine learning. There are a number of jupyter notebooks that you can run on IBM Data Science Experience, and there a live demo of a movie recommendation web application you can interact with. The demo also uses IBM Message Hub (kafka) to push application events to topic where they are consumed by a spark streaming job running on IBM BigInsights (hadoop).
Spark Install
⭐
50
Installation guide for Apache Spark + Hadoop on Mac/Linux
Hadoop Training
⭐
50
Hadoop training material from free MapR courses.
Dplyr Spark
⭐
49
spark backend for dplyr
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Hadoop Spark Hive Cluster Docker
⭐
45
hadoop-spark-hive-cluster-docker
Simr
⭐
45
Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure
Docker Hadoop Workbench
⭐
44
A Hadoop cluster based on Docker, including Hive and Spark.
Docker Spark Cluster
⭐
44
A Spark cluster setup running on Docker containers
Sparkoscope
⭐
43
Enabling Spark Optimization through Cross-stack Monitoring and Visualization
Code Of Spark Big Data Business Trilogy
⭐
42
This is code of book "Spark Big Data Business Trilogy"
Spark Ai Summit Europe 2018 10
⭐
42
Spark+AI Summit Europe 2018 PPT下载[共95个]
Sqoop On Spark
⭐
42
Sqoop on Apache Spark Engine
Yuzhouwan
⭐
42
Code Library for My Blog
Mongodb Spark Demo
⭐
41
Spark app that demonstrates reading and writing data to from MongoDB and BSON files
Devops
⭐
40
DevOps
Garmadon
⭐
39
Java event logs collector for hadoop and frameworks
Openspark
⭐
39
The out-of-the-box environment to for Hadoop/Spark applications
Hbrdd
⭐
39
一个为spark批量导入数据到hbase的库
Spark1.52
⭐
38
Spark源代码中文注释
Distributed Tf
⭐
38
Distributed TensorFlow Examples for O'Reilly
Weblogsanalysissystem
⭐
37
A big data platform for analyzing web access logs
Xxhadoop
⭐
37
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Bigdata Getting Started
⭐
37
大数据相关框架实战项目(Hadoop, Spark, Storm, Flink)
Big Data
⭐
37
Python tools for big data
Swordfish
⭐
37
Open-source distribute workflow schedule tools, also support streaming task.
Hadoop Guide
⭐
36
🐘 关于 HDFS,Yarn,MapReduce,HBase,Hive,Pig,Sqoop,Flume,Zoo 等大数据框架的学习笔记
Spark2 Hadoop2.6 Hbase Labs
⭐
36
Musketeer
⭐
35
The Musketeer workflow manager.
Data Infra Projects
⭐
34
List of some interesting projects
Sparkdemo
⭐
34
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
Visitante
⭐
34
Set of Hadoop, Spark and Storm based tools for web and customer analytic
Bigdata Docker Compose
⭐
33
Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.
Telemetry Analysis Service
⭐
33
Telemetry Analysis Service
Distributed Extraction Framework
⭐
33
DBpedia Distributed Extraction Framework: Extract structured data from Wikipedia in a parallel, distributed manner
Cipher
⭐
33
基于hdfs spark的视频非结构化数据计算
Engineeringteam
⭐
32
와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.
Awesome Tools
⭐
32
curated list of awesome tools and libraries for specific domains
Mastering Scala Machine Learning
⭐
32
Mastering-Scala-Machine-Learning
Disq
⭐
31
A library for manipulating bioinformatics sequencing formats in Apache Spark
Geotriples
⭐
31
Publishing Big Geospatial data as Linked Open Geospatial Data
Dockerfiles
⭐
31
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Bigdata Docker
⭐
30
docker构建大数据开发学习环境
Spark Vagrant Vm
⭐
29
Spark Vagrant VM definition and runnable examples.
Mapreduce
⭐
29
清华大数据作业MapReduce处理几百个G的JSON数据
Learning Spark
⭐
29
Tidy up Spark and Hadoop tutorials.
Basin
⭐
29
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Spark Openstack
⭐
29
Scripts to setup Spark cluster (any version) in any Openstack environment with optional useful tools.
Netapp Hadoop Nfs Connector
⭐
29
This projects provides a NFSv3 connector for Hadoop. Using the connector, Apache Hadoop and Apache Spark can use NFSv3 server as their storage backend.
Flokkr
⭐
28
Documentation placeholder and utilities for all the other containers.
Related Searches
Scala Spark (3,279)
Java Hadoop (2,117)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Jupyter Notebook Spark (1,151)
Hadoop Hdfs (1,075)
Spark Kafka (985)
Hadoop Mapreduce (847)
Spark Streaming (817)
101-200 of 489 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.