Awesome Open Source

Programming Languages

Search results for spark hadoop

489 search results found

Distributed Statistical Computing ⭐ 99

Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)

Ros_hadoop ⭐ 98

Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.

Spark With Python ⭐ 98

Fundamentals of Spark with Python (using PySpark), code examples

Mongo Spark ⭐ 93

Example application on how to use mongo-hadoop connector with Spark

Flink Spark Submiter ⭐ 92

从本地IDEA提交Flink/Spark任务到Yarn/k8s集群

Correlation Approximation ⭐ 90

Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets

Focusbigdata ⭐ 89

【大数据成神之路学习路径+面经+简历】

Druid Spark Batch ⭐ 89

Druid indexing plugin for using Spark in batch jobs

Smart Data Lake ⭐ 87

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Guacamole ⭐ 86

Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

Scalding Example Project ⭐ 85

The Scalding WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR

Pig on Apache Spark

DLFlow is a deep learning framework.

Caochong ⭐ 80

Set up a Hadoop and/or Spark cluster running within Docker containers on a single physical machine

Hadoop_cookbook ⭐ 80

Cookbook to install Hadoop 2.0+ using Chef

Vagrant Hadoop 2.4.1 Spark 1.0.1 ⭐ 79

Vagrant project to spin up a cluster virtual machines with Hadoop v2.4.1 and Spark v1.0.1

Docker Spark ⭐ 77

🚢 Docker image for Apache Spark

Cqu_bigdata ⭐ 77

重庆大学计算机学院“大数据课程群”实验及PPT

Resilient Ml Research Platform ⭐ 76

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

The Apache Ignite Book ⭐ 72

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Kafka_spark_hbase_demo ⭐ 72

kafka spark hbase 日志统计

本项目已废弃，笔记收藏整理参考：

Sparkplugins ⭐ 70

Code and examples of how to write and deploy Apache Spark Plugins with Spark 3.x. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.

Vagrant Hadoop Spark Hive ⭐ 68

Vagrant project to spin up a single virtual machine running current versions of Hadoop, Hive and Spark

Terraform Aws Emr Cluster ⭐ 67

Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS

Sparkbwa ⭐ 67

SparkBWA is a new tool that exploits the capabilities of a Big Data technology as Apache Spark to boost the performance of one of the most widely adopted sequence aligner, the Burrows-Wheeler Aligner (BWA).

Splittablegzip ⭐ 66

Splittable Gzip codec for Hadoop

Apache Spark Hands On ⭐ 64

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Airflow Spark ⭐ 64

Docker with Airflow and Spark standalone cluster

Spark Gpu ⭐ 61

Spark GPU and SIMD Support

Spark Submit Ui ⭐ 60

This is a based on playframwork for submit spark app

Hadoop Spark Installer ⭐ 59

hadoop related tools

Apachespark ⭐ 59

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

Coursework ⭐ 59

Platys Modern Data Platform ⭐ 58

Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....

Mylearningnotes ⭐ 58

Because its never late to start taking notes and 'public' it...

Titandataoperationsystem ⭐ 57

最好的大数据项目。《Titan数据运营系统》，本项目是一个全栈闭环系统，我们有用作数据可视化的web Echart等；

Serverless Spark Workshop ⭐ 56

Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service

Pybigdata ⭐ 56

使用 python 操作大数据的各种组件

Bigdataanalytics_infoh515 ⭐ 56

Material for the Big Data Analytics exercise classes - INFOH515 - Big Data : Distributed Data Management and Scalable Analytics - Université Libre de Bruxelles

Big_data ⭐ 55

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.

Geodocker ⭐ 55

Central repository for the GeoDocker project

Bigdataparty ⭐ 54

大数据组件 All-in-One 的 Dockerfile

Vagrant Hadoop Hive Spark ⭐ 53

Vagrant project to spin up a single node VM running current versions of Hadoop, Hive and Spark

Bestconf ⭐ 53

A tool automatically improving the performance of large-scale systems by finding better configuration settings

Spark Training ⭐ 52

Repository used for Spark Trainings

Docker Hadoop ⭐ 51

A Docker container with a full Hadoop cluster setup with Spark and Zeppelin

This repository is ReadOnly now. please go to https://github.com/apache/incubator-iotdb

Movie Recommender Demo ⭐ 50

This project walks through how you can create recommendations using Apache Spark machine learning. There are a number of jupyter notebooks that you can run on IBM Data Science Experience, and there a live demo of a movie recommendation web application you can interact with. The demo also uses IBM Message Hub (kafka) to push application events to topic where they are consumed by a spark streaming job running on IBM BigInsights (hadoop).

Spark Install ⭐ 50

Installation guide for Apache Spark + Hadoop on Mac/Linux

Hadoop Training ⭐ 50

Hadoop training material from free MapR courses.

Dplyr Spark ⭐ 49

spark backend for dplyr

Datapipelines Essentials Python ⭐ 45

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Hadoop Spark Hive Cluster Docker ⭐ 45

hadoop-spark-hive-cluster-docker

Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure

Docker Hadoop Workbench ⭐ 44

A Hadoop cluster based on Docker, including Hive and Spark.

Docker Spark Cluster ⭐ 44

A Spark cluster setup running on Docker containers

Sparkoscope ⭐ 43

Enabling Spark Optimization through Cross-stack Monitoring and Visualization

Code Of Spark Big Data Business Trilogy ⭐ 42

This is code of book "Spark Big Data Business Trilogy"

Spark Ai Summit Europe 2018 10 ⭐ 42

Spark+AI Summit Europe 2018 PPT下载[共95个]

Sqoop On Spark ⭐ 42

Sqoop on Apache Spark Engine

Yuzhouwan ⭐ 42

Code Library for My Blog

Mongodb Spark Demo ⭐ 41

Spark app that demonstrates reading and writing data to from MongoDB and BSON files

Garmadon ⭐ 39

Java event logs collector for hadoop and frameworks

Openspark ⭐ 39

The out-of-the-box environment to for Hadoop/Spark applications

一个为spark批量导入数据到hbase的库

Spark1.52 ⭐ 38

Spark源代码中文注释

Distributed Tf ⭐ 38

Distributed TensorFlow Examples for O'Reilly

Weblogsanalysissystem ⭐ 37

A big data platform for analyzing web access logs

Xxhadoop ⭐ 37

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Bigdata Getting Started ⭐ 37

大数据相关框架实战项目(Hadoop, Spark, Storm, Flink)

Big Data ⭐ 37

Python tools for big data

Swordfish ⭐ 37

Open-source distribute workflow schedule tools, also support streaming task.

Hadoop Guide ⭐ 36

🐘 关于 HDFS，Yarn，MapReduce，HBase，Hive，Pig，Sqoop，Flume，Zoo 等大数据框架的学习笔记

Spark2 Hadoop2.6 Hbase Labs ⭐ 36

Musketeer ⭐ 35

The Musketeer workflow manager.

Data Infra Projects ⭐ 34

List of some interesting projects

Sparkdemo ⭐ 34

spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)

Visitante ⭐ 34

Set of Hadoop, Spark and Storm based tools for web and customer analytic

Bigdata Docker Compose ⭐ 33

Hadoop, Hive, Spark, Zeppelin and Livy: all in one Docker-compose file.

Telemetry Analysis Service ⭐ 33

Telemetry Analysis Service

Distributed Extraction Framework ⭐ 33

DBpedia Distributed Extraction Framework: Extract structured data from Wikipedia in a parallel, distributed manner

基于hdfs spark的视频非结构化数据计算

Engineeringteam ⭐ 32

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

Awesome Tools ⭐ 32

curated list of awesome tools and libraries for specific domains

Mastering Scala Machine Learning ⭐ 32

Mastering-Scala-Machine-Learning

A library for manipulating bioinformatics sequencing formats in Apache Spark

Geotriples ⭐ 31

Publishing Big Geospatial data as Linked Open Geospatial Data

Dockerfiles ⭐ 31

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Bigdata Docker ⭐ 30

docker构建大数据开发学习环境

Spark Vagrant Vm ⭐ 29

Spark Vagrant VM definition and runnable examples.

Mapreduce ⭐ 29

清华大数据作业MapReduce处理几百个G的JSON数据

Learning Spark ⭐ 29

Tidy up Spark and Hadoop tutorials.

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Spark Openstack ⭐ 29

Scripts to setup Spark cluster (any version) in any Openstack environment with optional useful tools.

Netapp Hadoop Nfs Connector ⭐ 29

This projects provides a NFSv3 connector for Hadoop. Using the connector, Apache Hadoop and Apache Spark can use NFSv3 server as their storage backend.

Documentation placeholder and utilities for all the other containers.

Related Searches

Scala Spark (3,279)

Java Hadoop (2,117)

Python Spark (2,053)

Java Spark (1,587)

Apache Spark (1,207)

Jupyter Notebook Spark (1,151)

Hadoop Hdfs (1,075)

Spark Kafka (985)

Hadoop Mapreduce (847)

Spark Streaming (817)

101-200 of 489 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.