Awesome Open Source

Programming Languages

Search results for spark hdfs

212 search results found

Spark Emr ⭐ 17

Spark Elastic MapReduce bootstrap and runnable examples.

Bidmach_spark ⭐ 16

Code to allow running BIDMach on Spark including HDFS integration and lightweight sparse model updates (Kylix).

Spark2 Etl Examples ⭐ 16

A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0

Spark Cnn ⭐ 16

CS848 Final Project (using spark to speed up CNN)

Hdfs Spark Hive Dev Setup ⭐ 15

This repository contains makescript and instruction on how to setup local hdfs+spark+hive setup.

Yandex Big Data Engineering ⭐ 15

Spark Fits ⭐ 15

FITS data source for Spark SQL and DataFrames

小白大数据学习笔记 ⭐

Minispark ⭐ 15

Java implementation of a mini Spark-like framework named MiniSpark that can run on top of a HDFS cluster. MiniSpark supports operators including Map, FlatMap, MapPair, Reduce, ReduceByKey, Collect, Count, Parallelize, Join and Filter.

Featurestore ⭐ 15

Building blocks and patterns for building data prep transformations and feature engineering in Spark.

Cloud Local ⭐ 14

Install script for a local 1 node cloud...no excuses folks

A Big Data Platform Prototype Project

Sparkphoenix ⭐ 14

Spark Example using Phoenix to interact with HBase

Big Data Course ⭐ 14

Practice course on Big Data

Copybookinputformat ⭐ 14

Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...

Minikube for big data with Scala and Spark

Bigdata Fun ⭐ 14

A complete (distributed) BigData stack, running in containers

Local Hashicorp Stack ⭐ 14

Local Hashicorp Stack for DevOps Development without Hypervisor or Cloud

Hdfs Geohex ⭐ 13

(Web)Mapping Elephants with Sparks

Spark Playground ⭐ 13

Playground for experimenting with Apache Spark

Bigdata_docker ⭐ 13

Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook

Taller_sparkr ⭐ 12

Taller SparkR para las Jornadas de Usuarios de R

Camus Compressor ⭐ 12

Camus Compressor merges files created by Camus and saves them in a compressed format.

Sparkfaultbench ⭐ 12

A Spark Reliability Testing Suite

Cmsspark ⭐ 12

General purpose framework to run CMS experiment workflows on HDFS/Spark platform

Cloudera Framework ⭐ 12

Spark Benchmarks ⭐ 12

Benchmarking suite for Apache Spark

Spring Boot Spark Integration Demo ⭐ 12

Demo on how to integrate Spring Data JPA, Apache Spark and GraphX with Java and Scala mixed codes

Bigdataguide ⭐ 11

秋招自学上岸，自学太难了，想总结一份详细的大数据开发资料，包括基础 | 架构 | 源码，让更多自学的伙伴少走弯路。有相关问题可以添加公众号：大数据老刘，联系老刘！

using FM latent vectors as embedding features

Dcos Jupyterlab Service ⭐ 11

JupyterLab Notebook for Mesosphere DC/OS

Spark_mllib_algorithm_1.6.0 ⭐ 11

Spark Mllib 1.6.0版本算法封装

Git Influencer ⭐ 11

Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Network.

Cca175 Exam Preparation ⭐ 11

Cloudera CCA175 Spark and Hadoop Developer exam preparation

Spark Tpcds Benchmark ⭐ 11

Utility for benchmarking changes in Spark using TPC-DS workloads

Easterbunny ⭐ 11

EasterBunny数据分析

Artmosphere ⭐ 11

Data Engineering Project at Insight

Sparknow ⭐ 11

Deploy Spark on OpenStack. Now!

Literate Computing Hadoop ⭐ 11

Literate Computing for Reproducible Infrastructure - Hadoop Practice

Masterdatcom_bdcc_practice ⭐ 10

Practice and Workshop on BigData and Cloud Computing using Docker Containers and OpenNebula. HDFS, hadoop and spark+R

TPC-DS benchmarks including data generation with Spark and queries with Spark

SmartFD: Efficient and Scalable Functional Dependency Discovery on Distributed Data-Parallel Platforms

Tis Ansible ⭐ 10

TIS deployment script

Bigdata Etl Pipeline ⭐ 10

The Data Pipeline and Analytics Stack is a comprehensive solution designed for processing, storing, and visualizing data. Explore a complete data pipeline with all components seamlessly set up and ready to use

Spark On Yarn Cluster ⭐ 10

A Procedure To Create A Yarn Cluster Based on Docker, Run Spark, And Do TPC-DS Performance Test.

Bigdata20180301 ⭐ 10

巨量資料導論上課資料

Hadoop On Kubernetes ⭐ 10

hadoop on kubernetes. It contains the configuration of HDFS and Yarn

Docker Mesos Pyspark Hdfs ⭐ 9

example of a simulated multi-node mesos/(py)spark cluster using docker containers

Telecom Streaming ⭐ 9

Telecom scenarios implemented with streaming techniques

Hackathonclt2019 ⭐ 9

DGST: Efficient and Scalable Generalized Suffix Tree Construction on Apache Spark

Bigdatademo ⭐ 9

The demo of using Kafka, Spark, Hive, Cassandra, etc by using Docker. It produces the production ready environment for any kinds of big data project relates to Hadoop ecosystem

Bigdata Docker ⭐ 9

Run Hadoop Cluster within Docker Containers.

Imb Sampling Ros_and_rus ⭐ 9

Spark implementations of two data sampling methods (random oversampling and random undersampling) for imbalanced classification datasets

Fastunfolding ⭐ 9

Spark2 H2o R Zeppelin ⭐ 9

A stack for data mining using Spark2, H2O, R and Zeppelin running on Cloudera Hadoop Distribution

Lambda_poc ⭐ 8

example lambda architecture using Kafka, Spark, Cassandra, Hadoop

Spark Yarn Hadoop Cluster Vagrant ⭐ 8

Vagrant project to spin up a cluster of 4 nodes with Spark, YARN and Hadoop

Geotrellis Geomesa Template Project ⭐ 8

Tutorial with Spark, GeoTrellis and GeoMesa examples

Distributed linear algebra operations using Apache Spark

Geotrellis Ec2 Cluster ⭐ 8

Scripts to deploy a GeoTrellis Spark cluster on EC2

Docker Spark Yarn Cluster Mode ⭐ 8

Run Spark 2.0.2 on YARN and HDFS inside docker container in Multi-Node Cluster mode

Hadoop Hands On ⭐ 8

Learning how to tame the Big Data with Hadoop and related technologies

Streamsx.sparkmllib ⭐ 8

Toolkit for real-time scoring using Apache Spark MLLib library

Hands On Hadoop ⭐ 8

Hadoop, MapReduce, HDFS, Spark, Pig, Hive, HBase, MongoDB, Cassandra, Flume - the list goes on! Over 25 technologies.

빅데이터 pipeline 구성 요소 기술들에 관한 coding 실습 및 연구

Vagrant Jilla Hadoop ⭐ 8

Vagrant setup to spin up vm hadoop cluster

2018 Hadoop ⭐ 7

存放代码资源，交流大数据开发技术。共同成长，一同进步。

Spark Kuromoji Tokenizer ⭐ 7

Kuromoji Tokenizer for Spark DataFrames

Etl Processes Using Sqoop Hadoop Hive Spark And Scala ⭐ 7

I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perform analytics using Spark and Scala and loading the data back to HDFS.

Spark Tpc Ds ⭐ 7

Spark job for the TPC-DS benchmark

Tidyr.big ⭐ 7

Scalable backend for tidyr

Docker Hdfs Alluxio Spark ⭐ 7

Docker images and deployment configurations for a cluster of HDFS, Alluxio and Spark. Focusing on data locality. Support Openshift 3.4, and more comming.

Spark Kubernetes Demo ⭐ 7

Spark on Kubernetes for Demo

spark + kuromoji + d3.js = 誰でも簡単できる「つぶやきビッグデータ」

Example Spark Scala Read And Write From Hdfs ⭐ 7

Spark All Pairs Shortest Path ⭐ 7

Distributedml ⭐ 6

Distributed Machine Learning for Stock Price Prediction

Spark Es Csv ⭐ 6

spark export hdfs file to json or csv

Easynotes ⭐ 6

EasyNotes（简记）- sync with gitbook.

Big Data Stack ⭐ 6

Hadoop-based Big Data stack (hdfs, yarn, spark, etc)

Infraestructura para Big Data : Hadoop + NiFi +Spark + Hive usando Docker

Big Data Cluster ⭐ 6

The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.

Sparkdatalineagecapture ⭐ 6

Capture the logical plan from Spark (SQL)

Big Data Knowledge ⭐ 6

📖大数据相关知识集锦

Docker Single Node Hadoop ⭐ 6

This docker is used to create a single node hadoop with yarn activated

Map_reduce Ntua ⭐ 6

Lab exercise of Advanced Topics in Database Systems course in NTUA regarding Map Reduce

Fantasysportsleagues ⭐ 6

Implementation of a website that tracks fantasy sports leagues.

Spark Twitter Example ⭐ 6

Spark example app that demonstrates, on a broad level, the various aspects of Spark.

Distributable_docker_sql_on_hadoop ⭐ 6

Toy Hadoop cluster combining various SQL-on-Hadoop variants

Loganalysis ⭐ 6

日志分析项目

Virgo Spark Cluster ⭐ 6

Docker Images for the Virgo Spark Cluster. Distribution including HDFS, YARN, Hive, Spark 2.3+

Bigdata Platform ⭐ 6

End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter

Sahab Cloud Service

小白大数据学习笔记，学习路线，技术路线

Bigdata Ecosystem Architecture ⭐ 6

Life-cycle: Internal working of HDFS, SQOOP, HIVE, SPARK, HBASE, KAFKA with code.

Cluster In A Box ⭐ 5

Contains a Dockerised Spark cluster including Cassandra, YARN, HDFS and Zeppelin. For education only.

Hadoopsparkeigenfaces ⭐ 5

SVD computation via Hadoop and Spark for Eigenfaces face recognition

Related Searches

Scala Spark (3,279)

Python Spark (2,053)

Java Spark (1,587)

Jupyter Notebook Spark (1,268)

Apache Spark (1,207)

Spark Hadoop (1,188)

Hadoop Hdfs (1,075)

Spark Kafka (985)

Spark Streaming (817)

Spark Pyspark (812)

101-200 of 212 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.