Awesome Open Source

Programming Languages

Search results for spark hdfs

212 search results found

Bigdata Notes ⭐ 14,872

大数据入门指南 ⭐

God Of Bigdata ⭐ 8,483

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.

Tensorflowonspark ⭐ 3,851

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

The flexibility of Python with the scale and performance of modern SQL.

Bigdata Interview ⭐ 1,397

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop

Bigdata Growth ⭐ 1,256

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Devops Python Tools ⭐ 709

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Real Time Analytics and Data Pipelines based on Spark Streaming

Docker Hadoop Spark Workbench ⭐ 503

[EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker containers. Also includes spark-notebook and HDFS FileBrowser.

Zdh_web ⭐ 379

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台，包含数据采集,调度,权限,审批

Spindle ⭐ 333

Next-generation web analytics processing with Scala, Spark, and Parquet.

Tensorspark ⭐ 302

TensorFlow on Spark

Pysparkling ⭐ 253

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Bigdata_docker ⭐ 226

Big Data Ecosystem Docker

Zeppelin Notebooks ⭐ 206

Gallery of Apache Zeppelin notebooks

⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Wifiprobeanalysis ⭐ 189

基于WIFI探针的商业大数据分析技术

Spark Notes ⭐ 183

Magpie contains a number of scripts for running Big Data software in HPC environments, including Hadoop and Spark. There is support for Lustre, Slurm, Moab, Torque. LSF, Flux, and more.

Juicy Bigdata ⭐ 162

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

Dcos Commons ⭐ 162

DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.

Bigdata In Practice ⭐ 154

大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....

Lambda Arch ⭐ 151

A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.

Ipython Spark Docker ⭐ 151

Distributed Graph Analytics ⭐ 135

Distributed Graph Analytics (DGA) is a compendium of graph analytics written for Bulk-Synchronous-Parallel (BSP) processing frameworks such as Giraph and GraphX. The analytics included are High Betweenness Set Extraction, Weakly Connected Components, Page Rank, Leaf Compression, and Louvain Modularity.

Hdfs_fdw ⭐ 131

PostgreSQL foreign data wrapper for HDFS

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Vagrant Hadoop Spark Cluster ⭐ 121

Vagrant project to spin up a cluster of 4 32-bit CentOS6.5 Linux virtual machines with Hadoop v2.6.0 and Spark v1.1.1

Spark With Python ⭐ 98

Fundamentals of Spark with Python (using PySpark), code examples

Jaws Spark Sql Rest ⭐ 92

Big Data Engineering Coursera Yandex ⭐ 91

Big Data for Data Engineers Coursera Specialization from Yandex

Correlation Approximation ⭐ 90

Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets

Focusbigdata ⭐ 89

【大数据成神之路学习路径+面经+简历】

Cuesheet ⭐ 85

A framework for writing Spark 2.x applications in a pretty way

Hadoop_cookbook ⭐ 80

Cookbook to install Hadoop 2.0+ using Chef

Vagrant Hadoop 2.4.1 Spark 1.0.1 ⭐ 79

Vagrant project to spin up a cluster virtual machines with Hadoop v2.4.1 and Spark v1.0.1

Ros_hadoop ⭐ 75

Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.

Sparkplugins ⭐ 70

Code and examples of how to write and deploy Apache Spark Plugins with Spark 3.x. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.

Sparkmultitool ⭐ 66

Tools for spark which we use on the daily basis

Platys Modern Data Platform ⭐ 58

Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....

Mylearningnotes ⭐ 58

Because its never late to start taking notes and 'public' it...

Spark Docker ⭐ 55

Apache Spark Docker Image

Geodocker ⭐ 55

Central repository for the GeoDocker project

Bigdataparty ⭐ 54

大数据组件 All-in-One 的 Dockerfile

Spark Compaction ⭐ 52

File compaction tool that runs on top of the Spark framework.

Sparkstreaming.sessionization ⭐ 51

NRT Sessionization with Spark Streaming landing on HDFS and putting live stats in HBase

Docker Hadoop ⭐ 51

A Docker container with a full Hadoop cluster setup with Spark and Zeppelin

Spark Parquet Thrift Example ⭐ 44

Example Spark project using Parquet as a columnar store with Thrift objects.

Sparkoscope ⭐ 43

Enabling Spark Optimization through Cross-stack Monitoring and Visualization

Spark Cluster Deployment ⭐ 43

Automates Spark standalone cluster tasks with Puppet and Fabric.

Datashark ⭐ 41

dataShark is a Security & Network Event Analytics Framework built on Apache Spark

Seahorse Workflow Executor ⭐ 41

Spark Scala Maven Boilerplate Project ⭐ 40

This is a skeleton of a Scala project with maven to start using Spark

Garmadon ⭐ 39

Java event logs collector for hadoop and frameworks

Etl Light ⭐ 38

A light Kafka to HDFS/S3 ETL library based on Apache Spark

Xxhadoop ⭐ 37

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Spark Docker Compose ⭐ 37

Spark + HDFS cluster using docker compose

Hadoop Guide ⭐ 36

🐘 关于 HDFS，Yarn，MapReduce，HBase，Hive，Pig，Sqoop，Flume，Zoo 等大数据框架的学习笔记

Opendataplatform ⭐ 34

An open source, enterprise-scale, vendor-neutral data platform accelerating solution delivery.

基于hdfs spark的视频非结构化数据计算

Bucketing and partitioning system for Parquet

Mapreduce ⭐ 29

清华大数据作业MapReduce处理几百个G的JSON数据

Starlake ⭐ 29

Starlake is an On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing

Spark Vagrant Vm ⭐ 29

Spark Vagrant VM definition and runnable examples.

Learning Spark ⭐ 29

Tidy up Spark and Hadoop tutorials.

Topnotch ⭐ 29

A framework for systematically quality controlling big data.

Sparqlgx ⭐ 28

Efficient Distributed Evaluation of SPARQL with Apache Spark

Enceladus ⭐ 28

Dynamic Conformance Engine

一个对用户行为日志进行分析的大数据项目

Glm Parser ⭐ 26

Tree-adjoining grammar based statistical dependency parser using a general linear model (glm).

Sparkhbaseexample ⭐ 26

Spark code to analyze HBase Snapshots

Peel is a framework that helps you to define, execute, analyze, and share experiments for distributed systems and algorithms.

Sparkproject ⭐ 26

Using Apache Spark in an ArcMap Toolbox

Bigdata Doc ⭐ 25

大数据学习笔记，学习路线，技术案例整理。

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Sansa Notebooks ⭐ 25

Interactive Spark Notebooks for running SANSA examples.

Demo Spark Sensor Data ⭐ 25

Demo Spark application to transform data gathered on sensors for a heatmap application

Spark Example ⭐ 24

spark mllib example

Neo4j Graphx ⭐ 23

Similar to Shifu - Neo4j-GraphX extends Neo4j graph database to process big data graph algorithms with HDFS and Apache Spark on a scalable data set

Streamingstopgraceful ⭐ 23

Example to show how to stop the Spark Streaming Application Gracefully

Bigdata Tutorial ⭐ 22

Spark Workshop ⭐ 22

Code examples and docker environment for Spark

Kafka Spark Streaming ⭐ 22

Project for reading data from kafka and writing to kafka and HBase with kerberos

Fastdata Cluster ⭐ 22

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

De 100 Days ⭐ 22

data engineering 100 days 🤖 🧲 🦾 | #DE

Whakapai ⭐ 22

Various Python Data Science Projects available in PyPi

Spark_log_data ⭐ 21

Flume-to-Spark-Streaming Log Parser

Spark In Space ⭐ 21

Heroku Button Deploy of Apache Spark Clusters in private spaces + buildpack for deploying spark jobs

Spark Gdb ⭐ 20

A library for parsing and querying an Esri File Geodatabase with Apache Spark.

Tensoronspark ⭐ 20

Running Tensorflow on Spark in the scalable, fast and compatible style

Spark Yarn Rest Api ⭐ 20

Demonstrates how to submit a job to Spark on HDP directly via YARN's REST API from any workstation

Offlineesindexgenerator ⭐ 19

Offline Elasticsearch index generator

Spark Hdfs On Kubernetes ⭐ 18

Jun_bigdata ⭐ 18

jun_bigdata大数据平台服务框架。实现了Kafka实时数据过滤、清洗、转换、消费，实现了Sp SQL对Redis、MongoDB等非关系型数据库的数据的读写；集成了规则引擎，可基于规则引擎实现客

Data Pipeline Project ⭐ 18

Data pipeline project

Conductor ⭐ 18

Efficient, distributed downloads of large files from S3 to HDFS using Spark.

Spark Notes ⭐ 18

Note anything during writing spark or scala

Related Searches

Scala Spark (3,279)

Python Spark (2,053)

Java Spark (1,587)

Jupyter Notebook Spark (1,268)

Apache Spark (1,207)

Spark Hadoop (1,188)

Hadoop Hdfs (1,075)

Spark Kafka (985)

Spark Streaming (817)

Spark Pyspark (812)

1-100 of 212 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.