Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark hdfs
hdfs
x
spark
x
212 search results found
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
God Of Bigdata
⭐
8,483
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive.
Tensorflowonspark
⭐
3,851
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Ibis
⭐
3,404
The flexibility of Python with the scale and performance of modern SQL.
Bigdata Interview
⭐
1,397
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop
Bigdata Growth
⭐
1,256
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Sparta
⭐
526
Real Time Analytics and Data Pipelines based on Spark Streaming
Docker Hadoop Spark Workbench
⭐
503
[EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker containers. Also includes spark-notebook and HDFS FileBrowser.
Zdh_web
⭐
379
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批
Spindle
⭐
333
Next-generation web analytics processing with Scala, Spark, and Parquet.
Tensorspark
⭐
302
TensorFlow on Spark
Pysparkling
⭐
253
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Bigdata_docker
⭐
226
Big Data Ecosystem Docker
Zeppelin Notebooks
⭐
206
Gallery of Apache Zeppelin notebooks
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Wifiprobeanalysis
⭐
189
基于WIFI探针的商业大数据分析技术
Spark Notes
⭐
183
Magpie
⭐
182
Magpie contains a number of scripts for running Big Data software in HPC environments, including Hadoop and Spark. There is support for Lustre, Slurm, Moab, Torque. LSF, Flux, and more.
Juicy Bigdata
⭐
162
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Dcos Commons
⭐
162
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Bigdata In Practice
⭐
154
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Lambda Arch
⭐
151
A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.
Ipython Spark Docker
⭐
151
Distributed Graph Analytics
⭐
135
Distributed Graph Analytics (DGA) is a compendium of graph analytics written for Bulk-Synchronous-Parallel (BSP) processing frameworks such as Giraph and GraphX. The analytics included are High Betweenness Set Extraction, Weakly Connected Components, Page Rank, Leaf Compression, and Louvain Modularity.
Hdfs_fdw
⭐
131
PostgreSQL foreign data wrapper for HDFS
Cobrix
⭐
131
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Vagrant Hadoop Spark Cluster
⭐
121
Vagrant project to spin up a cluster of 4 32-bit CentOS6.5 Linux virtual machines with Hadoop v2.6.0 and Spark v1.1.1
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Jaws Spark Sql Rest
⭐
92
Big Data Engineering Coursera Yandex
⭐
91
Big Data for Data Engineers Coursera Specialization from Yandex
Correlation Approximation
⭐
90
Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets
Focusbigdata
⭐
89
【大数据成神之路学习路径+面经+简历】
Cuesheet
⭐
85
A framework for writing Spark 2.x applications in a pretty way
Hadoop_cookbook
⭐
80
Cookbook to install Hadoop 2.0+ using Chef
Vagrant Hadoop 2.4.1 Spark 1.0.1
⭐
79
Vagrant project to spin up a cluster virtual machines with Hadoop v2.4.1 and Spark v1.0.1
Ros_hadoop
⭐
75
Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Sparkplugins
⭐
70
Code and examples of how to write and deploy Apache Spark Plugins with Spark 3.x. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
Sparkmultitool
⭐
66
Tools for spark which we use on the daily basis
Platys Modern Data Platform
⭐
58
Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....
Mylearningnotes
⭐
58
Because its never late to start taking notes and 'public' it...
Spark Docker
⭐
55
Apache Spark Docker Image
Geodocker
⭐
55
Central repository for the GeoDocker project
Bigdataparty
⭐
54
大数据组件 All-in-One 的 Dockerfile
Spark Compaction
⭐
52
File compaction tool that runs on top of the Spark framework.
Sparkstreaming.sessionization
⭐
51
NRT Sessionization with Spark Streaming landing on HDFS and putting live stats in HBase
Docker Hadoop
⭐
51
A Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Spark Parquet Thrift Example
⭐
44
Example Spark project using Parquet as a columnar store with Thrift objects.
Sparkoscope
⭐
43
Enabling Spark Optimization through Cross-stack Monitoring and Visualization
Spark Cluster Deployment
⭐
43
Automates Spark standalone cluster tasks with Puppet and Fabric.
Datashark
⭐
41
dataShark is a Security & Network Event Analytics Framework built on Apache Spark
Seahorse Workflow Executor
⭐
41
Spark Scala Maven Boilerplate Project
⭐
40
This is a skeleton of a Scala project with maven to start using Spark
Devops
⭐
40
DevOps
Garmadon
⭐
39
Java event logs collector for hadoop and frameworks
Etl Light
⭐
38
A light Kafka to HDFS/S3 ETL library based on Apache Spark
Xxhadoop
⭐
37
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Spark Docker Compose
⭐
37
Spark + HDFS cluster using docker compose
Hadoop Guide
⭐
36
🐘 关于 HDFS,Yarn,MapReduce,HBase,Hive,Pig,Sqoop,Flume,Zoo 等大数据框架的学习笔记
Opendataplatform
⭐
34
An open source, enterprise-scale, vendor-neutral data platform accelerating solution delivery.
Cipher
⭐
33
基于hdfs spark的视频非结构化数据计算
Pucket
⭐
29
Bucketing and partitioning system for Parquet
Mapreduce
⭐
29
清华大数据作业MapReduce处理几百个G的JSON数据
Starlake
⭐
29
Starlake is an On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Spark Vagrant Vm
⭐
29
Spark Vagrant VM definition and runnable examples.
Learning Spark
⭐
29
Tidy up Spark and Hadoop tutorials.
Topnotch
⭐
29
A framework for systematically quality controlling big data.
Sparqlgx
⭐
28
Efficient Distributed Evaluation of SPARQL with Apache Spark
Enceladus
⭐
28
Dynamic Conformance Engine
Aaocp
⭐
27
一个对用户行为日志进行分析的大数据项目
Glm Parser
⭐
26
Tree-adjoining grammar based statistical dependency parser using a general linear model (glm).
Sparkhbaseexample
⭐
26
Spark code to analyze HBase Snapshots
Peel
⭐
26
Peel is a framework that helps you to define, execute, analyze, and share experiments for distributed systems and algorithms.
Sparkproject
⭐
26
Using Apache Spark in an ArcMap Toolbox
Bigdata Doc
⭐
25
大数据学习笔记,学习路线,技术案例整理。
Wasp
⭐
25
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Spash
⭐
25
Spash
Sansa Notebooks
⭐
25
Interactive Spark Notebooks for running SANSA examples.
Demo Spark Sensor Data
⭐
25
Demo Spark application to transform data gathered on sensors for a heatmap application
Spark Example
⭐
24
spark mllib example
Neo4j Graphx
⭐
23
Similar to Shifu - Neo4j-GraphX extends Neo4j graph database to process big data graph algorithms with HDFS and Apache Spark on a scalable data set
Streamingstopgraceful
⭐
23
Example to show how to stop the Spark Streaming Application Gracefully
Bigdata Tutorial
⭐
22
Spark Workshop
⭐
22
Code examples and docker environment for Spark
Kafka Spark Streaming
⭐
22
Project for reading data from kafka and writing to kafka and HBase with kerberos
Fastdata Cluster
⭐
22
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
De 100 Days
⭐
22
data engineering 100 days 🤖 🧲 🦾 | #DE
Whakapai
⭐
22
Various Python Data Science Projects available in PyPi
Spark_log_data
⭐
21
Flume-to-Spark-Streaming Log Parser
Knn_is
⭐
21
Spark In Space
⭐
21
Heroku Button Deploy of Apache Spark Clusters in private spaces + buildpack for deploying spark jobs
Spark Gdb
⭐
20
A library for parsing and querying an Esri File Geodatabase with Apache Spark.
Tensoronspark
⭐
20
Running Tensorflow on Spark in the scalable, fast and compatible style
Spark Yarn Rest Api
⭐
20
Demonstrates how to submit a job to Spark on HDP directly via YARN's REST API from any workstation
Offlineesindexgenerator
⭐
19
Offline Elasticsearch index generator
Spark Hdfs On Kubernetes
⭐
18
Jun_bigdata
⭐
18
jun_bigdata大数据平台服务框架。实现了Kafka实时数据过滤、清洗、转换、消费,实现了Sp SQL对Redis、MongoDB等非关系型数据库的数据的读写;集成了规则引擎,可基于规则引擎实现客
Data Pipeline Project
⭐
18
Data pipeline project
Conductor
⭐
18
Efficient, distributed downloads of large files from S3 to HDFS using Spark.
Spark Notes
⭐
18
Note anything during writing spark or scala
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Jupyter Notebook Spark (1,268)
Apache Spark (1,207)
Spark Hadoop (1,188)
Hadoop Hdfs (1,075)
Spark Kafka (985)
Spark Streaming (817)
Spark Pyspark (812)
1-100 of 212 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.