Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for mapreduce
mapreduce
x
671 search results found
Data Science Ipython Notebooks
⭐
25,668
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Redisson
⭐
22,647
Redisson - Easy Redis Java client with features of In-Memory Data Grid. Sync/Async/RxJava/Reactive API. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache ...
Bigdata Notes
⭐
14,872
大数据入门指南 ⭐
Cookbook
⭐
12,557
The Data Engineering Cookbook
Powerjob
⭐
6,249
Enterprise job scheduling middleware with distributed computing ability.
Dev Setup
⭐
5,802
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Hive
⭐
5,222
Apache Hive
Scalding
⭐
3,433
A Scala API for Cascading
Gleam
⭐
3,266
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.
Mit 6.824
⭐
2,976
Basic Sources for MIT 6.824 Distributed Systems Class
Dpark
⭐
2,637
Python clone of Spark, a MapReduce alike framework in Python
Mrjob
⭐
2,584
Run MapReduce jobs on Hadoop or Amazon Web Services
Summingbird
⭐
2,117
Streaming MapReduce with Scalding and Storm
Poseidon
⭐
1,543
A search engine which can hold 100 trillion lines of log data.
Mongo Hadoop
⭐
1,511
MongoDB Connector for Hadoop
Bigdata Interview
⭐
1,397
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop
Bigdata Growth
⭐
1,256
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Data Algorithms Book
⭐
973
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Mobius
⭐
937
C# and F# language binding and extensions to Apache Spark
Bashreduce
⭐
889
mapreduce in bash
Numaflow
⭐
866
Kubernetes-native platform to run massively parallel data/streaming jobs
Cdap
⭐
735
An open source framework for building data analytic applications.
Corral
⭐
652
🐎 A serverless MapReduce framework written for AWS Lambda
Perfectdocs
⭐
548
Reference and documentation for Perfect (Server-side Swift). Perfect (支持服务器端Swift语言的软件函数库)使用文档和参考手册.
Elephantdb
⭐
540
Distributed database specialized in exporting key/value data from Hadoop
Bigdata Ecosystem
⭐
536
BigData Ecosystem Dataset
Bigslice
⭐
525
A serverless cluster computing system for the Go programming language
Scoobi
⭐
485
A Scala productivity framework for Hadoop.
Mincemeatpy
⭐
467
Lightweight MapReduce in python
Transducers.jl
⭐
414
Efficient transducers for Julia
Bigdata
⭐
358
💎🔥大数据学习笔记
Lambda Refarch Mapreduce
⭐
355
This repo presents a reference architecture for running serverless MapReduce jobs. This has been implemented using AWS Lambda and Amazon S3.
Redisgears
⭐
339
Dynamic execution framework for your Redis data
Cascading
⭐
337
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Tdigest
⭐
332
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Incubator Uniffle
⭐
332
Uniffle is a high performance, general purpose Remote Shuffle Service.
Threadsx.jl
⭐
301
Parallelized Base functions
Mapreduce
⭐
285
C++ MapReduce Library for efficient multi-threading on single-machine
Behemoth
⭐
284
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Compass
⭐
284
Compass is a task diagnosis platform for bigdata
Parkour
⭐
261
Hadoop MapReduce in idiomatic Clojure.
Hadoopy
⭐
244
Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.
Firestorm
⭐
240
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Mapreduce Lite
⭐
230
A C++ implementaton of MapReduce without distributed filesystem
Appengine Mapreduce
⭐
229
A library for running MapReduce jobs on App Engine
Ddia
⭐
225
《设计数据密集型应用》中文翻译 《Designing Data-Intensive Application》
Hadoop Docker
⭐
210
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Commoncrawl Crawler
⭐
208
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
Hadoop Pcap
⭐
202
Hadoop library to read packet capture (PCAP) files
Mapreduce
⭐
201
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
Wonderdog
⭐
193
Bulk loading for elastic search
Etl Language Comparison
⭐
185
Count the number of times certain words were said in a particular neighborhood. Performed as a basic MapReduce job against 25M tweets. Implemented with different programming languages as a educational exercise.
Sharding Method
⭐
169
分表分库的新思路——服务层Sharding框架,全SQL、全数据库兼容,ACID特性与原生数据库一致
Terrapin
⭐
168
Serving system for batch generated data sets
Juicy Bigdata
⭐
162
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Learning Hadoop And Spark
⭐
160
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Awesome Couchdb
⭐
159
CouchDB - curated meta resources & best practices list
Cc Mrjob
⭐
157
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
Bigdata In Practice
⭐
154
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Pda_book
⭐
154
Code Examples Data Science using Python
Mr.lda
⭐
153
Scalable Topic Modeling using Variational Inference in MapReduce
Data Algorithms With Spark
⭐
151
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Huaweicloud Mrs Example
⭐
150
Examples for HUAWEI CLOUD MRS.
Spatialhadoop2
⭐
148
The second generation of SpatialHadoop that ships as an extension
Gopark
⭐
145
A Naive/Local Go Porting of Spark/DPark
Hadoop R
⭐
135
Example code for running R on Hadoop
Big Data Mapreduce Course
⭐
135
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Distribution System
⭐
132
分布式系统学习
Hpaste
⭐
130
HBase DSL for Scala with MapReduce support
Goriakpbc
⭐
129
A golang riak client inspired by the Ruby riak-client from Basho and riakpbc from mrb
Hipi
⭐
128
HIPI: Hadoop Image Processing Interface
Python Bigdata
⭐
128
Data science and Big Data with Python
Hadoopdemo
⭐
128
Hadoop简单应用案例,包括MapReduce、单词统计、HDFS基本操作、web日志分析、Zoo
Mupd8
⭐
126
Muppet
Ctenopharyngodon Idella
⭐
125
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities. (Hadoop,mapreduce分布式爬取掌上高考的所有中国大学数据)
Kaylee
⭐
123
MapReduce with ZeroMQ
Aliyun Emapreduce Demo
⭐
123
Mapreduce
⭐
122
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Sequenceiq Samples
⭐
119
SequenceIQ Hadoop examples
Dtail
⭐
119
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Hackathon
⭐
114
Library and resources for hack/reduce Hackathon events
Asakusafw
⭐
113
Asakusa Framework
Flox
⭐
112
Fast & furious GroupBy operations for dask.array
Avro Hadoop Starter
⭐
111
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Gora
⭐
111
The Apache Gora open source framework provides an in-memory data model and persistence for big data.
Hadron
⭐
110
Construct and run Hadoop MapReduce programs in Haskell
Dynamometer
⭐
110
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
6.824 Lecture Notes
⭐
106
6.824 Distributed Systems: Lecture notes (edited a little and formatted with Markdown)
Introtohadoopandmr__udacity_course
⭐
103
🐘 Source code for assignments of Udacity course "Introduction to Hadoop and MapReduce"
Babar
⭐
101
Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.
Dampr
⭐
101
Python Data Processing library
Crunch
⭐
100
Mirror of Apache Crunch (Incubating)
Distributed Statistical Computing
⭐
99
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
Spark With Python
⭐
98
Fundamentals of Spark with Python (using PySpark), code examples
Big Data Engineering Coursera Yandex
⭐
91
Big Data for Data Engineers Coursera Specialization from Yandex
Focusbigdata
⭐
89
【大数据成神之路学习路径+面经+简历】
Nativescript Couchbase
⭐
88
Annotated Wikiextractor
⭐
88
Simple Wikipedia plain text extractor with article link annotations and Hadoop support.
Elastic Mapreduce Ruby
⭐
86
Amazon's elastic mapreduce ruby client. Ruby 1.9.X compatible
Lemur
⭐
85
Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions and zero or more "steps".
Related Searches
Hadoop Mapreduce (851)
Java Mapreduce (759)
Python Mapreduce (383)
1-100 of 671 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.