Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Data Science Ipython Notebooks	25,668			6 months ago			34	other	Python
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Bigdata Notes	14,872			4 months ago			39		Java
大数据入门指南 :star:
Cookbook	12,557			4 months ago			111	apache-2.0
The Data Engineering Cookbook
Hive	5,222			3 months ago			89	apache-2.0	Java
Apache Hive
Scalding	3,433	37	40	a year ago	43	September 14, 2016	319	apache-2.0	Scala
A Scala API for Cascading
Mrjob	2,584	112	2	2 years ago	62	December 15, 2021	211	other	Python
Run MapReduce jobs on Hadoop or Amazon Web Services
Poseidon	1,543			7 years ago			9	bsd-3-clause	Go
A search engine which can hold 100 trillion lines of log data.
Mongo Hadoop	1,511	78	10	2 years ago	14	January 27, 2017	16		Java
MongoDB Connector for Hadoop
Bigdata Interview	1,397			3 years ago			n,ull
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Bigdata Growth	1,256			6 days ago			1	mit	Shell
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Alternatives To Cc Mrjob

Select To Compare

Data Science Ipython Notebooks ⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

most recent commit 6 months ago

Bigdata Notes ⭐ 14,872

大数据入门指南 :star:

most recent commit 4 months ago

Cookbook ⭐ 12,557

The Data Engineering Cookbook

most recent commit 4 months ago

Hive ⭐ 5,222

Apache Hive

most recent commit 3 months ago

Scalding ⭐ 3,433

A Scala API for Cascading

dependent packages 40total releases 43most recent commit a year ago

Mrjob ⭐ 2,584

Run MapReduce jobs on Hadoop or Amazon Web Services

dependent packages 2total releases 62most recent commit 2 years ago

Poseidon ⭐ 1,543

A search engine which can hold 100 trillion lines of log data.

most recent commit 7 years ago

Mongo Hadoop ⭐ 1,511

MongoDB Connector for Hadoop

dependent packages 10total releases 14most recent commit 2 years ago

Bigdata Interview ⭐ 1,397

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含H

most recent commit 3 years ago

Bigdata Growth ⭐ 1,256

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

most recent commit 6 days ago

Suggest An Alternative To cc-mrjob

Alternative Project Comparisons

Cc Mrjob vs Data Science Ipython Notebooks

Cc Mrjob vs Bigdata Notes

Cc Mrjob vs Mongo Hadoop

Cc Mrjob vs Bigdata Interview

Cc Mrjob vs Bigdata Growth

Popular Hadoop Projects

Spark ⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

dependent packages 939total releases 46latest release May 09, 2021most recent commit 3 months ago

Xgboost ⭐ 25,253

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

dependent packages 972total releases 79latest release November 13, 2023most recent commit 3 months ago

Luigi ⭐ 17,046

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

dependent packages 76total releases 80latest release October 05, 2023most recent commit 3 months ago

Apijson ⭐ 16,586

🏆 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码，前端(客户端) 定制返回 JSON 的数据和结构。 🏆 A JSON Transmission Protocol and an ORM Library 🚀 provides APIs and Docs without writing any code.

most recent commit 19 days ago

Deeplearning4j ⭐ 13,397

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.

dependent packages 119total releases 54latest release August 10, 2022most recent commit a month ago

Popular Mapreduce Projects

Redisson ⭐ 22,647

Redisson - Easy Redis Java client with features of In-Memory Data Grid. Sync/Async/RxJava/Reactive API. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache ...

dependent packages 358total releases 218latest release October 24, 2023most recent commit 14 days ago

Powerjob ⭐ 6,249

Enterprise job scheduling middleware with distributed computing ability.

dependent packages 5total releases 13latest release September 03, 2023most recent commit 3 months ago

Dev Setup ⭐ 5,802

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

most recent commit 2 years ago

Gleam ⭐ 3,266

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

dependent packages 1total releases 1latest release May 13, 2021most recent commit 5 months ago

Mit 6.824 ⭐ 2,976

Basic Sources for MIT 6.824 Distributed Systems Class

most recent commit 7 months ago

Popular Data Processing Categories