Hadoopy

Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.
Alternatives To Hadoopy
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Data Science Ipython Notebooks24,829
17 hours ago34otherPython
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Bigdata Notes13,291
2 months ago33Java
大数据入门指南 :star:
Cookbook11,362
3 months ago108apache-2.0
The Data Engineering Cookbook
Hive4,733
15 hours ago92apache-2.0Java
Apache Hive
Scalding3,3583740a year ago43September 14, 2016318apache-2.0Scala
A Scala API for Cascading
Mrjob2,58411216 months ago62September 17, 2020211otherPython
Run MapReduce jobs on Hadoop or Amazon Web Services
Poseidon1,543
6 years ago9bsd-3-clauseGo
A search engine which can hold 100 trillion lines of log data.
Mongo Hadoop1,511789a year ago14January 27, 201716Java
MongoDB Connector for Hadoop
Bigdata Interview1,237
2 years agon,ull
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Data Algorithms Book973
a year ago14otherJava
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Alternatives To Hadoopy
Select To Compare


Alternative Project Comparisons
Readme

Brandyn White [email protected] Andrew Miller [email protected]

Source https://github.com/bwhite/hadoopy/ Issues https://github.com/bwhite/hadoopy/issues Docs http://bwhite.github.com/hadoopy/

IRC: #hadoopy @ freenode.net

Requirements python development headers (python-dev), build tools (build-essential)

Optional cython (>=.13) (without this it falls back to the pregenerated .c files)

Features

  • oozie support
  • Automated job parallelization 'auto-oozie' available in the hadoopy_flow project (maintained out of branch)
  • typedbytes support (very fast)
  • Local execution of unmodified MapReduce job with launch_local
  • Read/write sequence files of TypedBytes directly to HDFS from python (readtb, writetb)
  • Works on OS X
  • Allows printing to stdout and stderr in Hadoop tasks without causing problems (uses the 'pipe hopping' technique, both are available in the task's stderr)
  • critical path is in Cython
  • works on clusters without any extra installation, Python, or any Python libraries (uses Pyinstaller that is included in this source tree)
  • Simple HDFS access (readtb and ls) inside Python, even inside running jobs
  • Unit test interface
  • Reporting using status and counters (and print statements! no need to be scared of them in Hadoopy)
  • Supports design patterns in the Lin/Dyer book (http://www.umiacs.umd.edu/~jimmylin/book.html)

Limitations

Used in

  • A Case for Query by Image and Text Content: Searching Computer Help using Screenshots and Keywords (to appear in WWW'11)
  • Web-Scale Computer Vision using MapReduce for Multimedia Data Mining (at KDD'10)
  • Vitrieve: Visual Search engine
  • Picarus: Hadoop computer vision toolbox

Ubuntu Install (others are similar) sudo apt-get install python-dev build-essential sudo python setup.py install

Popular Mapreduce Projects
Popular Hadoop Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
C
Hadoop
Hdfs
Cython
Mapreduce