Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Android Nosql | 287 | 3 years ago | 3 | apache-2.0 | Java | |||||
Lightweight, simple structured NoSQL database for Android | ||||||||||
Elastic Mapreduce Ruby | 86 | 9 years ago | 8 | apache-2.0 | Ruby | |||||
Amazon's elastic mapreduce ruby client. Ruby 1.9.X compatible | ||||||||||
Lemur | 85 | 6 years ago | 8 | apache-2.0 | Clojure | |||||
Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions and zero or more "steps". | ||||||||||
Cc Helloworld | 33 | 9 years ago | 1 | Java | ||||||
CommonCrawl Hello World example | ||||||||||
Ceteri Mapred | 19 | 12 years ago | Python | |||||||
MapReduce examples | ||||||||||
Spark Emr | 17 | 10 years ago | 9 | Scala | ||||||
Spark Elastic MapReduce bootstrap and runnable examples. | ||||||||||
Rdfgrid | 16 | 13 years ago | 1 | March 27, 2010 | unlicense | Ruby | ||||
[Unmaintained] RDFgrid is a framework for batch-processing RDF data with Hadoop and Amazon Elastic MapReduce. | ||||||||||
Sofia | 14 | 10 years ago | Java | |||||||
Code example to use Elastic Search as vector provider for a Mahout classifier (and for data exploration). On a related note will also contain sample code to move from the sequential setup to a distributed Hadoop implementation. In terms of data the example is based on the StackOverflow dump published among others on kaggle. | ||||||||||
Big Data Architecture | 10 | 8 years ago | ||||||||
国外互联网公司大数据技术架构研究 | ||||||||||
Elastic Mapreduce | 8 | 13 years ago | apache-2.0 | Ruby | ||||||
Amazon's commandline client for EMR (Elastic Map-Reduce) invocation |
Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions (aka hooks) and zero or more "steps". A step is Amazon's name for a task or job submitted to the cluster. Lemur reads your jobdef, at the end of your jobdef, you execute (fire! ...) to make things happen. Also keep in mind that the jobdef is an interpreted clj file, so you can insert arbitrary Clojure code to be executed anywhere in the file (but see HOOKS below for a better way).
Lemur does not try to replace elastic-mapreduce. While there is some overlap, lemur is focused on launching. It provides no replacement for many common activities that you will find in elastic-mapreduce. For example, "elastic-mapreduce --list". I recommend that you install elastic-mapreduce along-side lemur (or rely on the AWS Console for those activities).
Lemur uses DefaultAWSCredentialsProviderChain to gather AWS credentials to access various AWS services.
v0.9.7 Clojure 1.2
v1.0.1+ Clojure 1.3
v1.4.0+ Clojure 1.5
I've used lemur on Mac OS X and Linux. It MAY work on Windows (if you use cygwin). If you try it on Windows, I would be interested in hearing about your experience (patches welcome).
The general command line format is:
bin/lemur <command> <jobdef-file> [options] [remaining]
bin/lemur help - display this help text
bin/lemur run ./jobdef.clj - Run a job on EMR
bin/lemur dry-run ./jobdef.clj - Dry-run, i.e. just print out what would be done
bin/lemur start ./jobdef.clj - Start an EMR cluster, but don't run the steps (jobs)
bin/lemur local ./jobdef.clj - Run the job using local hadoop (e.g. standalone mode)
bin/lemur submit ./jobdef.clj --jobflow j-123456789 - Submit steps to an existing jobflow (running cluster)
lemur run clj/wb-clj/scripts/launch/hrap-jobdef.clj --dataset ahps --num-days 10
lemur start clj/wb-clj/src/weatherbill/lemur/sample-jobdef.clj
Feedback and feature requests are welcome!