Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Elastic Mapreduce Ruby | 86 | 9 years ago | 8 | apache-2.0 | Ruby | |||||
Amazon's elastic mapreduce ruby client. Ruby 1.9.X compatible | ||||||||||
Lemur | 85 | 6 years ago | 8 | apache-2.0 | Clojure | |||||
Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions and zero or more "steps". | ||||||||||
Rail | 70 | 3 years ago | 26 | other | Python | |||||
Scalable RNA-seq analysis | ||||||||||
Social Graph Analysis | 56 | 12 years ago | other | Python | ||||||
Social Graph Analysis using Elastic MapReduce and PyPy | ||||||||||
Elasticrawl | 50 | 1 | 7 years ago | 10 | February 15, 2017 | 1 | mit | Ruby | ||
Launch AWS Elastic MapReduce jobs that process Common Crawl data. | ||||||||||
Terraform Aws Emr Cluster | 35 | 4 years ago | 3 | apache-2.0 | HCL | |||||
A Terraform module to create an Amazon Web Services (AWS) Elastic MapReduce (EMR) cluster. | ||||||||||
Cc Helloworld | 33 | 9 years ago | 1 | Java | ||||||
CommonCrawl Hello World example | ||||||||||
Emrio | 30 | 9 years ago | Python | |||||||
Elastic MapReduce instance optimizer | ||||||||||
Ceteri Mapred | 19 | 12 years ago | Python | |||||||
MapReduce examples | ||||||||||
Spark Emr | 17 | 10 years ago | 9 | Scala | ||||||
Spark Elastic MapReduce bootstrap and runnable examples. |
Elastic MapReduce instance optimizer
EMRio helps you save money on Elastic MapReduce by using your last two months of usage to estimate how many EC2 reserved instances you should buy for the next year.
Elastic MapReduce is a service provided by Amazon that makes it easy to use MapReduce. EMR run on machines called EC2 instances. They come in many different flavors from heavy memory usage to heavy CPU usage. When businesses start using EMR, they use these services as a pay-as-you-go service. After some time, the amount of instances you use can become stable. If you utilize enough instances over time, it might make sense to switch from the pay-as-you -go service, or On-Demand service, to a pay-upfront service, or Reserved Instances service.
How Reserved Instances work can be read here. If you think that switching to reserved instances is a good plan, but don't know how many to buy, that's what EMRio is for!
EMRio first looks at your EMR history. That data has a two month limit. It then acts as if the job flow was reoccurring for a year. It has to estimate a year's worth of data for Reserved Instances to be worth the cost. It then simulates different configurations using the job flow history and will produce the best pool of instances to buy.
First, download the source
Then, go to the root directory and run:
python setup.py
Once you have the dependencies installed, you need to set up your boto configuration file. Look at our boto config as an example. Once you fill in the AWS key information and region information, copy it to either /etc/boto. conf or ~/.boto
After you have the setup done,
emrio
This should take a minute or two to grab the information off S3, do a few simulations, and output the resultant optimized instance pool.
If you want to see instance usage over time (how many instances are running at the same time), you run::
emrio -g
After it calculates the same data, you will now see graphs of each instance- type's usage over time, like this::
Now, re-calculating the optimal instances is kind of pointless on the same data, so in order to save and load optimal instance configurations, use this:
emrio --cache=output.json
The format is json encoded, check out the tests folder where an example instance file can be found.
Which will save the results in output.txt, and load them like so:
emrio --optimized=output.json
If you want to see all the commands, try --help
.
emrio --help