Elastic MapReduce instance optimizer
Alternatives To Emrio
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Elastic Mapreduce Ruby86
9 years ago8apache-2.0Ruby
Amazon's elastic mapreduce ruby client. Ruby 1.9.X compatible
6 years ago8apache-2.0Clojure
Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions and zero or more "steps".
3 years ago26otherPython
Scalable RNA-seq analysis
Social Graph Analysis56
12 years agootherPython
Social Graph Analysis using Elastic MapReduce and PyPy
17 years ago10February 15, 20171mitRuby
Launch AWS Elastic MapReduce jobs that process Common Crawl data.
Terraform Aws Emr Cluster35
4 years ago3apache-2.0HCL
A Terraform module to create an Amazon Web Services (AWS) Elastic MapReduce (EMR) cluster.
Cc Helloworld33
9 years ago1Java
CommonCrawl Hello World example
9 years agoPython
Elastic MapReduce instance optimizer
Ceteri Mapred19
12 years agoPython
MapReduce examples
Spark Emr17
10 years ago9Scala
Spark Elastic MapReduce bootstrap and runnable examples.
Alternatives To Emrio
Select To Compare

Alternative Project Comparisons


Elastic MapReduce instance optimizer

EMRio helps you save money on Elastic MapReduce by using your last two months of usage to estimate how many EC2 reserved instances you should buy for the next year.


Elastic MapReduce is a service provided by Amazon that makes it easy to use MapReduce. EMR run on machines called EC2 instances. They come in many different flavors from heavy memory usage to heavy CPU usage. When businesses start using EMR, they use these services as a pay-as-you-go service. After some time, the amount of instances you use can become stable. If you utilize enough instances over time, it might make sense to switch from the pay-as-you -go service, or On-Demand service, to a pay-upfront service, or Reserved Instances service.

How Reserved Instances work can be read here. If you think that switching to reserved instances is a good plan, but don't know how many to buy, that's what EMRio is for!

How It Works

EMRio first looks at your EMR history. That data has a two month limit. It then acts as if the job flow was reoccurring for a year. It has to estimate a year's worth of data for Reserved Instances to be worth the cost. It then simulates different configurations using the job flow history and will produce the best pool of instances to buy.


  • boto
  • tzinfo
  • matplotlib

Installation and Setup

First, download the source

Then, go to the root directory and run:

python setup.py

Once you have the dependencies installed, you need to set up your boto configuration file. Look at our boto config as an example. Once you fill in the AWS key information and region information, copy it to either /etc/boto. conf or ~/.boto


After you have the setup done,


This should take a minute or two to grab the information off S3, do a few simulations, and output the resultant optimized instance pool.

If you want to see instance usage over time (how many instances are running at the same time), you run::

emrio -g

After it calculates the same data, you will now see graphs of each instance- type's usage over time, like this::

picture Graph

Now, re-calculating the optimal instances is kind of pointless on the same data, so in order to save and load optimal instance configurations, use this:

emrio --cache=output.json

The format is json encoded, check out the tests folder where an example instance file can be found.

Which will save the results in output.txt, and load them like so:

emrio --optimized=output.json

If you want to see all the commands, try --help.

emrio --help
Popular Mapreduce Projects
Popular Elastic Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.