[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
Alternatives To Bdutil
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Spark Bigquery149
3 years ago6November 29, 201734apache-2.0Scala
Google BigQuery support for Spark, SQL, and DataFrames
4 years ago32apache-2.0Shell
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
Laravel Spark Google2fa82
3 years ago15December 14, 20191mitPHP
Google Authenticator support for Laravel Spark
Token Dispenser56
4 years ago9gpl-2.0Java
Stores email-password pairs, gives out Google Play Store tokens
Spark Google Spreadsheets46
3 years ago10August 21, 20197apache-2.0Scala
Google Spreadsheets datasource for SparkSQL and DataFrames
8 years ago1apache-2.0Python
Spark GCE Script Helps you deploy Spark cluster on Google Cloud.
Google Api Client Codeigniter Spark36
10 years ago1apache-2.0PHP
A carbon copy of the Google distributed PHP API Client, made available to the Sparks repository with some integration tips
Spark Google Analytics33
6 years ago8June 21, 20175apache-2.0Scala
A Spark package for retrieving data from Google Analytics
Spark Streaming With Google Cloud Example19
6 years agoapache-2.0Scala
an example of integrating Spark Streaming with Google Pub/Sub and Google Datastore
a year ago3gpl-3.0R
Sparklyr extension package to connect to Google BigQuery
Alternatives To Bdutil
Select To Compare

Alternative Project Comparisons

This project has been deprecated. Please use Google Cloud Dataproc to create managed Apache Hadoop and Apache Spark instances on Google Compute Engine.


bdutil is a command-line script used to manage Apache Hadoop and Apache Spark instances on Google Compute Engine. bdutil manages deployment, configuration, and shutdown of your Hadoop instances.


bdutil depends on the Google Cloud SDK. bdutil is supported in any posix-compliant Bash v3 or greater shell.


See the QUICKSTART file in the docs directory to learn how to set up your Hadoop instances using bdutil.

  1. Install and configure the Google Cloud SDK if you have already not done so
  2. Clone this repository with git clone
  3. Modify the following variables in the file:
  4. PROJECT - Set to the project ID for all bdutil commands. The project value will be overridden in the following order (where 1 overrides 2, and 2 overrides 3): * -p flag value, or if not specified then * PROJECT value in, or if not specified then * gcloud default project value
  5. CONFIGBUCKET - Set to a Google Compute Storage bucket that your project has read/write access to.
  6. Run bdutil --help for a list of commands.

The script implements the following commands, which are very similar:

  • bdutil create creates and starts instances, but will not apply most configuration settings. You can call bdutil run_command_steps on instances afterward to apply configuration settings to them. Typically you wouldn't use this, but would use bdutil deploy instead.
  • bdutil deploy creates and starts instances with all the configuration options specified in the command line and any included configuration scripts.

Components installed

The latest release of bdutil is 1.3.5. This bdutil release installs the following versions of open source components:

  • Apache Hadoop - 1.2.1 (2.7.1 if you use the -e argument)
  • Apache Spark - 1.5.0
  • Apache Pig - 0.12
  • Apache Hive - 1.2.1


The following documentation is useful for bdutil.

  • Quickstart - A guide on how to get started with bdutil quickly.
  • Jobs - How to submit jobs (work) to a bdutil cluster.
  • Monitoring - How to monitor bdutil cluster.
  • Shutdown - How shutdown a bdutil cluster.
Popular Google Projects
Popular Spark Projects
Popular Companies Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.