Java Dataproc

Alternatives To Java Dataproc
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Hadoop Connectors26722462 days ago578December 12, 202250apache-2.0Java
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Spydra132
a year ago20December 08, 202012apache-2.0Java
Ephemeral Hadoop clusters using Google Compute Platform
Bdutil114
4 years ago32apache-2.0Shell
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
Solutions Google Compute Engine Cluster For Hadoop81
5 years ago8apache-2.0Python
This sample app will get up and running quickly with a Hadoop cluster on Google Compute Engine. For more information on running Hadoop on GCE, read the papers at https://cloud.google.com/resources/.
Data Pipeline79
9 years ago2apache-2.0Python
Data pipeline is a tool to run Data loading pipelines. It is an open sourced app engine app that users can extend to suit their own needs. Out of the box it will load files from a source, transform them and then output them (output might be writing to a file or loading them into a data analysis tool). It is designed to be modular and support various sources, transformation technologies and output types. The transformations can be chained together to form complex pipelines.
Compute Hadoop Java Python28
8 years ago1apache-2.0Python
This software demonstrates one way to create and manage a cluster of Hadoop nodes running on Google Compute Engine.
Solutions Apache Hive And Pig On Google Compute Engine19
5 years agoapache-2.0Shell
This sample app will get up and running quickly with Hive and/or Pig on a Hadoop cluster on Google Compute Engine. For more information on running Hadoop on GCE, read the papers at https://cloud.google.com/resources/.
Hive Bigquery Storage Handler16
a year ago8apache-2.0Java
Hive Storage Handler for interoperability between BigQuery and Apache Hive
Nodejs Dataproc14
17 months ago39May 18, 202210apache-2.0JavaScript
Google Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning.
Java Dataproc13792 days ago137May 25, 20222apache-2.0Java
Alternatives To Java Dataproc
Select To Compare


Alternative Project Comparisons
Readme

Google Dataproc Client for Java

Java idiomatic client for Dataproc.

Maven Stability

🚌 In October 2022, this library has moved to google-cloud-java/java-dataproc. This repository will be archived in the future. Future releases will appear in the new repository (https://github.com/googleapis/google-cloud-java/releases). The Maven artifact coordinates (com.google.cloud:google-cloud-dataproc) remain the same.

Quickstart

If you are using Maven with BOM, add this to your pom.xml file:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.1.3</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-dataproc</artifactId>
  </dependency>
</dependencies>

If you are using Maven without BOM, add this to your dependencies:

<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-dataproc</artifactId>
  <version>4.0.8</version>
</dependency>

If you are using Gradle 5.x or later, add this to your dependencies:

implementation platform('com.google.cloud:libraries-bom:26.1.4')

implementation 'com.google.cloud:google-cloud-dataproc'

If you are using Gradle without BOM, add this to your dependencies:

implementation 'com.google.cloud:google-cloud-dataproc:4.2.0'

If you are using SBT, add this to your dependencies:

libraryDependencies += "com.google.cloud" % "google-cloud-dataproc" % "4.2.0"

Authentication

See the Authentication section in the base directory's README.

Authorization

The client application making API calls must be granted authorization scopes required for the desired Dataproc APIs, and the authenticated principal must have the IAM role(s) required to access GCP resources using the Dataproc API calls.

Getting Started

Prerequisites

You will need a Google Cloud Platform Console project with the Dataproc API enabled. You will need to enable billing to use Google Dataproc. Follow these instructions to get your project set up. You will also need to set up the local development environment by installing the Google Cloud SDK and running the following commands in command line: gcloud auth login and gcloud config set project [YOUR PROJECT ID].

Installation and setup

You'll need to obtain the google-cloud-dataproc library. See the Quickstart section to add google-cloud-dataproc as a dependency in your code.

About Dataproc

Dataproc is a faster, easier, more cost-effective way to run Apache Spark and Apache Hadoop.

See the Dataproc client library docs to learn how to use this Dataproc Client Library.

Samples

Samples are in the samples/ directory.

Sample Source Code Try it
Create Cluster source code Open in Cloud Shell
Create Cluster With Autoscaling source code Open in Cloud Shell
Instantiate Inline Workflow Template source code Open in Cloud Shell
Quickstart source code Open in Cloud Shell
Submit Hadoop Fs Job source code Open in Cloud Shell
Submit Job source code Open in Cloud Shell

Troubleshooting

To get help, follow the instructions in the shared Troubleshooting document.

Transport

Dataproc uses gRPC for the transport layer.

Supported Java Versions

Java 8 or above is required for using this client.

Google's Java client libraries, Google Cloud Client Libraries and Google Cloud API Libraries, follow the Oracle Java SE support roadmap (see the Oracle Java SE Product Releases section).

For new development

In general, new feature development occurs with support for the lowest Java LTS version covered by Oracle's Premier Support (which typically lasts 5 years from initial General Availability). If the minimum required JVM for a given library is changed, it is accompanied by a semver major release.

Java 11 and (in September 2021) Java 17 are the best choices for new development.

Keeping production systems current

Google tests its client libraries with all current LTS versions covered by Oracle's Extended Support (which typically lasts 8 years from initial General Availability).

Legacy support

Google's client libraries support legacy versions of Java runtimes with long term stable libraries that don't receive feature updates on a best efforts basis as it may not be possible to backport all patches.

Google provides updates on a best efforts basis to apps that continue to use Java 7, though apps might need to upgrade to current versions of the library that supports their JVM.

Where to find specific information

The latest versions and the supported Java versions are identified on the individual GitHub repository github.com/GoogleAPIs/java-SERVICENAME and on google-cloud-java.

Versioning

This library follows Semantic Versioning.

Contributing

Contributions to this library are always welcome and highly encouraged.

See CONTRIBUTING for more information how to get started.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See Code of Conduct for more information.

License

Apache 2.0 - See LICENSE for more information.

CI Status

Java Version Status
Java 8 Kokoro CI
Java 8 OSX Kokoro CI
Java 8 Windows Kokoro CI
Java 11 Kokoro CI

Java is a registered trademark of Oracle and/or its affiliates.

Popular Google Projects
Popular Hadoop Projects
Popular Companies Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Google
Hadoop