Hadoop Connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Alternatives To Hadoop Connectors
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Rclone38,652212 days ago254September 15, 2022897mitGo
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files
Analytics14,972
a day ago40agpl-3.0Elixir
Simple, open-source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.
Awesome Kubernetes13,893
14 days ago9otherShell
A curated list for awesome kubernetes sources :ship::tada:
Xg2xg12,414
8 days ago36
by ex-googlers, for ex-googlers - a lookup table of similar tech & services
Functions Samples11,675
a day ago143apache-2.0JavaScript
Collection of sample apps showcasing popular use cases using Cloud Functions for Firebase
Infracost9,117
5 days ago129August 30, 2022138apache-2.0Go
Cloud cost estimates for Terraform in pull requests💰📉 Love your cloud bill!
Training Data Analyst6,795
6 days ago325apache-2.0Jupyter Notebook
Labs and demos for courses for GCP Training (http://cloud.google.com/training).
Python Docs Samples6,320
a day ago2May 24, 2021106apache-2.0Jupyter Notebook
Code samples used on cloud.google.com
Google Cloud Python4,21243877a day ago20July 30, 2018234apache-2.0Python
Google Cloud Client Library for Python
Apps Script Samples3,893
a month ago36apache-2.0JavaScript
Apps Script samples for Google Workspace products.
Alternatives To Hadoop Connectors
Select To Compare


Alternative Project Comparisons
Readme

Apache Hadoop Connectors

GitHub release GitHub release date codecov

Libraries and tools for interoperability between Apache Hadoop related open-source software and Google Cloud Platform.

Google Cloud Storage connector for Apache Hadoop (HCFS)

Maven Central Maven Central Maven Central

The Google Cloud Storage connector for Hadoop enables running MapReduce jobs directly on data in Google Cloud Storage by implementing the Hadoop FileSystem interface. For details, see the README.

Building the Cloud Storage connector

Note that build requires Java 11+ and fails with older Java versions.

To build the connector for specific Hadoop version, run the following commands from the main directory:

./mvnw clean package

In order to verify test coverage for specific Hadoop version, run the following commands from the main directory:

./mvnw -P coverage clean verify

The Cloud Storage connector JAR can be found in gcs/target/ directory.

Adding the Cloud Storage connector to your build

Maven group ID is com.google.cloud.bigdataoss and artifact ID for Cloud Storage connector is gcs-connector.

To add a dependency on Cloud Storage connector using Maven, use the following:

<dependency>
  <groupId>com.google.cloud.bigdataoss</groupId>
  <artifactId>gcs-connector</artifactId>
  <version>hadoop3-2.2.10</version>
</dependency>

Resources

On Stack Overflow, use the tag google-cloud-dataproc for questions about the connectors in this repository. This tag receives responses from the Stack Overflow community and Google engineers, who monitor the tag and offer unofficial support.

Popular Google Projects
Popular Cloud Computing Projects
Popular Companies Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Google
Cloud Computing
Apache
Hadoop
Mapreduce
Bigquery