Compute Hadoop Java Python Alternatives

Name: googlearchive/compute-hadoop-java-python
Brand: googlearchive/compute-hadoop-java-python
SKU: project/googlearchive/compute-hadoop-java-python
Rating: 4.43 (28 reviews)

This software demonstrates one way to create and manage a cluster of Hadoop nodes running on Google Compute Engine.

Categories > Companies > Google

Suggest Alternative

Stars

Alternatives

License

apache-2.0

Open Issues

Most Recent Commit

almost 11 years ago

Programming Language

Python

Dependent Repos

Dependent Packages

Total Releases

Categories

Programming Languages > Python

Companies > Google

Data Processing > Hadoop

Data Storage > Hdfs

Data Processing > Mapreduce

Data Storage > Google Storage

Repo

Alternatives To googlearchive/compute-hadoop-java-python

Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
GoogleCloudDataproc/hadoop-connectors	278	22	58	over 2 years ago	597	November 03, 2023	63	apache-2.0	Java
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
spotify/spydra	132	0	0	over 4 years ago	20	December 08, 2020	12	apache-2.0	Java
Ephemeral Hadoop clusters using Google Compute Platform
GoogleCloudDataproc/bdutil	114	0	0	over 6 years ago	0		32	apache-2.0	Shell
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop	81	0	0	over 8 years ago	0		8	apache-2.0	Python
This sample app will get up and running quickly with a Hadoop cluster on Google Compute Engine. For more information on running Hadoop on GCE, read the papers at https://cloud.google.com/resources/.
GoogleCloudPlatform/Data-Pipeline	79	0	0	over 12 years ago	0		2	apache-2.0	Python
Data pipeline is a tool to run Data loading pipelines. It is an open sourced app engine app that users can extend to suit their own needs. Out of the box it will load files from a source, transform them and then output them (output might be writing to a file or loading them into a data analysis tool). It is designed to be modular and support various sources, transformation technologies and output types. The transformations can be chained together to form complex pipelines.
googlearchive/compute-hadoop-java-python	28	0	0	almost 11 years ago	0		1	apache-2.0	Python
This software demonstrates one way to create and manage a cluster of Hadoop nodes running on Google Compute Engine.
GoogleCloudDataproc/hive-bigquery-storage-handler	19	0	0	about 3 years ago	0		8	apache-2.0	Java
Hive Storage Handler for interoperability between BigQuery and Apache Hive
googlearchive/solutions-apache-hive-and-pig-on-google-compute-engine	19	0	0	over 8 years ago	0		0	apache-2.0	Shell
This sample app will get up and running quickly with Hive and/or Pig on a Hadoop cluster on Google Compute Engine. For more information on running Hadoop on GCE, read the papers at https://cloud.google.com/resources/.
googleapis/nodejs-dataproc	15	1	0	about 3 years ago	39	May 18, 2022	0	apache-2.0
This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.
googleapis/java-dataproc	13	13	8	about 3 years ago	99	September 16, 2021	3	apache-2.0
This library has moved to https://github.com/googleapis/google-cloud-java/tree/main/java-dataproc.

Alternatives To googlearchive/compute-hadoop-java-python

Select To Compare

GoogleCloudDataproc/hadoop-connectors ⭐ 278

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

dependent packages 58 total releases 597 most recent commit over 2 years ago

spotify/spydra ⭐ 132

Ephemeral Hadoop clusters using Google Compute Platform

dependent packages 0 total releases 20 most recent commit over 4 years ago

GoogleCloudDataproc/bdutil ⭐ 114

[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine

dependent packages 0 total releases 0 most recent commit over 6 years ago

GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop ⭐ 81

This sample app will get up and running quickly with a Hadoop cluster on Google Compute Engine. For more information on running Hadoop on GCE, read the papers at https://cloud.google.com/resources/.

dependent packages 0 total releases 0 most recent commit over 8 years ago

GoogleCloudPlatform/Data-Pipeline ⭐ 79

Data pipeline is a tool to run Data loading pipelines. It is an open sourced app engine app that users can extend to suit their own needs. Out of the box it will load files from a source, transform them and then output them (output might be writing to a file or loading them into a data analysis tool). It is designed to be modular and support various sources, transformation technologies and output types. The transformations can be chained together to form complex pipelines.

dependent packages 0 total releases 0 most recent commit over 12 years ago

googlearchive/compute-hadoop-java-python ⭐ 28

This software demonstrates one way to create and manage a cluster of Hadoop nodes running on Google Compute Engine.

dependent packages 0 total releases 0 most recent commit almost 11 years ago

GoogleCloudDataproc/hive-bigquery-storage-handler ⭐ 19

Hive Storage Handler for interoperability between BigQuery and Apache Hive

dependent packages 0 total releases 0 most recent commit about 3 years ago

googlearchive/solutions-apache-hive-and-pig-on-google-compute-engine ⭐ 19

This sample app will get up and running quickly with Hive and/or Pig on a Hadoop cluster on Google Compute Engine. For more information on running Hadoop on GCE, read the papers at https://cloud.google.com/resources/.

dependent packages 0 total releases 0 most recent commit over 8 years ago

googleapis/nodejs-dataproc ⭐ 15

This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.

dependent packages 0 total releases 39 most recent commit about 3 years ago downloads badge

googleapis/java-dataproc ⭐ 13

This library has moved to https://github.com/googleapis/google-cloud-java/tree/main/java-dataproc.

dependent packages 8 total releases 99 most recent commit about 3 years ago

Suggest An Alternative To compute-hadoop-java-python

Alternative Project Comparisons

googlearchive/compute-hadoop-java-python vs Hadoop Connectors

googlearchive/compute-hadoop-java-python vs Spydra

googlearchive/compute-hadoop-java-python vs Bdutil

googlearchive/compute-hadoop-java-python vs Solutions Google Compute Engine Cluster For Hadoop

googlearchive/compute-hadoop-java-python vs Data Pipeline

googlearchive/compute-hadoop-java-python vs Compute Hadoop Java Python

googlearchive/compute-hadoop-java-python vs Hive Bigquery Storage Handler

googlearchive/compute-hadoop-java-python vs Solutions Apache Hive And Pig On Google Compute Engine

googlearchive/compute-hadoop-java-python vs Nodejs Dataproc

googlearchive/compute-hadoop-java-python vs Java Dataproc

Popular Google Projects

google/material-design-icons⭐ 49,227

Material Design icons by Google

google/guava⭐ 48,993

Google core libraries for Java

Asabeneh/30-Days-Of-JavaScript⭐ 39,974

30 days of JavaScript programming challenge is a step-by-step guide to learn JavaScript programming language in 30 days. This challenge may take more than 100 days, please just follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw

danny-avila/LibreChat⭐ 38,686

Enhanced ChatGPT Clone: Features Agents, MCP, Skills, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active

google/styleguide⭐ 36,134

Style guides for Google-originated open-source projects

Popular Hadoop Projects

apache/spark⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

donnemartin/data-science-ipython-notebooks⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

dmlc/xgboost⭐ 25,253

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

spotify/luigi⭐ 17,046

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Tencent/APIJSON⭐ 16,277

🏆 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码，前端(客户端) 定制返回 JSON 的数据和结构。 🏆 A JSON Transmission Protocol and an ORM Library 🚀 provides APIs and Docs without writing any code.

Popular Companies Categories

Google

Microsoft

Amazon

Apple

Intel

Oracle

Nvidia

Ibm

Netlify

Elastic