Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Spark | 35,945 | 2,394 | 882 | 17 hours ago | 46 | May 09, 2021 | 274 | apache-2.0 | Scala | |
Apache Spark - A unified analytics engine for large-scale data processing | ||||||||||
Data Science Ipython Notebooks | 25,025 | a month ago | 33 | other | Python | |||||
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. | ||||||||||
Bigdata Notes | 13,291 | 4 months ago | 33 | Java | ||||||
大数据入门指南 :star: | ||||||||||
Deeplearning4j | 12,970 | 38 | 21 | 19 hours ago | 15 | January 27, 2017 | 614 | apache-2.0 | Java | |
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation. | ||||||||||
Cookbook | 11,769 | 2 months ago | 110 | apache-2.0 | ||||||
The Data Engineering Cookbook | ||||||||||
It_book | 8,543 | 2 years ago | 7 | |||||||
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行业大多数书籍和面试经验题目等等。有人工智能系列(常用深度学习框架TensorFlow、pytorch、keras。NLP、机器学习,深度学习等等),大数据系列(Spark,Hadoop,Scala,kafka等),程序员必修系列(C、C++、java、数据结构、linux,设计模式、数据库等等) | ||||||||||
Doris | 8,467 | 17 hours ago | 1,716 | apache-2.0 | Java | |||||
Apache Doris is an easy-to-use, high performance and unified analytics database. | ||||||||||
God Of Bigdata | 7,992 | 2 months ago | 2 | |||||||
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive... | ||||||||||
H2o 3 | 6,304 | 18 | 30 | 18 hours ago | 232 | September 19, 2022 | 2,692 | apache-2.0 | Jupyter Notebook | |
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. | ||||||||||
Alluxio | 6,263 | 31 | 45 | 18 hours ago | 54 | August 05, 2022 | 850 | apache-2.0 | Java | |
Alluxio, data orchestration for analytics and machine learning in the cloud |
THIS REPOSITORY IS DEPRECATED. ALL OF ITS CONTENT AND HISTORY HAS BEEN MOVED TO GOOGLE-CLOUD-NODE
Google Cloud Dataproc API client for Node.js
A comprehensive list of changes in each version may be found in the CHANGELOG.
Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained.
Table of contents:
npm install @google-cloud/dataproc
// This quickstart sample walks a user through creating a Dataproc
// cluster, submitting a PySpark job from Google Cloud Storage to the
// cluster, reading the output of the job and deleting the cluster, all
// using the Node.js client library.
'use strict';
function main(projectId, region, clusterName, jobFilePath) {
const dataproc = require('@google-cloud/dataproc');
const {Storage} = require('@google-cloud/storage');
// Create a cluster client with the endpoint set to the desired cluster region
const clusterClient = new dataproc.v1.ClusterControllerClient({
apiEndpoint: `${region}-dataproc.googleapis.com`,
projectId: projectId,
});
// Create a job client with the endpoint set to the desired cluster region
const jobClient = new dataproc.v1.JobControllerClient({
apiEndpoint: `${region}-dataproc.googleapis.com`,
projectId: projectId,
});
async function quickstart() {
// Create the cluster config
const cluster = {
projectId: projectId,
region: region,
cluster: {
clusterName: clusterName,
config: {
masterConfig: {
numInstances: 1,
machineTypeUri: 'n1-standard-2',
},
workerConfig: {
numInstances: 2,
machineTypeUri: 'n1-standard-2',
},
},
},
};
// Create the cluster
const [operation] = await clusterClient.createCluster(cluster);
const [response] = await operation.promise();
// Output a success message
console.log(`Cluster created successfully: ${response.clusterName}`);
const job = {
projectId: projectId,
region: region,
job: {
placement: {
clusterName: clusterName,
},
pysparkJob: {
mainPythonFileUri: jobFilePath,
},
},
};
const [jobOperation] = await jobClient.submitJobAsOperation(job);
const [jobResponse] = await jobOperation.promise();
const matches =
jobResponse.driverOutputResourceUri.match('gs://(.*?)/(.*)');
const storage = new Storage();
const output = await storage
.bucket(matches[1])
.file(`${matches[2]}.000000000`)
.download();
// Output a success message.
console.log(`Job finished successfully: ${output}`);
// Delete the cluster once the job has terminated.
const deleteClusterReq = {
projectId: projectId,
region: region,
clusterName: clusterName,
};
const [deleteOperation] = await clusterClient.deleteCluster(
deleteClusterReq
);
await deleteOperation.promise();
// Output a success message
console.log(`Cluster ${clusterName} successfully deleted.`);
}
quickstart();
}
const args = process.argv.slice(2);
if (args.length !== 4) {
console.log(
'Insufficient number of parameters provided. Please make sure a ' +
'PROJECT_ID, REGION, CLUSTER_NAME and JOB_FILE_PATH are provided, in this order.'
);
}
main(...args);
Samples are in the samples/
directory. Each sample's README.md
has instructions for running its sample.
Sample | Source Code | Try it |
---|---|---|
Create Cluster | source code | ![]() |
Instantiate an inline workflow template | source code | ![]() |
Quickstart | source code | ![]() |
Submit Job | source code | ![]() |
The Google Cloud Dataproc Node.js Client API Reference documentation also contains samples.
Our client libraries follow the Node.js release schedule. Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.
Google's client libraries support legacy versions of Node.js runtimes on a best-efforts basis with the following warnings:
Client libraries targeting some end-of-life versions of Node.js are available, and
can be installed through npm dist-tags.
The dist-tags follow the naming convention legacy-(version)
.
For example, npm install @google-cloud/[email protected]
installs client libraries
for versions compatible with Node.js 8.
This library follows Semantic Versioning.
This library is considered to be stable. The code surface will not change in backwards-incompatible ways unless absolutely necessary (e.g. because of critical security issues) or with an extensive deprecation period. Issues and requests against stable libraries are addressed with the highest priority.
More Information: Google Cloud Platform Launch Stages
Contributions welcome! See the Contributing Guide.
Please note that this README.md
, the samples/README.md
,
and a variety of configuration files in this repository (including .nycrc
and tsconfig.json
)
are generated from a central template. To edit one of these files, make an edit
to its templates in
directory.
Apache Version 2.0
See LICENSE