Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Cloudquery | 5,136 | 7 | 15 hours ago | 345 | May 22, 2023 | 330 | mpl-2.0 | Go | ||
The open source high performance data integration platform built for developers. | ||||||||||
Google Cloud Python | 4,341 | 123 | 18 hours ago | 38 | August 03, 2023 | 143 | apache-2.0 | Python | ||
Google Cloud Client Library for Python | ||||||||||
Apprtc | 4,001 | a month ago | 5 | September 15, 2020 | 124 | bsd-3-clause | JavaScript | |||
appr.tc has been shutdown. Please use the Dockerfile to run your own test/dev instance. | ||||||||||
Awesome Gcp Certifications | 3,420 | 3 months ago | mit | |||||||
Google Cloud Platform Certification resources. | ||||||||||
Google Cloud Node | 2,655 | 321 | 172 | 2 days ago | 73 | August 10, 2023 | 94 | apache-2.0 | TypeScript | |
Google Cloud Client Library for Node.js | ||||||||||
Professional Services | 2,572 | 4 days ago | 50 | apache-2.0 | Python | |||||
Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product. | ||||||||||
Google Cloud Java | 1,772 | 139 | 19 | 15 hours ago | 201 | August 08, 2023 | 72 | apache-2.0 | Java | |
Google Cloud Client Library for Java | ||||||||||
Google Cloud Ruby | 1,293 | 71 | 4 hours ago | 12 | September 12, 2023 | 250 | apache-2.0 | Ruby | ||
Google Cloud Client Library for Ruby | ||||||||||
Google Cloud Php | 1,001 | 191 | 58 | 5 hours ago | 220 | June 23, 2022 | 102 | apache-2.0 | PHP | |
Google Cloud Client Library for PHP | ||||||||||
Google Cloud Dotnet | 872 | 44 | 114 | 18 hours ago | 24 | June 07, 2022 | 10 | apache-2.0 | C# | |
Google Cloud Client Libraries for .NET |
sparkbq is a sparklyr extension package providing an integration with Google BigQuery. It builds on top of spark-bigquery, which provides a Google BigQuery data source to Apache Spark.
You can install the released version of sparkbq from CRAN via
install.packages("sparkbq")
or the latest development version through
devtools::install_github("miraisolutions/sparkbq", ref = "develop")
The following table provides an overview over supported versions of Apache Spark, Scala, and Google Dataproc:
sparkbq | spark-bigquery | Apache Spark | Scala | Google Dataproc |
---|---|---|---|---|
0.1.x | 0.1.0 | 2.2.x and 2.3.x | 2.11 | 1.2.x and 1.3.x |
sparkbq is based on the Spark package spark-bigquery which is available in a separate GitHub repository.
library(sparklyr)
library(sparkbq)
library(dplyr)
config <- spark_config()
sc <- spark_connect(master = "local[*]", config = config)
# Set Google BigQuery default settings
bigquery_defaults(
billingProjectId = "<your_billing_project_id>",
gcsBucket = "<your_gcs_bucket>",
datasetLocation = "US",
serviceAccountKeyFile = "<your_service_account_key_file>",
type = "direct"
)
# Reading the public shakespeare data table
# https://cloud.google.com/bigquery/public-data/
# https://cloud.google.com/bigquery/sample-tables
hamlet <-
spark_read_bigquery(
sc,
name = "hamlet",
projectId = "bigquery-public-data",
datasetId = "samples",
tableId = "shakespeare") %>%
filter(corpus == "hamlet") # NOTE: predicate pushdown to BigQuery!
# Retrieve results into a local tibble
hamlet %>% collect()
# Write result into "mysamples" dataset in our BigQuery (billing) project
spark_write_bigquery(
hamlet,
datasetId = "mysamples",
tableId = "hamlet",
mode = "overwrite")
When running outside of Google Cloud it is necessary to specify a service account JSON key file. Information on how to generate service account credentials can be found at https://cloud.google.com/storage/docs/authentication#service_accounts. The service account key file can either be passed as parameter serviceAccountKeyFile
to bigquery_defaults
or directly to spark_read_bigquery
and spark_write_bigquery
. Alternatively, an environment variable export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account_keyfile.json
can be set (see https://cloud.google.com/docs/authentication/getting-started for more information). When running on Google Cloud, e.g. Google Cloud Dataproc, application default credentials (ADC) may be used in which case it is not necessary to specify a service account key file.