Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Spark | 36,850 | 2,394 | 903 | 8 hours ago | 46 | May 09, 2021 | 244 | apache-2.0 | Scala | |
Apache Spark - A unified analytics engine for large-scale data processing | ||||||||||
Redash | 23,904 | 3 | 11 hours ago | 2 | May 05, 2020 | 591 | bsd-2-clause | Python | ||
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data. | ||||||||||
Doris | 9,630 | 9 hours ago | 5 | July 20, 2023 | 2,067 | apache-2.0 | Java | |||
Apache Doris is an easy-to-use, high performance and unified analytics database. | ||||||||||
Mage Ai | 5,590 | 8 hours ago | 278 | August 08, 2023 | 138 | apache-2.0 | Python | |||
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data. | ||||||||||
Sqlglot | 3,806 | 45 | 8 hours ago | 401 | August 14, 2023 | 3 | mit | Python | ||
Python SQL Parser and Transpiler | ||||||||||
Ibis | 3,164 | 24 | 24 | 9 hours ago | 48 | August 13, 2023 | 102 | apache-2.0 | Python | |
The flexibility of Python with the scale and performance of modern SQL. | ||||||||||
Linkis | 3,136 | 38 | 2 days ago | 3 | July 29, 2023 | 228 | apache-2.0 | Java | ||
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines. | ||||||||||
Quicksql | 1,939 | a year ago | 84 | mit | Java | |||||
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources | ||||||||||
Sql Generator | 1,923 | a year ago | 1 | May 18, 2022 | 1 | apache-2.0 | Vue | |||
🔨 用 JSON 来生成结构化的 SQL 语句,基于 Vue3 + TypeScript + Vite + Ant Design + MonacoEditor 实现,项目简单(重逻辑轻页面)、适合练手~ | ||||||||||
Fugue | 1,732 | 18 | 15 hours ago | 120 | August 20, 2023 | 44 | apache-2.0 | Python | ||
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites. |
THIS PROJECT IS IN MAINTENANCE MODE DUE TO THE FACT THAT IT'S NOT WIDELY USED WITHIN SPOTIFY. WE'LL PROVIDE BEST EFFORT SUPPORT FOR ISSUES AND PULL REQUESTS BUT DO EXPECT DELAY IN RESPONSES.
Google BigQuery support for Spark, SQL, and DataFrames.
spark-bigquery version | Spark version | Comment |
---|---|---|
0.2.x | 2.x.y | Active development |
0.1.x | 1.x.y | Development halted |
To use the package in a Google Cloud Dataproc cluster:
install org.apache.avro_avro-ipc-1.7.7.jar
to ~/.ivy2/jars
spark-shell --packages com.spotify:spark-bigquery_2.10:0.2.2
To use it in a local SBT console:
import com.spotify.spark.bigquery._
// Set up GCP credentials
sqlContext.setGcpJsonKeyFile("<JSON_KEY_FILE>")
// Set up BigQuery project and bucket
sqlContext.setBigQueryProjectId("<BILLING_PROJECT>")
sqlContext.setBigQueryGcsBucket("<GCS_BUCKET>")
// Set up BigQuery dataset location, default is US
sqlContext.setBigQueryDatasetLocation("<DATASET_LOCATION>")
Usage:
// Load everything from a table
val table = sqlContext.bigQueryTable("bigquery-public-data:samples.shakespeare")
// Load results from a SQL query
// Only legacy SQL dialect is supported for now
val df = sqlContext.bigQuerySelect(
"SELECT word, word_count FROM [bigquery-public-data:samples.shakespeare]")
// Save data to a table
df.saveAsBigQueryTable("my-project:my_dataset.my_table")
If you'd like to write nested records to BigQuery, be sure to specify an Avro Namespace.
BigQuery is unable to load Avro Namespaces with a leading dot (.nestedColumn
) on nested records.
// BigQuery is able to load fields with namespace 'myNamespace.nestedColumn'
df.saveAsBigQueryTable("my-project:my_dataset.my_table", tmpWriteOptions = Map("recordNamespace" -> "myNamespace"))
See also Loading Avro Data from Google Cloud Storage for data type mappings and limitations. For example loading arrays of arrays is not supported.
Copyright 2016 Spotify AB.
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0