Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Spark Bigquery | 149 | 4 years ago | 6 | November 29, 2017 | 34 | apache-2.0 | Scala | |||
Google BigQuery support for Spark, SQL, and DataFrames | ||||||||||
Bdutil | 114 | 4 years ago | 32 | apache-2.0 | Shell | |||||
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine | ||||||||||
Laravel Spark Google2fa | 82 | 4 years ago | 15 | December 14, 2019 | 1 | mit | PHP | |||
Google Authenticator support for Laravel Spark | ||||||||||
Token Dispenser | 56 | 4 years ago | 9 | gpl-2.0 | Java | |||||
Stores email-password pairs, gives out Google Play Store tokens | ||||||||||
Spark Google Spreadsheets | 46 | 4 years ago | 10 | August 21, 2019 | 7 | apache-2.0 | Scala | |||
Google Spreadsheets datasource for SparkSQL and DataFrames | ||||||||||
Spark_gce | 45 | 8 years ago | 1 | apache-2.0 | Python | |||||
Spark GCE Script Helps you deploy Spark cluster on Google Cloud. | ||||||||||
Google Api Client Codeigniter Spark | 36 | 11 years ago | 1 | apache-2.0 | PHP | |||||
A carbon copy of the Google distributed PHP API Client, made available to the Sparks repository with some integration tips | ||||||||||
Spark Google Analytics | 33 | 6 years ago | 8 | June 21, 2017 | 5 | apache-2.0 | Scala | |||
A Spark package for retrieving data from Google Analytics | ||||||||||
Spree_analytics_trackers | 19 | 2 | 2 months ago | 7 | February 16, 2021 | 9 | bsd-3-clause | Ruby | ||
Integrate your Spree application with Google Analytics and Segment.com | ||||||||||
Spark Streaming With Google Cloud Example | 19 | 7 years ago | apache-2.0 | Scala | ||||||
an example of integrating Spark Streaming with Google Pub/Sub and Google Datastore |
A library for querying Google Analytics data with Apache Spark, for Spark SQL and DataFrames.
This library requires Spark 1.4+
You can link against this library in your program at the following coordinates:
groupId: com.crealytics
artifactId: spark-google-analytics_2.10
version: 1.1.2
groupId: com.crealytics
artifactId: spark-google-analytics_2.11
version: 1.1.2
This package can be added to Spark using the --packages
command line option. For example, to include it when starting the spark shell:
$SPARK_HOME/bin/spark-shell --packages com.crealytics:spark-google-analytics_2.11:1.1.2
$SPARK_HOME/bin/spark-shell --packages com.crealytics:spark-google-analytics_2.10:1.1.2
This package allows querying Google Analytics reports as Spark DataFrames. The API accepts several options (see the Google Analytics developer docs for details):
serviceAccountId
: an account id for accessing the Google Analytics API ([email protected]
)keyFileLocation
: a key-file that you have to generate from the developer consoleclientId
: an account id that you have to generate from the developer console using OAuth 2.0 credentials optionclientSecret
: a client secret id that you have to obtain from the developer console for OAuth 2.0 credentials client id which you have already generatedrefreshToken
: a refresh token is need to be obtained by User's Login for which you wanted to collect GA data. Once user login from appropriate call you will get this token in response. See OAuth2WebServer Offline for more informationids
: the ID of the site for which you want to pull the datastartDate
: the start date for the reportendDate
: the end date for the reportqueryIndividualDays
: fetches each day from the chosen date range individually in order to minimize sampling (only works if date
is chosen as dimension)calculatedMetrics
: the suffixes of any calculated metrics (defined in your GA view) you want to querySpark 1.4+:
import org.apache.spark.sql.SQLContext
Option 1 : Authentication with Service Account ID and P12 Key File
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.crealytics.google.analytics")
.option("serviceAccountId", "[email protected]")
.option("keyFileLocation", "the_key_file.p12")
.option("ids", "ga:12345678")
.option("startDate", "7daysAgo")
.option("endDate", "yesterday")
.option("queryIndividualDays", "true")
.option("calculatedMetrics", "averageEngagement")
.load()
// You need select the date column if using queryIndividualDays
df.select("date", "browser", "city", "users", "calcMetric_averageEngagement").show()
OR
Option 2 : Authentication with Client ID, Client Secret and Refresh Token
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.crealytics.google.analytics")
.option("clientId", "XXXXXXXX-xyxyxxxxyxyxxxxxyyyx.apps.googleusercontent.com")
.option("clientSecret", "73xxYxyxy-XXXYZZx-xZ_Z")
.option("refreshToken", "1/ezzzxZYzxxyyXYXzyyXXYYyxxxxyyyyxxxy")
.option("ids", "ga:12345678")
.option("startDate", "7daysAgo")
.option("endDate", "yesterday")
.option("queryIndividualDays", "true")
.option("calculatedMetrics", "averageEngagement")
.load()
// You need select the date column if using queryIndividualDays
df.select("date", "browser", "city", "users", "calcMetric_averageEngagement").show()
This library is built with SBT, which is automatically downloaded by the included shell script. To build a JAR file simply run sbt/sbt package
from the project root. The build configuration includes support for both Scala 2.10 and 2.11.