Spark Google Analytics

A Spark package for retrieving data from Google Analytics
Alternatives To Spark Google Analytics
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Spark Bigquery149
4 years ago6November 29, 201734apache-2.0Scala
Google BigQuery support for Spark, SQL, and DataFrames
4 years ago32apache-2.0Shell
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
Laravel Spark Google2fa82
4 years ago15December 14, 20191mitPHP
Google Authenticator support for Laravel Spark
Token Dispenser56
4 years ago9gpl-2.0Java
Stores email-password pairs, gives out Google Play Store tokens
Spark Google Spreadsheets46
4 years ago10August 21, 20197apache-2.0Scala
Google Spreadsheets datasource for SparkSQL and DataFrames
8 years ago1apache-2.0Python
Spark GCE Script Helps you deploy Spark cluster on Google Cloud.
Google Api Client Codeigniter Spark36
11 years ago1apache-2.0PHP
A carbon copy of the Google distributed PHP API Client, made available to the Sparks repository with some integration tips
Spark Google Analytics33
6 years ago8June 21, 20175apache-2.0Scala
A Spark package for retrieving data from Google Analytics
22 months ago7February 16, 20219bsd-3-clauseRuby
Integrate your Spree application with Google Analytics and
Spark Streaming With Google Cloud Example19
7 years agoapache-2.0Scala
an example of integrating Spark Streaming with Google Pub/Sub and Google Datastore
Alternatives To Spark Google Analytics
Select To Compare

Alternative Project Comparisons

Spark Google Analytics Library

A library for querying Google Analytics data with Apache Spark, for Spark SQL and DataFrames.

Build Status


This library requires Spark 1.4+


You can link against this library in your program at the following coordinates:

Scala 2.10

groupId: com.crealytics
artifactId: spark-google-analytics_2.10
version: 1.1.2

Scala 2.11

groupId: com.crealytics
artifactId: spark-google-analytics_2.11
version: 1.1.2

Using with Spark shell

This package can be added to Spark using the --packages command line option. For example, to include it when starting the spark shell:

Spark compiled with Scala 2.11

$SPARK_HOME/bin/spark-shell --packages com.crealytics:spark-google-analytics_2.11:1.1.2

Spark compiled with Scala 2.10

$SPARK_HOME/bin/spark-shell --packages com.crealytics:spark-google-analytics_2.10:1.1.2


This package allows querying Google Analytics reports as Spark DataFrames. The API accepts several options (see the Google Analytics developer docs for details):

  • serviceAccountId: an account id for accessing the Google Analytics API ([email protected])
  • keyFileLocation: a key-file that you have to generate from the developer console
  • clientId: an account id that you have to generate from the developer console using OAuth 2.0 credentials option
  • clientSecret: a client secret id that you have to obtain from the developer console for OAuth 2.0 credentials client id which you have already generated
  • refreshToken: a refresh token is need to be obtained by User's Login for which you wanted to collect GA data. Once user login from appropriate call you will get this token in response. See OAuth2WebServer Offline for more information
  • ids: the ID of the site for which you want to pull the data
  • startDate: the start date for the report
  • endDate: the end date for the report
  • queryIndividualDays: fetches each day from the chosen date range individually in order to minimize sampling (only works if date is chosen as dimension)
  • calculatedMetrics: the suffixes of any calculated metrics (defined in your GA view) you want to query

Scala API

Spark 1.4+:

import org.apache.spark.sql.SQLContext

Option 1 : Authentication with Service Account ID and P12 Key File

val sqlContext = new SQLContext(sc)
val df =
    .option("serviceAccountId", "[email protected]")
    .option("keyFileLocation", "the_key_file.p12")
    .option("ids", "ga:12345678")
    .option("startDate", "7daysAgo")
    .option("endDate", "yesterday")
    .option("queryIndividualDays", "true")
    .option("calculatedMetrics", "averageEngagement")

// You need select the date column if using queryIndividualDays"date", "browser", "city", "users", "calcMetric_averageEngagement").show()


Option 2 : Authentication with Client ID, Client Secret and Refresh Token

val sqlContext = new SQLContext(sc)
val df =
    .option("clientId", "")
    .option("clientSecret", "73xxYxyxy-XXXYZZx-xZ_Z")
    .option("refreshToken", "1/ezzzxZYzxxyyXYXzyyXXYYyxxxxyyyyxxxy")
    .option("ids", "ga:12345678")
    .option("startDate", "7daysAgo")
    .option("endDate", "yesterday")
    .option("queryIndividualDays", "true")
    .option("calculatedMetrics", "averageEngagement")

// You need select the date column if using queryIndividualDays"date", "browser", "city", "users", "calcMetric_averageEngagement").show()

Building From Source

This library is built with SBT, which is automatically downloaded by the included shell script. To build a JAR file simply run sbt/sbt package from the project root. The build configuration includes support for both Scala 2.10 and 2.11.

Popular Google Projects
Popular Spark Projects
Popular Companies Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Google Analytics