Oct 12, 2015

Query your Google Analytics Data with the GAR package

Google Analytics API connection with R
Recently my friend Andrew Geisler released a new version of the GAR package. Like other similar packages, the GAR package is designed to help you retrieve data from Google Analytics using R. But with some new features.

I have been playing a bit with the package and the feature I enjoy the most is the ability to query multiple Google Analytics View IDs in the same query. To do that, you simply need to pass a vector of the View IDs in the correspondent gaRequest() command, and you get back a data frame with each view/profile clearly identified and all their correspondent metrics/dimension you included in the query.
Pretty simple, no?

I think this is a very useful feature which makes the GAR package stand out from other similar packages out there (as far as I know there are currently 4 Google Analytics packages available: RGoogleAnalytics, RGA, ganalytics and GAR of course).

You could also build a loop in R to query multiple View IDs at once, and this is actually what I did previously using the RGoogleAnalytics package. But having this feature included in a package, it just make your life easier!

The GAR package is available on CRAN repository (v1.1 was released on 17 Sep 2015) and you can install it and load with the following commands:

install.packages('GAR', type=source)

Getting the data from Google Analytics

To get data from Google Analytics is easy and similar to other packages.

First of all you need to:

  1. Create a new project in the Google Developers's API Console, if you have not done it before.
  2. Authenticate using your project credentials. 

You can find a detailed explanation for these two steps on the GAR github tutorial here.

So, assuming you got the authentication right and obtained a token, you now need to make sure your token is refreshed (GA access tokens expire) every time you need to retrieve data, and finally execute your query from R.

To refresh the token you use the tokenRefresh() function. The resulting access token will be stored as an environmental variable accessible by the GAR Package.


To get the data, you will use the gaRequest() function.

df <- gaRequest(
metrics='ga:sessions, ga:users, ga:pageviews',

The arguments of this function are based on the structure of the typical API call to Google Analytics. So,  it's here that you will specify all the parameters of your query (metrics, dimensions, period,etc.). And it is here in particular that you specify the Google Analytics View IDs you would like to get the data from.

Of course the gaRequest() function will authenticate using the access token previously stored as an environmental variable.

Let's run an example. In the query below I am asking Google Analytics API to retrieve data about sessions and pageviews between 10 Oct 2015 to 11 Oct 2015, from five distinct View IDs.

df <- gaRequest(
id=c('ga:83424646','ga:77989457','ga:82857332','ga:65743580','ga:65743194'), dimensions='ga:date,ga:month',
metrics='ga:sessions, ga:pageviews',
start='2015-10-10', end='2015-10-11',

As expected, the resulting dataset has a total of 10 rows (5 View IDs x 2 days).

GAR package Query Output

As you can see on the screenshot, in addition to the metrics and dimensions you requested, the resulting data frame contains also details about your request, such as:

  • profile ID (or View ID)
  • accountId
  • webPropertyId
  • internalWebPropertyId
  • profileName (or View name)
  • tableId
  • start-date
  • end-date

Now that you have got your output data frame, you might want to categorize different websites or Views according to specific criteria and apply any aggregate functions (sum, average). It's up to you and to your internal business reporting needs. The key thing is that all the data you requested are included in a single table and ready analyse it with R.

Happy analysis!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.