May 19, 2015

Query Multiple Google Analytics View IDs with R

Query Multiple View IDs with R

Extracting Google Analytics data from one website is pretty easy, and there are several options to do it quickly. But what if you need to extract data from multiple websites or, to be more precise, from multiple Views? And perhaps you also need to summarize it within a single data frame?

Not long ago I was working on a reporting project, where the client owned over 60 distinct websites. All of them tracked using Google Analytics.


Given the high number of sites managed and the nature of their business, it did not make sense for them to report & analyse data for each single website. It was much more effective to group those websites into categories (let say category 1, category 2, category 3, etc.) and report/analyse data at a category level rather than at a website level.

In other words, they needed to:

  1. Collect data from each website
  2. Categorize websites data according to specific internal business criteria
  3. Report and visualize data at a category level (through an internal reporting system)


Very soon I realized that steps 1 & 2 were critical both in terms of time needed for extracting data and of the risk of copy/paste errors, especially if the extraction process was executed directly from Google Analytics platform.

But luckily that's where R and the RGoogleAnalytics package came in handy, allowing me to automate the extraction process with a simple for loop.

Let's quickly go through the different options I had to tackle points 1. and 2.

a) Download data from Google Analytics platform as Excel format

This would have meant doing the same operation for each one of the 60 sites! Too long. Plus a subsequent manual copy/paste work to group sites data into different categories. Boring and too risky! Moreover, given the segmentation required by the client, I could not find the info directly from Google Analytics standard reports.

b) Google Analytics Query Explorer

Google Analytics Query Explorer is very very handy and I use it a lot. You can connect to Google Analytics API and build complex queries quickly thanks to their easy to use interface. So I could obtain the required segmentation of data quite fast.

However, the current Query Explorer version allows you to query only one View ID at a time. Despite its plural nomenclature (ids), the View ID is a unique value as explained in Core Reporting API documentation, and you will  have to run your request several times in order to query multiple websites.

Hence, even if you use Query Explorer, you will have to query each website/view at a time. Download the data and merge it together your "websites category".

c) Google Spreadsheet Add-on

Thanks to the Google Analytics Spreadsheet Add-on, it's easy to run a query via Google Analytics API and obtain your web data. You can also run more than one query  at a time, which means you can query more than one Vew ID at a time.

I love Google Sheets Add-on, though in this particular case (query and categorize over 60 websites), you would still have some manual copy/paste work to do once you extracted the data into the spreadsheet.

d) Automate the extraction process with R (the solution I cover in this post)

R to extract Google analytics data

There are a few packages in R that let you connect to Google Analytics API. One of them is RGoogleAnalytics. But R is also a powerful programming language which allows you to automate complex operations.

So, I thought that combining the RGoogleAnalytics package with a simple R control structure like a for loop, could do the job quickly and with low margin of error.
for (var in seq) expr

Here below I provide a bit more details of how I run multiple queries in R,  and obviously, the code!


For loop to query multiple Google Analytics View IDs with R


What I did, was running a simple for loop that iterates over each View ID of my category, and retrieves the corresponding data using the query. Each time appending the new data in a data frame that will eventually become the final category data frame.

Let's break it down in a few steps to make it clearer.


Step 1: Authenticate to Google Analytics via RGoogleAnalytics package


I assume you are familiar with the RGoogleAnalytics package. If not, please check out this brilliant post which explains in details how to connect Google Analytics with R.

What you have to do, is first of all create a new project using the Google developers Console. Once created, you will grab your credentials ("client.id" and "client.secret" variables in the code), and use them to create and validate your token.

Of course you need to have the RGoogleAnalytics library loaded to do all of this.

library(RGoogleAnalytics)
client.id <- "yourClientID"
client.secret <- "yourClientSecret"
 
# if no token is found within your worrking directory, a new token will be created. Otherwise the existing one will be loaded
 
if (!file.exists("./oauth_token")) {
oauth_token <- Auth(client.id,client.secret)
oauth_token <- save(token,file="./oauth_token")
} else {
load("./oauth_token")
}
 
ValidateToken(token)


Step 2:  Create the View IDs category


Using the "GetProfiles" command, you can get a list with all the Views(or profiles) you have access to with your token. And the corresponding View IDs too, which are actually the parameters you need to build your query.

From that list you can easily select the ones you need to build your category. Or otherwise you can create your category directly by entering the IDs manually. As an example, below I create 3 categories, each containing a certain number of IDs.

Each category will be a vector of charachter class.

viewID<-GetProfiles(token)
viewID
 
category1<- c("79242136", "89242136", "892421","242136","242138","242140","242141")
category2<- c("54120", "54121", "54125","54126")
category3<- c("60123", "60124", "60125")

Step 3: Initialize an empty data frame


Before executing the loop, I create an empty data frame named "df". I will need this to store the data extracted through the multiple queries.

df<-data.frame()

As you will see in next step, each time a new query is run for a specific View ID, the resulting data will be appended below the last row of the previous data frame using the function rbind.


Step 4: Run the for loop over each category


Now that we have the websites's categories set up and the a data frame ready to store data, we can finally run the loop. What I do here, is using a variable called "v" and iterate it over a specific category, let say "category1". In other words, the Google Analytics query is run for each single View ID included in the category.

The resulting object of each query is a data frame called "ga.data". To collect the result of each query in the same data frame, each time the loop is run, the "df" data frame created previously is joined vertically using a "rbind" function.

for (v in category1){
     start.date <- "2015-04-01"
     end.date <- "2015-04-30"
     view.id <- paste("ga:",v,sep="") #the View ID parameter need to have "ga:" in front of the ID 
 
     query.list <- Init(start.date = start.date, end.date = end.date, dimensions = "ga:date,    ga:deviceCategory, ga:channelGrouping,", metrics = "ga:sessions, ga:users, ga:bounceRate, ga:goalCompletions1", table.id = view.id)
     ga.query <- QueryBuilder(query.list)
     ga.data <- GetReportData(ga.query, token, paginate_query = F)
 
     df<-rbind(df,ga.data)
}

Query Multiple Google Analytics View IDs output


This for loop would query data only for category 1. To query websites belonging to category 2, you would need run the same loop again, this time iterating over category 2. Remember to re-initialize the "df" data frame when you change category, otherwise all nes results will be joined below your previous data frame.

Step 5: Do whatever you want with your data frame!


At this point, you should have all the Google Analytics data available in your R workspace. And most importantly, categorized!

You might need now to perform some cleaning on your data, visualize it or export it into another format. Fortunately R offers you so many functions and packages that you can do basically whatever you want with those data.

If you need for example to export your data frame into a .csv. file, you can do it very quickly using the write.csv command:

write.csv(df,file="category1.csv")

Another data munging operation you might want to do on your Google Analytics data, is converting dates in a more friendly format. Infact, the dates you extract from Google Analytics comes into R as character data type, with the "yyyyMMdd" format. You can do this with the following code:

class(ga.data$date) # dates come as character
newDate<-as.Date(ga.data$date,"%Y%M%d")  #convert into date data type
newFormat<- format(newDate,"%m/%d/%y") #to change format, but it convets it back to character class
newFormat<- as.Date(newFormat,"%d/%m/%y")  #convert it back to date data type

In general I suggest you use the dplyr package for any data manipulation operation you might need to perform on your data frame.

And of course, you could include all the data cleaning/manipulation commands inside the above for loop if you like. By doing that, you would automatize your process even more, and end up with a data frame ready to be reported or visualized for your audience.  

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.