May 18, 2015

Query Multiple Google Analytics View IDs with R

Query Multiple View IDs with R

Extracting Google Analytics data from one website is pretty easy, and there are several options to do it quickly. But what if you need to extract data from multiple websites or, to be more precise, from multiple Views? And perhaps you also need to summarize it within a single data frame?

Not long ago I was working on a reporting project, where the client owned over 60 distinct websites. All of them tracked using Google Analytics.

Given the high number of sites managed and the nature of their business, it did not make sense for them to report & analyse data for each single website. It was much more effective to group those websites into categories (let say category 1, category 2, category 3, etc.) and report/analyse data at a category level rather than at a website level.

In other words, they needed to:
1) Collect data from each website
2) Categorize websites data according to specific internal business criteria
3) Report and visualize data at a category level (through an internal reporting system)

Very soon I realized that steps 1 & 2 were critical both in terms of time needed for extracting data and of the risk of copy/paste errors, especially if the extraction process was executed directly from Google Analytics platform.

But luckily that's where R and the RGoogleAnalytics package came in handy, allowing me to automate the extraction process with a simple for loop.

Let's quickly go through the different options I had to tackle points 1) and 2):

a) Download data from Google Analytics platform as Excel format
This would have meant doing the same operation for each one of the 60 sites! Too long. Plus a subsequent manual copy/paste work to group sites data into different categories. Boring and too risky! Moreover, given the segmentation required by the client, I could not find the info directly from Google Analytics standard reports.

b) Google Analytics Query Explorer
Google Analytics Query Explorer is very very handy and I use it a lot. You can connect to Google Analytics API and build complex queries quickly thanks to their easy to use interface. So I could obtain the required segmentation of data quite fast.

However, the current Query Explorer version allows you to query only one View ID at a time. Despite its plural nomenclature (ids), the View ID is a unique value as explained in Core Reporting API documentation, and you will  have to run your request several times in order to query multiple websites.

Hence, even if you use Query Explorer, you will have to query each website/view at a time. Download the data and merge it together your "websites category".

c) Google Spreadsheet Add-on
Thanks to the Google Analytics Spreadsheet Add-on, it's easy to run a query via Google Analytics API and obtain your web data. You can also run more than one query  at a time, which means you can query more than one Vew ID at a time.

I love Google Sheets Add-on, though in this particular case (query and categorize over 60 websites), you would still have some manual copy/paste work to do once you extracted the data into the spreadsheet.

d) Automate the extraction process with R
There are a few packages in R that let you connect to Google Analytics API. One of them is RGoogleAnalytics. But R is also a powerful programming language which allows you to automate complex operations.

So, I thought that combining the RGoogleAnalytics package with a simple R control structure like a for loop, could do the job quickly and with low margin of error.

Here below I provide a bit more details of how I run multiple queries in R,  and obviously, the code!


For loop to query multiple Google Analytics View IDs with R


What I did, was running a simple for loop that iterates over each View ID of my category, and retrieves the corresponding data using the query. Each time appending the new data in a data frame that will eventually become the final category data frame.

Let's break it down in a few steps to make it clearer.

Step 1: Authenticate to Google Analytics via RGoogleAnalytics package

I assume you are familiar with the RGoogleAnalytics package. If not, please check out this brilliant post which explains in details how to connect Google Analytics with R.

What you have to do, is first of all create a new project using the Google developers Console. Once created, you will grab your credentials ("client.id" and "client.secret" variables in the code), and use them to create and validate your token.

Of course you need to have the RGoogleAnalytics library loaded to do all of this.

library(RGoogleAnalytics)
client.id <- "yourClientID"
client.secret <- "yourClientSecret"
 
# if no token is found within your worrking directory, a new token will be created. Otherwise the existing one will be loaded
 
if (!file.exists("./oauth_token")) {
oauth_token <- Auth(client.id,client.secret)
oauth_token <- save(token,file="./oauth_token")
} else {
load("./oauth_token")
}
 
ValidateToken(token)


Step 2:  Create the View IDs category

Using the "GetProfiles" command, you can get a list with all the Views(or profiles) you have access to with your token. And the corresponding View IDs too, which are actually the parameters you need to build your query.

From that list you can easily select the ones you need to build your category. Or otherwise you can create your category directly by entering the IDs manually. As an example, below I create 3 categories, each containing a certain number of IDs.

Each category will be a vector of charachter class.

viewID<-GetProfiles(token)
viewID
 
category1<- c("79242136", "89242136", "892421","242136","242138","242140","242141")
category2<- c("54120", "54121", "54125","54126")
category3<- c("60123", "60124", "60125")

Step 3: Initialize an empty data frame

Before executing the loop, I create an empty data frame named "df". I will need this to store the data extracted through the multiple queries.

df<-data.frame()

As you will see in next step, each time a new query is run for a specific View ID, the resulting data will be appended below the last row of the previous data frame using the function rbind.


Step 4: Run the for loop over each category

Now that we have the websites's categories set up and the a data frame ready to store data, we can finally run the loop. What I do here, is using a variable called "v" and iterate it over a specific category, let say "category1". In other words, the Google Analytics query is run for each single View ID included in the category.

The resulting object of each query is a data frame called "ga.data". To collect the result of each query in the same data frame, each time the loop is run, the "df" data frame created previously is joined vertically using a "rbind" function.

for (v in category1){
     start.date <- "2015-04-01"
     end.date <- "2015-04-30"
     view.id <- paste("ga:",v,sep="") #the View ID parameter need to have "ga:" in front of the ID 
 
     query.list <- Init(start.date = start.date, end.date = end.date, dimensions = "ga:date,    ga:deviceCategory, ga:channelGrouping,", metrics = "ga:sessions, ga:users, ga:bounceRate, ga:goalCompletions1", table.id = view.id)
     ga.query <- QueryBuilder(query.list)
     ga.data <- GetReportData(ga.query, token, paginate_query = F)
 
     df<-rbind(df,ga.data)
}

Query Multiple Google Analytics View IDs output


This for loop would query data only for category 1. To query websites belonging to category 2, you would need run the same loop again, this time iterating over category 2. Remember to re-initialize the "df" data frame when you change category, otherwise all nes results will be joined below your previous data frame.

Step 5: Do whatever you want with your data!

At this point, you should have all the Google Analytics data available in your R workspace. And most importantly, categorized!

You might need now to perform some cleaning on your data, visualize it or export it into another format. Fortunately R offers you so many functions and packages that you can do basically whatever you want with those data.

If you need for example to export your data frame into a .csv. file, you can do it very quickly using the write.csv command:

write.csv(df,file="category1.csv")

Another data munging operation you might want to do on your Google Analytics data, is converting dates in a more friendly format. Infact, the dates you extract from Google Analytics comes into R as character data type, with the "yyyyMMdd" format. You can do this with the following code:

class(ga.data$date) # dates come as character
newDate<-as.Date(ga.data$date,"%Y%M%d")  #convert into date data type
newFormat<- format(newDate,"%m/%d/%y") #to change format, but it convets it back to character class
newFormat<- as.Date(newFormat,"%d/%m/%y")  #convert it back to date data type

In general I suggest you use the dplyr package for any data manipulation operation you might need to perform on your data frame.

And of course, you could include all the data cleaning/manipulation commands inside the above for loop if you like. By doing that, you would automatize your process even more, and end up with a data frame ready to be reported or visualized for your audience.  

Mar 29, 2015

R Stats + Digital Analytics: 8 Blogs you should Follow


Are you interested in using R for your digital analytics projects? Do you need to perform prediction modelling and visualizations on your digital data and Excel can´t just do the job as you wanted?

Or, you simply have no idea how R could help you in your digital analytics problems and you would like to see some real working examples first?

Well, there are 2 good news for you.

The first one is that you are not alone. There is a quite vibrant community out there, sharing more and more examples on how to get real value from using R in digital analytics. They often post/tweet around the #rstats hashtag.

The second news is that I decided to write a post on this. I am going to list here the main blogs (and people) that might be useful to add to your "R Stats + Digital Analytics" reading list.

Jan 27, 2015

Google Analytics Dashboards with R & Shiny


Google Analytics Dashboards with R & Shiny
One of the key activities of any web or digital analyst is to design and create dashboards. The main objective of a web analytics dashboard is to display the current status of your key web metrics and arrange them on a single view, so that information can be monitored at a glance. Great dashboards should allow you/your boss or client to take action quickly and spot trends in data.

There are plenty of tools for creating dashboard out there. You can decide to create your dashboard directly in Google Analytics, using a spreadsheets (e.g. Excel or Google Sheets) or you might decide to go for an ad hoc dashboarding solution such as Tableau, or Klipfolio (I am a heavy user of the latter).

In this blogpost I aim to move away a bit from traditional dashboarding tools, and I wil show you an example of Google Analytics dashboard I've built using the R programming language and the Shiny package. Finally, I will also summarize the main benefits of using such tools for creating dashboards and perform data analysis in a digital analytics context.