Oct 12, 2015

Query your Google Analytics Data with the GAR package

Google Analytics API connection with R
Recently my friend Andrew Geisler released a new version of the GAR package. Like other similar packages, the GAR package is designed to help you retrieve data from Google Analytics using R. But with some new features.

I have been playing a bit with the package and the feature I enjoy the most is the ability to query multiple Google Analytics View IDs in the same query. To do that, you simply need to pass a vector of the View IDs in the correspondent gaRequest() command, and you get back a data frame with each view/profile clearly identified and all their correspondent metrics/dimension you included in the query. Pretty simple, no?

I think this is a very useful feature which makes the GAR package stand out from other similar packages out there (as far as I know there are currently 4 Google Analytics packages available: RGoogleAnalytics, RGA, ganalytics and GAR of course).

You could also build a loop in R to query multiple View IDs at once, and this is actually what I did previously using the RGoogleAnalytics package. But having this feature included in a package, it just make your life easier!

The GAR package is available on CRAN repository (v1.1 was released on 17 Sep 2015) and you can install it and load with the following commands:

install.packages('GAR', type=source)

Getting the data from Google Analytics

To get data from Google Analytics is easy and similar to other packages.

First of all you need to:

  1. Create a new project in the Google Developers's API Console, if you have not done it before.
  2. Authenticate using your project credentials. 

You can find a detailed explanation for these two steps on the GAR github tutorial here.

So, assuming you got the authentication right and obtained a token, you now need to make sure your token is refreshed (GA access tokens expire) every time you need to retrieve data, and finally execute your query from R.

To refresh the token you use the tokenRefresh() function. The resulting access token will be stored as an environmental variable accessible by the GAR Package.


To get the data, you will use the gaRequest() function.

df <- gaRequest(
metrics='ga:sessions, ga:users, ga:pageviews',

The arguments of this function are based on the structure of the typical API call to Google Analytics. So,  it's here that you will specify all the parameters of your query (metrics, dimensions, period,etc.). And it is here in particular that you specify the Google Analytics View IDs you would like to get the data from.

Of course the gaRequest() function will authenticate using the access token previously stored as an environmental variable.

Let's run an example. In the query below I am asking Google Analytics API to retrieve data about sessions and pageviews between 10 Oct 2015 to 11 Oct 2015, from five distinct View IDs.

df <- gaRequest(
id=c('ga:83424646','ga:77989457','ga:82857332','ga:65743580','ga:65743194'), dimensions='ga:date,ga:month',
metrics='ga:sessions, ga:pageviews',
start='2015-10-10', end='2015-10-11',

As expected, the resulting dataset has a total of 10 rows (5 View IDs x 2 days).

GAR package Query Output

As you can see on the screenshot, in addition to the metrics and dimensions you requested, the resulting data frame contains also details about your request, such as:

  • profile ID (or View ID)
  • accountId
  • webPropertyId
  • internalWebPropertyId
  • profileName (or View name)
  • tableId
  • start-date
  • end-date

Now that you have got your output data frame, you might want to categorize different websites or Views according to specific criteria and apply any aggregate functions (sum, average). It's up to you and to your internal business reporting needs. The key thing is that all the data you requested are included in a single table and ready analyse it with R.

Happy analysis!

Aug 17, 2015

Playing with R, Shiny Dashboard and Google Analytics Data

In this post, I want to share some examples of data visualization I was playing with recently. Like in many other occasions, my field of application is digital analytics data. Precisely, data from Google Analytics.

You might remember a previous post where I built a tentative dashboard using R, Shiny and Google Charts. The final result was not too bad, however the layout was somewhat too rigid since I was using the command "merge" to merge the charts and create the final dashboard.

So, I thought to spend some time improving my previous dashboard and include a couple of new visualizations, which will be hopefully inspiring. Of course, I am still using R, Shiny, and in particular shinydashboard: an ad hoc package to build dashboard with R.

The dashboard I've made makes use of the following visualizations:

  • Value boxes
  • Interactive Time Series (dygraphs)
  • Bubble charts
  • Streamgraphs
  • Treemaps 

You can see the final dashboard at shinyapps.io (though, because of basic plan current limits, it might be temporarily unavailable), or better you can check the code at github. Here is a screenshot:

Let's go quickly through each visualization to see what Google Analytics dimension/metrics it shows.

Value Boxes

When you build a dashboard, boxes are probably the main building blocks since they allow organize the information you want to show within the page. When I build a dashboard, I normally start by sketching the layout, and this means placing the main boxes.

A particular type of box available in the Shiny Dashboard package is the valueBox, which lets you display  numeric or text values, and also add an icon. Value boxes are great components to be placed at the top of a dashboard and display main KPI's, change % or add a description to the rest of the dashboard.

In my dashboard I placed 3 boxes at the top, showing the value for my 3 main KPI's: sessions to the website, transactions (conversions) and conversion rate. The code to build a value box with shiny dashboard is very simple and if you want to have dynamic values, like in my case, you have to create in both the server.R and ui.R section of your Shiny app:

Value Boxes with Shiny Dashbard

Interactive Time Series (dygraphs)

Time series charts might get chaotics and not provide clear insights when filled with too many data and series (you might end up with the so called "spaghetti-effect").

But if time series are interactive, user can easily explore and make sense of complex datasets.

For example, users could highlight specific data points, include/exclude time series, zoom in specific time intervals, enrich the graph with shaded regions or annotations, etc. All of these features are offered by the dygraphs Javascript charting library.

I used the R dygraph package (which provides an interface to the Javascript dygraph charting library) to make an interactive time series with my Google Analytics dataset. The simple chart I made shows 3 metrics: sessions, transactions and conversion rate (of those transactions) over the period selected by the user. Both sessions and transactions use the left axis while conversion rate the right one. I included a dyRangeSelector placed at the bottom of the chart that lets you narrow down the time interval.

Dygraphs with R Shiny

Bubble charts

With bubble charts you can show three dimensions of data. I used a bubble chart to visualize the performance of traffic channels: x axis represents the number of sessions, y axis thee avg. pages per session, and finally transactions (that is the ultimate objective of many websites) are proportional to the size of the bubble. The larger the bubble and the higher is the number of transactions  produced by that channel of traffic.

To make this chart I used the GoogleVis package.

Bubble Charts to visualize Traffic Channel Performance

In the dashboard I've also included a one-dimensional bubble chart using the bubbles library. This type of chart works similar to a bar chart though the latter is more accurate in terms of understanding the real value you are showing.

On the other hand, this bubble chart might look more attractive than bar charts and it allows to display lots of values in a small area. I used this chart to show screen resolutions data from Google Analytics mobile reports.

Bubbles showing Sessions by Screen Resolution


Streamgraphs are a type of stacked area charts that are displaced around a central horizontal axis. Stremgraphs are very effective to visualize data series that varies over time, especially if you need to show many categories.

The result is a flowing, organic shape, with strong aesthetic appeal, which is why streamgraphs are becoming more and more popular.

In the dashboard I made a streamgraph to visualize the evolution of sessions among devices (desktop, mobile, tablet) over the past years. To do it in R. I had to play a bit with the streamgraph package.

Here below is the final data viz (I am not completely happy with this visualization as for some reason when I mouse over the series the value showed is always the total of the period, not the one of the specific date I am pointing on. Any help?).

Streamgraph to show Devices Share of Traffic

Another interesting application on web analytics data,  would be using streamgraphs to analyse channels share of traffic over time (direct vs organic vs paid vs referral, etc.).


Treemap visualizations are very effective in showing hierarchical (tree-structured) data in a compact way. They can display lot of information within a limited space and at the same allow users to drilldown into the represented segments.

An example of hierarchical data in Google Analytics reports, is devices as principal segment (main rectangles) and browser as sub-segment (nested rectangles). The area of each rectangle is proportional to the amount of sessions produced by its corresponding segment/sub-segment.

To make in R, I used the treemap library (unfortunately the visualization is not interactive, but you can have a try with the d3treeR library).

Treemap to show Devices and OS Share of Sessions.

I hope you can get inspiration from these visualizations and include some of them in your digital analytics dashboard or reports. My plan is to keep adding more interesting visualizations (that are not currently offered in Google Analytics reports) to this dashboard, to better show digital data. If you have suggestions please leave a comment here or share it via github repo.

May 18, 2015

Query Multiple Google Analytics View IDs with R

Query Multiple View IDs with R

Extracting Google Analytics data from one website is pretty easy, and there are several options to do it quickly. But what if you need to extract data from multiple websites or, to be more precise, from multiple Views? And perhaps you also need to summarize it within a single data frame?

Not long ago I was working on a reporting project, where the client owned over 60 distinct websites. All of them tracked using Google Analytics.

Mar 29, 2015

R Statistics for Digital Analytics: 8 Blogs you should Follow

Are you interested in using R for your digital analytics projects? Do you need to perform prediction modelling and visualizations on your digital data and Excel can´t just do the job as you wanted?

Or, you simply have no idea how R could help you in your digital analytics problems and you would like to see some real working examples first?

Well, there are 2 good news for you.

The first one is that you are not alone. There is a quite vibrant community out there, sharing more and more examples on how to get real value from using R in digital analytics. They often post/tweet around the #rstats hashtag.

The second news is that I decided to write a post on this. I am going to list here the main blogs (and people) that might be useful to add to your "R Stats + Digital Analytics" reading list.

Jan 27, 2015

Google Analytics Dashboards with R & Shiny

Google Analytics Dashboards with R & Shiny
One of the key activities of any web or digital analyst is to design and create dashboards. The main objective of a web analytics dashboard is to display the current status of your key web metrics and arrange them on a single view, so that information can be monitored at a glance. Great dashboards should allow you/your boss or client to take action quickly and spot trends in data.

There are plenty of tools for creating dashboard out there. You can decide to create your dashboard directly in Google Analytics, using a spreadsheets (e.g. Excel or Google Sheets) or you might decide to go for an ad hoc dashboarding solution such as Tableau, or Klipfolio (I am a heavy user of the latter).

In this blogpost I aim to move away a bit from traditional dashboarding tools, and I wil show you an example of Google Analytics dashboard I've built using the R programming language and the Shiny package. Finally, I will also summarize the main benefits of using such tools for creating dashboards and perform data analysis in a digital analytics context.

[UPDATE: I've recently built a more sophisticated and better looking dashboard using the shinydashboard package. Click here to see it.]

Nov 23, 2014

Drawbacks of Using Time Metrics to Measure Blogs

When it comes to blogging, we all know that CONTENT is king. We also understand that SOCIAL interactions and readers ENGAGEMENT play a primary role for making the blog successful.

So far, so good.

But then it's time to analyse data and make decisions...and that's where we often fail.

We usually take a web analytics tool like Google Analytics, install basic tracking code on pages, and analyze the blog like any other website. We look at most common metrics and take them as standard references to evaluate future performance. But we forget about the unique features that differentiate blogs from other digital properties: content consumption and social interactions.

This post will help you understand one of the most misused metrics to measure blogs performance: I am talking about time on page and time on site. Most bloggers don't understand what time metrics actually measure. So, first of all I will try to explain how they are calculated in a typical web analytics tool (it might be different from what you think!).

I will then discuss some of the drawbacks of using time metrics to measure blog performance and finally suggest a couple of more solid KPI's to better measure content engagement.

After reading this post, I am sure you will start looking at time metrics with a bit more critical thinking than before. And perhaps shift your blog analytics focus to other more powerful metrics.

Let's go!

Oct 27, 2014

How I Measure Success for my Blog. A Framework using Google Analytics.

If you are serious about blogging, then you must have a measurement plan. No matter if you have just started and have only a dozen of visitors, or you already have a very popular blog whose primary purpose is making revenue from advertising. As long as you have some objectives for your blog, then you must decide what you need to measure.

Why? Because this is the only way to understand your blog performance and whether you are successful or not for your readers (I assume you are not writing only for yourself!).

Developing a measurement plan is the only way to understand whether you are successful or not for your readers.

In this post I am going to draft a measurement plan for MY BLOG and use it as a learning exercise to discuss critical aspects like choosing KPI's, (Key Performance Indicators) and segments to analyse performance. Google Analytics will be my reference platform for implementing the measurement plan.

Sep 6, 2014

All Data Journalism Graduates in a Map

This week I got my certificate of completion from the course "Doing Journalism with Data: First Steps, Skills and Tools"(if you like to know more about data journalism check out my post "3 Great Examples of Data Journalism Stories"). I enjoyed the course a lot, and I am proud of being one of the 1250 people who successfully completed the course. I was a bit surprised we were only 1250 graduates!

Aug 12, 2014

How to Test Universal Analytics Before Upgrading: via Google Tag Manager

Test Universal Analytics with Google Tag Manager

Since Universal Analytics came out of beta last April, more and more users have been starting the upgrade process from classic Google Analytics. Altough Google strongly encourages to do the upgrade, and reassure that the migration will not cause any loss of data (perhaps just a few seconds of traffic), some of us still remain a bit worried about the change. This is especially true in the case of big websites with a large number of tags already implemented through classic Google Analytics.

Will there be any significant difference in data after the complete migration? Will Universal Analytics inflate/reduce some metrics compared to classing tracking code? These questions should motivate you to perform some testing before moving completely to a new standard.

In this post I am going to suggest a step by step process to conduct your upgrade to Universal Analytics, with the help of Google Tag Manager. Yes, this post is also about Google Tag Manager. It´s actually about taking the opportunity of the transition to a new standard (Universal Analytics) and make it in the most efficient and safest way (Google Tag Manager).

The main idea of this step by step process is to keep the upgrade "under control" and make sure you are going to get the same quality of data as before.

Jul 15, 2014

3 Great Examples of Data Journalism Stories

Over the last month I've been spending part of my free time learning about an emerging discipline in the areas of data analytics: Data Journalism. I am doing it, firstly because I find the combination "data analysis + journalism" very fascinating, but also because, as a Web Analyst I believe that there are some very important skills I can absorb from Data Journalists (here is a post where I talk about Web Analyst skills).

The aim of this post is to introduce you to this emerging discipline, and show you a couple of practical examples of data journalism. To do so, I've selected 3 published data journalism stories and analysed each of them by answering four key questions:
  1. What does the story do?
  2. How was it created (methodology)? 
  3. How was it illustrated?
  4. What technologies were used to create and present the story to readers?