Jan 27, 2015

Google Analytics Dashboards with R & Shiny

Google Analytics Dashboards with R & Shiny
One of the key activities of any web or digital analyst is to design and create dashboards. The main objective of a web analytics dashboard is to display the current status of your key web metrics and arrange them on a single view, so that information can be monitored at a glance. Great dashboards should allow you/your boss or client to take action quickly and spot trends in data.

There are plenty of tools for creating dashboard out there. You can decide to create your dashboard directly in Google Analytics, using a spreadsheets (e.g. Excel or Google Sheets) or you might decide to go for an ad hoc dashboarding solution such as Tableau, or Klipfolio (I am a heavy user of the latter).

In this blogpost I aim to move away a bit from traditional dashboarding tools, and I wil show you an example of Google Analytics dashboard I've built using the R programming language and the Shiny package. Finally, I will also summarize the main benefits of using such tools to for creating dashboards and perform data analysis in a digital analytics context.

R and Shiny introduction 

R is a very powerful platform for data analysis. R is actually very good at lots of things including statistical modelling, data visualizations, plus it relies on a very large and enthusiastic  community of users and developers which make the product growing and improving regularly. For all these reasons, today R  is widely used by scientists, researchers, and statisticians. And many are the companies that are routinely using R for data analysis: Google, Facebook, The New York Times, Twitter, Coursera, to name a few. As Dave Smith wrote on a recent paper, "R is still hot and getting hotter".  

On the other hand, Shiny is an R package developed by the guys of RStudio, that allows you to build interactive web applications using R code. So let say you have performed some data analysis with R: you can now wrap it into an app and share it to other people, who do not need to be R users.

With the developement of Shiny,  R is gaining more interactivity and is becoming a quite attractive option for analysts to construct interactive dashboards and share data to their boss/clients or co-workers. Pretty cool, isn't it?

Let's show you an example...

A simple dashboard scenario: segment traffic by device

The Shiny application I created simulates a simple dashboard scenario where users can segment data by traffic device (desktop vs mobile vs tablet) through a radio button.

GAdashboard on Make A Gif

make animated gifs like this at MakeAGif

The dashboard is composed of 4 visualizations:

  1. A line chart showing sessions and sign-ups daily trend. Sessions are measured on primary axis while sign-ups on secondary axis.
  2. A bubble chart plotting for each traffic channel three metrics: number of sessions, avg. pages per session and revenue. This visualization can be quite interesting to analyse channels performance with respect to the website objective (e.g. revenue), and currently it is not available in Google Analytics acquisition reports.
  3. A line chart showing bounce rate daily evolution.
  4. A world map visualizing the number of new users: the darker the country and the more new users visited the site from that country.  

Click this link to view the dashboard and go interact yourself. The app is currently hosted at Shinyapps.io, a dedicated website where you can deploy and share your Shiny applications online.

As you can see, it's a very simple scenario, both in terms of user interface and calculations running in the background. Nothing complex, no statistical modelling involved, though this would be definitely a very powerful feature to include in a R coded dashboard.

What I did, was playing a little bit with Google Charts visualizations through the GoogleVis package. GoogleVis is a R package that provides an interface to the Google Vis API, and make creating interactive plots quite easy. Interactive means that users can manipulate data and look for the info they need.

Except for the bubble chart, all the other charts I used to create the dashboard are available in Google Analytics reports. But if like, you can do much much more. Among the charts available in GoogleVis package there are scatter charts, histograms, stepped area charts, org charts, tree maps, gauge charts and boxplots. Here is the complete list of visualizations you can do with Google Charts.

Like all Shiny applications, this dashboard app is made of two code files (ui.R and server.R) which must be placed in the same directory:

  • ui.R = it defines how the web application looks to users. All the calls you make on this file, they generate some HTML code.
  • server.R = this is normal R code where you perform your data analysis.

With respect to building the actual 4 charts dashboard, what I did was first creating each chart object separately, and then merge them in pairs as follows:

D1 <- gvisLineChart(dataDevice, "date", c("sessions","signup"))
D2 <- gvisBubbleChart(channelsDevice, idvar="channel", xvar="sessions", yvar="pages.sessions")
D3 <- gvisLineChart(dataDevice, xvar="date", yvar="bounce.rate",
D4 <- gvisGeoChart(countriesDevice, "country", "new.users")
D12 <- gvisMerge(D1,D2, horizontal=TRUE)
D34 <- gvisMerge(D3,D4, horizontal=TRUE)
D12D34<- gvisMerge(D12,D34, horizontal=FALSE)

All of the code for this dashboard application lives on this GitHub Repo here. Raw data was downloaded manually from Google Analytics in .csv format, though this operation can be automated by connecting directly with Google Analytics API (see RGoogleAnalytics package).

Benefits of using R & Shiny to create a Google Analytics Dashboard

So, what are the main benefits of using R & Shiny to create a Google Anaytics dashboard? And to answer a broader question: why should you use R for web anaytics?

With the development of a package such as Shiny, R definitely becomes a more attractive option for analysts to build dashboards. Here below I put together a list of 12 main benefits you would gain by using R for creating a Google Analytics dashboard:

  1. Advanced statistics capabilities & prediction models. R was born as a statistical language and keeps being the language of reference of any statistician. It has lots of packages for performing any specialized function and it's always up to date thanks to its open source nature. Using R for web analytics would allow you to incorporate sophisticated prediction models easily in your dashboard, and more importantly let your boss/client explore and interact with the models you have built (E-commerce is a very interesting field where to apply prediction models).
  2. State-of-the-Art Visualizations. R has very advanced graphics capabilities which let you create beautiful and interactive dashboards. R offers several powerful packages like GoogleVis (the one I used in the above dashboard), ggplot, ggVis or dygraphs.
  3. Connect directly to Google Analytics API. In my dashboard example I manually downloaded the data in .csv format (mainly for privacy reasons), but you can surely automate the retrieaval of data through ad-hoc R packages. Check out this recent post that explains how to connect Google Anaytics to R using the RGoogleAnalytics package.
  4. No web development  knowledge is required, altough if you know some HTML/CSS/ JavaScript  you can fully customize the user interface and make suitable for you and your final users.
  5. Attractive default UI theme, based on Twitter bootstrap.
  6. Shiny can integrate JavaScript libraries like d3.js for visualizations.
  7. Shiny uses a reactive programming model like modern web applications do, which indicates that  when the user changes a value in a ui control (e.g. the radio button), the R code in the background will get recalculated and the output that is bound to the ui (e.g. the 4 charts in the dashboard) will be re-rendered.
  8. Reproducibility. This is a very very important concept at the basis of R (and other programming languages too), and means being able capture each step of your data analysis so that you or other people can reproduce it. In a business scenario, reproducibility means being able to repeat complex functions and dashboards for more than one client.
  9. Scalability. R is a much more powerful and solid compared to other toools like Excel when it comes to process large amount of data.
  10. Integrate different data sources. R can read almost any type of data (.txt, .csv, etc.). There are R packages specifically designed to read Excel, JSON, XML, etc. or you can even scrape data from websites and execute SQL queries. This means you could potentially integrate different sources of data all in the same dashboard. And once imported the data, and cleaned it, you can build a data frame on which you can use all R functions. Very powerful.
  11. R is an open source project, which means it is continually improved, upgraded, enhanced, and expanded by a global community of incredibly passionate developers and users. Currently R has over 5,000 add-ons packages.
  12. it's Free!

What do you think about implementing a dashboard with R & Shiny? Which are the main obstacles you might encounter moving from traditional dashboarding tools to R?

Do you see R & Shiny playing an important role in digital analytics in the near future?

Share your thougths and be social!

Nov 23, 2014

Drawbacks of Using Time Metrics to Measure Blogs

When it comes to blogging, we all know that CONTENT is king. We also understand that SOCIAL interactions and readers ENGAGEMENT play a primary role for making the blog successful.

So far, so good.

But then it's time to analyse data and make decisions...and that's where we often fail.

We usually take a web analytics tool like Google Analytics, install basic tracking code on pages, and analyze the blog like any other website. We look at most common metrics and take them as standard references to evaluate future performance. But we forget about the unique features that differentiate blogs from other digital properties: content consumption and social interactions.

This post will help you understand one of the most misused metrics to measure blogs performance: I am talking about time on page and time on site. Most bloggers don't understand what time metrics actually measure. So, first of all I will try to explain how they are calculated in a typical web analytics tool (it might be different from what you think!).

I will then discuss some of the drawbacks of using time metrics to measure blog performance and finally suggest a couple of more solid KPI's to better measure content engagement.

After reading this post, I am sure you will start looking at time metrics with a bit more critical thinking than before. And perhaps shift your blog analytics focus to other more powerful metrics.

Let's go!

Oct 27, 2014

How I Measure Success for my Blog. A Framework using Google Analytics.

If you are serious about blogging, then you must have a measurement plan. No matter if you have just started and have only a dozen of visitors, or you already have a very popular blog whose primary purpose is making revenue from advertising. As long as you have some objectives for your blog, then you must decide what you need to measure.

Why? Because this is the only way to understand your blog performance and whether you are successful or not for your readers (I assume you are not writing only for yourself!).

Developing a measurement plan is the only way to understand whether you are successful or not for your readers.

In this post I am going to draft a measurement plan for MY BLOG and use it as a learning exercise to discuss critical aspects like choosing KPI's, (Key Performance Indicators) and segments to analyse performance. Google Analytics will be my reference platform for implementing the measurement plan.

Sep 6, 2014

All Data Journalism Graduates in a Map

This week I got my certificate of completion from the course "Doing Journalism with Data: First Steps, Skills and Tools"(if you like to know more about data journalism check out my post "3 Great Examples of Data Journalism Stories"). I enjoyed the course a lot, and I am proud of being one of the 1250 people who successfully completed the course. I was a bit surprised we were only 1250 graduates!

So, where did we come from and who we are? Above is a map I built using the R programming language, and in particular the GoogleVis package. GoogleVis is a great package that provides an interface to the Gogle Vis API, and make creating interactive plots quite easy. Interactive means that users can manipulate data and look for the info they need. Here a list of visualizations you can do with Google Charts.

The other great thing about this visualization, is that you can make it available over HTML, like I did above (you can edit the HTML if you like). No more static charts on your desktop then, but beautiful, interactive visualization shared on the web!

Below is the simple R code I used to prepare the data and plot the charts. To plot the data about graduates titles (the title people indicated when they enrolled to the course) I used Google Refine and some of its cluster methods to clean/group data (e.g.: "journalist" or journalists" or "periodista" falled into the general category of "Journalist"). Then I load it into R as a .csv file.

ddj<-  read.csv("ddjCleaned.csv")
studCountry<-  as.data.frame(table(ddj$country))
names(studCountry)<- c("country","graduates")

studTitle<- as.data.frame(table(ddj$title))


C<- gvisGeoChart(studCountry, locationvar = "country", colorvar = "graduates", options = list(width = 500, height = 400))

T<- gvisPieChart(head(studTitle[order(studTitle$graduates, decreasing =TRUE),],10), labelvar = "title", numvar="graduates",options = list(width = 500, height = 300))

CT <- gvisMerge(D,T, horizontal=FALSE)

# to get the HTML code of your visualization you can either print execute the following command:

print(CT)  #print the Object you have just created

# or you can click on the Chart ID link below your visualization.

Aug 12, 2014

How to Test Universal Analytics Before Upgrading: via Google Tag Manager

Test Universal Analytics with Google Tag Manager

Since Universal Analytics came out of beta last April, more and more users have been starting the upgrade process from classic Google Analytics. Altough Google strongly encourages to do the upgrade, and reassure that the migration will not cause any loss of data (perhaps just a few seconds of traffic), some of us still remain a bit worried about the change. This is especially true in the case of big websites with a large number of tags already implemented through classic Google Analytics.

Will there be any significant difference in data after the complete migration? Will Universal Analytics inflate/reduce some metrics compared to classing tracking code? These questions should motivate you to perform some testing before moving completely to a new standard.

In this post I am going to suggest a step by step process to conduct your upgrade to Universal Analytics, with the help of Google Tag Manager. Yes, this post is also about Google Tag Manager. It´s actually about taking the opportunity of the transition to a new standard (Universal Analytics) and make it in the most efficient and safest way (Google Tag Manager).

The main idea of this step by step process is to keep the upgrade "under control" and make sure you are going to get the same quality of data as before.

Jul 15, 2014

3 Great Examples of Data Journalism Stories

Over the last month I've been spending part of my free time learning about an emerging discipline in the areas of data analytics: Data Journalism. I am doing it, firstly because I find the combination "data analysis + journalism" very fascinating, but also because, as a Web Analyst I believe that there are some very important skills I can absorb from Data Journalists (here is a post where I talk about Web Analyst skills).

The aim of this post is to introduce you to this emerging discipline, and show you a couple of practical examples of data journalism. To do so, I've selected 3 published data journalism stories and analysed each of them by answering four key questions:
  1. What does the story do?
  2. How was it created (methodology)? 
  3. How was it illustrated?
  4. What technologies were used to create and present the story to readers?   

Jun 23, 2014

Performing ANOVA Test in R: Results and Interpretation

ANOVA test with R

When testing an hypothesis with a categorical explanatory variable and a quantitative response variable, the tool normally used in statistics is Analysis of Variances, also called ANOVA.

In this post I am performing an ANOVA test using the R programming language, to a dataset of breast cancer new cases across continents.

The objective of the ANOVA test is to analyse if there  is a (statistically) significant difference in breast cancer, between different continents. In other words, I am interested to see whether new episodes of breast cancer are more likely to take place in some regions rather than others.

Beyond analysing this specific breast cancer dataset, I hope with this post to create a short tutorial about ANOVA and how to do simple linear models in R.

May 13, 2014

Web Scraping for Non-Programmers: 3 easy Tools to Extract Data from Websites

If you work with data and use the web as your main source for datasets, then you might have heard the words "web scraping". If you have not come across it yet, well surely you happened to find some interesting data on the web, but no available download options. No csv file or excel download. Nothing. Nada. Niente. And even your desperate copy-and-paste attempt has failed you. This is where web scraping comes in handy.

This post is about introducing web scraping, and I am going to present 3 tools anyone of us can use to "scrape" the web. Two of them can be used directly from your browser, while the other option is available through Google Spreadsheets. But, most importantly, they are all free, very quick and easy to use and do not require programming skills.

All right, let's define the topic of this post first. What the heck is Web Scraping?

web scraping

Apr 29, 2014

How to Move your Blog from Tumblr to Blogger in 10 Steps

Premise: this post is not about web analytics. Still I've decided to include it here as this is the place where I am present online. At the end of the day I am a blogger too. And like all bloggers, sooner or later we have to get knowledgeable of diffferent blogging platforms, choose the most suitable for our own objectives, and focus on writing about the things we like. Hope you find it useful!

move from tumblr to blogger

Less than two months ago I decided to move all my blog posts from Tumblr to Blogger. Why? There were actually things I really enjoyed about being part of the Tumblr community. But I had clearly more reasons (I am not going in details here) to make a move and get into a simple and popular platform as Blogger.

Here below I am reporting tha main steps I took (sussessfully, thanks god) to move my blog posts from Tumblr to Blogger. I will try to do it in a very quick and simple way, as I hope your migration will be. For each step I provide the link of the tool/tutorial you can refer to for more info. Ok, let's start. Good luck.


Mar 26, 2014

Gathering Business Requirements for a Google Analytics Project. A quick Template.

Every business has unique objectives and data needs. Because of that, a good Web Analytics project should always start with an audit aimed at understanding and gathering these unique business requirements.

Infact, it´s only by capturing the objectives of the website/online business that we will be able to create and implement an effective measurement plan and eventualy take action on data. Using Avinash Kaushik words:
There is one difference between winners and losers when it comes to web analytcs. Winners, well before they think data or tool, have a well structured Digital Marketing & Measurement Model. Losers don´t.
In this post I suggest a simple and quick template that will help you capturing business and technical requirements for a web analytics project. And so being able, later, to develop an implementation plan.  I am in particular thinking about digital analysts/agencies who offer Google Analytics consulting services to their clients.

The following template can be useful when you sit down for the first time with a client (or prospect one), and you know very little or nothing about their business on the web. Of course you should complement this initial audit with:
  1. A visit to the website (do it before meeting the client!): see how it looks, go to product pages, navigate as you were a potential customer. And try to identify yourself their business objectives and potential technical requirements.
  2. A more detailed analysis into their Web Analytics tool (if they are currently using one, and clearly if you have been given access previously). This step will let you audit the type of implementation they currently have and get first a picture of the data.    
Scroll down the form to keep reading.

A couple of notes about the above template, before closing this post:

- the above questions were formulated with a Google Analytics audit in mind (especially questions about GA implementation features). However, I believe most of them could be replicated for any Web Analytics tool;

- I created the template using Google Forms, though it is not intended to collect answers (Google Form it is used to create and send surveys for example). I simply used it because I liked the format and I could incorporate it easily in my post;

- the template could be sent to clients by email, instead of filling it out during a face to face meeting;

- some of the questions will need more detailed answers than the ones I put on the template. I didn´t leave a blank space in the form, but I encourage you to do it. The more detailed info you can get from the client, the better you will deliver your measurement and implementation plans.

I hope this simple template will be of use in your daily work. Please feel free to share advice, critique or other question you reckon I could add to the template.

Thank you.