Oct 3, 2016

How to Upgrade R version in Windows. The easy way recommended on CRAN

Today I have found myself needing to upgrade R. Main reason was that my current version  R-3.2.1 did not support some new graphic packages. To install these new packages I needed at least a R-3.3 version.

After a bit of initial hesitation (will I lose my packages during the new installation? etc. etc.) I finally took some courage and decided to follow the official documentation on CRAN. Everything worked just fine and I have now installed the latest available R version on CRAN: at the time I write this post it´s R-3.3.1.

The upgrading process was really easy, so I thought to share it step by step. Enjoy :)

Sep 18, 2016

Analyzing Stack Overflow questions and tags with the StackLite dataset

The guys at Stack Overflow have recently released a very interesting dataset containing the entire history of questions made by users since the beginning of the site, back in 2008. It's called StackLite and it contains, for each Stack Overflow question the following data:
  • Question ID
  • Creation Date
  • Closed Date (when applicable)
  • Deletion Date (when applicable)
  • Score
  • Owner user ID
  • Number of answers
  • Tags 

As David Robinson explains in his introductory post, the Stacklite dataset is designed to be easy to read and analysed with any programming language or statistical tool. A fantastic resource if you are a data analyst/scientist and want to crunch some real data! 

I thought to give it a go and perform some exploratory analysis using R. More specifically, I am going to answer the following business questions:
  • What are the most popular tags?
  • How many questions have more than one tag?
  • What is the overall closure rate for the site and which tags present higher values?
  • How much time it takes, on average, to close a question?
  • Which tags tend to have higher/lower score?
  • And in particular: how data science languages perform on the above questions?

Aug 11, 2016

Google Analytics makes Demo Account available to all

Playing with GA data is much much easier now.

Last week biggest news was definitely Google making a Demo Google Analytics Account available to everyone. As the word "demo" says, the main purpose is demonstrating all the features and reports GA offers, and become a learning platform for analysts. But it´s actually real numbers! All the data available come from the Google Merchandise Store (which sells Google branded merchandise), so you can apply your favorite algorithm, find valuable insights from the data and show off your analytics skills to others.

Click on this link to access the GA Demo Account.

  • If you already have a Google Analytics account, Google will add the demo account to it (then you can access it via the Home tab in Google Analytics).
  • If you do not have a Google Analytics account, it will create one for you in association with your Google account (yes you need a Google account first) and add the demo account to it.

What can you do with the GA Demo Account?

Jun 17, 2016

Where to Live in Barcelona in a Dashboard

Barcelona best barrio visualization

Sometimes data can tell a story much faster and effectively than many words. That's why I´ve decided to start sharing more data stories via this blog, hoping to both:

  1. address specific topics readers want to dive in (often these will not be data-people, they would be new to my blog, probably coming after googling a specific questions e.g. "which are the best boroughs to live in Barcelona?").
  2. showcase data visualization tools and best practices to present your data (these are data-people, yes you my regular readers, you might like to see a tool in action).

Mar 27, 2016

Enhance your Blog Measurement with these Google Analytics Calculated Metrics

Calculated Metrics in GA

Google Analytics has recently incorporated a new powerful feature that offers more flexibility for measuring your own business objectives. I am talking about calculated metrics.

In this post I am going to suggest a list of calculated metrics that you can easily configure in Google Analytics to better measure your blog performance.

As a blogger, when it comes to measure performance of my content, I am very focused on measuring readers engagement with the content I publish. Also, I am constantly looking to increase my readers base, giving my blog more exposure and acquiring new subscribers. Here is an outline of my measurement plan using Google Analytics (I highly recommend this read if you are new on the concept of digital measurement plan).

The new calculated metrics feature gives me the opportunity to customize my own measurement plan. How?

Feb 8, 2016

What happens when you have outliers in your data?

In this post I am going to talk briefly about outliers and the effect they might have on your data. With an example of course. Let's start with defining the word "outlier": what is an outlier in math/statistics?

An outlier is basically a number (or data point) in a set o data that is either way smaller or way bigger than most of the other data points.

Let's go through a practical example in order to understand the implications of having an outlier within your data set.

Jan 17, 2016

Scheduling R Markdown Reports via Email

GA markdown report using R
R Markdown is an amazing tool that allows you to blend bits of R code with ordinary text and produce well-formatted data analysis reports very quickly. You can export the final report in many formats like HTML, pdf or MS Words which makes it easy to share with others. And of course, you can modify or update it with fresh data very easily.

I have recently been using it R Markdown for pulling data from various data source such Google Analytics API and MySQL database, perform several operations on it (merging for example) and present the outputs with tables, visualizations and insights (text).

But what about automating the whole report generation and emailing the final report as an attached document every month at a specific time?