Apr 16, 2013

Plotting data over a map with R

After searching for a few hours on the web, I’ve been able to get my R code working and plot breast cancer data on a world map. It might not the best looking map possible (R graphics is incredible!), but I am happy with that for now.

To produce the map I used the “maps” package available through CRAN repository. And of course I needed longitude and latitude coordinates for each country, which I searched on the web and added to my original data set. Here are the steps I followed:


1) Load a .csv file containing lat/long coordinates for all countries

> countryCoord<- read.csv (“~/Rworkdir/data/countryCoord.csv”)

2) Add lat/long coordinates to my original breast cancer data set (dataset is called “gapCleaned”). To do this I used the function “merge”, specifying to merge the two data sets by the variable “country” (both the datasets have this variable in common), and used left outer join (here is a good explanation of merge command)

> mergedCleaned<- merge(gapCleaned, countryCoord, by=”country”, all.x=TRUE)

All right, now I have two new columns in my data set, indicating lat and long coordinates for each country :) Cool, next step is finally drawing a map with the data.

3) Draw a world map and tell R where to plot breast cancer data

> library(maps)

> map(“world”,col=”gray90”, fill=TRUE)

image


I size the breast cancer symbol according to breast cancer value for each country in my data set

> radius <- 3^sqrt (mergedCleaned$breastcancer)

Finally, I give R instructions to plot my breast cancer data over the world map

> symbols(mergedCleaned$lon, mergedCleaned$lat, bg = “blue”, fg = “red”, lwd = 1, circles = radius, inches = 0.175, add = TRUE)


                    New cases of breast cancer in the world, 2002

image


I am sure we can do prettier plots with R (I know there are other interesting packages suitable for this, such as ggplot), but I am happy for now. I’ve learned something new and been able to visualize and communicate data in a more effective way than just a scatterplot.

Conclusions:

Looking at the map, we can quickly identify the countries/areas with the highest number of breast cancer cases and hypothesize patterns. As reported on my last post, these are United States, New Zealand, Israel, Central/Northern Europe and in general highly developed economies rather than developing countries.