Mar 28, 2013

Ready for a new Statistics Course: my Research Questions

As part of “Passion Driven Stats" course offered through Coursera, I have decided to perform some analysis on a dataset downloaded from Gapminder is a non-profit venture which seek to increase the use and understanding of statistics about social, economic and environmental development at a global level.

I will provide later on more details about the specific dataset, variables, etc. For now, I am only posting the main questions I will try to answer with my research.


1. Is there any association between people’s ‘unhealthy’ lifestyle and new cases of cancer diagnosed globally?

Variables used:

- new cases of cancer: I will use the number of new cases of breast cancer in 100,000 female residents in each country included in the dataset. This should represent a relevant indicator for new cases of cancer for women in general. I might add other types of cancer on women (es. lung/liver cancers) or also include cancer for men. I will have to check available data on Gapminder.

- people ‘unhealthy’ lifestyle: I will use mainly the average alcohol consumption per adult and amount of CO2 emission in a country. I might also add average of smoking people if available through Gapminder.

My hypothesis:

Unhealthy lifestyle of people is generally associated with a higher number of new cases of cancer, at a global level.

2. Is there any significant difference in the relationship above studied, between Latin American countries and the rest of the world?

I am interested in looking more in details at Latin America, as in recent years the region reported several cases of cancer among its political leaders. Hugo Chavez, President of Venezuela, this month died of pelvic cancer. Before this tragic event, other major leaders, such as Cristina Fernandez (Argentina), Luiz Lula (Brazil), Fernando Lugo (Paraguay) and Dilma Roussef (Brazil), have undergone cancer treatments. These odd coincidences raised also some conspiracy theories about the causes of cancer.

Do Latin America present a high number of cancer cases compared to the rest of the world? Can we see the same association between variables as the rest of the world (the one studied above) or is there a significant different pattern?

Hopefully I will be able to find something interesting through my analysis, and most importantly,  statistically significant. More details will follow on next posts.

Oh, I am going to use R studio…

