Jul 15, 2014

3 Great Examples of Data Journalism Stories

Over the last month I've been spending part of my free time learning about an emerging discipline in the areas of data analytics: Data Journalism. I am doing it, firstly because I find the combination "data analysis + journalism" very fascinating, but also because, as a Web Analyst I believe that there are some very important skills I can absorb from Data Journalists (here is a post where I talk about Web Analyst skills).

The aim of this post is to introduce you to this emerging discipline, and show you a couple of practical examples of data journalism. To do so, I've selected 3 published data journalism stories and analysed each of them by answering four key questions:
  1. What does the story do?
  2. How was it created (methodology)? 
  3. How was it illustrated?
  4. What technologies were used to create and present the story to readers?   

If, after reading this post, you feel you like to know more about data journalism, I encourage you to sign-up to this mooc: Doing Journalism with Data, offered by the European Journalism Centre through the Canvas network platform. The course actually started in May and material will be available until next 30 of July, so you still have sometime.

But first, of course, keep scrolling down this post and get started here with Data Journalism!


What is Data Journalism?


To put it very very simple, Data Journalism is Journalism done with Data.

I know this is not a very helpful definition, and the reality is that if we asked different journalists for a definition of data journalism, we would probably get several different answers.

To make things a bit clearer, Simon Rogers (Data Editor @ Twitter) looks at some key aspects of doing journalism with data. To define it, he suggests that data journalism is about:
  • telling stories with numbers
  • finding the best way to tell this story
  • the techniques with which you tell the story (which keep changing all the time)
But, hold on. Is data jounalism a totally new discipline? Not really. Actually, data have always been at the base of stories. In some areas like sports for example, data have always been an essential part of the piece of work to deliver to the reader.

Then, what happened over the last years that data journalism is emerging so rapidly?

1) First of all, more and more data are becoming accessible to everyone, as never before. The world of media can now look for data to create their stories from several different sources such as: public databases about government spending, leaked documents published by Wikileaks or "big data" generated by social networks such as Facebook or Twitter for example;

2) Secondly, data analysis tools are available to most of us. Who has not an Excel copy (or equivalent spreadsheet software) installed on his computer? Some very powerful tools are even free or open source. Think of R statistical programming language.

4) Last, data analysis tools are much more powerful and easier to use than in the past. In many cases you can produce beautiful visualizations or maps without having technical experience or coding knowledge.


Because of the above changes, the journalism field is under siege and so are the skills needed for "new journalists". Indeed, journalists will have to become knowledgeable in searching, cleaning, processing, analysing and visualizing data. They will have to mine the data, making sense of it and turn it into something interesting for the reader.

Finally, we can try to group the main activities of a data journalists (or set of skills needed) into 4 categories:
  1. Finding data to support stories
  2. Analyse the data to discover potential stories
  3. Clean the data
  4. Tell stories through visualizations

Some pionering journalists and newspapers are already demonstrating how data can deliver unique insights of what is happening around us. And they are creating very interesting stories from that. Here below I am going to show you a couple of examples of published pieces of data journalism, and I will try to analyse each of them through the simple schema mentioned above.  


Three Great Data Journalism Stories


Story #1. Afghanistan War Logs: a Selection of Significant Accidents

Published by: The Guardian Data Blog


afghan accidents data journalism


What does the story do?
This great data story created and published by The Guardian, shows us a selection of key events happened during the war in Afghanistan, such as coalition forces attacks on civilians, friendly fire incidents (coalition troops mistakenly firing on each others), and Afghan forces attacking each others. To achieve that, The Guardian uses data that have become available through Wikileaks, which discloses previously military confidential facts about the war in Afghanistan.

How was it created?
What we are talking about here is one of the biggest leaks in intelligence history. The Guardian got a huge Excel file from Wikileaks, logging the history of the war in Afghanistan. The Excel file contained over 90,000 rows data, some of which of course had nothing in it or poor formatting.

The data obviously needed some cleaning. And Excel showed its limits processing such a huge amount of data. Reporters could not access the data easily, hence it was hard to extract meaningful stories. What the Guardian data team did, was building an internal database to store and access these data, so that reporters could now look for stories, by using keywords and events. One of the key stories found from the war data, was the rise in the use of IED (improvise explosive devices) attacks.

They then mapped latitudes and longitudes coordinates of every event, made a selection of key events to include in the story, and finally created a graphic visualization with the help of Google Maps.

How was it illustrated?
The story was visualized by plotting points (key events) of different colours on a geographical map. Using Google Map in this case. As you can see on the picture above, colours identify the category of event (Afghan friendly fire vs Coalition friendly fire, etc.).

If you click on the categories within the map legend above, you can hide or show them in the map. Also, if you click on any data point within the map, a small window will open showing a brief description of the event, the category as well as the data and time it occured. By clicking on "Read the full log entry" you will be able to see the complete log of that event.

I recommend you explore the Afghan War Logs map yourself and play a bit with the data.

What technologies were used to create and present the story to readers?
Wikileaks data were recorded in a spreadsheet, about 92,000 notes, then the team built a simple database and interrogated with SQL. Finally, I guess they used Google Maps API to produce the map.


Story #2. Weapons from Croatia Spread through the Conflict in Syria

Published by: final article was published by The New York Times, though the story was originally created by Eliot Higgins, a.ka. Brown Moses

Weapons Smuggled into Syria - Data Journalism Story

What does the story do?
The story reveals as, at sometime during the war in Syria, some very unusual weapons appeared in the conflict. Apparently all coming from former Yugoslavia. The insights discovered, led the journalist make a very important conclusion from the data: the Saudis had purchased those weapons from Croatia, shipped them to Jordan, and started smuggling into Syria to support the free Syrian Army in their fight against President Assad. All of this probably happening with the knowledge of the US Government.
   
How was it created?
All began in 2012 when, an unemployed finance worker named Eliot Higgins, started a blog about the Syrian civil war. As he said, his early posts were rather unorganised collections of videos he had seen on Facebook and Twitter. But after a couple of months, he started adopting a more systematic approach to collect and examine videos coming from Syria.

What he did, was gathering a list of all channels that were posting from each specific area of Syria. Her ended up monitoring a list of over 500 Youtube channels daily, searching for images of weapons and tracking when new types of weapons appeared in the conflict, where , and with which army group. To find out the type of weapons, he mainly relied on Google. He eventually collected all this data into a spreadsheet and analysed it.

Once he noticed there might have been an important sotry behind those data (unusual weapons mainly appearing in the southern region of Syria and near to the border with Jordan), he first published his findings on his blog, and later wen to the New York Times with an article summarising what he found.

The New York Times did further investigations and eventually published the article.

A very interesting point made by Elliot Higgins, is about the effectiveness of his data collection methodology. By monitoring Social Media he was able to track the arrival of those unusual weapons in Syria, which is something he might not have picked up staying on the ground. For this type of analysis, he had a much better picture of what was going on in Syria, than a journalist based locally. If you are interested in this fascinating story, you must read Higgins story on School of Data or just google it.

How was it illustrated?
I have not found published visualizations about the findings. The story published on the New York is only text based, though it references to Brown Moses blog. Here you can find various pictures and videos reporting the conflict in Syria and the weapons discovered.

As a personal thought, it would be very interesting to summarize and show the story throughout some type of visualization. An idea, could be a geographical map, with points plotted over it indicating the different type of arms used in the conflict; and also a timeline showing when Croatian weapons started spreading throughout the conflict. Any other idea?

What technologies were used to create and present the story?
He regularly monitored images and videos from social media like Facebook and Twitter, and collected all the data into a spreadsheet, I guess Excel.


Story #3. The Cholera Map of John Snow (1854)

Published by: John Snow in 1854
Article link: I am not sure if the original publication is available on the internet, however his amazing work has been largely discussed by many experts in the field of data visualization and journalism. I recommend you to read the interesting article from The Guardian, where they also recreate John Snow story with an interactive map. Here below is the original cholera map.

Jogh Snow Cholera Map - Data Journalism Story


What does the story do?
Until the 1870s, most scientists believed that cholera, as many other sicknesses, were caught by breathing "polluted" air. In 1854, a severe oubreak of cholera occured in the London district of Soho, was the occasion for the physician John Snow to study the phenomenon closely, and defy the dominant theory by hypothesizing that cholera was a waterbone disease caused by germs.

He concluded that the source of the London outbreak was the public water pump located on Broad Street (now called Broadwick Street). So, the cholera spreaded by contaminated water and not by "polluted" air as most scientists believed.

How was it created?
John Snow talked to local residents, found out where the cholera cases happened and collected all data. He then made a map of all the cases, representing each death as a bar and locating in the map on the exact point where it happened. Thanks to the map, Snow could show his story behind those data: most of cholera deaths clustered around the Broad Street pump, which (as later was discovered) had been contaminated by fecal matter from a sick baby.

How was it illustrated?
With bars on a map, representing the amount of cholera cases in each areas of London Soho. And circles showing the locations of water pumps.

This type of data visualization, allowed him to relate the two variables: pumps locations and number of cholera cases.

What technologies were used to create and show the story?
At that time, I guess Snow produced the map with just a pen and paper.

As I mentioned earlier, his data story has been revisited recently by The Guardian, who recreated the cholera map using modern mapping tools such as CartoDB and Stamen style maps  .

For more about the impact of Snow story on current data journalism and data visualization, you can check the very interesting posts from Simon Rogers and Alberto Cairo


Conclusions: What can we Learn from Data Journalism?


In this post I introduced the emerging field of data journalism and showed with 3 examples, how some pionering journalists are analysing data to find insightful stories.

The technologies used to create and present stories were more or less sophisticated. Still, all journalists followed a common process to build the story: they had to find data, analyze it, clean it, and finally communicate the story possibly using some graphic visualization.

As a Web Analyst, I think that looking at how journalists are creating stories with data is a great learning excercise. It encourages us being more creative, curious and developing a good critical actitude which will help us in our daily job. And very important, it should make us realising that we share many aspects of our job with data journalists. Indeed, both of us follow the same process in building the story (find, analyse, clean, visualize) and have to communicate the final results to someone (who sometimes is new to the subject).

Okay, I guess in many cases our job (web analysis) is more "standardized" than a journalist, in the sense that we tend to stick to the same sources of data, tools and techniques to collect, analyse and visualize data. But still, this should not stop us thinking outside the box and see if there is a better way to solve our data problems. For example, we might start asking ourselves questions like:

How could we collect more interesting data to perform our web analysis? Is there any other available data that we could combine together with our website clickstream data, so that we will be able to make better decisions? Think at the Syria story and how Elliot Higgins started monitoring social media to get insights about the conflict going on in Syria. Would he had been able to get the same data by staying on the ground? Probably not.

And finally, what would be the best graphic format for our monthly/weekly report? Could we replace current tables/charts with more insightful graphich visualizations? Sometimes it can be a great idea to just grab pen and paper and do a sketch of what we want to show with the data. John Snow cholera map was an excellent example of insightful data visualizations.

See you next post! Thanks for reading it.